[PATCH RFC v2 0/2] linux/vnuma: vNUMA PV guest support introduction

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC v2 0/2] linux/vnuma: vNUMA PV guest support introduction
@ 2013-09-17  8:33 Elena Ufimtseva
  2013-09-17  8:34 ` [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest Elena Ufimtseva
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Elena Ufimtseva @ 2013-09-17  8:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Elena Ufimtseva, george.dunlap, dario.faggioli, lccycc123, msw,
	boris.ostrovsky, anddavid.vrabel

This patchset introduces vNUMA for PV domU guest.

Enables PV guest to discover NUMA topology provided by Xen
and initializes NUMA topology on boot. XENMEM subop hypercall
is used to retreive information from Xen. Xen provides number 
of NUMA nodes, memory regions (start and end pfn) constructed 
based on e820 domU map, distance table and cpu to node map. i
xen_numa_init is called to setup NUMA related structures. 
To enable this mechanism, kernel should be compiled as PV guest
with CONFIG_NUMA=y and Xen should support vNUMA functionality 
(patchset http://lists.xenproject.org/archives/html/xen-devel/2013-09/msg01337.html ).

Elena Ufimtseva (2):
  vNUMA topology support for PV domu guest
  Enables NUMA for PV vNUMA-enabled domu guest.

 arch/x86/include/asm/xen/vnuma.h |   12 +++++
 arch/x86/mm/numa.c               |    5 ++
 arch/x86/xen/Makefile            |    2 +-
 arch/x86/xen/setup.c             |    6 ++-
 arch/x86/xen/vnuma.c             |   94 ++++++++++++++++++++++++++++++++++++++
 include/xen/interface/memory.h   |   27 +++++++++++
 6 files changed, 144 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/include/asm/xen/vnuma.h
 create mode 100644 arch/x86/xen/vnuma.c

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest
  2013-09-17  8:33 [PATCH RFC v2 0/2] linux/vnuma: vNUMA PV guest support introduction Elena Ufimtseva
@ 2013-09-17  8:34 ` Elena Ufimtseva
  2013-09-17 14:10   ` David Vrabel
                     ` (2 more replies)
  2013-09-17  8:34 ` [PATCH RFC v2 2/2] linux/vnuma: Enables NUMA for domu PV guest Elena Ufimtseva
  2013-09-18 16:16 ` [PATCH RFC v2 0/2] linux/vnuma: vNUMA PV guest support introduction Dario Faggioli
  2 siblings, 3 replies; 20+ messages in thread
From: Elena Ufimtseva @ 2013-09-17  8:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Elena Ufimtseva, george.dunlap, dario.faggioli, lccycc123, msw,
	boris.ostrovsky, anddavid.vrabel

Requests NUMA topology info from Xen by issuing subop
hypercall. Initializes NUMA nodes, sets number of CPUs,
distance table and NUMA nodes memory ranges during boot.
vNUMA topology defined by user in VM config file. Memory
ranges are represented by structure vnuma_topology_info
where start and end of memory area are defined in guests
pfns numbers, constructed and aligned accordingly to
e820 domain map.
In case the received structure has errors, will fail to
dummy numa init.
Requires XEN with applied patches from vnuma patchset;

Changes since v1:
- moved the test for xen_pv_domain() into xen_numa_init;
- replaced memory block search/allocation by single memblock_alloc;
- moved xen_numa_init to vnuma.c from enlighten.c;
- moved memblock structure to public interface memory.h;
- specified signedness of vnuma topology structure members;
- removed excessive debug output;

TODO:
- consider common interface for Dom0, HVM and PV guests to provide
vNUMA topology;
- dynamic numa balancing at the time of this patch (kernel 3.11
6e4664525b1db28f8c4e1130957f70a94c19213e with boot parameter
numa_balancing=true that is such by default) crashes numa-enabled
guest. Investigate further.

Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
---
 arch/x86/include/asm/xen/vnuma.h |   12 +++++
 arch/x86/mm/numa.c               |    5 +++
 arch/x86/xen/Makefile            |    2 +-
 arch/x86/xen/vnuma.c             |   92 ++++++++++++++++++++++++++++++++++++++
 include/xen/interface/memory.h   |   27 +++++++++++
 5 files changed, 137 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/include/asm/xen/vnuma.h
 create mode 100644 arch/x86/xen/vnuma.c

diff --git a/arch/x86/include/asm/xen/vnuma.h b/arch/x86/include/asm/xen/vnuma.h
new file mode 100644
index 0000000..1bf4cae
--- /dev/null
+++ b/arch/x86/include/asm/xen/vnuma.h
@@ -0,0 +1,12 @@
+#ifndef _ASM_X86_VNUMA_H
+#define _ASM_X86_VNUMA_H
+
+#ifdef CONFIG_XEN
+int xen_vnuma_support(void);
+int xen_numa_init(void);
+#else
+int xen_vnuma_support(void) { return 0; };
+int xen_numa_init(void) {};
+#endif
+
+#endif /* _ASM_X86_VNUMA_H */
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 8bf93ba..a95fadf 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -19,6 +19,7 @@
 #include <asm/amd_nb.h>
 
 #include "numa_internal.h"
+#include "asm/xen/vnuma.h"
 
 int __initdata numa_off;
 nodemask_t numa_nodes_parsed __initdata;
@@ -621,6 +622,10 @@ static int __init dummy_numa_init(void)
 void __init x86_numa_init(void)
 {
 	if (!numa_off) {
+#ifdef CONFIG_XEN
+		if (!numa_init(xen_numa_init))
+			return;
+#endif
 #ifdef CONFIG_X86_NUMAQ
 		if (!numa_init(numaq_numa_init))
 			return;
diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 96ab2c0..de9deab 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -13,7 +13,7 @@ CFLAGS_mmu.o			:= $(nostackp)
 obj-y		:= enlighten.o setup.o multicalls.o mmu.o irq.o \
 			time.o xen-asm.o xen-asm_$(BITS).o \
 			grant-table.o suspend.o platform-pci-unplug.o \
-			p2m.o
+			p2m.o vnuma.o
 
 obj-$(CONFIG_EVENT_TRACING) += trace.o
 
diff --git a/arch/x86/xen/vnuma.c b/arch/x86/xen/vnuma.c
new file mode 100644
index 0000000..3c6c73f
--- /dev/null
+++ b/arch/x86/xen/vnuma.c
@@ -0,0 +1,92 @@
+#include <linux/err.h>
+#include <linux/memblock.h>
+#include <xen/interface/xen.h>
+#include <xen/interface/memory.h>
+#include <asm/xen/interface.h>
+#include <asm/xen/hypercall.h>
+#include <asm/xen/vnuma.h>
+#ifdef CONFIG_NUMA
+/* Xen PV NUMA topology initialization */
+static unsigned int xen_vnuma_init = 0;
+int xen_vnuma_support()
+{
+	return xen_vnuma_init;
+}
+int __init xen_numa_init(void)
+{
+	int rc;
+	unsigned int i, j, cpu, idx, pcpus;
+	u64 phys, physd, physc;
+	unsigned int *vdistance, *cpu_to_node;
+	unsigned long mem_size, dist_size, cpu_to_node_size;
+	struct vnuma_memarea *varea;
+
+	struct vnuma_topology_info numa_topo = {
+		.domid = DOMID_SELF
+	};
+	rc = -EINVAL;
+	if (!xen_pv_domain())
+		return rc;
+	pcpus = num_possible_cpus();
+	mem_size =  pcpus * sizeof(struct vnuma_memarea);
+	dist_size = pcpus * pcpus * sizeof(*numa_topo.vdistance);
+	cpu_to_node_size = pcpus * sizeof(*numa_topo.cpu_to_node);
+	phys = memblock_alloc(mem_size, PAGE_SIZE);
+	physd = memblock_alloc(dist_size, PAGE_SIZE);
+	physc = memblock_alloc(cpu_to_node_size, PAGE_SIZE);
+	if (!phys || !physc || !physd)
+		goto vnumaout;
+	varea = __va(phys);
+	vdistance  = __va(physd);
+	cpu_to_node  = __va(physc);
+	set_xen_guest_handle(numa_topo.vmemarea, varea);
+	set_xen_guest_handle(numa_topo.vdistance, vdistance);
+	set_xen_guest_handle(numa_topo.cpu_to_node, cpu_to_node);
+	rc = HYPERVISOR_memory_op(XENMEM_get_vnuma_info, &numa_topo);
+	if (rc < 0)
+		goto vnumaout;
+	rc = -EINVAL;
+	if (numa_topo.nr_nodes == 0) {
+		/* will pass to dummy_numa_init */
+		goto vnumaout;
+	}
+	if (numa_topo.nr_nodes > num_possible_cpus()) {
+		pr_debug("vNUMA: Node without cpu is not supported in this version.\n");
+		goto vnumaout;
+	}
+	/*
+	 * NUMA nodes memory ranges are in pfns, constructed and
+	 * aligned based on e820 ram domain map
+	*/
+	for (i = 0; i < numa_topo.nr_nodes; i++) {
+		if (numa_add_memblk(i, varea[i].start, varea[i].end))
+			/* pass to numa_dummy_init */
+			goto vnumaout;
+		node_set(i, numa_nodes_parsed);
+	}
+	setup_nr_node_ids();
+	/* Setting the cpu, apicid to node */
+	for_each_cpu(cpu, cpu_possible_mask) {
+		set_apicid_to_node(cpu, cpu_to_node[cpu]);
+		numa_set_node(cpu, cpu_to_node[cpu]);
+		__apicid_to_node[cpu] = cpu_to_node[cpu];
+		cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node[cpu]]);
+	}
+	for (i = 0; i < numa_topo.nr_nodes; i++) {
+		for (j = 0; j < numa_topo.nr_nodes; j++) {
+			idx = (j * numa_topo.nr_nodes) + i;
+			numa_set_distance(i, j, *(vdistance + idx));
+		}
+	}
+	rc = 0;
+	xen_vnuma_init = 1;
+vnumaout:
+	if (phys)
+		memblock_free(__pa(phys), mem_size);
+	if (physd)
+		memblock_free(__pa(physd), dist_size);
+	if (physc)
+		memblock_free(__pa(physc), cpu_to_node_size);
+	return rc;
+}
+#endif
diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h
index 2ecfe4f..4237f51 100644
--- a/include/xen/interface/memory.h
+++ b/include/xen/interface/memory.h
@@ -263,4 +263,31 @@ struct xen_remove_from_physmap {
 };
 DEFINE_GUEST_HANDLE_STRUCT(xen_remove_from_physmap);
 
+/* vNUMA structures */
+struct vnuma_memarea {
+	uint64_t start, end;
+};
+DEFINE_GUEST_HANDLE_STRUCT(vnuma_memarea);
+
+struct vnuma_topology_info {
+	/* OUT */
+	domid_t domid;
+	/* IN */
+	uint16_t nr_nodes; /* number of virtual numa nodes */
+	uint32_t _pad;
+	/* distance table */
+	GUEST_HANDLE(uint) vdistance;
+	/* cpu mapping to vnodes */
+	GUEST_HANDLE(uint) cpu_to_node;
+	/*
+	* array of numa memory areas constructed by Xen
+	* where start and end are pfn numbers of the area
+	* Xen takes into account domains e820 map
+	*/
+	GUEST_HANDLE(vnuma_memarea) vmemarea;
+};
+DEFINE_GUEST_HANDLE_STRUCT(vnuma_topology_info);
+
+#define XENMEM_get_vnuma_info	25
+
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH RFC v2 2/2] linux/vnuma: Enables NUMA for domu PV guest
  2013-09-17  8:33 [PATCH RFC v2 0/2] linux/vnuma: vNUMA PV guest support introduction Elena Ufimtseva
  2013-09-17  8:34 ` [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest Elena Ufimtseva
@ 2013-09-17  8:34 ` Elena Ufimtseva
  2013-09-17 14:17   ` David Vrabel
  2013-09-18 15:14   ` Dario Faggioli
  2013-09-18 16:16 ` [PATCH RFC v2 0/2] linux/vnuma: vNUMA PV guest support introduction Dario Faggioli
  2 siblings, 2 replies; 20+ messages in thread
From: Elena Ufimtseva @ 2013-09-17  8:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Elena Ufimtseva, george.dunlap, dario.faggioli, lccycc123, msw,
	boris.ostrovsky, anddavid.vrabel

After the NUMA topology was received from Xen,
enable NUMA during boot. Should have CONFIG_NUMA
enabled in kernel.

Changes since v1:
- added additional checks for PV guest and hypercall
support before enablinf NUMA;

Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
---
 arch/x86/xen/setup.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 8f3eea6..2f2d28e 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -20,6 +20,7 @@
 #include <asm/numa.h>
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
+#include <asm/xen/vnuma.h>
 
 #include <xen/xen.h>
 #include <xen/page.h>
@@ -583,6 +584,9 @@ void __init xen_arch_setup(void)
 	WARN_ON(xen_set_default_idle());
 	fiddle_vdso();
 #ifdef CONFIG_NUMA
-	numa_off = 1;
+	if (!xen_initial_domain() && xen_vnuma_support())
+		numa_off = 0;
+	else
+		numa_off = 1;
 #endif
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest
  2013-09-17  8:34 ` [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest Elena Ufimtseva
@ 2013-09-17 14:10   ` David Vrabel
  2013-09-18  6:16     ` Elena Ufimtseva
  2013-09-17 14:21   ` Boris Ostrovsky
  2013-09-18 16:04   ` Dario Faggioli
  2 siblings, 1 reply; 20+ messages in thread
From: David Vrabel @ 2013-09-17 14:10 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: lccycc123, george.dunlap, dario.faggioli, xen-devel, David Vrabel,
	msw, boris.ostrovsky

On 17/09/13 09:34, Elena Ufimtseva wrote:
> Requests NUMA topology info from Xen by issuing subop
> hypercall. Initializes NUMA nodes, sets number of CPUs,
> distance table and NUMA nodes memory ranges during boot.
> vNUMA topology defined by user in VM config file. Memory
> ranges are represented by structure vnuma_topology_info
> where start and end of memory area are defined in guests
> pfns numbers, constructed and aligned accordingly to
> e820 domain map.
> In case the received structure has errors, will fail to
> dummy numa init.
> Requires XEN with applied patches from vnuma patchset;
> 
> Changes since v1:
> - moved the test for xen_pv_domain() into xen_numa_init;
> - replaced memory block search/allocation by single memblock_alloc;
> - moved xen_numa_init to vnuma.c from enlighten.c;
> - moved memblock structure to public interface memory.h;
> - specified signedness of vnuma topology structure members;
> - removed excessive debug output;
> 
> TODO:
> - consider common interface for Dom0, HVM and PV guests to provide
> vNUMA topology;
> - dynamic numa balancing at the time of this patch (kernel 3.11
> 6e4664525b1db28f8c4e1130957f70a94c19213e with boot parameter
> numa_balancing=true that is such by default) crashes numa-enabled
> guest. Investigate further.

> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -19,6 +19,7 @@
>  #include <asm/amd_nb.h>

#include <asm/xen/vnuma.h> here...

>  #include "numa_internal.h"
> +#include "asm/xen/vnuma.h"

... not here.

> --- /dev/null
> +++ b/arch/x86/xen/vnuma.c
> @@ -0,0 +1,92 @@
> +#include <linux/err.h>
> +#include <linux/memblock.h>
> +#include <xen/interface/xen.h>
> +#include <xen/interface/memory.h>
> +#include <asm/xen/interface.h>
> +#include <asm/xen/hypercall.h>
> +#include <asm/xen/vnuma.h>
> +#ifdef CONFIG_NUMA
> +/* Xen PV NUMA topology initialization */
> +static unsigned int xen_vnuma_init = 0;
> +int xen_vnuma_support()
> +{
> +	return xen_vnuma_init;
> +}

I'm not sure how this and the usage in the next patch actually work.
xen_vnuma_init is only set after the test of numa_off prior to calling
xen_numa_init() which will set xen_vnuma_init.

> +int __init xen_numa_init(void)
> +{
> +	int rc;
> +	unsigned int i, j, cpu, idx, pcpus;
> +	u64 phys, physd, physc;
> +	unsigned int *vdistance, *cpu_to_node;
> +	unsigned long mem_size, dist_size, cpu_to_node_size;
> +	struct vnuma_memarea *varea;
> +
> +	struct vnuma_topology_info numa_topo = {
> +		.domid = DOMID_SELF
> +	};
> +	rc = -EINVAL;
> +	if (!xen_pv_domain())
> +		return rc;
> +	pcpus = num_possible_cpus();
> +	mem_size =  pcpus * sizeof(struct vnuma_memarea);
> +	dist_size = pcpus * pcpus * sizeof(*numa_topo.vdistance);
> +	cpu_to_node_size = pcpus * sizeof(*numa_topo.cpu_to_node);
> +	phys = memblock_alloc(mem_size, PAGE_SIZE);
> +	physd = memblock_alloc(dist_size, PAGE_SIZE);
> +	physc = memblock_alloc(cpu_to_node_size, PAGE_SIZE);
> +	if (!phys || !physc || !physd)
> +		goto vnumaout;
> +	varea = __va(phys);
> +	vdistance  = __va(physd);
> +	cpu_to_node  = __va(physc);
> +	set_xen_guest_handle(numa_topo.vmemarea, varea);
> +	set_xen_guest_handle(numa_topo.vdistance, vdistance);
> +	set_xen_guest_handle(numa_topo.cpu_to_node, cpu_to_node);
> +	rc = HYPERVISOR_memory_op(XENMEM_get_vnuma_info, &numa_topo);
> +	if (rc < 0)
> +		goto vnumaout;
> +	rc = -EINVAL;
> +	if (numa_topo.nr_nodes == 0) {
> +		/* will pass to dummy_numa_init */
> +		goto vnumaout;
> +	}
> +	if (numa_topo.nr_nodes > num_possible_cpus()) {
> +		pr_debug("vNUMA: Node without cpu is not supported in this version.\n");
> +		goto vnumaout;
> +	}
> +	/*
> +	 * NUMA nodes memory ranges are in pfns, constructed and
> +	 * aligned based on e820 ram domain map
> +	*/
> +	for (i = 0; i < numa_topo.nr_nodes; i++) {
> +		if (numa_add_memblk(i, varea[i].start, varea[i].end))
> +			/* pass to numa_dummy_init */
> +			goto vnumaout;

If there's a failure here, numa may be partially setup.  Do you need to
undo any of the bits that have already setup?

> +		node_set(i, numa_nodes_parsed);
> +	}
> +	setup_nr_node_ids();
> +	/* Setting the cpu, apicid to node */
> +	for_each_cpu(cpu, cpu_possible_mask) {
> +		set_apicid_to_node(cpu, cpu_to_node[cpu]);
> +		numa_set_node(cpu, cpu_to_node[cpu]);
> +		__apicid_to_node[cpu] = cpu_to_node[cpu];
> +		cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node[cpu]]);
> +	}
> +	for (i = 0; i < numa_topo.nr_nodes; i++) {
> +		for (j = 0; j < numa_topo.nr_nodes; j++) {
> +			idx = (j * numa_topo.nr_nodes) + i;
> +			numa_set_distance(i, j, *(vdistance + idx));
> +		}
> +	}
> +	rc = 0;
> +	xen_vnuma_init = 1;
> +vnumaout:

Call this label "out".

> +	if (phys)
> +		memblock_free(__pa(phys), mem_size);
> +	if (physd)
> +		memblock_free(__pa(physd), dist_size);
> +	if (physc)
> +		memblock_free(__pa(physc), cpu_to_node_size);
> +	return rc;

If you return an error, x86_numa_init() will try to call setup for other
NUMA system.  Consider calling numa_dummy_init() directly instead and
then returning success.

> +}

Please use blank lines to space the logical bits of this function out
some more.

> +#endif
> diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h
> index 2ecfe4f..4237f51 100644
> --- a/include/xen/interface/memory.h
> +++ b/include/xen/interface/memory.h
> @@ -263,4 +263,31 @@ struct xen_remove_from_physmap {
>  };
>  DEFINE_GUEST_HANDLE_STRUCT(xen_remove_from_physmap);
>  
> +/* vNUMA structures */
> +struct vnuma_memarea {
> +	uint64_t start, end;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(vnuma_memarea);
> +
> +struct vnuma_topology_info {
> +	/* OUT */
> +	domid_t domid;
> +	/* IN */
> +	uint16_t nr_nodes; /* number of virtual numa nodes */
> +	uint32_t _pad;

Is this _pad intended to make this structure uniform across 32-bit and
64-bit guests?  The following GUEST_HANDLES() have different sizes on
32- and 64-bit guests.

> +	/* distance table */
> +	GUEST_HANDLE(uint) vdistance;
> +	/* cpu mapping to vnodes */
> +	GUEST_HANDLE(uint) cpu_to_node;
> +	/*
> +	* array of numa memory areas constructed by Xen
> +	* where start and end are pfn numbers of the area
> +	* Xen takes into account domains e820 map
> +	*/
> +	GUEST_HANDLE(vnuma_memarea) vmemarea;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(vnuma_topology_info);
> +
> +#define XENMEM_get_vnuma_info	25
> +
>  #endif /* __XEN_PUBLIC_MEMORY_H__ */

David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC v2 2/2] linux/vnuma: Enables NUMA for domu PV guest
  2013-09-17  8:34 ` [PATCH RFC v2 2/2] linux/vnuma: Enables NUMA for domu PV guest Elena Ufimtseva
@ 2013-09-17 14:17   ` David Vrabel
  2013-09-17 14:37     ` Dario Faggioli
  2013-09-18 15:14   ` Dario Faggioli
  1 sibling, 1 reply; 20+ messages in thread
From: David Vrabel @ 2013-09-17 14:17 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: lccycc123, george.dunlap, dario.faggioli, xen-devel, msw,
	boris.ostrovsky, anddavid.vrabel

On 17/09/13 09:34, Elena Ufimtseva wrote:
> After the NUMA topology was received from Xen,
> enable NUMA during boot. Should have CONFIG_NUMA
> enabled in kernel.
> 
> Changes since v1:
> - added additional checks for PV guest and hypercall
> support before enablinf NUMA;

As I said in response to the other patch, I don't think this does the
right thing.

I think xen_vnuma_support() needs to try the vnuma hypercall and check
it is successful.

> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -20,6 +20,7 @@
>  #include <asm/numa.h>
>  #include <asm/xen/hypervisor.h>
>  #include <asm/xen/hypercall.h>
> +#include <asm/xen/vnuma.h>
>  
>  #include <xen/xen.h>
>  #include <xen/page.h>
> @@ -583,6 +584,9 @@ void __init xen_arch_setup(void)
>  	WARN_ON(xen_set_default_idle());
>  	fiddle_vdso();
>  #ifdef CONFIG_NUMA
> -	numa_off = 1;
> +	if (!xen_initial_domain() && xen_vnuma_support())

I don't think there's a need to special case the initial domain here is
there?

> +		numa_off = 0;
> +	else
> +		numa_off = 1;
>  #endif
>  }

David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest
  2013-09-17  8:34 ` [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest Elena Ufimtseva
  2013-09-17 14:10   ` David Vrabel
@ 2013-09-17 14:21   ` Boris Ostrovsky
  2013-09-18  6:30     ` Elena Ufimtseva
  2013-09-18 16:04   ` Dario Faggioli
  2 siblings, 1 reply; 20+ messages in thread
From: Boris Ostrovsky @ 2013-09-17 14:21 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: george.dunlap, dario.faggioli, lccycc123, xen-devel, msw,
	anddavid.vrabel

On 09/17/2013 04:34 AM, Elena Ufimtseva wrote:
> Requests NUMA topology info from Xen by issuing subop
> hypercall. Initializes NUMA nodes, sets number of CPUs,
> distance table and NUMA nodes memory ranges during boot.
> vNUMA topology defined by user in VM config file. Memory
> ranges are represented by structure vnuma_topology_info
> where start and end of memory area are defined in guests
> pfns numbers, constructed and aligned accordingly to
> e820 domain map.
> In case the received structure has errors, will fail to
> dummy numa init.
> Requires XEN with applied patches from vnuma patchset;
>
> Changes since v1:
> - moved the test for xen_pv_domain() into xen_numa_init;
> - replaced memory block search/allocation by single memblock_alloc;
> - moved xen_numa_init to vnuma.c from enlighten.c;
> - moved memblock structure to public interface memory.h;
> - specified signedness of vnuma topology structure members;
> - removed excessive debug output;
>
> TODO:
> - consider common interface for Dom0, HVM and PV guests to provide
> vNUMA topology;
> - dynamic numa balancing at the time of this patch (kernel 3.11
> 6e4664525b1db28f8c4e1130957f70a94c19213e with boot parameter
> numa_balancing=true that is such by default) crashes numa-enabled
> guest. Investigate further.
>
> Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
> ---
>   arch/x86/include/asm/xen/vnuma.h |   12 +++++
>   arch/x86/mm/numa.c               |    5 +++
>   arch/x86/xen/Makefile            |    2 +-
>   arch/x86/xen/vnuma.c             |   92 ++++++++++++++++++++++++++++++++++++++
>   include/xen/interface/memory.h   |   27 +++++++++++
>   5 files changed, 137 insertions(+), 1 deletion(-)
>   create mode 100644 arch/x86/include/asm/xen/vnuma.h
>   create mode 100644 arch/x86/xen/vnuma.c
>
> diff --git a/arch/x86/include/asm/xen/vnuma.h b/arch/x86/include/asm/xen/vnuma.h
> new file mode 100644
> index 0000000..1bf4cae
> --- /dev/null
> +++ b/arch/x86/include/asm/xen/vnuma.h
> @@ -0,0 +1,12 @@
> +#ifndef _ASM_X86_VNUMA_H
> +#define _ASM_X86_VNUMA_H
> +
> +#ifdef CONFIG_XEN
> +int xen_vnuma_support(void);
> +int xen_numa_init(void);
> +#else
> +int xen_vnuma_support(void) { return 0; };
> +int xen_numa_init(void) {};

This should return -EINVAL. Or perhaps you can add

#ifdef CONFIG_XEN
#include "asm/xen/vnuma.h"
#endif

in numa.c  and not bother with ifdef here.

> +#endif
> +
> +#endif /* _ASM_X86_VNUMA_H */
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index 8bf93ba..a95fadf 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -19,6 +19,7 @@
>   #include <asm/amd_nb.h>
>   
>   #include "numa_internal.h"
> +#include "asm/xen/vnuma.h"
>   
>   int __initdata numa_off;
>   nodemask_t numa_nodes_parsed __initdata;
> @@ -621,6 +622,10 @@ static int __init dummy_numa_init(void)
>   void __init x86_numa_init(void)
>   {
>   	if (!numa_off) {
> +#ifdef CONFIG_XEN
> +		if (!numa_init(xen_numa_init))
> +			return;
> +#endif
>   #ifdef CONFIG_X86_NUMAQ
>   		if (!numa_init(numaq_numa_init))
>   			return;
> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
> index 96ab2c0..de9deab 100644
> --- a/arch/x86/xen/Makefile
> +++ b/arch/x86/xen/Makefile
> @@ -13,7 +13,7 @@ CFLAGS_mmu.o			:= $(nostackp)
>   obj-y		:= enlighten.o setup.o multicalls.o mmu.o irq.o \
>   			time.o xen-asm.o xen-asm_$(BITS).o \
>   			grant-table.o suspend.o platform-pci-unplug.o \
> -			p2m.o
> +			p2m.o vnuma.o
>   
>   obj-$(CONFIG_EVENT_TRACING) += trace.o
>   
> diff --git a/arch/x86/xen/vnuma.c b/arch/x86/xen/vnuma.c
> new file mode 100644
> index 0000000..3c6c73f
> --- /dev/null
> +++ b/arch/x86/xen/vnuma.c
> @@ -0,0 +1,92 @@
> +#include <linux/err.h>
> +#include <linux/memblock.h>
> +#include <xen/interface/xen.h>
> +#include <xen/interface/memory.h>
> +#include <asm/xen/interface.h>
> +#include <asm/xen/hypercall.h>
> +#include <asm/xen/vnuma.h>
> +#ifdef CONFIG_NUMA
> +/* Xen PV NUMA topology initialization */
> +static unsigned int xen_vnuma_init = 0;
> +int xen_vnuma_support()
> +{
> +	return xen_vnuma_init;
> +}
> +int __init xen_numa_init(void)
> +{
> +	int rc;
> +	unsigned int i, j, cpu, idx, pcpus;
> +	u64 phys, physd, physc;
> +	unsigned int *vdistance, *cpu_to_node;

cpu_to_node may not be a particularly good name as there is a macro with
the same name in topology.h

> +	unsigned long mem_size, dist_size, cpu_to_node_size;
> +	struct vnuma_memarea *varea;
> +
> +	struct vnuma_topology_info numa_topo = {
> +		.domid = DOMID_SELF
> +	};
> +	rc = -EINVAL;
> +	if (!xen_pv_domain())
> +		return rc;

No need to set rc here, just return -EINVAL;

And please add spaces between lines to separate logical blocks a little.

> +	pcpus = num_possible_cpus();
> +	mem_size =  pcpus * sizeof(struct vnuma_memarea);
> +	dist_size = pcpus * pcpus * sizeof(*numa_topo.vdistance);
> +	cpu_to_node_size = pcpus * sizeof(*numa_topo.cpu_to_node);
> +	phys = memblock_alloc(mem_size, PAGE_SIZE);
> +	physd = memblock_alloc(dist_size, PAGE_SIZE);
> +	physc = memblock_alloc(cpu_to_node_size, PAGE_SIZE);
> +	if (!phys || !physc || !physd)
> +		goto vnumaout;
> +	varea = __va(phys);
> +	vdistance  = __va(physd);
> +	cpu_to_node  = __va(physc);
> +	set_xen_guest_handle(numa_topo.vmemarea, varea);
> +	set_xen_guest_handle(numa_topo.vdistance, vdistance);
> +	set_xen_guest_handle(numa_topo.cpu_to_node, cpu_to_node);
> +	rc = HYPERVISOR_memory_op(XENMEM_get_vnuma_info, &numa_topo);
> +	if (rc < 0)
> +		goto vnumaout;
> +	rc = -EINVAL;
> +	if (numa_topo.nr_nodes == 0) {
> +		/* will pass to dummy_numa_init */
> +		goto vnumaout;
> +	}
> +	if (numa_topo.nr_nodes > num_possible_cpus()) {
> +		pr_debug("vNUMA: Node without cpu is not supported in this version.\n");
> +		goto vnumaout;
> +	}
> +	/*
> +	 * NUMA nodes memory ranges are in pfns, constructed and
> +	 * aligned based on e820 ram domain map
> +	*/
> +	for (i = 0; i < numa_topo.nr_nodes; i++) {
> +		if (numa_add_memblk(i, varea[i].start, varea[i].end))
> +			/* pass to numa_dummy_init */
> +			goto vnumaout;
> +		node_set(i, numa_nodes_parsed);
> +	}
> +	setup_nr_node_ids();
> +	/* Setting the cpu, apicid to node */
> +	for_each_cpu(cpu, cpu_possible_mask) {
> +		set_apicid_to_node(cpu, cpu_to_node[cpu]);
> +		numa_set_node(cpu, cpu_to_node[cpu]);
> +		__apicid_to_node[cpu] = cpu_to_node[cpu];

Isn't this what set_apicid_to_node() above will do?

-boris

> +		cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node[cpu]]);
> +	}
> +	for (i = 0; i < numa_topo.nr_nodes; i++) {
> +		for (j = 0; j < numa_topo.nr_nodes; j++) {
> +			idx = (j * numa_topo.nr_nodes) + i;
> +			numa_set_distance(i, j, *(vdistance + idx));
> +		}
> +	}
> +	rc = 0;
> +	xen_vnuma_init = 1;
> +vnumaout:
> +	if (phys)
> +		memblock_free(__pa(phys), mem_size);
> +	if (physd)
> +		memblock_free(__pa(physd), dist_size);
> +	if (physc)
> +		memblock_free(__pa(physc), cpu_to_node_size);
> +	return rc;
> +}
> +#endif
> diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h
> index 2ecfe4f..4237f51 100644
> --- a/include/xen/interface/memory.h
> +++ b/include/xen/interface/memory.h
> @@ -263,4 +263,31 @@ struct xen_remove_from_physmap {
>   };
>   DEFINE_GUEST_HANDLE_STRUCT(xen_remove_from_physmap);
>   
> +/* vNUMA structures */
> +struct vnuma_memarea {
> +	uint64_t start, end;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(vnuma_memarea);
> +
> +struct vnuma_topology_info {
> +	/* OUT */
> +	domid_t domid;
> +	/* IN */
> +	uint16_t nr_nodes; /* number of virtual numa nodes */
> +	uint32_t _pad;
> +	/* distance table */
> +	GUEST_HANDLE(uint) vdistance;
> +	/* cpu mapping to vnodes */
> +	GUEST_HANDLE(uint) cpu_to_node;
> +	/*
> +	* array of numa memory areas constructed by Xen
> +	* where start and end are pfn numbers of the area
> +	* Xen takes into account domains e820 map
> +	*/
> +	GUEST_HANDLE(vnuma_memarea) vmemarea;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(vnuma_topology_info);
> +
> +#define XENMEM_get_vnuma_info	25
> +
>   #endif /* __XEN_PUBLIC_MEMORY_H__ */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC v2 2/2] linux/vnuma: Enables NUMA for domu PV guest
  2013-09-17 14:17   ` David Vrabel
@ 2013-09-17 14:37     ` Dario Faggioli
  2013-09-18  6:32       ` Elena Ufimtseva
  0 siblings, 1 reply; 20+ messages in thread
From: Dario Faggioli @ 2013-09-17 14:37 UTC (permalink / raw)
  Cc: Elena Ufimtseva, lccycc123, george.dunlap, xen-devel, msw,
	boris.ostrovsky, anddavid.vrabel


[-- Attachment #1.1: Type: text/plain, Size: 1976 bytes --]

On mar, 2013-09-17 at 15:17 +0100, David Vrabel wrote:
> On 17/09/13 09:34, Elena Ufimtseva wrote:
> > After the NUMA topology was received from Xen,
> > enable NUMA during boot. Should have CONFIG_NUMA
> > enabled in kernel.
> > 
> > Changes since v1:
> > - added additional checks for PV guest and hypercall
> > support before enablinf NUMA;
> 
> As I said in response to the other patch, I don't think this does the
> right thing.
> 
> I think xen_vnuma_support() needs to try the vnuma hypercall and check
> it is successful.
> 
That can surely be done, I think. Elena?

> > --- a/arch/x86/xen/setup.c
> > +++ b/arch/x86/xen/setup.c
> > @@ -20,6 +20,7 @@
> >  #include <asm/numa.h>
> >  #include <asm/xen/hypervisor.h>
> >  #include <asm/xen/hypercall.h>
> > +#include <asm/xen/vnuma.h>
> >  
> >  #include <xen/xen.h>
> >  #include <xen/page.h>
> > @@ -583,6 +584,9 @@ void __init xen_arch_setup(void)
> >  	WARN_ON(xen_set_default_idle());
> >  	fiddle_vdso();
> >  #ifdef CONFIG_NUMA
> > -	numa_off = 1;
> > +	if (!xen_initial_domain() && xen_vnuma_support())
> 
> I don't think there's a need to special case the initial domain here is
> there?
> 
This is actually something that Konrad asked, since, apparently, there
are AMD machines that just blows up, as Dom0, if this is on.

Therefore, since, right now, the status of Elena's work is "just DomUs",
we figured this could be fine for now.

Of course, the final goal (whether for Elena or for someone else to pick
it up) is to be able to enable vNUMA for Dom0 too, at which point we
definitely will have to figure out a way to kill this special casing
safely...

What do you think?

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest
  2013-09-17 14:10   ` David Vrabel
@ 2013-09-18  6:16     ` Elena Ufimtseva
  2013-09-18  7:17       ` Dario Faggioli
  2013-09-18 12:23       ` David Vrabel
  0 siblings, 2 replies; 20+ messages in thread
From: Elena Ufimtseva @ 2013-09-18  6:16 UTC (permalink / raw)
  To: David Vrabel
  Cc: Li Yechen, George Dunlap, Dario Faggioli, xen-devel@lists.xen.org,
	Matt Wilson, boris.ostrovsky

On Tue, Sep 17, 2013 at 10:10 AM, David Vrabel <david.vrabel@citrix.com> wrote:
> On 17/09/13 09:34, Elena Ufimtseva wrote:
>> Requests NUMA topology info from Xen by issuing subop
>> hypercall. Initializes NUMA nodes, sets number of CPUs,
>> distance table and NUMA nodes memory ranges during boot.
>> vNUMA topology defined by user in VM config file. Memory
>> ranges are represented by structure vnuma_topology_info
>> where start and end of memory area are defined in guests
>> pfns numbers, constructed and aligned accordingly to
>> e820 domain map.
>> In case the received structure has errors, will fail to
>> dummy numa init.
>> Requires XEN with applied patches from vnuma patchset;
>>
>> Changes since v1:
>> - moved the test for xen_pv_domain() into xen_numa_init;
>> - replaced memory block search/allocation by single memblock_alloc;
>> - moved xen_numa_init to vnuma.c from enlighten.c;
>> - moved memblock structure to public interface memory.h;
>> - specified signedness of vnuma topology structure members;
>> - removed excessive debug output;
>>
>> TODO:
>> - consider common interface for Dom0, HVM and PV guests to provide
>> vNUMA topology;
>> - dynamic numa balancing at the time of this patch (kernel 3.11
>> 6e4664525b1db28f8c4e1130957f70a94c19213e with boot parameter
>> numa_balancing=true that is such by default) crashes numa-enabled
>> guest. Investigate further.
>
>> --- a/arch/x86/mm/numa.c
>> +++ b/arch/x86/mm/numa.c
>> @@ -19,6 +19,7 @@
>>  #include <asm/amd_nb.h>
>
> #include <asm/xen/vnuma.h> here...
>
>>  #include "numa_internal.h"
>> +#include "asm/xen/vnuma.h"
>
> ... not here.
>
>> --- /dev/null
>> +++ b/arch/x86/xen/vnuma.c
>> @@ -0,0 +1,92 @@
>> +#include <linux/err.h>
>> +#include <linux/memblock.h>
>> +#include <xen/interface/xen.h>
>> +#include <xen/interface/memory.h>
>> +#include <asm/xen/interface.h>
>> +#include <asm/xen/hypercall.h>
>> +#include <asm/xen/vnuma.h>
>> +#ifdef CONFIG_NUMA
>> +/* Xen PV NUMA topology initialization */
>> +static unsigned int xen_vnuma_init = 0;
>> +int xen_vnuma_support()
>> +{
>> +     return xen_vnuma_init;
>> +}
>
> I'm not sure how this and the usage in the next patch actually work.
> xen_vnuma_init is only set after the test of numa_off prior to calling
> xen_numa_init() which will set xen_vnuma_init.

David, its obscure and naming is not self explanatory.. Will fix it.
But the idea was to make sure
that NUMA can be safely turned on (for domu domain and if
xen_numa_init call was sucessfull).
>
>> +int __init xen_numa_init(void)
>> +{
>> +     int rc;
>> +     unsigned int i, j, cpu, idx, pcpus;
>> +     u64 phys, physd, physc;
>> +     unsigned int *vdistance, *cpu_to_node;
>> +     unsigned long mem_size, dist_size, cpu_to_node_size;
>> +     struct vnuma_memarea *varea;
>> +
>> +     struct vnuma_topology_info numa_topo = {
>> +             .domid = DOMID_SELF
>> +     };
>> +     rc = -EINVAL;
>> +     if (!xen_pv_domain())
>> +             return rc;
>> +     pcpus = num_possible_cpus();
>> +     mem_size =  pcpus * sizeof(struct vnuma_memarea);
>> +     dist_size = pcpus * pcpus * sizeof(*numa_topo.vdistance);
>> +     cpu_to_node_size = pcpus * sizeof(*numa_topo.cpu_to_node);
>> +     phys = memblock_alloc(mem_size, PAGE_SIZE);
>> +     physd = memblock_alloc(dist_size, PAGE_SIZE);
>> +     physc = memblock_alloc(cpu_to_node_size, PAGE_SIZE);
>> +     if (!phys || !physc || !physd)
>> +             goto vnumaout;
>> +     varea = __va(phys);
>> +     vdistance  = __va(physd);
>> +     cpu_to_node  = __va(physc);
>> +     set_xen_guest_handle(numa_topo.vmemarea, varea);
>> +     set_xen_guest_handle(numa_topo.vdistance, vdistance);
>> +     set_xen_guest_handle(numa_topo.cpu_to_node, cpu_to_node);
>> +     rc = HYPERVISOR_memory_op(XENMEM_get_vnuma_info, &numa_topo);
>> +     if (rc < 0)
>> +             goto vnumaout;
>> +     rc = -EINVAL;
>> +     if (numa_topo.nr_nodes == 0) {
>> +             /* will pass to dummy_numa_init */
>> +             goto vnumaout;
>> +     }
>> +     if (numa_topo.nr_nodes > num_possible_cpus()) {
>> +             pr_debug("vNUMA: Node without cpu is not supported in this version.\n");
>> +             goto vnumaout;
>> +     }
>> +     /*
>> +      * NUMA nodes memory ranges are in pfns, constructed and
>> +      * aligned based on e820 ram domain map
>> +     */
>> +     for (i = 0; i < numa_topo.nr_nodes; i++) {
>> +             if (numa_add_memblk(i, varea[i].start, varea[i].end))
>> +                     /* pass to numa_dummy_init */
>> +                     goto vnumaout;
>
> If there's a failure here, numa may be partially setup.  Do you need to
> undo any of the bits that have already setup?

Konrad asked me the same and I was under impression it is safe. But
that was based on assumptions
what I would rather avoid making.  I will add bits to unset numa in
case of failure.
>
>> +             node_set(i, numa_nodes_parsed);
>> +     }
>> +     setup_nr_node_ids();
>> +     /* Setting the cpu, apicid to node */
>> +     for_each_cpu(cpu, cpu_possible_mask) {
>> +             set_apicid_to_node(cpu, cpu_to_node[cpu]);
>> +             numa_set_node(cpu, cpu_to_node[cpu]);
>> +             __apicid_to_node[cpu] = cpu_to_node[cpu];
>> +             cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node[cpu]]);
>> +     }
>> +     for (i = 0; i < numa_topo.nr_nodes; i++) {
>> +             for (j = 0; j < numa_topo.nr_nodes; j++) {
>> +                     idx = (j * numa_topo.nr_nodes) + i;
>> +                     numa_set_distance(i, j, *(vdistance + idx));
>> +             }
>> +     }
>> +     rc = 0;
>> +     xen_vnuma_init = 1;
>> +vnumaout:
>
> Call this label "out".
>
>> +     if (phys)
>> +             memblock_free(__pa(phys), mem_size);
>> +     if (physd)
>> +             memblock_free(__pa(physd), dist_size);
>> +     if (physc)
>> +             memblock_free(__pa(physc), cpu_to_node_size);
>> +     return rc;
>
> If you return an error, x86_numa_init() will try to call setup for other
> NUMA system.  Consider calling numa_dummy_init() directly instead and
> then returning success.

David, isnt it what x86_numa_init() supposed to do? try every
*numa_init until one succeed?
Will adding excplicit call to dummy numa from xen_init_numa brake this logic?

>
>> +}
>
> Please use blank lines to space the logical bits of this function out
> some more.
>
>> +#endif
>> diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h
>> index 2ecfe4f..4237f51 100644
>> --- a/include/xen/interface/memory.h
>> +++ b/include/xen/interface/memory.h
>> @@ -263,4 +263,31 @@ struct xen_remove_from_physmap {
>>  };
>>  DEFINE_GUEST_HANDLE_STRUCT(xen_remove_from_physmap);
>>
>> +/* vNUMA structures */
>> +struct vnuma_memarea {
>> +     uint64_t start, end;
>> +};
>> +DEFINE_GUEST_HANDLE_STRUCT(vnuma_memarea);
>> +
>> +struct vnuma_topology_info {
>> +     /* OUT */
>> +     domid_t domid;
>> +     /* IN */
>> +     uint16_t nr_nodes; /* number of virtual numa nodes */
>> +     uint32_t _pad;
>
> Is this _pad intended to make this structure uniform across 32-bit and
> 64-bit guests?  The following GUEST_HANDLES() have different sizes on
> 32- and 64-bit guests.
>
>> +     /* distance table */
>> +     GUEST_HANDLE(uint) vdistance;
>> +     /* cpu mapping to vnodes */
>> +     GUEST_HANDLE(uint) cpu_to_node;
>> +     /*
>> +     * array of numa memory areas constructed by Xen
>> +     * where start and end are pfn numbers of the area
>> +     * Xen takes into account domains e820 map
>> +     */
>> +     GUEST_HANDLE(vnuma_memarea) vmemarea;
>> +};
>> +DEFINE_GUEST_HANDLE_STRUCT(vnuma_topology_info);
>> +
>> +#define XENMEM_get_vnuma_info        25
>> +
>>  #endif /* __XEN_PUBLIC_MEMORY_H__ */
>
> David

Thank you, will work on rest of code.

-- 
Elena

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest
  2013-09-17 14:21   ` Boris Ostrovsky
@ 2013-09-18  6:30     ` Elena Ufimtseva
  2013-09-18  7:33       ` Dario Faggioli
  0 siblings, 1 reply; 20+ messages in thread
From: Elena Ufimtseva @ 2013-09-18  6:30 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: George Dunlap, Dario Faggioli, Li Yechen, xen-devel@lists.xen.org,
	Matt Wilson, anddavid.vrabel

On Tue, Sep 17, 2013 at 10:21 AM, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> On 09/17/2013 04:34 AM, Elena Ufimtseva wrote:
>>
>> Requests NUMA topology info from Xen by issuing subop
>> hypercall. Initializes NUMA nodes, sets number of CPUs,
>> distance table and NUMA nodes memory ranges during boot.
>> vNUMA topology defined by user in VM config file. Memory
>> ranges are represented by structure vnuma_topology_info
>> where start and end of memory area are defined in guests
>> pfns numbers, constructed and aligned accordingly to
>> e820 domain map.
>> In case the received structure has errors, will fail to
>> dummy numa init.
>> Requires XEN with applied patches from vnuma patchset;
>>
>> Changes since v1:
>> - moved the test for xen_pv_domain() into xen_numa_init;
>> - replaced memory block search/allocation by single memblock_alloc;
>> - moved xen_numa_init to vnuma.c from enlighten.c;
>> - moved memblock structure to public interface memory.h;
>> - specified signedness of vnuma topology structure members;
>> - removed excessive debug output;
>>
>> TODO:
>> - consider common interface for Dom0, HVM and PV guests to provide
>> vNUMA topology;
>> - dynamic numa balancing at the time of this patch (kernel 3.11
>> 6e4664525b1db28f8c4e1130957f70a94c19213e with boot parameter
>> numa_balancing=true that is such by default) crashes numa-enabled
>> guest. Investigate further.
>>
>> Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
>> ---
>>   arch/x86/include/asm/xen/vnuma.h |   12 +++++
>>   arch/x86/mm/numa.c               |    5 +++
>>   arch/x86/xen/Makefile            |    2 +-
>>   arch/x86/xen/vnuma.c             |   92
>> ++++++++++++++++++++++++++++++++++++++
>>   include/xen/interface/memory.h   |   27 +++++++++++
>>   5 files changed, 137 insertions(+), 1 deletion(-)
>>   create mode 100644 arch/x86/include/asm/xen/vnuma.h
>>   create mode 100644 arch/x86/xen/vnuma.c
>>
>> diff --git a/arch/x86/include/asm/xen/vnuma.h
>> b/arch/x86/include/asm/xen/vnuma.h
>> new file mode 100644
>> index 0000000..1bf4cae
>> --- /dev/null
>> +++ b/arch/x86/include/asm/xen/vnuma.h
>> @@ -0,0 +1,12 @@
>> +#ifndef _ASM_X86_VNUMA_H
>> +#define _ASM_X86_VNUMA_H
>> +
>> +#ifdef CONFIG_XEN
>> +int xen_vnuma_support(void);
>> +int xen_numa_init(void);
>> +#else
>> +int xen_vnuma_support(void) { return 0; };
>> +int xen_numa_init(void) {};
>
>
> This should return -EINVAL. Or perhaps you can add
>
> #ifdef CONFIG_XEN
> #include "asm/xen/vnuma.h"
> #endif
>
> in numa.c  and not bother with ifdef here.


>
>
>> +#endif
>> +
>> +#endif /* _ASM_X86_VNUMA_H */
>> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
>> index 8bf93ba..a95fadf 100644
>> --- a/arch/x86/mm/numa.c
>> +++ b/arch/x86/mm/numa.c
>> @@ -19,6 +19,7 @@
>>   #include <asm/amd_nb.h>
>>     #include "numa_internal.h"
>> +#include "asm/xen/vnuma.h"
>>     int __initdata numa_off;
>>   nodemask_t numa_nodes_parsed __initdata;
>> @@ -621,6 +622,10 @@ static int __init dummy_numa_init(void)
>>   void __init x86_numa_init(void)
>>   {
>>         if (!numa_off) {
>> +#ifdef CONFIG_XEN
>> +               if (!numa_init(xen_numa_init))
>> +                       return;
>> +#endif
>>   #ifdef CONFIG_X86_NUMAQ
>>                 if (!numa_init(numaq_numa_init))
>>                         return;
>> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
>> index 96ab2c0..de9deab 100644
>> --- a/arch/x86/xen/Makefile
>> +++ b/arch/x86/xen/Makefile
>> @@ -13,7 +13,7 @@ CFLAGS_mmu.o                  := $(nostackp)
>>   obj-y         := enlighten.o setup.o multicalls.o mmu.o irq.o \
>>                         time.o xen-asm.o xen-asm_$(BITS).o \
>>                         grant-table.o suspend.o platform-pci-unplug.o \
>> -                       p2m.o
>> +                       p2m.o vnuma.o
>>     obj-$(CONFIG_EVENT_TRACING) += trace.o
>>   diff --git a/arch/x86/xen/vnuma.c b/arch/x86/xen/vnuma.c
>> new file mode 100644
>> index 0000000..3c6c73f
>> --- /dev/null
>> +++ b/arch/x86/xen/vnuma.c
>> @@ -0,0 +1,92 @@
>> +#include <linux/err.h>
>> +#include <linux/memblock.h>
>> +#include <xen/interface/xen.h>
>> +#include <xen/interface/memory.h>
>> +#include <asm/xen/interface.h>
>> +#include <asm/xen/hypercall.h>
>> +#include <asm/xen/vnuma.h>
>> +#ifdef CONFIG_NUMA
>> +/* Xen PV NUMA topology initialization */
>> +static unsigned int xen_vnuma_init = 0;
>> +int xen_vnuma_support()
>> +{
>> +       return xen_vnuma_init;
>> +}
>> +int __init xen_numa_init(void)
>> +{
>> +       int rc;
>> +       unsigned int i, j, cpu, idx, pcpus;
>> +       u64 phys, physd, physc;
>> +       unsigned int *vdistance, *cpu_to_node;
>
>
> cpu_to_node may not be a particularly good name as there is a macro with
> the same name in topology.h

Sure, will fix.
>
>
>> +       unsigned long mem_size, dist_size, cpu_to_node_size;
>> +       struct vnuma_memarea *varea;
>> +
>> +       struct vnuma_topology_info numa_topo = {
>> +               .domid = DOMID_SELF
>> +       };
>> +       rc = -EINVAL;
>> +       if (!xen_pv_domain())
>> +               return rc;
>
>
> No need to set rc here, just return -EINVAL;
>
> And please add spaces between lines to separate logical blocks a little.\
Ok.
>
>
>> +       pcpus = num_possible_cpus();
>> +       mem_size =  pcpus * sizeof(struct vnuma_memarea);
>> +       dist_size = pcpus * pcpus * sizeof(*numa_topo.vdistance);
>> +       cpu_to_node_size = pcpus * sizeof(*numa_topo.cpu_to_node);
>> +       phys = memblock_alloc(mem_size, PAGE_SIZE);
>> +       physd = memblock_alloc(dist_size, PAGE_SIZE);
>> +       physc = memblock_alloc(cpu_to_node_size, PAGE_SIZE);
>> +       if (!phys || !physc || !physd)
>> +               goto vnumaout;
>> +       varea = __va(phys);
>> +       vdistance  = __va(physd);
>> +       cpu_to_node  = __va(physc);
>> +       set_xen_guest_handle(numa_topo.vmemarea, varea);
>> +       set_xen_guest_handle(numa_topo.vdistance, vdistance);
>> +       set_xen_guest_handle(numa_topo.cpu_to_node, cpu_to_node);
>> +       rc = HYPERVISOR_memory_op(XENMEM_get_vnuma_info, &numa_topo);
>> +       if (rc < 0)
>> +               goto vnumaout;
>> +       rc = -EINVAL;
>> +       if (numa_topo.nr_nodes == 0) {
>> +               /* will pass to dummy_numa_init */
>> +               goto vnumaout;
>> +       }
>> +       if (numa_topo.nr_nodes > num_possible_cpus()) {
>> +               pr_debug("vNUMA: Node without cpu is not supported in this
>> version.\n");
>> +               goto vnumaout;
>> +       }
>> +       /*
>> +        * NUMA nodes memory ranges are in pfns, constructed and
>> +        * aligned based on e820 ram domain map
>> +       */
>> +       for (i = 0; i < numa_topo.nr_nodes; i++) {
>> +               if (numa_add_memblk(i, varea[i].start, varea[i].end))
>> +                       /* pass to numa_dummy_init */
>> +                       goto vnumaout;
>> +               node_set(i, numa_nodes_parsed);
>> +       }
>> +       setup_nr_node_ids();
>> +       /* Setting the cpu, apicid to node */
>> +       for_each_cpu(cpu, cpu_possible_mask) {
>> +               set_apicid_to_node(cpu, cpu_to_node[cpu]);
>> +               numa_set_node(cpu, cpu_to_node[cpu]);
>> +               __apicid_to_node[cpu] = cpu_to_node[cpu];
>
>
> Isn't this what set_apicid_to_node() above will do?

Yes, exactly the same ) will fix.
>
> -boris
>
>
>> +               cpumask_set_cpu(cpu,
>> node_to_cpumask_map[cpu_to_node[cpu]]);
>> +       }
>> +       for (i = 0; i < numa_topo.nr_nodes; i++) {
>> +               for (j = 0; j < numa_topo.nr_nodes; j++) {
>> +                       idx = (j * numa_topo.nr_nodes) + i;
>> +                       numa_set_distance(i, j, *(vdistance + idx));
>> +               }
>> +       }
>> +       rc = 0;
>> +       xen_vnuma_init = 1;
>> +vnumaout:
>> +       if (phys)
>> +               memblock_free(__pa(phys), mem_size);
>> +       if (physd)
>> +               memblock_free(__pa(physd), dist_size);
>> +       if (physc)
>> +               memblock_free(__pa(physc), cpu_to_node_size);
>> +       return rc;
>> +}
>> +#endif
>> diff --git a/include/xen/interface/memory.h
>> b/include/xen/interface/memory.h
>> index 2ecfe4f..4237f51 100644
>> --- a/include/xen/interface/memory.h
>> +++ b/include/xen/interface/memory.h
>> @@ -263,4 +263,31 @@ struct xen_remove_from_physmap {
>>   };
>>   DEFINE_GUEST_HANDLE_STRUCT(xen_remove_from_physmap);
>>   +/* vNUMA structures */
>> +struct vnuma_memarea {
>> +       uint64_t start, end;
>> +};
>> +DEFINE_GUEST_HANDLE_STRUCT(vnuma_memarea);
>> +
>> +struct vnuma_topology_info {
>> +       /* OUT */
>> +       domid_t domid;
>> +       /* IN */
>> +       uint16_t nr_nodes; /* number of virtual numa nodes */
>> +       uint32_t _pad;
>> +       /* distance table */
>> +       GUEST_HANDLE(uint) vdistance;
>> +       /* cpu mapping to vnodes */
>> +       GUEST_HANDLE(uint) cpu_to_node;
>> +       /*
>> +       * array of numa memory areas constructed by Xen
>> +       * where start and end are pfn numbers of the area
>> +       * Xen takes into account domains e820 map
>> +       */
>> +       GUEST_HANDLE(vnuma_memarea) vmemarea;
>> +};
>> +DEFINE_GUEST_HANDLE_STRUCT(vnuma_topology_info);
>> +
>> +#define XENMEM_get_vnuma_info  25
>> +
>>   #endif /* __XEN_PUBLIC_MEMORY_H__ */
>
>



-- 
Elena

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC v2 2/2] linux/vnuma: Enables NUMA for domu PV guest
  2013-09-17 14:37     ` Dario Faggioli
@ 2013-09-18  6:32       ` Elena Ufimtseva
  0 siblings, 0 replies; 20+ messages in thread
From: Elena Ufimtseva @ 2013-09-18  6:32 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Li Yechen, George Dunlap, xen-devel@lists.xen.org, David Vrabel,
	Matt Wilson, Boris Ostrovsky, anddavid.vrabel

On Tue, Sep 17, 2013 at 10:37 AM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On mar, 2013-09-17 at 15:17 +0100, David Vrabel wrote:
>> On 17/09/13 09:34, Elena Ufimtseva wrote:
>> > After the NUMA topology was received from Xen,
>> > enable NUMA during boot. Should have CONFIG_NUMA
>> > enabled in kernel.
>> >
>> > Changes since v1:
>> > - added additional checks for PV guest and hypercall
>> > support before enablinf NUMA;
>>
>> As I said in response to the other patch, I don't think this does the
>> right thing.
>>
>> I think xen_vnuma_support() needs to try the vnuma hypercall and check
>> it is successful.
>>
> That can surely be done, I think. Elena?

Yes, sure. I will change it.

>
>> > --- a/arch/x86/xen/setup.c
>> > +++ b/arch/x86/xen/setup.c
>> > @@ -20,6 +20,7 @@
>> >  #include <asm/numa.h>
>> >  #include <asm/xen/hypervisor.h>
>> >  #include <asm/xen/hypercall.h>
>> > +#include <asm/xen/vnuma.h>
>> >
>> >  #include <xen/xen.h>
>> >  #include <xen/page.h>
>> > @@ -583,6 +584,9 @@ void __init xen_arch_setup(void)
>> >     WARN_ON(xen_set_default_idle());
>> >     fiddle_vdso();
>> >  #ifdef CONFIG_NUMA
>> > -   numa_off = 1;
>> > +   if (!xen_initial_domain() && xen_vnuma_support())
>>
>> I don't think there's a need to special case the initial domain here is
>> there?
>>
> This is actually something that Konrad asked, since, apparently, there
> are AMD machines that just blows up, as Dom0, if this is on.
>
> Therefore, since, right now, the status of Elena's work is "just DomUs",
> we figured this could be fine for now.
>
> Of course, the final goal (whether for Elena or for someone else to pick
> it up) is to be able to enable vNUMA for Dom0 too, at which point we
> definitely will have to figure out a way to kill this special casing
> safely...
>
> What do you think?
>
> Regards,
> Dario
>
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>



-- 
Elena

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest
  2013-09-18  6:16     ` Elena Ufimtseva
@ 2013-09-18  7:17       ` Dario Faggioli
  2013-09-18  7:41         ` Elena Ufimtseva
  2013-09-18 12:23       ` David Vrabel
  1 sibling, 1 reply; 20+ messages in thread
From: Dario Faggioli @ 2013-09-18  7:17 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: Li Yechen, George Dunlap, xen-devel@lists.xen.org, David Vrabel,
	Matt Wilson, boris.ostrovsky


[-- Attachment #1.1: Type: text/plain, Size: 3323 bytes --]

On mer, 2013-09-18 at 02:16 -0400, Elena Ufimtseva wrote:
> On Tue, Sep 17, 2013 at 10:10 AM, David Vrabel <david.vrabel@citrix.com> wrote:
>
> >> --- /dev/null
> >> +++ b/arch/x86/xen/vnuma.c
> >> @@ -0,0 +1,92 @@
> >> +#include <linux/err.h>
> >> +#include <linux/memblock.h>
> >> +#include <xen/interface/xen.h>
> >> +#include <xen/interface/memory.h>
> >> +#include <asm/xen/interface.h>
> >> +#include <asm/xen/hypercall.h>
> >> +#include <asm/xen/vnuma.h>
> >> +#ifdef CONFIG_NUMA
> >> +/* Xen PV NUMA topology initialization */
> >> +static unsigned int xen_vnuma_init = 0;
> >> +int xen_vnuma_support()
> >> +{
> >> +     return xen_vnuma_init;
> >> +}
> >
> > I'm not sure how this and the usage in the next patch actually work.
> > xen_vnuma_init is only set after the test of numa_off prior to calling
> > xen_numa_init() which will set xen_vnuma_init.
> 
> David, its obscure and naming is not self explanatory.. Will fix it.
> But the idea was to make sure
> that NUMA can be safely turned on (for domu domain and if
> xen_numa_init call was sucessfull).
>
I think what David meant was to actually issue one/the hypercall,
perhaps with factitious and known to be wrong parameters, and check the
return value.

If that is EINVAL (or anything different than ENOSYS), then it means
that the hypercall is supported (i.e., we're running on a version of the
Xen hypervisor that has it implemented), and we can go ahead.

If that is ENOSYS, then it means there is no such hypercall, in which
case, the sooner we give up, the better. :-)

David, is this what you meant? If yes, Elena, does that make sense to
you? Do you think you can make the code look like this?

> >> +int __init xen_numa_init(void)
> >> +{
[snip]
> >> +     if (phys)
> >> +             memblock_free(__pa(phys), mem_size);
> >> +     if (physd)
> >> +             memblock_free(__pa(physd), dist_size);
> >> +     if (physc)
> >> +             memblock_free(__pa(physc), cpu_to_node_size);
> >> +     return rc;
> >
> > If you return an error, x86_numa_init() will try to call setup for other
> > NUMA system.  Consider calling numa_dummy_init() directly instead and
> > then returning success.
> 
> David, isnt it what x86_numa_init() supposed to do? try every
> *numa_init until one succeed?
> Will adding excplicit call to dummy numa from xen_init_numa brake this logic?
> 
I was about to replay exactly the same (as Elena) but, on a second
thought, I think David has a point here.

After all, what's the point in calling stuff line acpi_numa_init(),
etc., we already know they're going to fail! So, yes, I think his idea
worth a try...

Also, bear in mind what Konrad said about a call to one of the functions
in x86_numa_init() blowing up if run as Dom0 on some AMD chips. If we
return 'success' here, that will never happen: we either setup a proper
vNUMA topology or go straight to dummy_*, no more room for weird stuff
to happen... It actually sounds pretty cool, doesn't it? :-P

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest
  2013-09-18  6:30     ` Elena Ufimtseva
@ 2013-09-18  7:33       ` Dario Faggioli
  2013-09-18  7:39         ` Elena Ufimtseva
  0 siblings, 1 reply; 20+ messages in thread
From: Dario Faggioli @ 2013-09-18  7:33 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: Li Yechen, George Dunlap, xen-devel@lists.xen.org, Matt Wilson,
	Boris Ostrovsky, anddavid.vrabel


[-- Attachment #1.1: Type: text/plain, Size: 1577 bytes --]

On mer, 2013-09-18 at 02:30 -0400, Elena Ufimtseva wrote:
> On Tue, Sep 17, 2013 at 10:21 AM, Boris Ostrovsky
>
> >> +int __init xen_numa_init(void)
> >> +{
[snip]
> >> +       setup_nr_node_ids();
> >> +       /* Setting the cpu, apicid to node */
> >> +       for_each_cpu(cpu, cpu_possible_mask) {
> >> +               set_apicid_to_node(cpu, cpu_to_node[cpu]);
> >> +               numa_set_node(cpu, cpu_to_node[cpu]);
> >> +               __apicid_to_node[cpu] = cpu_to_node[cpu];
> >
> >
> > Isn't this what set_apicid_to_node() above will do?
> 
> Yes, exactly the same ) will fix.
>
I seem to recall that something strange was happening if we do not do
this in this way (i.e., calling the same stuff twice, or something like
that), isn't it so Elena?

If it is, please, explain that in a comment. However, it may well be
possible that my recollection is wrong.. In which case, sorry for the
noise.

Anyway, in general, and both for this series and for the Xen one, I
think the code could use a little bit more of commenting (along with the
breaking up in paragraphs, as many have already pointed out).

I know, I know, too much comments is also bad... Actually, finding the
right balance between too few and too much is as much important as it is
difficult! :-P

Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest
  2013-09-18  7:33       ` Dario Faggioli
@ 2013-09-18  7:39         ` Elena Ufimtseva
  0 siblings, 0 replies; 20+ messages in thread
From: Elena Ufimtseva @ 2013-09-18  7:39 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Li Yechen, George Dunlap, xen-devel@lists.xen.org, Matt Wilson,
	Boris Ostrovsky, anddavid.vrabel

On Wed, Sep 18, 2013 at 3:33 AM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On mer, 2013-09-18 at 02:30 -0400, Elena Ufimtseva wrote:
>> On Tue, Sep 17, 2013 at 10:21 AM, Boris Ostrovsky
>>
>> >> +int __init xen_numa_init(void)
>> >> +{
> [snip]
>> >> +       setup_nr_node_ids();
>> >> +       /* Setting the cpu, apicid to node */
>> >> +       for_each_cpu(cpu, cpu_possible_mask) {
>> >> +               set_apicid_to_node(cpu, cpu_to_node[cpu]);
>> >> +               numa_set_node(cpu, cpu_to_node[cpu]);
>> >> +               __apicid_to_node[cpu] = cpu_to_node[cpu];
>> >
>> >
>> > Isn't this what set_apicid_to_node() above will do?
>>
>> Yes, exactly the same ) will fix.
>>
> I seem to recall that something strange was happening if we do not do
> this in this way (i.e., calling the same stuff twice, or something like
> that), isn't it so Elena?
>
I think its in the past as proper ordering made sense and that second
call was removed :)

> If it is, please, explain that in a comment. However, it may well be
> possible that my recollection is wrong.. In which case, sorry for the
> noise.
>
> Anyway, in general, and both for this series and for the Xen one, I
> think the code could use a little bit more of commenting (along with the
> breaking up in paragraphs, as many have already pointed out).
>
> I know, I know, too much comments is also bad... Actually, finding the
> right balance between too few and too much is as much important as it is
> difficult! :-P

I see that ) I will learn.
>
> Dario
>
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>



-- 
Elena

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest
  2013-09-18  7:17       ` Dario Faggioli
@ 2013-09-18  7:41         ` Elena Ufimtseva
  0 siblings, 0 replies; 20+ messages in thread
From: Elena Ufimtseva @ 2013-09-18  7:41 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Li Yechen, George Dunlap, xen-devel@lists.xen.org, David Vrabel,
	Matt Wilson, Boris Ostrovsky

On Wed, Sep 18, 2013 at 3:17 AM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On mer, 2013-09-18 at 02:16 -0400, Elena Ufimtseva wrote:
>> On Tue, Sep 17, 2013 at 10:10 AM, David Vrabel <david.vrabel@citrix.com> wrote:
>>
>> >> --- /dev/null
>> >> +++ b/arch/x86/xen/vnuma.c
>> >> @@ -0,0 +1,92 @@
>> >> +#include <linux/err.h>
>> >> +#include <linux/memblock.h>
>> >> +#include <xen/interface/xen.h>
>> >> +#include <xen/interface/memory.h>
>> >> +#include <asm/xen/interface.h>
>> >> +#include <asm/xen/hypercall.h>
>> >> +#include <asm/xen/vnuma.h>
>> >> +#ifdef CONFIG_NUMA
>> >> +/* Xen PV NUMA topology initialization */
>> >> +static unsigned int xen_vnuma_init = 0;
>> >> +int xen_vnuma_support()
>> >> +{
>> >> +     return xen_vnuma_init;
>> >> +}
>> >
>> > I'm not sure how this and the usage in the next patch actually work.
>> > xen_vnuma_init is only set after the test of numa_off prior to calling
>> > xen_numa_init() which will set xen_vnuma_init.
>>
>> David, its obscure and naming is not self explanatory.. Will fix it.
>> But the idea was to make sure
>> that NUMA can be safely turned on (for domu domain and if
>> xen_numa_init call was sucessfull).
>>
> I think what David meant was to actually issue one/the hypercall,
> perhaps with factitious and known to be wrong parameters, and check the
> return value.
>
> If that is EINVAL (or anything different than ENOSYS), then it means
> that the hypercall is supported (i.e., we're running on a version of the
> Xen hypervisor that has it implemented), and we can go ahead.
>
> If that is ENOSYS, then it means there is no such hypercall, in which
> case, the sooner we give up, the better. :-)
>
> David, is this what you meant? If yes, Elena, does that make sense to
> you? Do you think you can make the code look like this?

Sure, in my todo list )

>
>> >> +int __init xen_numa_init(void)
>> >> +{
> [snip]
>> >> +     if (phys)
>> >> +             memblock_free(__pa(phys), mem_size);
>> >> +     if (physd)
>> >> +             memblock_free(__pa(physd), dist_size);
>> >> +     if (physc)
>> >> +             memblock_free(__pa(physc), cpu_to_node_size);
>> >> +     return rc;
>> >
>> > If you return an error, x86_numa_init() will try to call setup for other
>> > NUMA system.  Consider calling numa_dummy_init() directly instead and
>> > then returning success.
>>
>> David, isnt it what x86_numa_init() supposed to do? try every
>> *numa_init until one succeed?
>> Will adding excplicit call to dummy numa from xen_init_numa brake this logic?
>>
> I was about to replay exactly the same (as Elena) but, on a second
> thought, I think David has a point here.
>
> After all, what's the point in calling stuff line acpi_numa_init(),
> etc., we already know they're going to fail! So, yes, I think his idea
> worth a try...
>
> Also, bear in mind what Konrad said about a call to one of the functions
> in x86_numa_init() blowing up if run as Dom0 on some AMD chips. If we
> return 'success' here, that will never happen: we either setup a proper
> vNUMA topology or go straight to dummy_*, no more room for weird stuff
> to happen... It actually sounds pretty cool, doesn't it? :-P

I agree and it does it make sense. I did not know how far I can go
with modifying
original linux logic.

>
> Regards,
> Dario
>
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>



-- 
Elena

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest
  2013-09-18  6:16     ` Elena Ufimtseva
  2013-09-18  7:17       ` Dario Faggioli
@ 2013-09-18 12:23       ` David Vrabel
  1 sibling, 0 replies; 20+ messages in thread
From: David Vrabel @ 2013-09-18 12:23 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: Li Yechen, George Dunlap, Dario Faggioli, xen-devel@lists.xen.org,
	Matt Wilson, boris.ostrovsky

On 18/09/13 07:16, Elena Ufimtseva wrote:
> On Tue, Sep 17, 2013 at 10:10 AM, David Vrabel <david.vrabel@citrix.com> wrote:
>> On 17/09/13 09:34, Elena Ufimtseva wrote:
>>> Requests NUMA topology info from Xen by issuing subop
>>> hypercall. Initializes NUMA nodes, sets number of CPUs,
>>> distance table and NUMA nodes memory ranges during boot.
>>> vNUMA topology defined by user in VM config file. Memory
>>> ranges are represented by structure vnuma_topology_info
>>> where start and end of memory area are defined in guests
>>> pfns numbers, constructed and aligned accordingly to
>>> e820 domain map.
>>> In case the received structure has errors, will fail to
>>> dummy numa init.
>>> Requires XEN with applied patches from vnuma patchset;
>>>
>>> Changes since v1:
>>> - moved the test for xen_pv_domain() into xen_numa_init;
>>> - replaced memory block search/allocation by single memblock_alloc;
>>> - moved xen_numa_init to vnuma.c from enlighten.c;
>>> - moved memblock structure to public interface memory.h;
>>> - specified signedness of vnuma topology structure members;
>>> - removed excessive debug output;
>>>
>>> TODO:
>>> - consider common interface for Dom0, HVM and PV guests to provide
>>> vNUMA topology;
>>> - dynamic numa balancing at the time of this patch (kernel 3.11
>>> 6e4664525b1db28f8c4e1130957f70a94c19213e with boot parameter
>>> numa_balancing=true that is such by default) crashes numa-enabled
>>> guest. Investigate further.
>>
>>> --- a/arch/x86/mm/numa.c
>>> +++ b/arch/x86/mm/numa.c
>>> @@ -19,6 +19,7 @@
>>>  #include <asm/amd_nb.h>
>>
>> #include <asm/xen/vnuma.h> here...
>>
>>>  #include "numa_internal.h"
>>> +#include "asm/xen/vnuma.h"
>>
>> ... not here.
>>
>>> --- /dev/null
>>> +++ b/arch/x86/xen/vnuma.c
>>> @@ -0,0 +1,92 @@
>>> +#include <linux/err.h>
>>> +#include <linux/memblock.h>
>>> +#include <xen/interface/xen.h>
>>> +#include <xen/interface/memory.h>
>>> +#include <asm/xen/interface.h>
>>> +#include <asm/xen/hypercall.h>
>>> +#include <asm/xen/vnuma.h>
>>> +#ifdef CONFIG_NUMA
>>> +/* Xen PV NUMA topology initialization */
>>> +static unsigned int xen_vnuma_init = 0;
>>> +int xen_vnuma_support()
>>> +{
>>> +     return xen_vnuma_init;
>>> +}
>>
>> I'm not sure how this and the usage in the next patch actually work.
>> xen_vnuma_init is only set after the test of numa_off prior to calling
>> xen_numa_init() which will set xen_vnuma_init.
> 
> David, its obscure and naming is not self explanatory.. Will fix it.
> But the idea was to make sure
> that NUMA can be safely turned on (for domu domain and if
> xen_numa_init call was sucessfull).

I understand what it's for, I just don't see how it works.

The code path looks like (I think):

xen_vnuma_init = 0;

if (!xen_vnuma_init)
    numa_off = 1

if (!numa_off)
    xen_numa_init()

However, if you go with the idea of calling dummy init in
xen_num_init()'s error path you don't need this.


>>> +     for (i = 0; i < numa_topo.nr_nodes; i++) {
>>> +             if (numa_add_memblk(i, varea[i].start, varea[i].end))
>>> +                     /* pass to numa_dummy_init */
>>> +                     goto vnumaout;
>>
>> If there's a failure here, numa may be partially setup.  Do you need to
>> undo any of the bits that have already setup?
> 
> Konrad asked me the same and I was under impression it is safe. But
> that was based on assumptions
> what I would rather avoid making.  I will add bits to unset numa in
> case of failure.

I looked at the other uses of this and none of them undo on failure so I
think it is fine as is.

>>> +     if (phys)
>>> +             memblock_free(__pa(phys), mem_size);
>>> +     if (physd)
>>> +             memblock_free(__pa(physd), dist_size);
>>> +     if (physc)
>>> +             memblock_free(__pa(physc), cpu_to_node_size);
>>> +     return rc;
>>
>> If you return an error, x86_numa_init() will try to call setup for other
>> NUMA system.  Consider calling numa_dummy_init() directly instead and
>> then returning success.
> 
> David, isnt it what x86_numa_init() supposed to do? try every
> *numa_init until one succeed?
> Will adding excplicit call to dummy numa from xen_init_numa brake this logic?

Yes, but if we know we're a PV guest we do not want to try any other
one, we want to fallback to the dummy init immediately.

David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC v2 2/2] linux/vnuma: Enables NUMA for domu PV guest
  2013-09-17  8:34 ` [PATCH RFC v2 2/2] linux/vnuma: Enables NUMA for domu PV guest Elena Ufimtseva
  2013-09-17 14:17   ` David Vrabel
@ 2013-09-18 15:14   ` Dario Faggioli
  2013-09-27 17:03     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 20+ messages in thread
From: Dario Faggioli @ 2013-09-18 15:14 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: lccycc123, george.dunlap, xen-devel, david.vrabel, msw,
	boris.ostrovsky


[-- Attachment #1.1: Type: text/plain, Size: 1506 bytes --]

On mar, 2013-09-17 at 04:34 -0400, Elena Ufimtseva wrote:
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 8f3eea6..2f2d28e 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
>
>  #include <xen/xen.h>
>  #include <xen/page.h>
> @@ -583,6 +584,9 @@ void __init xen_arch_setup(void)
>  	WARN_ON(xen_set_default_idle());
>  	fiddle_vdso();
>  #ifdef CONFIG_NUMA
> -	numa_off = 1;
> +	if (!xen_initial_domain() && xen_vnuma_support())
> +		numa_off = 0;
> +	else
> +		numa_off = 1;
>  #endif
>  }
I can't be positive about this, but I think that, if we go for David's
suggestions of:
 - testing for the hypercall being actually supported,
 - calling dummy_numa_init() directly from within xen_numa_init (in 
   case something go wrong), instead of going through all the 
   alternatives from x86_numa_init()

then we can even get rid of this numa_off=0|1 all together (which is
also something David was already suggesting, AFAICR).

It would be nice to know what the issue was at the time they had to
introduce this, and whether we can get to do some testing on any of the
boxes where it was exploding.

I'll investigate more, in the meanwhile, does anyone had any clue?

Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest
  2013-09-17  8:34 ` [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest Elena Ufimtseva
  2013-09-17 14:10   ` David Vrabel
  2013-09-17 14:21   ` Boris Ostrovsky
@ 2013-09-18 16:04   ` Dario Faggioli
  2 siblings, 0 replies; 20+ messages in thread
From: Dario Faggioli @ 2013-09-18 16:04 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: lccycc123, george.dunlap, xen-devel, david.vrabel, msw,
	boris.ostrovsky


[-- Attachment #1.1: Type: text/plain, Size: 4393 bytes --]

On mar, 2013-09-17 at 04:34 -0400, Elena Ufimtseva wrote:
> Requests NUMA topology info from Xen by issuing subop
> hypercall. 
>
Not such a big deal, but I think you can call it just an hypercall, and
perhaps put it the name (or something like "XENMEM_get_vnuma_info" via
"HYPERVISOR_memory_op()" ). What I'm after is that 'subop hcall' may be
too much of a Xen-ism.

> Initializes NUMA nodes, sets number of CPUs,
> distance table and NUMA nodes memory ranges during boot.
> vNUMA topology defined by user in VM config file. 
>
Here (I mean in the changelog of this patch) I'd just say that the vNUMA
topology is hosted in Xen. You can add a quick note about from where it
come from but, (1), it pretty much always comes from the config file
right now, but that may change in the future (when we introduce Dom0
vNUMA); (2) I don't think Linux cares much from where Xen has gotten the
information. :-P

> Memory
> ranges are represented by structure vnuma_topology_info
> where start and end of memory area are defined in guests
> pfns numbers, constructed and aligned accordingly to
> e820 domain map.
>
Fine, and what does that mean for Linux? The fact that all is already
aligned and respectful of the e820, I mena? I actually have a question
about this, which I ask below, but whatever the answer is, I think a
summary of that should go up here.

Oh, and BTW, as it is in the code, some blank lines separating a bit
more the various paragraphs would be nice in the  changelogs too.

> Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
> ---
>
> diff --git a/arch/x86/include/asm/xen/vnuma.h b/arch/x86/include/asm/xen/vnuma.h
>
> diff --git a/arch/x86/xen/vnuma.c b/arch/x86/xen/vnuma.c
> new file mode 100644
> index 0000000..3c6c73f
> --- /dev/null
> +++ b/arch/x86/xen/vnuma.c
> @@ -0,0 +1,92 @@
> +#include <linux/err.h>
> +#include <linux/memblock.h>
> +#include <xen/interface/xen.h>
> +#include <xen/interface/memory.h>
> +#include <asm/xen/interface.h>
> +#include <asm/xen/hypercall.h>
> +#include <asm/xen/vnuma.h>
> +#ifdef CONFIG_NUMA
> +/* Xen PV NUMA topology initialization */
> +static unsigned int xen_vnuma_init = 0;
> +int xen_vnuma_support()
> +{
> +	return xen_vnuma_init;
> +}
> +int __init xen_numa_init(void)
> +{
[snip]
> +	/*
> +	 * NUMA nodes memory ranges are in pfns, constructed and
> +	 * aligned based on e820 ram domain map
> +	*/
> +	for (i = 0; i < numa_topo.nr_nodes; i++) {
> +		if (numa_add_memblk(i, varea[i].start, varea[i].end))
> +			/* pass to numa_dummy_init */
> +			goto vnumaout;
> +		node_set(i, numa_nodes_parsed);
> +	}
>
So, forgive me if my e820-fu is not black-belt level, here you go
through nr_nodes elements of the varea[] array, and setup nr_nodes
memblks, i.e., one for each node.

I take this like there only can be one memblk per node, which, foor
instance, means no holes, is that accurate?

If yes, the "constructed and aligned based on e820" means basically just
alignment, right?

More important, are we sure this is always ok, i.e., that we don't and
won't ever need to take holes into account?

> diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h
> index 2ecfe4f..4237f51 100644
> --- a/include/xen/interface/memory.h
> +++ b/include/xen/interface/memory.h
> @@ -263,4 +263,31 @@ struct xen_remove_from_physmap {
>
> +struct vnuma_topology_info {
> +	/* OUT */
> +	domid_t domid;
> +	/* IN */
> +	uint16_t nr_nodes; /* number of virtual numa nodes */
> +	uint32_t _pad;
> +	/* distance table */
> +	GUEST_HANDLE(uint) vdistance;
> +	/* cpu mapping to vnodes */
> +	GUEST_HANDLE(uint) cpu_to_node;
> +	/*
> +	* array of numa memory areas constructed by Xen
> +	* where start and end are pfn numbers of the area
> +	* Xen takes into account domains e820 map
                                   ^domain's ?
> +	*/
> +	GUEST_HANDLE(vnuma_memarea) vmemarea;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(vnuma_topology_info);
> +
> +#define XENMEM_get_vnuma_info	25
> +
>  #endif /* __XEN_PUBLIC_MEMORY_H__ */

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC v2 0/2] linux/vnuma: vNUMA PV guest support introduction
  2013-09-17  8:33 [PATCH RFC v2 0/2] linux/vnuma: vNUMA PV guest support introduction Elena Ufimtseva
  2013-09-17  8:34 ` [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest Elena Ufimtseva
  2013-09-17  8:34 ` [PATCH RFC v2 2/2] linux/vnuma: Enables NUMA for domu PV guest Elena Ufimtseva
@ 2013-09-18 16:16 ` Dario Faggioli
  2013-09-18 16:20   ` Elena Ufimtseva
  2 siblings, 1 reply; 20+ messages in thread
From: Dario Faggioli @ 2013-09-18 16:16 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: lccycc123, george.dunlap, xen-devel, david.vrabel, msw,
	boris.ostrovsky


[-- Attachment #1.1: Type: text/plain, Size: 2025 bytes --]

On mar, 2013-09-17 at 04:33 -0400, Elena Ufimtseva wrote:
> This patchset introduces vNUMA for PV domU guest.
> 
I'm picky and I know it, but I think domU and guest are synonyms...
Having just one of them should be enough.

Also, the subject line "linux/vnuma: vNUMA...". It is indeed a good
practice to indicate to what component and subsystem the patches applies
to. However, in this case, I don't think havin "linux/" there adds much,
since you'll be sending these mainly in LKML (although, yes, they'll go
on xen-devel too, but I honestly think we can manage).

Regarding the "/vnuma" part, well, vnuma isn't really a subsystem.
Actually, it does not even exist before this series, so again, I won't
put it there.

Actually, from a Linux developer/maintainer point of view, these patches
are about Xen, so something like "xen:..." or "x86/xen:" is probably
better

> Enables PV guest to discover NUMA topology provided by Xen
> and initializes NUMA topology on boot. XENMEM subop hypercall
> is used to retreive information from Xen. 
>
The fact that this happens during the regular x86 NUMA initialization
phase (i.e., in x86_numa_init()) is worth mentioning here.

> Xen provides number 
> of NUMA nodes, memory regions (start and end pfn) constructed 
> based on e820 domU map, distance table and cpu to node map. i
> xen_numa_init is called to setup NUMA related structures. 
> To enable this mechanism, kernel should be compiled as PV guest
> with CONFIG_NUMA=y and Xen should support vNUMA functionality 
> (patchset http://lists.xenproject.org/archives/html/xen-devel/2013-09/msg01337.html ).
>  
This is fine. Perhaps I'd add something about future plans, which are to
extend this to work for Dom0 too.

Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC v2 0/2] linux/vnuma: vNUMA PV guest support introduction
  2013-09-18 16:16 ` [PATCH RFC v2 0/2] linux/vnuma: vNUMA PV guest support introduction Dario Faggioli
@ 2013-09-18 16:20   ` Elena Ufimtseva
  0 siblings, 0 replies; 20+ messages in thread
From: Elena Ufimtseva @ 2013-09-18 16:20 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Li Yechen, George Dunlap, xen-devel@lists.xen.org, David Vrabel,
	Matt Wilson, Boris Ostrovsky

On Wed, Sep 18, 2013 at 12:16 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On mar, 2013-09-17 at 04:33 -0400, Elena Ufimtseva wrote:
>> This patchset introduces vNUMA for PV domU guest.
>>
> I'm picky and I know it, but I think domU and guest are synonyms...
> Having just one of them should be enough.

>
> Also, the subject line "linux/vnuma: vNUMA...". It is indeed a good
> practice to indicate to what component and subsystem the patches applies
> to. However, in this case, I don't think havin "linux/" there adds much,
> since you'll be sending these mainly in LKML (although, yes, they'll go
> on xen-devel too, but I honestly think we can manage).
>
> Regarding the "/vnuma" part, well, vnuma isn't really a subsystem.
> Actually, it does not even exist before this series, so again, I won't
> put it there.
>
> Actually, from a Linux developer/maintainer point of view, these patches
> are about Xen, so something like "xen:..." or "x86/xen:" is probably
> better
>
>> Enables PV guest to discover NUMA topology provided by Xen
>> and initializes NUMA topology on boot. XENMEM subop hypercall
>> is used to retreive information from Xen.
>>
> The fact that this happens during the regular x86 NUMA initialization
> phase (i.e., in x86_numa_init()) is worth mentioning here.
>
>> Xen provides number
>> of NUMA nodes, memory regions (start and end pfn) constructed
>> based on e820 domU map, distance table and cpu to node map. i
>> xen_numa_init is called to setup NUMA related structures.
>> To enable this mechanism, kernel should be compiled as PV guest
>> with CONFIG_NUMA=y and Xen should support vNUMA functionality
>> (patchset http://lists.xenproject.org/archives/html/xen-devel/2013-09/msg01337.html ).
>>
> This is fine. Perhaps I'd add something about future plans, which are to
> extend this to work for Dom0 too.
>

Sure, will change these.

> Dario
>
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>



-- 
Elena

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC v2 2/2] linux/vnuma: Enables NUMA for domu PV guest
  2013-09-18 15:14   ` Dario Faggioli
@ 2013-09-27 17:03     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 20+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-09-27 17:03 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Elena Ufimtseva, lccycc123, george.dunlap, xen-devel,
	david.vrabel, msw, boris.ostrovsky

On Wed, Sep 18, 2013 at 05:14:49PM +0200, Dario Faggioli wrote:
> On mar, 2013-09-17 at 04:34 -0400, Elena Ufimtseva wrote:
> > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> > index 8f3eea6..2f2d28e 100644
> > --- a/arch/x86/xen/setup.c
> > +++ b/arch/x86/xen/setup.c
> >
> >  #include <xen/xen.h>
> >  #include <xen/page.h>
> > @@ -583,6 +584,9 @@ void __init xen_arch_setup(void)
> >  	WARN_ON(xen_set_default_idle());
> >  	fiddle_vdso();
> >  #ifdef CONFIG_NUMA
> > -	numa_off = 1;
> > +	if (!xen_initial_domain() && xen_vnuma_support())
> > +		numa_off = 0;
> > +	else
> > +		numa_off = 1;
> >  #endif
> >  }
> I can't be positive about this, but I think that, if we go for David's
> suggestions of:
>  - testing for the hypercall being actually supported,
>  - calling dummy_numa_init() directly from within xen_numa_init (in 
>    case something go wrong), instead of going through all the 
>    alternatives from x86_numa_init()
> 
> then we can even get rid of this numa_off=0|1 all together (which is
> also something David was already suggesting, AFAICR).


You could do:

	if (xen_initial_domain()) && !xen_vnuma_support())
		numa_off = 1;

And just do that. The PV guest (the plain one) can then go on
and just do the dummy one.


> 
> It would be nice to know what the issue was at the time they had to
> introduce this, and whether we can get to do some testing on any of the
> boxes where it was exploding.

konrad@phenom:~/linux$ git annotate arch/x86/xen/setup.c|grep numa_off
8d54db795       (Konrad Rzeszutek Wilk  2012-08-17 10:22:37 -0400       601)    numa_off = 1;
konrad@phenom:~/linux$ git show 8d54db795
commit 8d54db795dfb1049d45dc34f0dddbc5347ec5642
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Fri Aug 17 10:22:37 2012 -0400

    xen/boot: Disable NUMA for PV guests.
....
    this scanning is still enabled (K8 and Fam10h, not Bulldozer class)
..
    Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar



> 
> I'll investigate more, in the meanwhile, does anyone had any clue?
> 
> Dario
> 
> -- 
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
> 



> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-09-27 17:03 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-17  8:33 [PATCH RFC v2 0/2] linux/vnuma: vNUMA PV guest support introduction Elena Ufimtseva
2013-09-17  8:34 ` [PATCH RFC v2 1/2] linux/vnuma: vNUMA for PV domu guest Elena Ufimtseva
2013-09-17 14:10   ` David Vrabel
2013-09-18  6:16     ` Elena Ufimtseva
2013-09-18  7:17       ` Dario Faggioli
2013-09-18  7:41         ` Elena Ufimtseva
2013-09-18 12:23       ` David Vrabel
2013-09-17 14:21   ` Boris Ostrovsky
2013-09-18  6:30     ` Elena Ufimtseva
2013-09-18  7:33       ` Dario Faggioli
2013-09-18  7:39         ` Elena Ufimtseva
2013-09-18 16:04   ` Dario Faggioli
2013-09-17  8:34 ` [PATCH RFC v2 2/2] linux/vnuma: Enables NUMA for domu PV guest Elena Ufimtseva
2013-09-17 14:17   ` David Vrabel
2013-09-17 14:37     ` Dario Faggioli
2013-09-18  6:32       ` Elena Ufimtseva
2013-09-18 15:14   ` Dario Faggioli
2013-09-27 17:03     ` Konrad Rzeszutek Wilk
2013-09-18 16:16 ` [PATCH RFC v2 0/2] linux/vnuma: vNUMA PV guest support introduction Dario Faggioli
2013-09-18 16:20   ` Elena Ufimtseva

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).