* [PATCH v2 0/2] xen: vnuma introduction for pv guest @ 2013-11-18 20:25 Elena Ufimtseva 2013-11-18 20:25 ` [PATCH v2 1/2] xen: vnuma support for PV guests running as domU Elena Ufimtseva ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Elena Ufimtseva @ 2013-11-18 20:25 UTC (permalink / raw) To: xen-devel Cc: konrad.wilk, boris.ostrovsky, david.vrabel, tglx, mingo, hpa, x86, akpm, tangchen, wency, ian.campbell, stefano.stabellini, mukesh.rathor, linux-kernel, Elena Ufimtseva Xen vnuma introduction. The patchset introduces vnuma to paravirtualized Xen guests runnning as domU. Xen subop hypercall is used to retreive vnuma topology information. Bases on the retreived topology from Xen, NUMA number of nodes, memory ranges, distance table and cpumask is being set. If initialization is incorrect, sets 'dummy' node and unsets nodemask. vNUMA topology is constructed by Xen toolstack. Xen patchset is available at https://git.gitorious.org/xenvnuma/xenvnuma.git:v3. Example of vnuma enabled pv domain dmesg: [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x00001000-0x0009ffff] [ 0.000000] node 0: [mem 0x00100000-0xffffffff] [ 0.000000] node 1: [mem 0x100000000-0x1ffffffff] [ 0.000000] node 2: [mem 0x200000000-0x2ffffffff] [ 0.000000] node 3: [mem 0x300000000-0x3ffffffff] [ 0.000000] On node 0 totalpages: 1048479 [ 0.000000] DMA zone: 56 pages used for memmap [ 0.000000] DMA zone: 21 pages reserved [ 0.000000] DMA zone: 3999 pages, LIFO batch:0 [ 0.000000] DMA32 zone: 14280 pages used for memmap [ 0.000000] DMA32 zone: 1044480 pages, LIFO batch:31 [ 0.000000] On node 1 totalpages: 1048576 [ 0.000000] Normal zone: 14336 pages used for memmap [ 0.000000] Normal zone: 1048576 pages, LIFO batch:31 [ 0.000000] On node 2 totalpages: 1048576 [ 0.000000] Normal zone: 14336 pages used for memmap [ 0.000000] Normal zone: 1048576 pages, LIFO batch:31 [ 0.000000] On node 3 totalpages: 1048576 [ 0.000000] Normal zone: 14336 pages used for memmap [ 0.000000] Normal zone: 1048576 pages, LIFO batch:31 [ 0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org [ 0.000000] smpboot: Allowing 4 CPUs, 0 hotplug CPUs [ 0.000000] No local APIC present [ 0.000000] APIC: disable apic facility [ 0.000000] APIC: switched to apic NOOP [ 0.000000] nr_irqs_gsi: 16 [ 0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000fffff] [ 0.000000] e820: cannot find a gap in the 32bit address range [ 0.000000] e820: PCI devices with unassigned 32bit BARs may break! [ 0.000000] e820: [mem 0x400100000-0x4004fffff] available for PCI devices [ 0.000000] Booting paravirtualized kernel on Xen [ 0.000000] Xen version: 4.4-unstable (preserve-AD) [ 0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:4 nr_node_ids:4 [ 0.000000] PERCPU: Embedded 28 pages/cpu @ffff8800ffc00000 s85376 r8192 d21120 u2097152 [ 0.000000] pcpu-alloc: s85376 r8192 d21120 u2097152 alloc=1*2097152 numactl output: root@heatpipe:~# numactl --hardware available: 4 nodes (0-3) node 0 cpus: 0 node 0 size: 4031 MB node 0 free: 3997 MB node 1 cpus: 1 node 1 size: 4039 MB node 1 free: 4022 MB node 2 cpus: 2 node 2 size: 4039 MB node 2 free: 4023 MB node 3 cpus: 3 node 3 size: 3975 MB node 3 free: 3963 MB node distances: node 0 1 2 3 0: 10 20 20 20 1: 20 10 20 20 2: 20 20 10 20 3: 20 20 20 10 Current patchset is available at https://git.gitorious.org/xenvnuma/linuxvnuma.git:v3 Xen patchset is available at: https://git.gitorious.org/xenvnuma/xenvnuma.git:v3 TODO * dom0, pvh and hvm vnuma support; * multiple memory ranges per node support; * benchmarking; Elena Ufimtseva (2): xen: vnuma support for PV guests running as domU xen: enable vnuma for PV guest arch/x86/include/asm/xen/vnuma.h | 12 ++++ arch/x86/mm/numa.c | 3 + arch/x86/xen/Makefile | 2 +- arch/x86/xen/setup.c | 6 +- arch/x86/xen/vnuma.c | 127 ++++++++++++++++++++++++++++++++++++++ include/xen/interface/memory.h | 44 +++++++++++++ 6 files changed, 192 insertions(+), 2 deletions(-) create mode 100644 arch/x86/include/asm/xen/vnuma.h create mode 100644 arch/x86/xen/vnuma.c -- 1.7.10.4 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 1/2] xen: vnuma support for PV guests running as domU 2013-11-18 20:25 [PATCH v2 0/2] xen: vnuma introduction for pv guest Elena Ufimtseva @ 2013-11-18 20:25 ` Elena Ufimtseva 2013-11-18 21:14 ` H. Peter Anvin 2013-11-19 7:15 ` [Xen-devel] " Dario Faggioli 2013-11-18 20:25 ` [PATCH v2 2/2] xen: enable vnuma for PV guest Elena Ufimtseva 2013-11-19 15:38 ` [PATCH v2 0/2] xen: vnuma introduction for pv guest Konrad Rzeszutek Wilk 2 siblings, 2 replies; 15+ messages in thread From: Elena Ufimtseva @ 2013-11-18 20:25 UTC (permalink / raw) To: xen-devel Cc: konrad.wilk, boris.ostrovsky, david.vrabel, tglx, mingo, hpa, x86, akpm, tangchen, wency, ian.campbell, stefano.stabellini, mukesh.rathor, linux-kernel, Elena Ufimtseva Issues Xen hypercall subop XENMEM_get_vnumainfo and sets the NUMA topology, otherwise sets dummy NUMA node and prevents numa_init from calling other numa initializators as they dont work with pv guests. Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com> --- arch/x86/include/asm/xen/vnuma.h | 12 ++++ arch/x86/mm/numa.c | 3 + arch/x86/xen/Makefile | 2 +- arch/x86/xen/vnuma.c | 127 ++++++++++++++++++++++++++++++++++++++ include/xen/interface/memory.h | 44 +++++++++++++ 5 files changed, 187 insertions(+), 1 deletion(-) create mode 100644 arch/x86/include/asm/xen/vnuma.h create mode 100644 arch/x86/xen/vnuma.c diff --git a/arch/x86/include/asm/xen/vnuma.h b/arch/x86/include/asm/xen/vnuma.h new file mode 100644 index 0000000..aee4e92 --- /dev/null +++ b/arch/x86/include/asm/xen/vnuma.h @@ -0,0 +1,12 @@ +#ifndef _ASM_X86_VNUMA_H +#define _ASM_X86_VNUMA_H + +#ifdef CONFIG_XEN +bool xen_vnuma_supported(void); +int xen_numa_init(void); +#else +static inline bool xen_vnuma_supported(void) { return false; }; +static inline int xen_numa_init(void) { return -1; }; +#endif + +#endif /* _ASM_X86_VNUMA_H */ diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 24aec58..99efa1b 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -17,6 +17,7 @@ #include <asm/dma.h> #include <asm/acpi.h> #include <asm/amd_nb.h> +#include "asm/xen/vnuma.h" #include "numa_internal.h" @@ -632,6 +633,8 @@ static int __init dummy_numa_init(void) void __init x86_numa_init(void) { if (!numa_off) { + if (!numa_init(xen_numa_init)) + return; #ifdef CONFIG_X86_NUMAQ if (!numa_init(numaq_numa_init)) return; diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index 96ab2c0..de9deab 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -13,7 +13,7 @@ CFLAGS_mmu.o := $(nostackp) obj-y := enlighten.o setup.o multicalls.o mmu.o irq.o \ time.o xen-asm.o xen-asm_$(BITS).o \ grant-table.o suspend.o platform-pci-unplug.o \ - p2m.o + p2m.o vnuma.o obj-$(CONFIG_EVENT_TRACING) += trace.o diff --git a/arch/x86/xen/vnuma.c b/arch/x86/xen/vnuma.c new file mode 100644 index 0000000..bce4523 --- /dev/null +++ b/arch/x86/xen/vnuma.c @@ -0,0 +1,127 @@ +#include <linux/err.h> +#include <linux/memblock.h> +#include <xen/interface/xen.h> +#include <xen/interface/memory.h> +#include <asm/xen/interface.h> +#include <asm/xen/hypercall.h> +#include <asm/xen/vnuma.h> + +#ifdef CONFIG_NUMA + +/* Checks if hypercall is supported */ +bool xen_vnuma_supported() +{ + return HYPERVISOR_memory_op(XENMEM_get_vnuma_info, NULL) == -ENOSYS ? false : true; +} + +/* + * Called from numa_init if numa_off = 0; + * we set numa_off = 0 if xen_vnuma_supported() + * returns true and its a domU; + */ +int __init xen_numa_init(void) +{ + int rc; + unsigned int i, j, nr_nodes, cpu, idx, pcpus; + u64 physm, physd, physc; + unsigned int *vdistance, *cpu_to_node; + unsigned long mem_size, dist_size, cpu_to_node_size; + struct vmemrange *vblock; + + struct vnuma_topology_info numa_topo = { + .domid = DOMID_SELF, + .__pad = 0 + }; + rc = -EINVAL; + physm = physd = physc = 0; + + /* For now only PV guests are supported */ + if (!xen_pv_domain()) + return rc; + + pcpus = num_possible_cpus(); + + mem_size = pcpus * sizeof(struct vmemrange); + dist_size = pcpus * pcpus * sizeof(*numa_topo.distance); + cpu_to_node_size = pcpus * sizeof(*numa_topo.cpu_to_node); + + physm = memblock_alloc(mem_size, PAGE_SIZE); + vblock = __va(physm); + + physd = memblock_alloc(dist_size, PAGE_SIZE); + vdistance = __va(physd); + + physc = memblock_alloc(cpu_to_node_size, PAGE_SIZE); + cpu_to_node = __va(physc); + + if (!physm || !physc || !physd) + goto out; + + set_xen_guest_handle(numa_topo.nr_nodes, &nr_nodes); + set_xen_guest_handle(numa_topo.memrange, vblock); + set_xen_guest_handle(numa_topo.distance, vdistance); + set_xen_guest_handle(numa_topo.cpu_to_node, cpu_to_node); + + rc = HYPERVISOR_memory_op(XENMEM_get_vnuma_info, &numa_topo); + + if (rc < 0) + goto out; + nr_nodes = *numa_topo.nr_nodes; + if (nr_nodes == 0) { + goto out; + } + if (nr_nodes > num_possible_cpus()) { + pr_debug("vNUMA: Node without cpu is not supported in this version.\n"); + goto out; + } + + /* + * NUMA nodes memory ranges are in pfns, constructed and + * aligned based on e820 ram domain map. + */ + for (i = 0; i < nr_nodes; i++) { + if (numa_add_memblk(i, vblock[i].start, vblock[i].end)) + goto out; + node_set(i, numa_nodes_parsed); + } + + setup_nr_node_ids(); + /* Setting the cpu, apicid to node */ + for_each_cpu(cpu, cpu_possible_mask) { + set_apicid_to_node(cpu, cpu_to_node[cpu]); + numa_set_node(cpu, cpu_to_node[cpu]); + cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node[cpu]]); + } + + for (i = 0; i < nr_nodes; i++) { + for (j = 0; j < *numa_topo.nr_nodes; j++) { + idx = (j * nr_nodes) + i; + numa_set_distance(i, j, *(vdistance + idx)); + } + } + + rc = 0; +out: + if (physm) + memblock_free(__pa(physm), mem_size); + if (physd) + memblock_free(__pa(physd), dist_size); + if (physc) + memblock_free(__pa(physc), cpu_to_node_size); + /* + * Set a dummy node and return success. This prevents calling any + * hardware-specific initializers which do not work in a PV guest. + * Taken from dummy_numa_init code. + */ + if (rc != 0) { + for (i = 0; i < MAX_LOCAL_APIC; i++) + set_apicid_to_node(i, NUMA_NO_NODE); + nodes_clear(numa_nodes_parsed); + nodes_clear(node_possible_map); + nodes_clear(node_online_map); + node_set(0, numa_nodes_parsed); + numa_add_memblk(0, 0, PFN_PHYS(max_pfn)); + } + return 0; +} +#endif diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h index 2ecfe4f..b61482c 100644 --- a/include/xen/interface/memory.h +++ b/include/xen/interface/memory.h @@ -263,4 +263,48 @@ struct xen_remove_from_physmap { }; DEFINE_GUEST_HANDLE_STRUCT(xen_remove_from_physmap); +/* vNUMA structures */ +struct vmemrange { + uint64_t start, end; + /* reserved */ + uint64_t _padm; +}; +DEFINE_GUEST_HANDLE_STRUCT(vmemrange); + +struct vnuma_topology_info { + /* OUT */ + domid_t domid; + uint32_t __pad; + /* IN */ + /* number of virtual numa nodes */ + union { + GUEST_HANDLE(uint) nr_nodes; + uint64_t _padn; + }; + /* distance table */ + union { + GUEST_HANDLE(uint) distance; + uint64_t _padd; + }; + /* cpu mapping to vnodes */ + union { + GUEST_HANDLE(uint) cpu_to_node; + uint64_t _padc; + }; + /* + * memory areas constructed by Xen, start and end + * of the ranges are specific to domain e820 map. + * Xen toolstack constructs these ranges for domain + * when building it. + */ + union { + GUEST_HANDLE(vmemrange) memrange; + uint64_t _padm; + }; +}; +typedef struct vnuma_topology_info vnuma_topology_info_t; +DEFINE_GUEST_HANDLE_STRUCT(vnuma_topology_info); + +#define XENMEM_get_vnuma_info 25 + #endif /* __XEN_PUBLIC_MEMORY_H__ */ -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/2] xen: vnuma support for PV guests running as domU 2013-11-18 20:25 ` [PATCH v2 1/2] xen: vnuma support for PV guests running as domU Elena Ufimtseva @ 2013-11-18 21:14 ` H. Peter Anvin 2013-11-18 21:28 ` Elena Ufimtseva 2013-11-18 22:13 ` Joe Perches 2013-11-19 7:15 ` [Xen-devel] " Dario Faggioli 1 sibling, 2 replies; 15+ messages in thread From: H. Peter Anvin @ 2013-11-18 21:14 UTC (permalink / raw) To: Elena Ufimtseva, xen-devel Cc: konrad.wilk, boris.ostrovsky, david.vrabel, tglx, mingo, x86, akpm, tangchen, wency, ian.campbell, stefano.stabellini, mukesh.rathor, linux-kernel, Joe Perches On 11/18/2013 12:25 PM, Elena Ufimtseva wrote: > +/* Checks if hypercall is supported */ > +bool xen_vnuma_supported() This isn't C++... http://lwn.net/Articles/487493/ There are several more things in this patchset that get flagged by checkpatch, but apparently this rather common (and rather serious) problem is still not being detected, even through a patch was submitted almost two years ago: https://lkml.org/lkml/2012/3/16/510 -hpa ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/2] xen: vnuma support for PV guests running as domU 2013-11-18 21:14 ` H. Peter Anvin @ 2013-11-18 21:28 ` Elena Ufimtseva 2013-11-18 22:13 ` Joe Perches 1 sibling, 0 replies; 15+ messages in thread From: Elena Ufimtseva @ 2013-11-18 21:28 UTC (permalink / raw) To: H. Peter Anvin Cc: xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel, tglx, mingo, x86, akpm, tangchen, wency, Ian Campbell, Stefano Stabellini, mukesh.rathor, linux-kernel, Joe Perches On Mon, Nov 18, 2013 at 4:14 PM, H. Peter Anvin <hpa@zytor.com> wrote: > On 11/18/2013 12:25 PM, Elena Ufimtseva wrote: >> +/* Checks if hypercall is supported */ >> +bool xen_vnuma_supported() > > This isn't C++... > > http://lwn.net/Articles/487493/ > > There are several more things in this patchset that get flagged by > checkpatch, but apparently this rather common (and rather serious) > problem is still not being detected, even through a patch was submitted > almost two years ago: > > https://lkml.org/lkml/2012/3/16/510 Thank you Peter, good to know. Will resend these. > > -hpa > > -- Elena ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/2] xen: vnuma support for PV guests running as domU 2013-11-18 21:14 ` H. Peter Anvin 2013-11-18 21:28 ` Elena Ufimtseva @ 2013-11-18 22:13 ` Joe Perches 1 sibling, 0 replies; 15+ messages in thread From: Joe Perches @ 2013-11-18 22:13 UTC (permalink / raw) To: H. Peter Anvin Cc: Elena Ufimtseva, xen-devel, konrad.wilk, boris.ostrovsky, david.vrabel, tglx, mingo, x86, akpm, tangchen, wency, ian.campbell, stefano.stabellini, mukesh.rathor, linux-kernel On Mon, 2013-11-18 at 13:14 -0800, H. Peter Anvin wrote: > On 11/18/2013 12:25 PM, Elena Ufimtseva wrote: > > +/* Checks if hypercall is supported */ > > +bool xen_vnuma_supported() > > This isn't C++... > http://lwn.net/Articles/487493/ > > There are several more things in this patchset that get flagged by > checkpatch, but apparently this rather common (and rather serious) > problem is still not being detected, even through a patch was submitted > almost two years ago: > > https://lkml.org/lkml/2012/3/16/510 I gave notes to the patch and no follow up was done. https://lkml.org/lkml/2012/3/16/514 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Xen-devel] [PATCH v2 1/2] xen: vnuma support for PV guests running as domU 2013-11-18 20:25 ` [PATCH v2 1/2] xen: vnuma support for PV guests running as domU Elena Ufimtseva 2013-11-18 21:14 ` H. Peter Anvin @ 2013-11-19 7:15 ` Dario Faggioli 1 sibling, 0 replies; 15+ messages in thread From: Dario Faggioli @ 2013-11-19 7:15 UTC (permalink / raw) To: Elena Ufimtseva Cc: xen-devel, akpm, wency, x86, linux-kernel, tangchen, mingo, david.vrabel, hpa, boris.ostrovsky, tglx, stefano.stabellini, ian.campbell [-- Attachment #1: Type: text/plain, Size: 1944 bytes --] On lun, 2013-11-18 at 15:25 -0500, Elena Ufimtseva wrote: > Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com> > diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile > index 96ab2c0..de9deab 100644 > --- a/arch/x86/xen/Makefile > +++ b/arch/x86/xen/Makefile > @@ -13,7 +13,7 @@ CFLAGS_mmu.o := $(nostackp) > obj-y := enlighten.o setup.o multicalls.o mmu.o irq.o \ > time.o xen-asm.o xen-asm_$(BITS).o \ > grant-table.o suspend.o platform-pci-unplug.o \ > - p2m.o > + p2m.o vnuma.o > > obj-$(CONFIG_EVENT_TRACING) += trace.o I think David said something about this during last round (going fetchin'-cuttin'-pastin' it): " obj-$(CONFIG_NUMA) += vnuma.o Then you can remove the #ifdef CONFIG_NUMA from xen/vnuma.c " > diff --git a/arch/x86/xen/vnuma.c b/arch/x86/xen/vnuma.c > +/* > + * Called from numa_init if numa_off = 0; ^ if numa_off = 1 ? > + * we set numa_off = 0 if xen_vnuma_supported() > + * returns true and its a domU; > + */ > +int __init xen_numa_init(void) > +{ > + if (nr_nodes > num_possible_cpus()) { > + pr_debug("vNUMA: Node without cpu is not supported in this version.\n"); > + goto out; > + } > + This is a super-minor thing, but I wouldn't say "in this version". It makes people think that there will be a later version where that will be supported, which we don't know. :-) > + /* > + * Set a dummy node and return success. This prevents calling any > + * hardware-specific initializers which do not work in a PV guest. > + * Taken from dummy_numa_init code. > + */ > This is a lot better... Thanks! :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 2/2] xen: enable vnuma for PV guest 2013-11-18 20:25 [PATCH v2 0/2] xen: vnuma introduction for pv guest Elena Ufimtseva 2013-11-18 20:25 ` [PATCH v2 1/2] xen: vnuma support for PV guests running as domU Elena Ufimtseva @ 2013-11-18 20:25 ` Elena Ufimtseva 2013-11-19 15:38 ` [PATCH v2 0/2] xen: vnuma introduction for pv guest Konrad Rzeszutek Wilk 2 siblings, 0 replies; 15+ messages in thread From: Elena Ufimtseva @ 2013-11-18 20:25 UTC (permalink / raw) To: xen-devel Cc: konrad.wilk, boris.ostrovsky, david.vrabel, tglx, mingo, hpa, x86, akpm, tangchen, wency, ian.campbell, stefano.stabellini, mukesh.rathor, linux-kernel, Elena Ufimtseva Enables numa if vnuma topology hypercall is supported and it is domU. Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com> --- arch/x86/xen/setup.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 68c054f..0aab799 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -20,6 +20,7 @@ #include <asm/numa.h> #include <asm/xen/hypervisor.h> #include <asm/xen/hypercall.h> +#include <asm/xen/vnuma.h> #include <xen/xen.h> #include <xen/page.h> @@ -598,6 +599,9 @@ void __init xen_arch_setup(void) WARN_ON(xen_set_default_idle()); fiddle_vdso(); #ifdef CONFIG_NUMA - numa_off = 1; + if (!xen_initial_domain() && xen_vnuma_supported()) + numa_off = 0; + else + numa_off = 1; #endif } -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/2] xen: vnuma introduction for pv guest 2013-11-18 20:25 [PATCH v2 0/2] xen: vnuma introduction for pv guest Elena Ufimtseva 2013-11-18 20:25 ` [PATCH v2 1/2] xen: vnuma support for PV guests running as domU Elena Ufimtseva 2013-11-18 20:25 ` [PATCH v2 2/2] xen: enable vnuma for PV guest Elena Ufimtseva @ 2013-11-19 15:38 ` Konrad Rzeszutek Wilk 2013-11-19 18:29 ` [Xen-devel] " Dario Faggioli 2 siblings, 1 reply; 15+ messages in thread From: Konrad Rzeszutek Wilk @ 2013-11-19 15:38 UTC (permalink / raw) To: Elena Ufimtseva Cc: xen-devel, boris.ostrovsky, david.vrabel, tglx, mingo, hpa, x86, akpm, tangchen, wency, ian.campbell, stefano.stabellini, mukesh.rathor, linux-kernel On Mon, Nov 18, 2013 at 03:25:48PM -0500, Elena Ufimtseva wrote: > Xen vnuma introduction. > > The patchset introduces vnuma to paravirtualized Xen guests > runnning as domU. > Xen subop hypercall is used to retreive vnuma topology information. > Bases on the retreived topology from Xen, NUMA number of nodes, > memory ranges, distance table and cpumask is being set. > If initialization is incorrect, sets 'dummy' node and unsets > nodemask. > vNUMA topology is constructed by Xen toolstack. Xen patchset is > available at https://git.gitorious.org/xenvnuma/xenvnuma.git:v3. Yeey! One question - I know you had questions about the PROT_GLOBAL | ~PAGE_PRESENT being set on PTEs that are going to be harvested for AutoNUMA balancing. And that the hypercall to set such PTE entry disallows the PROT_GLOBAL (it stripts it off)? That means that when the Linux page system kicks in (as it has ~PAGE_PRESENT) the Linux pagehandler won't see the PROT_GLOBAL (as it has been filtered out). Which means that the AutoNUMA code won't kick in. (see http://article.gmane.org/gmane.comp.emulators.xen.devel/174317) Was that problem ever answered? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Xen-devel] [PATCH v2 0/2] xen: vnuma introduction for pv guest 2013-11-19 15:38 ` [PATCH v2 0/2] xen: vnuma introduction for pv guest Konrad Rzeszutek Wilk @ 2013-11-19 18:29 ` Dario Faggioli 2013-12-04 0:35 ` Elena Ufimtseva 0 siblings, 1 reply; 15+ messages in thread From: Dario Faggioli @ 2013-11-19 18:29 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Elena Ufimtseva, akpm, wency, stefano.stabellini, x86, linux-kernel, tangchen, mingo, david.vrabel, hpa, xen-devel, boris.ostrovsky, tglx, ian.campbell [-- Attachment #1: Type: text/plain, Size: 2845 bytes --] On mar, 2013-11-19 at 10:38 -0500, Konrad Rzeszutek Wilk wrote: > On Mon, Nov 18, 2013 at 03:25:48PM -0500, Elena Ufimtseva wrote: > > The patchset introduces vnuma to paravirtualized Xen guests > > runnning as domU. > > Xen subop hypercall is used to retreive vnuma topology information. > > Bases on the retreived topology from Xen, NUMA number of nodes, > > memory ranges, distance table and cpumask is being set. > > If initialization is incorrect, sets 'dummy' node and unsets > > nodemask. > > vNUMA topology is constructed by Xen toolstack. Xen patchset is > > available at https://git.gitorious.org/xenvnuma/xenvnuma.git:v3. > > Yeey! > :-) > One question - I know you had questions about the > PROT_GLOBAL | ~PAGE_PRESENT being set on PTEs that are going to > be harvested for AutoNUMA balancing. > > And that the hypercall to set such PTE entry disallows the > PROT_GLOBAL (it stripts it off)? That means that when the > Linux page system kicks in (as it has ~PAGE_PRESENT) the > Linux pagehandler won't see the PROT_GLOBAL (as it has > been filtered out). Which means that the AutoNUMA code won't > kick in. > > (see http://article.gmane.org/gmane.comp.emulators.xen.devel/174317) > > Was that problem ever answered? > I think the issue is a twofold one. If I remember correctly (Elena, please, correct me if I'm wrong) Elena was seeing _crashes_ with both vNUMA and AutoNUMA enabled for the guest. That's what pushed her to investigate the issue, and led to what you're summing up above. However, it appears the crash was due to something completely unrelated to Xen and vNUMA, was affecting baremetal too, and got fixed, which means the crash is now gone. It remains to be seen (I think) whether that also means that AutoNUMA works. In fact, chatting about this in Edinburgh, Elena managed to convince me pretty badly that we should --as part of the vNUMA support-- do something about this, in order to make it work. At that time I thought we should be doing something to avoid the system to go ka-boom, but as I said, even now that it does not crash anymore, she was so persuasive that I now find it quite hard to believe that we really don't need to do anything. :-P I guess, as soon as we get the chance, we should see if this actually works, i.e., in addition to seeing the proper topology and not crashing, verify that AutoNUMA in the guest is actually doing is job. What do you think? Again, Elena, please chime in and explain how things are, if I got something wrong. :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Xen-devel] [PATCH v2 0/2] xen: vnuma introduction for pv guest 2013-11-19 18:29 ` [Xen-devel] " Dario Faggioli @ 2013-12-04 0:35 ` Elena Ufimtseva 2013-12-04 6:20 ` Elena Ufimtseva 0 siblings, 1 reply; 15+ messages in thread From: Elena Ufimtseva @ 2013-12-04 0:35 UTC (permalink / raw) To: Dario Faggioli Cc: Konrad Rzeszutek Wilk, akpm, wency, Stefano Stabellini, x86, linux-kernel, tangchen, mingo, David Vrabel, H. Peter Anvin, xen-devel, Boris Ostrovsky, tglx, Ian Campbell On Tue, Nov 19, 2013 at 1:29 PM, Dario Faggioli <dario.faggioli@citrix.com> wrote: > On mar, 2013-11-19 at 10:38 -0500, Konrad Rzeszutek Wilk wrote: >> On Mon, Nov 18, 2013 at 03:25:48PM -0500, Elena Ufimtseva wrote: >> > The patchset introduces vnuma to paravirtualized Xen guests >> > runnning as domU. >> > Xen subop hypercall is used to retreive vnuma topology information. >> > Bases on the retreived topology from Xen, NUMA number of nodes, >> > memory ranges, distance table and cpumask is being set. >> > If initialization is incorrect, sets 'dummy' node and unsets >> > nodemask. >> > vNUMA topology is constructed by Xen toolstack. Xen patchset is >> > available at https://git.gitorious.org/xenvnuma/xenvnuma.git:v3. >> >> Yeey! >> > :-) > >> One question - I know you had questions about the >> PROT_GLOBAL | ~PAGE_PRESENT being set on PTEs that are going to >> be harvested for AutoNUMA balancing. >> >> And that the hypercall to set such PTE entry disallows the >> PROT_GLOBAL (it stripts it off)? That means that when the >> Linux page system kicks in (as it has ~PAGE_PRESENT) the >> Linux pagehandler won't see the PROT_GLOBAL (as it has >> been filtered out). Which means that the AutoNUMA code won't >> kick in. >> >> (see http://article.gmane.org/gmane.comp.emulators.xen.devel/174317) >> >> Was that problem ever answered? >> > I think the issue is a twofold one. > > If I remember correctly (Elena, please, correct me if I'm wrong) Elena > was seeing _crashes_ with both vNUMA and AutoNUMA enabled for the guest. > That's what pushed her to investigate the issue, and led to what you're > summing up above. > > However, it appears the crash was due to something completely unrelated > to Xen and vNUMA, was affecting baremetal too, and got fixed, which > means the crash is now gone. > > It remains to be seen (I think) whether that also means that AutoNUMA > works. In fact, chatting about this in Edinburgh, Elena managed to > convince me pretty badly that we should --as part of the vNUMA support-- > do something about this, in order to make it work. At that time I > thought we should be doing something to avoid the system to go ka-boom, > but as I said, even now that it does not crash anymore, she was so > persuasive that I now find it quite hard to believe that we really don't > need to do anything. :-P Yes, you were right Dario :) See at the end. pv guests do not crash, but they have user space memory corruption. Ok, so I will try to understand what again had happened during this weekend. Meanwhile posting patches for Xen. > > I guess, as soon as we get the chance, we should see if this actually > works, i.e., in addition to seeing the proper topology and not crashing, > verify that AutoNUMA in the guest is actually doing is job. > > What do you think? Again, Elena, please chime in and explain how things > are, if I got something wrong. :-) > Oh guys, I feel really bad about not replying to these emails... Somehow these replies all got deleted.. wierd. Ok, about that automatic balancing. At the moment of the last patch automatic numa balancing seem to work, but after rebasing on the top of 3.12-rc2 I see similar issues. I will try to figure out what commits broke and will contact Ingo Molnar and Mel Gorman. Konrad, as of PROT_GLOBAL flag, I will double check once more to exclude errors from my side. Last time I was able to have numa_balancing working without any modifications from hypervisor side. But again, I want to double check this, some experiments might have appear being good :) > Regards, > Dario > > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) > -- Elena ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Xen-devel] [PATCH v2 0/2] xen: vnuma introduction for pv guest 2013-12-04 0:35 ` Elena Ufimtseva @ 2013-12-04 6:20 ` Elena Ufimtseva 2013-12-05 1:13 ` Dario Faggioli 0 siblings, 1 reply; 15+ messages in thread From: Elena Ufimtseva @ 2013-12-04 6:20 UTC (permalink / raw) To: Dario Faggioli Cc: Konrad Rzeszutek Wilk, akpm, wency, Stefano Stabellini, x86, linux-kernel, tangchen, mingo, David Vrabel, H. Peter Anvin, xen-devel, Boris Ostrovsky, tglx, Ian Campbell On Tue, Dec 3, 2013 at 7:35 PM, Elena Ufimtseva <ufimtseva@gmail.com> wrote: > On Tue, Nov 19, 2013 at 1:29 PM, Dario Faggioli > <dario.faggioli@citrix.com> wrote: >> On mar, 2013-11-19 at 10:38 -0500, Konrad Rzeszutek Wilk wrote: >>> On Mon, Nov 18, 2013 at 03:25:48PM -0500, Elena Ufimtseva wrote: >>> > The patchset introduces vnuma to paravirtualized Xen guests >>> > runnning as domU. >>> > Xen subop hypercall is used to retreive vnuma topology information. >>> > Bases on the retreived topology from Xen, NUMA number of nodes, >>> > memory ranges, distance table and cpumask is being set. >>> > If initialization is incorrect, sets 'dummy' node and unsets >>> > nodemask. >>> > vNUMA topology is constructed by Xen toolstack. Xen patchset is >>> > available at https://git.gitorious.org/xenvnuma/xenvnuma.git:v3. >>> >>> Yeey! >>> >> :-) >> >>> One question - I know you had questions about the >>> PROT_GLOBAL | ~PAGE_PRESENT being set on PTEs that are going to >>> be harvested for AutoNUMA balancing. >>> >>> And that the hypercall to set such PTE entry disallows the >>> PROT_GLOBAL (it stripts it off)? That means that when the >>> Linux page system kicks in (as it has ~PAGE_PRESENT) the >>> Linux pagehandler won't see the PROT_GLOBAL (as it has >>> been filtered out). Which means that the AutoNUMA code won't >>> kick in. >>> >>> (see http://article.gmane.org/gmane.comp.emulators.xen.devel/174317) >>> >>> Was that problem ever answered? >>> >> I think the issue is a twofold one. >> >> If I remember correctly (Elena, please, correct me if I'm wrong) Elena >> was seeing _crashes_ with both vNUMA and AutoNUMA enabled for the guest. >> That's what pushed her to investigate the issue, and led to what you're >> summing up above. >> >> However, it appears the crash was due to something completely unrelated >> to Xen and vNUMA, was affecting baremetal too, and got fixed, which >> means the crash is now gone. >> >> It remains to be seen (I think) whether that also means that AutoNUMA >> works. In fact, chatting about this in Edinburgh, Elena managed to >> convince me pretty badly that we should --as part of the vNUMA support-- >> do something about this, in order to make it work. At that time I >> thought we should be doing something to avoid the system to go ka-boom, >> but as I said, even now that it does not crash anymore, she was so >> persuasive that I now find it quite hard to believe that we really don't >> need to do anything. :-P > > Yes, you were right Dario :) See at the end. pv guests do not crash, > but they have user space memory corruption. > Ok, so I will try to understand what again had happened during this > weekend. > Meanwhile posting patches for Xen. > >> >> I guess, as soon as we get the chance, we should see if this actually >> works, i.e., in addition to seeing the proper topology and not crashing, >> verify that AutoNUMA in the guest is actually doing is job. >> >> What do you think? Again, Elena, please chime in and explain how things >> are, if I got something wrong. :-) >> > > Oh guys, I feel really bad about not replying to these emails... Somehow these > replies all got deleted.. wierd. > > Ok, about that automatic balancing. At the moment of the last patch > automatic numa balancing seem to > work, but after rebasing on the top of 3.12-rc2 I see similar issues. > I will try to figure out what commits broke and will contact Ingo > Molnar and Mel Gorman. > > Konrad, > as of PROT_GLOBAL flag, I will double check once more to exclude > errors from my side. > Last time I was able to have numa_balancing working without any > modifications from hypervisor side. > But again, I want to double check this, some experiments might have > appear being good :) > > >> Regards, >> Dario >> >> -- >> <<This happens because I choose it to happen!>> (Raistlin Majere) >> ----------------------------------------------------------------- >> Dario Faggioli, Ph.D, http://about.me/dario.faggioli >> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) >> > As of now I have patch v4 for reviewing. Not sure if it will be beneficial to post it for review or look closer at the current problem. The issue I am seeing right now is defferent from what was happening before. The corruption happens when on change_prot_numa way : [ 6638.021439] pfn 45e602, highest_memmap_pfn - 14ddd7 [ 6638.021444] BUG: Bad page map in process dd pte:800000045e602166 pmd:abf1a067 [ 6638.021449] addr:00007f4fda2d8000 vm_flags:00100073 anon_vma:ffff8800abf77b90 mapping: (null) index:7f4fda2d8 [ 6638.021457] CPU: 1 PID: 1033 Comm: dd Tainted: G B W 3.13.0-rc2+ #10 [ 6638.021462] 0000000000000000 00007f4fda2d8000 ffffffff813ca5b1 ffff88010d68deb8 [ 6638.021471] ffffffff810f2c88 00000000abf1a067 800000045e602166 0000000000000000 [ 6638.021482] 000000000045e602 ffff88010d68deb8 00007f4fda2d8000 800000045e602166 [ 6638.021492] Call Trace: [ 6638.021497] [<ffffffff813ca5b1>] ? dump_stack+0x41/0x51 [ 6638.021503] [<ffffffff810f2c88>] ? print_bad_pte+0x19d/0x1c9 [ 6638.021509] [<ffffffff810f3aef>] ? vm_normal_page+0x94/0xb3 [ 6638.021519] [<ffffffff810fb788>] ? change_protection+0x35c/0x5a8 [ 6638.021527] [<ffffffff81107965>] ? change_prot_numa+0x13/0x24 [ 6638.021533] [<ffffffff81071697>] ? task_numa_work+0x1fb/0x299 [ 6638.021539] [<ffffffff8105ef54>] ? task_work_run+0x7b/0x8f [ 6638.021545] [<ffffffff8100e658>] ? do_notify_resume+0x53/0x68 [ 6638.021552] [<ffffffff813d4432>] ? int_signal+0x12/0x17 [ 6638.021560] pfn 45d732, highest_memmap_pfn - 14ddd7 [ 6638.021565] BUG: Bad page map in process dd pte:800000045d732166 pmd:10d684067 [ 6638.021572] addr:00007fff7c143000 vm_flags:00100173 anon_vma:ffff8800abf77960 mapping: (null) index:7fffffffc [ 6638.021582] CPU: 1 PID: 1033 Comm: dd Tainted: G B W 3.13.0-rc2+ #10 [ 6638.021587] 0000000000000000 00007fff7c143000 ffffffff813ca5b1 ffff8800abf339b0 [ 6638.021595] ffffffff810f2c88 000000010d684067 800000045d732166 0000000000000000 [ 6638.021603] 000000000045d732 ffff8800abf339b0 00007fff7c143000 800000045d732166 The code has changed since last problem, I will work on this to see where it comes from. Elena > > > -- > Elena -- Elena ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Xen-devel] [PATCH v2 0/2] xen: vnuma introduction for pv guest 2013-12-04 6:20 ` Elena Ufimtseva @ 2013-12-05 1:13 ` Dario Faggioli 2013-12-20 7:39 ` Elena Ufimtseva 0 siblings, 1 reply; 15+ messages in thread From: Dario Faggioli @ 2013-12-05 1:13 UTC (permalink / raw) To: Elena Ufimtseva Cc: Konrad Rzeszutek Wilk, akpm, wency, Stefano Stabellini, x86, linux-kernel, tangchen, mingo, David Vrabel, H. Peter Anvin, xen-devel, Boris Ostrovsky, tglx, Ian Campbell [-- Attachment #1: Type: text/plain, Size: 3465 bytes --] On mer, 2013-12-04 at 01:20 -0500, Elena Ufimtseva wrote: > On Tue, Dec 3, 2013 at 7:35 PM, Elena Ufimtseva <ufimtseva@gmail.com> wrote: > > Oh guys, I feel really bad about not replying to these emails... Somehow these > > replies all got deleted.. wierd. > > No worries... You should see *my* backlog. :-P > > Ok, about that automatic balancing. At the moment of the last patch > > automatic numa balancing seem to > > work, but after rebasing on the top of 3.12-rc2 I see similar issues. > > I will try to figure out what commits broke and will contact Ingo > > Molnar and Mel Gorman. > > > As of now I have patch v4 for reviewing. Not sure if it will be > beneficial to post it for review > or look closer at the current problem. > You mean the Linux side? Perhaps stick somewhere a reference to the git tree/branch where it lives, but, before re-sending, let's wait for it to be as issue free as we can tell? > The issue I am seeing right now is defferent from what was happening before. > The corruption happens when on change_prot_numa way : > Ok, so, I think I need to step back a bit from the actual stack trace and look at the big picture. Please, Elena or anyone, correct me if I'm saying something wrong about how Linux's autonuma works and interacts with Xen. The way it worked when I last looked at it was sort of like this: - there was a kthread scanning all the pages, removing the PAGE_PRESENT bit from actually present pages, and adding a new special one (PAGE_NUMA or something like that); - when a page fault is triggered and the PAGE_NUMA flag is found, it figures out the page is actually there, so no swap or anything. However, it tracks from what node the access to that page came from, matches it with the node where the page actually is and collect some statistics about that; - at some point (and here I don't remember the exact logic, since it changed quite a few times) pages ranking badly in the stats above are moved from one node to another. Is this description still accurate? If yes, here's what I would (double) check, when running this in a PV guest on top of Xen: 1. the NUMA hinting page fault, are we getting and handling them correctly in the PV guest? Are the stats in the guest kernel being updated in a sensible way, i.e., do they make sense and properly relate to the virtual topology of the guest? At some point we thought it would have been necessary to intercept these faults and make sure the above is true with some help from the hypervisor... Is this the case? Why? Why not? 2. what happens when autonuma tries to move pages from one node to another? For us, that would mean in moving from one virtual node to another... Is there a need to do anything at all? I mean, is this, from our perspective, just copying the content of an MFN from node X into another MFN on node Y, or do we need to update some of our vnuma tracking data structures in Xen? If we have this figured out already, then I think we just chase bugs and repost the series. If not, well, I think we should. :-D Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Xen-devel] [PATCH v2 0/2] xen: vnuma introduction for pv guest 2013-12-05 1:13 ` Dario Faggioli @ 2013-12-20 7:39 ` Elena Ufimtseva 2013-12-20 7:48 ` Elena Ufimtseva 0 siblings, 1 reply; 15+ messages in thread From: Elena Ufimtseva @ 2013-12-20 7:39 UTC (permalink / raw) To: Dario Faggioli Cc: Konrad Rzeszutek Wilk, akpm, wency, Stefano Stabellini, x86, linux-kernel, tangchen, mingo, David Vrabel, H. Peter Anvin, xen-devel, Boris Ostrovsky, tglx, Ian Campbell On Wed, Dec 4, 2013 at 8:13 PM, Dario Faggioli <dario.faggioli@citrix.com> wrote: > On mer, 2013-12-04 at 01:20 -0500, Elena Ufimtseva wrote: >> On Tue, Dec 3, 2013 at 7:35 PM, Elena Ufimtseva <ufimtseva@gmail.com> wrote: >> > Oh guys, I feel really bad about not replying to these emails... Somehow these >> > replies all got deleted.. wierd. >> > > No worries... You should see *my* backlog. :-P > >> > Ok, about that automatic balancing. At the moment of the last patch >> > automatic numa balancing seem to >> > work, but after rebasing on the top of 3.12-rc2 I see similar issues. >> > I will try to figure out what commits broke and will contact Ingo >> > Molnar and Mel Gorman. >> > >> As of now I have patch v4 for reviewing. Not sure if it will be >> beneficial to post it for review >> or look closer at the current problem. >> > You mean the Linux side? Perhaps stick somewhere a reference to the git > tree/branch where it lives, but, before re-sending, let's wait for it to > be as issue free as we can tell? > >> The issue I am seeing right now is defferent from what was happening before. >> The corruption happens when on change_prot_numa way : >> > Ok, so, I think I need to step back a bit from the actual stack trace > and look at the big picture. Please, Elena or anyone, correct me if I'm > saying something wrong about how Linux's autonuma works and interacts > with Xen. > > The way it worked when I last looked at it was sort of like this: > - there was a kthread scanning all the pages, removing the PAGE_PRESENT > bit from actually present pages, and adding a new special one > (PAGE_NUMA or something like that); > - when a page fault is triggered and the PAGE_NUMA flag is found, it > figures out the page is actually there, so no swap or anything. > However, it tracks from what node the access to that page came from, > matches it with the node where the page actually is and collect some > statistics about that; > - at some point (and here I don't remember the exact logic, since it > changed quite a few times) pages ranking badly in the stats above are > moved from one node to another. Hello Dario, Konrad. - Yes, there is a kernel worker that runs on each node and scans some pages stats and marks them as _PROT_NONE and resets _PAGE_PRESENT. The page fault at this moment is triggered and control is being returned back to the linux pv kernel to process with handle_mm_fault and page numa fault handler if discovered if that was a numa pmd/pte with present flag cleared. About the stats, I will have to collect some sensible information. > > Is this description still accurate? If yes, here's what I would (double) > check, when running this in a PV guest on top of Xen: > > 1. the NUMA hinting page fault, are we getting and handling them > correctly in the PV guest? Are the stats in the guest kernel being > updated in a sensible way, i.e., do they make sense and properly > relate to the virtual topology of the guest? > At some point we thought it would have been necessary to intercept > these faults and make sure the above is true with some help from the > hypervisor... Is this the case? Why? Why not? The real healp needed from hypervisor is to allow _PAGE_NUMA flags on pte/pmd entries. I have done so in hypervisor by utilizing same _PAGE_NUMA bit and including into the allowed bit mask. As this bit is the same as PAGE_GLOBAL in hypervisor, that may induce some other errors. So far I have not seen any and I will double check on this. > > 2. what happens when autonuma tries to move pages from one node to > another? For us, that would mean in moving from one virtual node > to another... Is there a need to do anything at all? I mean, is > this, from our perspective, just copying the content of an MFN from > node X into another MFN on node Y, or do we need to update some of > our vnuma tracking data structures in Xen? > > If we have this figured out already, then I think we just chase bugs and > repost the series. If not, well, I think we should. :-D > here is the best part :) After a fresh look at the numa autobalancing, applying recent patches, talking some to riel who works now on mm numa autobalancing and running some tests including dd, ltp, kernel compiling and my own tests, autobalancing now is working correctly with vnuma. Now I can see sucessfully migrated pages in /proc/vmstat: numa_pte_updates 39 numa_huge_pte_updates 0 numa_hint_faults 36 numa_hint_faults_local 23 numa_pages_migrated 4 pgmigrate_success 4 pgmigrate_fail 0 I will be running some tests with transparent huge pages as the migration of such will be failing. Probably it is possible to find all the patches related to numa autobalancing and figure out possible reasons of why previously balancing was not working. Giving the amount of work kernel folks spent recently to fix issues with numa and the significance of the changes itself, I might need few more attempts to understand it. I am going to test THP and if that works will follow up with patches. Dario, what tools did you use to test NUMA on xen? Maybe there is something I can use as well? Here http://lwn.net/Articles/558593/ Mel Gorman uses specjbb and jvm, I though I can run something similar. > Thanks and Regards, > Dario > > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) > -- Elena ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Xen-devel] [PATCH v2 0/2] xen: vnuma introduction for pv guest 2013-12-20 7:39 ` Elena Ufimtseva @ 2013-12-20 7:48 ` Elena Ufimtseva 2013-12-20 15:38 ` Dario Faggioli 0 siblings, 1 reply; 15+ messages in thread From: Elena Ufimtseva @ 2013-12-20 7:48 UTC (permalink / raw) To: Dario Faggioli Cc: Konrad Rzeszutek Wilk, akpm, wency, Stefano Stabellini, x86, linux-kernel, tangchen, mingo, David Vrabel, H. Peter Anvin, xen-devel, Boris Ostrovsky, tglx, Ian Campbell On Fri, Dec 20, 2013 at 2:39 AM, Elena Ufimtseva <ufimtseva@gmail.com> wrote: > On Wed, Dec 4, 2013 at 8:13 PM, Dario Faggioli > <dario.faggioli@citrix.com> wrote: >> On mer, 2013-12-04 at 01:20 -0500, Elena Ufimtseva wrote: >>> On Tue, Dec 3, 2013 at 7:35 PM, Elena Ufimtseva <ufimtseva@gmail.com> wrote: >>> > Oh guys, I feel really bad about not replying to these emails... Somehow these >>> > replies all got deleted.. wierd. >>> > >> No worries... You should see *my* backlog. :-P >> >>> > Ok, about that automatic balancing. At the moment of the last patch >>> > automatic numa balancing seem to >>> > work, but after rebasing on the top of 3.12-rc2 I see similar issues. >>> > I will try to figure out what commits broke and will contact Ingo >>> > Molnar and Mel Gorman. >>> > >>> As of now I have patch v4 for reviewing. Not sure if it will be >>> beneficial to post it for review >>> or look closer at the current problem. >>> >> You mean the Linux side? Perhaps stick somewhere a reference to the git >> tree/branch where it lives, but, before re-sending, let's wait for it to >> be as issue free as we can tell? >> >>> The issue I am seeing right now is defferent from what was happening before. >>> The corruption happens when on change_prot_numa way : >>> >> Ok, so, I think I need to step back a bit from the actual stack trace >> and look at the big picture. Please, Elena or anyone, correct me if I'm >> saying something wrong about how Linux's autonuma works and interacts >> with Xen. >> >> The way it worked when I last looked at it was sort of like this: >> - there was a kthread scanning all the pages, removing the PAGE_PRESENT >> bit from actually present pages, and adding a new special one >> (PAGE_NUMA or something like that); >> - when a page fault is triggered and the PAGE_NUMA flag is found, it >> figures out the page is actually there, so no swap or anything. >> However, it tracks from what node the access to that page came from, >> matches it with the node where the page actually is and collect some >> statistics about that; >> - at some point (and here I don't remember the exact logic, since it >> changed quite a few times) pages ranking badly in the stats above are >> moved from one node to another. > > Hello Dario, Konrad. > > - Yes, there is a kernel worker that runs on each node and scans some > pages stats and > marks them as _PROT_NONE and resets _PAGE_PRESENT. > The page fault at this moment is triggered and control is being > returned back to the linux pv kernel > to process with handle_mm_fault and page numa fault handler if > discovered if that was a numa pmd/pte with > present flag cleared. > About the stats, I will have to collect some sensible information. > >> >> Is this description still accurate? If yes, here's what I would (double) >> check, when running this in a PV guest on top of Xen: >> >> 1. the NUMA hinting page fault, are we getting and handling them >> correctly in the PV guest? Are the stats in the guest kernel being >> updated in a sensible way, i.e., do they make sense and properly >> relate to the virtual topology of the guest? >> At some point we thought it would have been necessary to intercept >> these faults and make sure the above is true with some help from the >> hypervisor... Is this the case? Why? Why not? > > The real healp needed from hypervisor is to allow _PAGE_NUMA flags on > pte/pmd entries. > I have done so in hypervisor by utilizing same _PAGE_NUMA bit and > including into the allowed bit mask. > As this bit is the same as PAGE_GLOBAL in hypervisor, that may induce > some other errors. So far I have not seen any > and I will double check on this. > >> >> 2. what happens when autonuma tries to move pages from one node to >> another? For us, that would mean in moving from one virtual node >> to another... Is there a need to do anything at all? I mean, is >> this, from our perspective, just copying the content of an MFN from >> node X into another MFN on node Y, or do we need to update some of >> our vnuma tracking data structures in Xen? >> >> If we have this figured out already, then I think we just chase bugs and >> repost the series. If not, well, I think we should. :-D >> > here is the best part :) > > After a fresh look at the numa autobalancing, applying recent patches, > talking some to riel who works now on mm numa autobalancing and > running some tests including dd, ltp, kernel compiling and my own > tests, autobalancing now is working > correctly with vnuma. Now I can see sucessfully migrated pages in /proc/vmstat: > > numa_pte_updates 39 > numa_huge_pte_updates 0 > numa_hint_faults 36 > numa_hint_faults_local 23 > numa_pages_migrated 4 > pgmigrate_success 4 > pgmigrate_fail 0 > > I will be running some tests with transparent huge pages as the > migration of such will be failing. > Probably it is possible to find all the patches related to numa > autobalancing and figure out possible reasons > of why previously balancing was not working. Giving the amount of work > kernel folks spent recently to fix > issues with numa and the significance of the changes itself, I might > need few more attempts to understand it. > > I am going to test THP and if that works will follow up with patches. > > Dario, what tools did you use to test NUMA on xen? Maybe there is > something I can use as well? > Here http://lwn.net/Articles/558593/ Mel Gorman uses specjbb and jvm, > I though I can run something similar. And of course, more details will follow... :) > >> Thanks and Regards, >> Dario >> >> -- >> <<This happens because I choose it to happen!>> (Raistlin Majere) >> ----------------------------------------------------------------- >> Dario Faggioli, Ph.D, http://about.me/dario.faggioli >> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) >> > > > > -- > Elena -- Elena ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Xen-devel] [PATCH v2 0/2] xen: vnuma introduction for pv guest 2013-12-20 7:48 ` Elena Ufimtseva @ 2013-12-20 15:38 ` Dario Faggioli 0 siblings, 0 replies; 15+ messages in thread From: Dario Faggioli @ 2013-12-20 15:38 UTC (permalink / raw) To: Elena Ufimtseva Cc: Konrad Rzeszutek Wilk, akpm, wency, Stefano Stabellini, x86, linux-kernel, tangchen, mingo, David Vrabel, H. Peter Anvin, xen-devel, Boris Ostrovsky, tglx, Ian Campbell [-- Attachment #1: Type: text/plain, Size: 1471 bytes --] On ven, 2013-12-20 at 02:48 -0500, Elena Ufimtseva wrote: > On Fri, Dec 20, 2013 at 2:39 AM, Elena Ufimtseva <ufimtseva@gmail.com> wrote: > > > Dario, what tools did you use to test NUMA on xen? Maybe there is > > something I can use as well? > > Here http://lwn.net/Articles/558593/ Mel Gorman uses specjbb and jvm, > > I though I can run something similar. > > And of course, more details will follow... :) > Yeah, well, during early investigation, I also used basically just SpecJBB. See here: http://blog.xen.org/index.php/2012/04/26/numa-and-xen-part-1-introduction/ http://blog.xen.org/index.php/2012/05/16/numa-and-xen-part-ii-scheduling-and-placement/ For benchmarking our NUMA aware scheduling solution, in Xen, I used SpecJBB, sysbench and LMbench: http://blog.xen.org/index.php/2012/05/16/numa-and-xen-part-ii-scheduling-and-placement/ The trickiest part is have the benchmark(s) of choice running concurrently is a bunch of VMs on the same host. I used some hand crafted scripts at the time, very specific to my own environment. I'm working on abstracting that and putting together something more easy to be shared and used outside my attic. :-P Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-12-20 15:39 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-11-18 20:25 [PATCH v2 0/2] xen: vnuma introduction for pv guest Elena Ufimtseva 2013-11-18 20:25 ` [PATCH v2 1/2] xen: vnuma support for PV guests running as domU Elena Ufimtseva 2013-11-18 21:14 ` H. Peter Anvin 2013-11-18 21:28 ` Elena Ufimtseva 2013-11-18 22:13 ` Joe Perches 2013-11-19 7:15 ` [Xen-devel] " Dario Faggioli 2013-11-18 20:25 ` [PATCH v2 2/2] xen: enable vnuma for PV guest Elena Ufimtseva 2013-11-19 15:38 ` [PATCH v2 0/2] xen: vnuma introduction for pv guest Konrad Rzeszutek Wilk 2013-11-19 18:29 ` [Xen-devel] " Dario Faggioli 2013-12-04 0:35 ` Elena Ufimtseva 2013-12-04 6:20 ` Elena Ufimtseva 2013-12-05 1:13 ` Dario Faggioli 2013-12-20 7:39 ` Elena Ufimtseva 2013-12-20 7:48 ` Elena Ufimtseva 2013-12-20 15:38 ` Dario Faggioli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).