* numa=on broken
@ 2007-03-30 17:34 Ryan Harper
2007-03-30 17:51 ` Keir Fraser
2007-03-30 18:06 ` Ryan Harper
0 siblings, 2 replies; 19+ messages in thread
From: Ryan Harper @ 2007-03-30 17:34 UTC (permalink / raw)
To: xen-devel
Testing the latest xen-unstable bits, booting with numa=on fails:
(XEN) Command line: /boot/xen.gz com1=115200,8n1 console=com1 conswitch=x numa=on
(XEN) Physical RAM map:
(XEN) 0000000000000000 - 000000000009d000 (usable)
(XEN) 000000000009dc00 - 00000000000a0000 (reserved)
(XEN) 00000000000d0000 - 0000000000100000 (reserved)
(XEN) 0000000000100000 - 00000000dff60000 (usable)
(XEN) 00000000dff60000 - 00000000dff72000 (ACPI data)
(XEN) 00000000dff72000 - 00000000dff80000 (ACPI NVS)
(XEN) 00000000dff80000 - 00000000e0000000 (reserved)
(XEN) 00000000fec00000 - 00000000fec00400 (reserved)
(XEN) 00000000fee00000 - 00000000fee01000 (reserved)
(XEN) 00000000fff80000 - 0000000100000000 (reserved)
(XEN) 0000000100000000 - 0000000200000000 (usable)
(XEN) System RAM: 7678MB (7863284kB)
(XEN) ACPI: RSDP (v002 PTLTD ) @ 0x00000000000f7170
(XEN) ACPI: XSDT (v001 PTLTD XSDT 0x06040000 LTP 0x00000000) @ 0x00000000dff6d116
(XEN) ACPI: FADT (v003 AMD HAMMER 0x06040000 PTEC 0x000f4240) @ 0x00000000dff71d0f
(XEN) ACPI: SRAT (v001 AMD HAMMER 0x06040000 AMD 0x00000001) @ 0x00000000dff71e03
(XEN) ACPI: MADT (v001 PTLTD APIC 0x06040000 LTP 0x00000000) @ 0x00000000dff71ecb
(XEN) ACPI: ASF! (v016 MBI CETP 0x06040000 PTL 0x00000001) @ 0x00000000dff71f59
(XEN) ACPI: DSDT (v001 AMD-K8 AMDACPI 0x06040000 MSFT 0x0100000e) @ 0x0000000000000000
(XEN) SRAT: PXM 0 -> APIC 0 -> Node 0
(XEN) SRAT: PXM 1 -> APIC 1 -> Node 1
(XEN) SRAT: Node 0 PXM 0 0-a0000
(XEN) SRAT: Node 0 PXM 0 0-e0000000
(XEN) SRAT: Node 1 PXM 1 100000000-200000000
(XEN) NUMA: Using 32 for the hash shift.
(XEN) Cannot handle page request order 0!
(XEN) Cannot handle page request order 2!
(XEN) Reserving non-aligned node boundary @ mfn 1048576
(XEN) Cannot handle page request order 0!
(XEN) Cannot handle page request order 2!
(XEN) Cannot handle page request order 0!
(XEN) Cannot handle page request order 2!
(XEN) Cannot handle page request order 0!
(XEN) Cannot handle page request order 2!
(XEN) Unknown interrupt
Looking a little deeper, it looks like in end_boot_allocator() we are
attempting to dynamically allocate memory for addition arrays in avail[]
and for the heap. This uses xmalloc() which relies on
alloc_xenheap_pages() to work. alloc_xenheap_pages() can't function
until end_boot_allocator() completes. Prior to end_boot_alloctor(), one
needs to use alloc_boot_pages().
Any thoughts on how to proceed here?
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-03-30 17:34 numa=on broken Ryan Harper
@ 2007-03-30 17:51 ` Keir Fraser
2007-03-30 18:08 ` Ryan Harper
2007-03-30 18:06 ` Ryan Harper
1 sibling, 1 reply; 19+ messages in thread
From: Keir Fraser @ 2007-03-30 17:51 UTC (permalink / raw)
To: Ryan Harper, xen-devel
On 30/3/07 18:34, "Ryan Harper" <ryanh@us.ibm.com> wrote:
> Looking a little deeper, it looks like in end_boot_allocator() we are
> attempting to dynamically allocate memory for addition arrays in avail[]
> and for the heap. This uses xmalloc() which relies on
> alloc_xenheap_pages() to work. alloc_xenheap_pages() can't function
> until end_boot_allocator() completes. Prior to end_boot_alloctor(), one
> needs to use alloc_boot_pages().
>
> Any thoughts on how to proceed here?
Since the buddy allocators are initialised with memory from address zero
upwards, I would expect everything to work fine if numa node zero always
owns the first block of physical memory. This is because we statically
allocate space for heap metadata for numa node zero (since even non-numa
machines have a 'numa node zero'), so by the time you need to allocate
memory for other numa nodes you're xenheap will be populated with memory.
So -- can we ensure that the node that owns low memory is node zero?
-- Keir
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-03-30 17:51 ` Keir Fraser
@ 2007-03-30 18:08 ` Ryan Harper
2007-03-30 18:17 ` Keir Fraser
0 siblings, 1 reply; 19+ messages in thread
From: Ryan Harper @ 2007-03-30 18:08 UTC (permalink / raw)
To: Keir Fraser; +Cc: Ryan Harper, xen-devel
* Keir Fraser <Keir.Fraser@cl.cam.ac.uk> [2007-03-30 13:06]:
> On 30/3/07 18:34, "Ryan Harper" <ryanh@us.ibm.com> wrote:
>
> > Looking a little deeper, it looks like in end_boot_allocator() we are
> > attempting to dynamically allocate memory for addition arrays in avail[]
> > and for the heap. This uses xmalloc() which relies on
> > alloc_xenheap_pages() to work. alloc_xenheap_pages() can't function
> > until end_boot_allocator() completes. Prior to end_boot_alloctor(), one
> > needs to use alloc_boot_pages().
> >
> > Any thoughts on how to proceed here?
>
> Since the buddy allocators are initialised with memory from address zero
> upwards, I would expect everything to work fine if numa node zero always
> owns the first block of physical memory. This is because we statically
> allocate space for heap metadata for numa node zero (since even non-numa
> machines have a 'numa node zero'), so by the time you need to allocate
> memory for other numa nodes you're xenheap will be populated with memory.
>
> So -- can we ensure that the node that owns low memory is node zero?
AFAIK, that is the case, look at the SRAT table:
(XEN) SRAT: Node 0 PXM 0 0-a0000
(XEN) SRAT: Node 0 PXM 0 0-e0000000
(XEN) SRAT: Node 1 PXM 1 100000000-200000000
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-03-30 18:08 ` Ryan Harper
@ 2007-03-30 18:17 ` Keir Fraser
2007-03-30 18:20 ` Ryan Harper
0 siblings, 1 reply; 19+ messages in thread
From: Keir Fraser @ 2007-03-30 18:17 UTC (permalink / raw)
To: Ryan Harper; +Cc: xen-devel
On 30/3/07 19:08, "Ryan Harper" <ryanh@us.ibm.com> wrote:
>> So -- can we ensure that the node that owns low memory is node zero?
>
> AFAIK, that is the case, look at the SRAT table:
>
> (XEN) SRAT: Node 0 PXM 0 0-a0000
> (XEN) SRAT: Node 0 PXM 0 0-e0000000
> (XEN) SRAT: Node 1 PXM 1 100000000-200000000
What if you move end_boot_allocator() to just before 'early_boot = 0' in
arch/x86/setup.c?
-- Keir
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-03-30 18:17 ` Keir Fraser
@ 2007-03-30 18:20 ` Ryan Harper
2007-03-30 18:46 ` Ryan Harper
0 siblings, 1 reply; 19+ messages in thread
From: Ryan Harper @ 2007-03-30 18:20 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
* Keir Fraser <Keir.Fraser@cl.cam.ac.uk> [2007-03-30 13:17]:
>
>
>
> On 30/3/07 19:08, "Ryan Harper" <ryanh@us.ibm.com> wrote:
>
> >> So -- can we ensure that the node that owns low memory is node zero?
> >
> > AFAIK, that is the case, look at the SRAT table:
> >
> > (XEN) SRAT: Node 0 PXM 0 0-a0000
> > (XEN) SRAT: Node 0 PXM 0 0-e0000000
> > (XEN) SRAT: Node 1 PXM 1 100000000-200000000
>
> What if you move end_boot_allocator() to just before 'early_boot = 0' in
> arch/x86/setup.c?
Giving that a shot.
Debugging shows confirms that we are looking at node0 memory which
should have already been initialized in the allocator:
(XEN) NUMA: Using 32 for the hash shift.
(XEN) attempting to xmalloc_array() avail for NODE1
(XEN) alloc_xenheap_pages: Requesting MEMZONE_XEN, cpu0, order->0
(XEN) alloc_heap_pages: attempting to allocate in zone(0,0), cpu0, node0, order->0
(XEN) Cannot handle page request order 0!
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-03-30 18:20 ` Ryan Harper
@ 2007-03-30 18:46 ` Ryan Harper
2007-03-30 18:48 ` Ryan Harper
2007-03-30 18:51 ` Keir Fraser
0 siblings, 2 replies; 19+ messages in thread
From: Ryan Harper @ 2007-03-30 18:46 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
* Ryan Harper <ryanh@us.ibm.com> [2007-03-30 13:36]:
> * Keir Fraser <Keir.Fraser@cl.cam.ac.uk> [2007-03-30 13:17]:
> >
> >
> >
> > On 30/3/07 19:08, "Ryan Harper" <ryanh@us.ibm.com> wrote:
> >
> > >> So -- can we ensure that the node that owns low memory is node zero?
> > >
> > > AFAIK, that is the case, look at the SRAT table:
> > >
> > > (XEN) SRAT: Node 0 PXM 0 0-a0000
> > > (XEN) SRAT: Node 0 PXM 0 0-e0000000
> > > (XEN) SRAT: Node 1 PXM 1 100000000-200000000
> >
> > What if you move end_boot_allocator() to just before 'early_boot = 0' in
> > arch/x86/setup.c?
>
> Giving that a shot.
Didn't work, just pushes the bubble back to allocating the array for
NODE0:
(XEN) SRAT: Node 0 PXM 0 0-a0000
(XEN) SRAT: Node 0 PXM 0 0-e0000000
(XEN) SRAT: Node 1 PXM 1 100000000-200000000
(XEN) NUMA: Using 32 for the hash shift.
(XEN) done with numa_initmem_init()
(XEN) attempting to xmalloc_array() avail for NODE0
(XEN) Cannot handle page request order 0!
(XEN) attempting to xmalloc() for heap for NODE0
(XEN) Cannot handle page request order 2!
That is, we are still dying in init_heap_pages(), but this time for
allocating avail[0] rather than avail[1].
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-03-30 18:46 ` Ryan Harper
@ 2007-03-30 18:48 ` Ryan Harper
2007-03-30 18:51 ` Keir Fraser
1 sibling, 0 replies; 19+ messages in thread
From: Ryan Harper @ 2007-03-30 18:48 UTC (permalink / raw)
To: Ryan Harper; +Cc: xen-devel
* Ryan Harper <ryanh@us.ibm.com> [2007-03-30 13:46]:
> * Ryan Harper <ryanh@us.ibm.com> [2007-03-30 13:36]:
> > * Keir Fraser <Keir.Fraser@cl.cam.ac.uk> [2007-03-30 13:17]:
> > >
> > >
> > >
> > > On 30/3/07 19:08, "Ryan Harper" <ryanh@us.ibm.com> wrote:
> > >
> > > >> So -- can we ensure that the node that owns low memory is node zero?
> > > >
> > > > AFAIK, that is the case, look at the SRAT table:
> > > >
> > > > (XEN) SRAT: Node 0 PXM 0 0-a0000
> > > > (XEN) SRAT: Node 0 PXM 0 0-e0000000
> > > > (XEN) SRAT: Node 1 PXM 1 100000000-200000000
> > >
> > > What if you move end_boot_allocator() to just before 'early_boot = 0' in
> > > arch/x86/setup.c?
> >
> > Giving that a shot.
>
> Didn't work, just pushes the bubble back to allocating the array for
> NODE0:
Ah, top of end_boot_allocator assings avail[0] to avail0. Looks like
there is some heap/avail initiliazation that needs to be done
seperately.
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-03-30 18:46 ` Ryan Harper
2007-03-30 18:48 ` Ryan Harper
@ 2007-03-30 18:51 ` Keir Fraser
2007-03-30 18:55 ` Ryan Harper
2007-03-30 19:03 ` Ryan Harper
1 sibling, 2 replies; 19+ messages in thread
From: Keir Fraser @ 2007-03-30 18:51 UTC (permalink / raw)
To: Ryan Harper; +Cc: xen-devel
On 30/3/07 19:46, "Ryan Harper" <ryanh@us.ibm.com> wrote:
> That is, we are still dying in init_heap_pages(), but this time for
> allocating avail[0] rather than avail[1].
In addition to that change in x86/setup.c, move the first three lines of
code from end_boot_allocator() into init_xenheap_pages() (these are the ones
that actually set up the statically allocated node0 metadata). If that works
I'll sort out a clean fix.
This is just a simple exercise in code shuffling. :-)
-- Keir
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-03-30 18:51 ` Keir Fraser
@ 2007-03-30 18:55 ` Ryan Harper
2007-03-30 19:05 ` Keir Fraser
2007-03-30 19:03 ` Ryan Harper
1 sibling, 1 reply; 19+ messages in thread
From: Ryan Harper @ 2007-03-30 18:55 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
* Keir Fraser <Keir.Fraser@cl.cam.ac.uk> [2007-03-30 13:52]:
>
>
>
> On 30/3/07 19:46, "Ryan Harper" <ryanh@us.ibm.com> wrote:
>
> > That is, we are still dying in init_heap_pages(), but this time for
> > allocating avail[0] rather than avail[1].
>
> In addition to that change in x86/setup.c, move the first three lines of
> code from end_boot_allocator() into init_xenheap_pages() (these are the ones
> that actually set up the statically allocated node0 metadata). If that works
> I'll sort out a clean fix.
>
> This is just a simple exercise in code shuffling. :-)
Giving that try right now, let you know in a fewl.
I would have rather had the folks who changed the code actually test
with numa=on.
What are you thoughts about turning numa on by default or at least
getting a numa=on run on a two-socket opteron in the xen-rt regressions.
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-03-30 18:55 ` Ryan Harper
@ 2007-03-30 19:05 ` Keir Fraser
2007-03-30 19:39 ` Ryan Harper
0 siblings, 1 reply; 19+ messages in thread
From: Keir Fraser @ 2007-03-30 19:05 UTC (permalink / raw)
To: Ryan Harper; +Cc: xen-devel
On 30/3/07 19:55, "Ryan Harper" <ryanh@us.ibm.com> wrote:
>> This is just a simple exercise in code shuffling. :-)
>
> Giving that try right now, let you know in a fewl.
>
> I would have rather had the folks who changed the code actually test
> with numa=on.
>
> What are you thoughts about turning numa on by default or at least
> getting a numa=on run on a two-socket opteron in the xen-rt regressions.
Turning on by default is pointless because guests are not restricted to
running on specific nodes by default. Since manual intervention is required
to achieve that (right now at least) requiring numa=on is not much of a
hardship.
Integration with xenrt is perhaps possible but we are constrained by machine
resources. Numa=on has been broken for over a month and noone else had yet
complained so it doesn't seem to be a priority for many people.
-- Keir
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-03-30 19:05 ` Keir Fraser
@ 2007-03-30 19:39 ` Ryan Harper
2007-03-31 9:06 ` Keir Fraser
0 siblings, 1 reply; 19+ messages in thread
From: Ryan Harper @ 2007-03-30 19:39 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
* Keir Fraser <Keir.Fraser@cl.cam.ac.uk> [2007-03-30 14:10]:
> On 30/3/07 19:55, "Ryan Harper" <ryanh@us.ibm.com> wrote:
>
> >> This is just a simple exercise in code shuffling. :-)
> >
> > Giving that try right now, let you know in a fewl.
> >
> > I would have rather had the folks who changed the code actually test
> > with numa=on.
> >
> > What are you thoughts about turning numa on by default or at least
> > getting a numa=on run on a two-socket opteron in the xen-rt regressions.
>
> Turning on by default is pointless because guests are not restricted to
> running on specific nodes by default. Since manual intervention is required
> to achieve that (right now at least) requiring numa=on is not much of a
> hardship.
I'm getting ready to re-submit patches to export the topology information
so the userspace tools can use that info to make intelligent selections.
This was available back in October, but was never picked up, or even
commented upon.
>
> Integration with xenrt is perhaps possible but we are constrained by machine
> resources. Numa=on has been broken for over a month and noone else had yet
> complained so it doesn't seem to be a priority for many people.
I spent a bit of last week pointing out to folks that NUMA was available
as it was never announced in the 3.0.4 release [1]notes AFAICT. It may be
that the community wasn't aware that it was there to even test. It is
indeed important to IBM and I've spoken with folks at AMD who have
expressed support for ensuring solid NUMA support in Xen.
In any case, thanks for the quick work on fixing this issue.
1. http://xensource.com/download/index_3.0.4.html
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-03-30 19:39 ` Ryan Harper
@ 2007-03-31 9:06 ` Keir Fraser
2007-04-01 5:20 ` Ryan Harper
0 siblings, 1 reply; 19+ messages in thread
From: Keir Fraser @ 2007-03-31 9:06 UTC (permalink / raw)
To: Ryan Harper; +Cc: xen-devel
On 30/3/07 20:39, "Ryan Harper" <ryanh@us.ibm.com> wrote:
>> Turning on by default is pointless because guests are not restricted to
>> running on specific nodes by default. Since manual intervention is required
>> to achieve that (right now at least) requiring numa=on is not much of a
>> hardship.
>
> I'm getting ready to re-submit patches to export the topology information
> so the userspace tools can use that info to make intelligent selections.
> This was available back in October, but was never picked up, or even
> commented upon.
But can tools make sane automatic decisions on domain creation? And if tools
decide not to use NUMA-ness of the system, should the Xen allocator still
hoover up all the memory of the node that vcpu0 happens to start on?
My primary concern is simply whether enabling NUMA by default can hurt
performance, or cause problems by hitting certain memory nodes or memory
zones harder than others, for the great majority of users who will not use
NUMA (even if they have a small NUMA system like AMD K8).
-- Keir
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-03-31 9:06 ` Keir Fraser
@ 2007-04-01 5:20 ` Ryan Harper
2007-04-01 8:29 ` Keir Fraser
0 siblings, 1 reply; 19+ messages in thread
From: Ryan Harper @ 2007-04-01 5:20 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
* Keir Fraser <Keir.Fraser@cl.cam.ac.uk> [2007-03-31 04:09]:
> On 30/3/07 20:39, "Ryan Harper" <ryanh@us.ibm.com> wrote:
>
> >> Turning on by default is pointless because guests are not restricted to
> >> running on specific nodes by default. Since manual intervention is required
> >> to achieve that (right now at least) requiring numa=on is not much of a
> >> hardship.
> >
> > I'm getting ready to re-submit patches to export the topology information
> > so the userspace tools can use that info to make intelligent selections.
> > This was available back in October, but was never picked up, or even
> > commented upon.
>
> But can tools make sane automatic decisions on domain creation? And if tools
I don't think the tools would do any worse than what an admin would do:
keep the domains within a node.
> decide not to use NUMA-ness of the system, should the Xen allocator still
> hoover up all the memory of the node that vcpu0 happens to start on?
I'm not sure I understand what you mean by decide to not use the
NUMA-ness.
>
> My primary concern is simply whether enabling NUMA by default can hurt
> performance, or cause problems by hitting certain memory nodes or memory
> zones harder than others, for the great majority of users who will not use
> NUMA (even if they have a small NUMA system like AMD K8).
Folks without NUMA hardware see the same path through the allocator
today whether they pass numa=on or not. Last summer, I did [1]overhead
testing specifically on small NUMA systems to address this concern. I
assumed that those numbers were satisfactory or the patches would not
have been picked up in the first place.
On systems with NUMA, when the domains are kept within a NUMA node, we
see significant performance [2]boost. I don't have any data to to say
how well a NUMA system would perform with a mixed load of on and off
node access, but when presented with the option of running a domain
entirely within a node on a NUMA system, we should.
What amount of testing is enough for you to consider enabling numa=on by
default post 3.0.5? I think we ought to cook numa=on by default through
another development cycle so we have time to address any performance
issues that may arise.
1. http://lists.xensource.com/archives/html/xen-devel/2006-07/msg01057.html
2. http://lists.xensource.com/archives/html/xen-devel/2006-09/msg00958.html
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-04-01 5:20 ` Ryan Harper
@ 2007-04-01 8:29 ` Keir Fraser
2007-04-01 13:46 ` Ryan Harper
0 siblings, 1 reply; 19+ messages in thread
From: Keir Fraser @ 2007-04-01 8:29 UTC (permalink / raw)
To: Ryan Harper; +Cc: xen-devel
On 1/4/07 06:20, "Ryan Harper" <ryanh@us.ibm.com> wrote:
>>> I'm getting ready to re-submit patches to export the topology information
>>> so the userspace tools can use that info to make intelligent selections.
>>> This was available back in October, but was never picked up, or even
>>> commented upon.
>>
>> But can tools make sane automatic decisions on domain creation? And if tools
>
> I don't think the tools would do any worse than what an admin would do:
> keep the domains within a node.
Well, for example it's not really going to work with the default memory
allocation policy where dom0 takes all memory and then auto-balloons itself
down as domains are created. In this situation the domU will end up with
whatever dom0 happens to have freed up: there's no guarantee of locality.
I don't think that auto-ballooning is a particularly sensible setting for
serious use of Xen. I'd always advise to work out how much memory your dom0
actually needs and make that a static allocation at boot time. But it is our
out-of-the-box default: another thing that needs explicit changing (via
dom0_mem= in this case).
-- Keir
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-04-01 8:29 ` Keir Fraser
@ 2007-04-01 13:46 ` Ryan Harper
2007-04-01 15:51 ` Keir Fraser
0 siblings, 1 reply; 19+ messages in thread
From: Ryan Harper @ 2007-04-01 13:46 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
* Keir Fraser <Keir.Fraser@cl.cam.ac.uk> [2007-04-01 03:28]:
> On 1/4/07 06:20, "Ryan Harper" <ryanh@us.ibm.com> wrote:
>
> >>> I'm getting ready to re-submit patches to export the topology information
> >>> so the userspace tools can use that info to make intelligent selections.
> >>> This was available back in October, but was never picked up, or even
> >>> commented upon.
> >>
> >> But can tools make sane automatic decisions on domain creation? And if tools
> >
> > I don't think the tools would do any worse than what an admin would do:
> > keep the domains within a node.
>
> Well, for example it's not really going to work with the default memory
> allocation policy where dom0 takes all memory and then auto-balloons itself
> down as domains are created. In this situation the domU will end up with
> whatever dom0 happens to have freed up: there's no guarantee of locality.
That's true, but that doesn't mean that as long as there is memory
available in a node that the tool can pick the right cpus that will be
close to the memory.
> I don't think that auto-ballooning is a particularly sensible setting for
> serious use of Xen. I'd always advise to work out how much memory your dom0
> actually needs and make that a static allocation at boot time. But it is our
> out-of-the-box default: another thing that needs explicit changing (via
> dom0_mem= in this case).
Right. It looks like then that it would make sense to leave numa off by
default leaving the admin to specify both numa=on and a sensible
dom0_mem in the absence of a mechanism for dom0 to hand back memory from
a specific node, or some page migration mechanism.
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-04-01 13:46 ` Ryan Harper
@ 2007-04-01 15:51 ` Keir Fraser
2007-04-01 18:53 ` Ryan Harper
0 siblings, 1 reply; 19+ messages in thread
From: Keir Fraser @ 2007-04-01 15:51 UTC (permalink / raw)
To: Ryan Harper; +Cc: xen-devel
On 1/4/07 14:46, "Ryan Harper" <ryanh@us.ibm.com> wrote:
>> I don't think that auto-ballooning is a particularly sensible setting for
>> serious use of Xen. I'd always advise to work out how much memory your dom0
>> actually needs and make that a static allocation at boot time. But it is our
>> out-of-the-box default: another thing that needs explicit changing (via
>> dom0_mem= in this case).
>
> Right. It looks like then that it would make sense to leave numa off by
> default leaving the admin to specify both numa=on and a sensible
> dom0_mem in the absence of a mechanism for dom0 to hand back memory from
> a specific node, or some page migration mechanism.
That's my thinking. I'll see about getting some numa=on testing mixed into
our regression tests, however. There's no reason not to run some proportion
of them with numa=on, although actually most of our test systems are not
NUMA (a few are though).
-- Keir
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-04-01 15:51 ` Keir Fraser
@ 2007-04-01 18:53 ` Ryan Harper
0 siblings, 0 replies; 19+ messages in thread
From: Ryan Harper @ 2007-04-01 18:53 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
* Keir Fraser <Keir.Fraser@cl.cam.ac.uk> [2007-04-01 10:50]:
> On 1/4/07 14:46, "Ryan Harper" <ryanh@us.ibm.com> wrote:
>
> >> I don't think that auto-ballooning is a particularly sensible setting for
> >> serious use of Xen. I'd always advise to work out how much memory your dom0
> >> actually needs and make that a static allocation at boot time. But it is our
> >> out-of-the-box default: another thing that needs explicit changing (via
> >> dom0_mem= in this case).
> >
> > Right. It looks like then that it would make sense to leave numa off by
> > default leaving the admin to specify both numa=on and a sensible
> > dom0_mem in the absence of a mechanism for dom0 to hand back memory from
> > a specific node, or some page migration mechanism.
>
> That's my thinking. I'll see about getting some numa=on testing mixed into
> our regression tests, however. There's no reason not to run some proportion
> of them with numa=on, although actually most of our test systems are not
> NUMA (a few are though).
Thanks. I should have a patchset for exposing the topology and heap
information cooked up this week for post 3.0.5 consideration.
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-03-30 18:51 ` Keir Fraser
2007-03-30 18:55 ` Ryan Harper
@ 2007-03-30 19:03 ` Ryan Harper
1 sibling, 0 replies; 19+ messages in thread
From: Ryan Harper @ 2007-03-30 19:03 UTC (permalink / raw)
To: Keir Fraser; +Cc: Ryan Harper, xen-devel
* Keir Fraser <Keir.Fraser@cl.cam.ac.uk> [2007-03-30 13:53]:
>
>
>
> On 30/3/07 19:46, "Ryan Harper" <ryanh@us.ibm.com> wrote:
>
> > That is, we are still dying in init_heap_pages(), but this time for
> > allocating avail[0] rather than avail[1].
>
> In addition to that change in x86/setup.c, move the first three lines of
> code from end_boot_allocator() into init_xenheap_pages() (these are the ones
> that actually set up the statically allocated node0 metadata). If that works
> I'll sort out a clean fix.
>
> This is just a simple exercise in code shuffling. :-)
That works. Shuffle away. =) When you have a patch, send it my way and
I'll give it a spin to confirm.
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: numa=on broken
2007-03-30 17:34 numa=on broken Ryan Harper
2007-03-30 17:51 ` Keir Fraser
@ 2007-03-30 18:06 ` Ryan Harper
1 sibling, 0 replies; 19+ messages in thread
From: Ryan Harper @ 2007-03-30 18:06 UTC (permalink / raw)
To: Ryan Harper; +Cc: xen-devel
* Ryan Harper <ryanh@us.ibm.com> [2007-03-30 12:37]:
> Testing the latest xen-unstable bits, booting with numa=on fails:
>
> (XEN) Command line: /boot/xen.gz com1=115200,8n1 console=com1 conswitch=x numa=on
> (XEN) Physical RAM map:
> (XEN) 0000000000000000 - 000000000009d000 (usable)
> (XEN) 000000000009dc00 - 00000000000a0000 (reserved)
> (XEN) 00000000000d0000 - 0000000000100000 (reserved)
> (XEN) 0000000000100000 - 00000000dff60000 (usable)
> (XEN) 00000000dff60000 - 00000000dff72000 (ACPI data)
> (XEN) 00000000dff72000 - 00000000dff80000 (ACPI NVS)
> (XEN) 00000000dff80000 - 00000000e0000000 (reserved)
> (XEN) 00000000fec00000 - 00000000fec00400 (reserved)
> (XEN) 00000000fee00000 - 00000000fee01000 (reserved)
> (XEN) 00000000fff80000 - 0000000100000000 (reserved)
> (XEN) 0000000100000000 - 0000000200000000 (usable)
> (XEN) System RAM: 7678MB (7863284kB)
> (XEN) ACPI: RSDP (v002 PTLTD ) @ 0x00000000000f7170
> (XEN) ACPI: XSDT (v001 PTLTD XSDT 0x06040000 LTP 0x00000000) @ 0x00000000dff6d116
> (XEN) ACPI: FADT (v003 AMD HAMMER 0x06040000 PTEC 0x000f4240) @ 0x00000000dff71d0f
> (XEN) ACPI: SRAT (v001 AMD HAMMER 0x06040000 AMD 0x00000001) @ 0x00000000dff71e03
> (XEN) ACPI: MADT (v001 PTLTD APIC 0x06040000 LTP 0x00000000) @ 0x00000000dff71ecb
> (XEN) ACPI: ASF! (v016 MBI CETP 0x06040000 PTL 0x00000001) @ 0x00000000dff71f59
> (XEN) ACPI: DSDT (v001 AMD-K8 AMDACPI 0x06040000 MSFT 0x0100000e) @ 0x0000000000000000
> (XEN) SRAT: PXM 0 -> APIC 0 -> Node 0
> (XEN) SRAT: PXM 1 -> APIC 1 -> Node 1
> (XEN) SRAT: Node 0 PXM 0 0-a0000
> (XEN) SRAT: Node 0 PXM 0 0-e0000000
> (XEN) SRAT: Node 1 PXM 1 100000000-200000000
> (XEN) NUMA: Using 32 for the hash shift.
> (XEN) Cannot handle page request order 0!
> (XEN) Cannot handle page request order 2!
> (XEN) Reserving non-aligned node boundary @ mfn 1048576
> (XEN) Cannot handle page request order 0!
> (XEN) Cannot handle page request order 2!
> (XEN) Cannot handle page request order 0!
> (XEN) Cannot handle page request order 2!
> (XEN) Cannot handle page request order 0!
> (XEN) Cannot handle page request order 2!
> (XEN) Unknown interrupt
>
> Looking a little deeper, it looks like in end_boot_allocator() we are
> attempting to dynamically allocate memory for addition arrays in avail[]
actually, that is in init_heap_pages() which is called by
end_boot_allocator().
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2007-04-01 18:53 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-30 17:34 numa=on broken Ryan Harper
2007-03-30 17:51 ` Keir Fraser
2007-03-30 18:08 ` Ryan Harper
2007-03-30 18:17 ` Keir Fraser
2007-03-30 18:20 ` Ryan Harper
2007-03-30 18:46 ` Ryan Harper
2007-03-30 18:48 ` Ryan Harper
2007-03-30 18:51 ` Keir Fraser
2007-03-30 18:55 ` Ryan Harper
2007-03-30 19:05 ` Keir Fraser
2007-03-30 19:39 ` Ryan Harper
2007-03-31 9:06 ` Keir Fraser
2007-04-01 5:20 ` Ryan Harper
2007-04-01 8:29 ` Keir Fraser
2007-04-01 13:46 ` Ryan Harper
2007-04-01 15:51 ` Keir Fraser
2007-04-01 18:53 ` Ryan Harper
2007-03-30 19:03 ` Ryan Harper
2007-03-30 18:06 ` Ryan Harper
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.