* Re: ia64 ORDERROUNDDOWN issue
[not found] <617E1C2C70743745A92448908E030B2AD89B2A@scsmsx411.amr.corp.intel.com>
@ 2006-11-29 21:21 ` Andrew Morton
0 siblings, 0 replies; 3+ messages in thread
From: Andrew Morton @ 2006-11-29 21:21 UTC (permalink / raw)
To: Luck, Tony; +Cc: xb, linux-ia64, linux-mm
(add linux-mm)
On Wed, 29 Nov 2006 10:34:24 -0800
"Luck, Tony" <tony.luck@intel.com> wrote:
> > After some investigations I stated that count_node_pages() was computing
> > mem_data[1].min_pfn = 0, and mem_data[1].max_pfn = 20000 for node 1,
> > thus conflicting with the 0-2GB DMA memory range on node 0.
> > This is due to the line:
> > start = ORDERROUNDDOWN(start);
>
> There is an assumption here that the memory space on a node doesn't
> cross a MAX_ORDER boundary ... and I'm not really sure where to go
> with that. Your patch papers over the problem for your specific case,
> but as you point out it will just re-appear for someone who picks
> a bigger MAX_ORDER.
>
> Having nodes that are smaller than MAX_ORDER will cause confusion in
> the allocator (if all the memory belonging to two nodes is in a
> single MAX_ORDER page, the buddy allocator will give all the memory
> to one node, and none to the other (won't it?).
>
> > This should at least be checked in the count_node_pages() function.
>
> Yes, a check should be made ... but count_node_pages() doesn't have
> all the information if needs to do this (it just gets the start/size
> for the memory on the node ... and it needs to check whether the
> rounddown of the start address (or the roundup of the end address)
> would cause conflicts with memory belonging to other nodes.
>
> Do we need a "max_order" variable that could be adjusted to some lower
> value that MAX_ORDER if we find the memory topology doesn't fit inside
> the lines?
>
(Your email talks about nodes, but I am asuming that we're actually dealing
with per-zone concepts here)
We could of course do that, although it looks like your runtime max_order
should be per-zone and not global. And making it a runtime thing would
cause more code to be emitted for alloc_pages() and alloc_pages_node(), so
we'd at least have to move their checks into .c.
But I wonder if a better approach would be to teach ia64 to just throw away
the last 1 .. MAX_ORDER-1 pages from the oddball zone?
^ permalink raw reply [flat|nested] 3+ messages in thread
* ia64 ORDERROUNDDOWN issue
@ 2006-11-29 14:57 xb
2006-11-30 10:47 ` Zoltan Menyhart
0 siblings, 1 reply; 3+ messages in thread
From: xb @ 2006-11-29 14:57 UTC (permalink / raw)
To: linux-ia64
[-- Attachment #1: Type: text/plain, Size: 5044 bytes --]
Hello all,
On some ia64 NUMA platforms with some specific memory configurations,
the 2.6.18.3 kernel crashes at system initialisation due to conflict for
allocating DMA memory.
The machine has the following memory configuration:
physical address length node
0 2GB 0
4GB 4GB 1
8GB 2GB 0
We use 64 KB pages and the default CONFIG_FORCE_MAX_ZONEORDER=17 value,
that provides the availability to use 4GB huge pages ( 2^(17-1)*2^16 B).
After some investigations I stated that count_node_pages() was computing
mem_data[1].min_pfn = 0, and mem_data[1].max_pfn = 20000 for node 1,
thus conflicting with the 0-2GB DMA memory range on node 0.
This is due to the line:
start = ORDERROUNDDOWN(start);
that computes the value 0 for the 0x100000000 (4GB) address.
I suppose the goal was to check that the memory range is aligned on a
4GB boundary ( 2^(17-1)*2^16 Bytes), and in our case there should be no
round of ht value.
I fixed the ORDERROUNDDOWN macro and system boots OK.
It is not sure that this fixes the problem in all cases: with a
CONFIG_FORCE_MAX_ZONEORDER=18 value, the ORDERROUNDDOWN macro would have
generated the same problem (mem_data[1].min_pfn=0). This should at least
be checked in the count_node_pages() function.
--- linux-2.6.18.3/include/asm-ia64/meminit.h 2006-11-19
04:28:22.000000000 +0100
+++ linux-2.6.18.3new/include/asm-ia64/meminit.h 2006-11-29
15:23:37.000000000 +0100
@@ -40,7 +40,7 @@
*/
#define GRANULEROUNDDOWN(n) ((n) & ~(IA64_GRANULE_SIZE-1))
#define GRANULEROUNDUP(n) (((n)+IA64_GRANULE_SIZE-1) &
~(IA64_GRANULE_SIZE-1))
-#define ORDERROUNDDOWN(n) ((n) & ~((PAGE_SIZE<<MAX_ORDER)-1))
+#define ORDERROUNDDOWN(n) ((n) & ~((PAGE_SIZE<<(MAX_ORDER-1))-1))
#ifdef CONFIG_NUMA
extern void call_pernode_memory (unsigned long start, unsigned long
len, void *func);
- traces
---------------------------------------------------------------------------------------------------
all_unreclaimable? no
lowmem_reserve[]: 0 0 256 256Linux version 2.6.18.3
...
SRAT Memory (0x0000000000000000 length 0x0000000080000000 type 0x0) in
proximity domain 0 enabled
SRAT Memory (0x0000000200000000 length 0x0000000080000000 type 0x0) in
proximity domain 0 enabled
SRAT Memory (0x0000000100000000 length 0x0000000100000000 type 0x0) in
proximity domain 1 enabled
Number of logical nodes in system = 2
Number of memory chunks in system = 3
...
ide0: BM-DMA at 0x2080-0x2087<4>swapper: page allocation failure.
order:0, mode:0x21
Call Trace:
[<a000000100010c30>] show_stack+0x50/0xa0
sp=e000000100cdfbf0 bsp=e000000100cd13c8
[<a000000100010cb0>] dump_stack+0x30/0x60
sp=e000000100cdfdc0 bsp=e000000100cd13b0
[<a0000001000e5f00>] __alloc_pages+0x500/0x540
sp=e000000100cdfdc0 bsp=e000000100cd1348
[<a000000100119830>] alloc_page_interleave+0xd0/0x160
sp=e000000100cdfdd0 bsp=e000000100cd1318
[<a0000001001199f0>] alloc_pages_current+0x130/0x1a0
sp=e000000100cdfdd0 bsp=e000000100cd12e8
[<a0000001000e5f70>] __get_free_pages+0x30/0x100
sp=e000000100cdfdd0 bsp=e000000100cd12c0
[<a0000001003369b0>] swiotlb_alloc_coherent+0x70/0x280
sp=e000000100cdfdd0 bsp=e000000100cd1280
[<a00000010044e510>] ide_setup_dma+0x430/0x8c0
sp=e000000100cdfdd0 bsp=e000000100cd1240
[<a00000010044b2c0>] ide_pci_setup_ports+0xd60/0xea0
sp=e000000100cdfdd0 bsp=e000000100cd11a8
[<a00000010044bc00>] do_ide_setup_pci_device+0x800/0x840
sp=e000000100cdfde0 bsp=e000000100cd1138
[<a00000010044bc80>] ide_setup_pci_device+0x40/0x140
sp=e000000100cdfdf0 bsp=e000000100cd1100
[<a0000001004322d0>] piix_init_one+0x50/0x80
sp=e000000100cdfe00 bsp=e000000100cd10d8
[<a0000001006c8230>] ide_scan_pcidev+0xf0/0x180
sp=e000000100cdfe00 bsp=e000000100cd10a8
[<a0000001006c8300>] ide_scan_pcibus+0x40/0x1e0
sp=e000000100cdfe00 bsp=e000000100cd1080
[<a0000001006c8110>] ide_init+0xb0/0xe0
sp=e000000100cdfe00 bsp=e000000100cd1060
[<a000000100009640>] init+0x380/0x7a0
sp=e000000100cdfe00 bsp=e000000100cd1020
[<a000000100012dd0>] kernel_thread_helper+0xd0/0x100
sp=e000000100cdfe30 bsp=e000000100cd0ff0
[<a000000100009140>] start_kernel_thread+0x20/0x40
sp=e000000100cdfe30 bsp=e000000100cd0ff0
Mem-info:
Node 0 DMA free:0kB min:2688kB low:3328kB high:4032kB active:0kB
inactive:0kB present:1943936kB pages_scanned:0 all_unreclaimable? yes
Node 1 DMA free:1876352kB min:0kB low:0kB high:0kB active:0kB
inactive:0kB present:0kB
[-- Attachment #2: xavier.bru.vcf --]
[-- Type: text/x-vcard, Size: 304 bytes --]
begin:vcard
fn:Xavier Bru
n:Bru;Xavier
adr:;;1 rue de Provence, BP 208;38432 Echirolles Cedex;;;France
email;internet:Xavier.Bru@bull.net
title:BULL/DT/Open Software/linux/ia64
tel;work:+33 (0)4 76 29 77 45
tel;fax:+33 (0)4 76 29 77 70
x-mozilla-html:TRUE
url:http://www.bull.com
version:2.1
end:vcard
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: ia64 ORDERROUNDDOWN issue
2006-11-29 14:57 xb
@ 2006-11-30 10:47 ` Zoltan Menyhart
0 siblings, 0 replies; 3+ messages in thread
From: Zoltan Menyhart @ 2006-11-30 10:47 UTC (permalink / raw)
To: linux-ia64
>> Do we need a "max_order" variable that could be adjusted to some lower
>> value that MAX_ORDER if we find the memory topology doesn't fit inside
>> the lines?
>
> (Your email talks about nodes, but I am asuming that we're actually dealing
> with per-zone concepts here)
...and ...
> But I wonder if a better approach would be to teach ia64 to just throw away
> the last 1 .. MAX_ORDER-1 pages from the oddball zone?
Assume we've got a machine with a memory configuration:
Node 0: 0 ... 4 Gbyte - 1
Node 1: 4 Gbyte ... 16 Gbyte - 1
Assume MAX_ORDER is 8 Gbytes (a single binary, for all of our machines,
for maintenance reasons).
An allocation of 8 Gbytes should have its chance.
Therefore the global MAX_ORDER should not be diminished dynamically.
Surely we do not want to throw away 4 Gbytes of memory.
The kernel should support that both of the nodes have starting address at 0.
Therefore the "node_bootmem_map"-s of both of the nodes include the address
range 0 ... 4 Gbyte in the example above.
The "node_bootmem_map" of node 1 just happens to contain 0-s in the range
of 0 ... 4 Gbyte.
A not-in-use level of the buddy allocator
(the 8 Gbyte level on node 0 in the example above)
does not cost too much, I think there is no use to add complexity to the
allocator code.
Thanks,
Zoltan Menyhart
P.S. I guess it is not an ia64-only issue.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-11-30 10:47 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <617E1C2C70743745A92448908E030B2AD89B2A@scsmsx411.amr.corp.intel.com>
2006-11-29 21:21 ` ia64 ORDERROUNDDOWN issue Andrew Morton
2006-11-29 14:57 xb
2006-11-30 10:47 ` Zoltan Menyhart
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox