linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@osdl.org>
To: Mel Gorman <mel@csn.ul.ie>
Cc: davej@codemonkey.org.uk, tony.luck@intel.com, linux-mm@kvack.org,
	mel@csn.ul.ie, ak@suse.de, bob.picco@hp.com,
	linux-kernel@vger.kernel.org, linuxppc-dev@ozlabs.org
Subject: Re: [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes
Date: Sat, 20 May 2006 13:59:22 -0700	[thread overview]
Message-ID: <20060520135922.129a481d.akpm@osdl.org> (raw)
In-Reply-To: <20060508141151.26912.15976.sendpatchset@skynet>

Mel Gorman <mel@csn.ul.ie> wrote:
>
> 
> Size zones and holes in an architecture independent manner for x86_64.
> 
> 

I found a .config which triggers the cant-map-acpitables problem.


With that .config, and without this patch:

Linux version 2.6.17-rc4-mm2 (akpm@box) (gcc version 4.1.0 20060304 (Red Hat 4.6
BIOS-provided physical RAM map:                                                
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000ca605000 (usable)  
 BIOS-e820: 00000000ca605000 - 00000000ca680000 (ACPI NVS)
 BIOS-e820: 00000000ca680000 - 00000000cb5ef000 (usable)  
 BIOS-e820: 00000000cb5ef000 - 00000000cb5fc000 (reserved)
 BIOS-e820: 00000000cb5fc000 - 00000000cb6a2000 (usable)  
 BIOS-e820: 00000000cb6a2000 - 00000000cb6eb000 (ACPI NVS)
 BIOS-e820: 00000000cb6eb000 - 00000000cb6ef000 (usable)  
 BIOS-e820: 00000000cb6ef000 - 00000000cb6ff000 (ACPI data)
 BIOS-e820: 00000000cb6ff000 - 00000000cb700000 (usable)   
 BIOS-e820: 00000000cb700000 - 00000000cc000000 (reserved)
 BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000130000000 (usable)  
DMI 2.4 present.                                        
ACPI: PM-Timer IO Port: 0x408
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 6:15 APIC version 20                 
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 6:15 APIC version 20                 
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled)
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])  
ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)      


With that .config, and with this patch:

Bootdata ok (command line is ro root=LABEL=/ earlyprintk=serial,ttyS0,9600,keep netconsole=4444@192.168.2.4/eth0,5147@192.168.2.33/00:0D:56:C6:C6:CC)
Linux version 2.6.17-rc4-mm2 (akpm@box) (gcc version 4.1.0 20060304 (Red Hat 4.1.0-3)) #33 SMP Sat May 20 12:08:03 PDT 2006
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000ca605000 (usable)
 BIOS-e820: 00000000ca605000 - 00000000ca680000 (ACPI NVS)
 BIOS-e820: 00000000ca680000 - 00000000cb5ef000 (usable)
 BIOS-e820: 00000000cb5ef000 - 00000000cb5fc000 (reserved)
 BIOS-e820: 00000000cb5fc000 - 00000000cb6a2000 (usable)
 BIOS-e820: 00000000cb6a2000 - 00000000cb6eb000 (ACPI NVS)
 BIOS-e820: 00000000cb6eb000 - 00000000cb6ef000 (usable)
 BIOS-e820: 00000000cb6ef000 - 00000000cb6ff000 (ACPI data)
 BIOS-e820: 00000000cb6ff000 - 00000000cb700000 (usable)
 BIOS-e820: 00000000cb700000 - 00000000cc000000 (reserved)
 BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000130000000 (usable)
Too many memory regions, truncating
Too many memory regions, truncating
Too many memory regions, truncating
DMI 2.4 present.
ACPI: Unable to map RSDT header
Intel MultiProcessor Specification v1.4
    Virtual Wire compatibility mode.
OEM ID:  Product ID:  APIC at: 0xFEE00000


ACPI disables itself.

Good .config: http://www.zip.com.au/~akpm/linux/patches/stuff/config-good
Bad .config: http://www.zip.com.au/~akpm/linux/patches/stuff/config-bad


The handling of MAX_ACTIVE_REGIONS is unpleasing, sorry.  In my setup it is
5.  But we _really_ only support 4 regions.  So for a start it is misnamed.
The maximum number of regions we support is actually MAX_ACTIVE_REGIONS-1.
And this is a config option too!  So the user must specify
CONFIG_MAX_ACTIVE_REGIONS as the number of active regions plus one, for the
terminating region which has end_pfn=0.  It's weird.

I would not consider this code to be adequately commented.  Please raise a
patch which comments the major functions - what they do, why they do it,
any caveats or implementations details.  A few lines each - don't overdo
it.  Details such as whether the various end_pfn's are inclusive or
exclusive are important, as is a description of the return value.

Anyway, I just don't get how this code can work.  We have an e820 map with
up to 128 entries (this machine has ten) and we're trying to scrunch that
all into the four-entry early_node_map[].

With config-good we're set up for NUMA, CONFIG_NODES_SHIFT=6.  So
MAX_ACTIVE_REGIONS is enormous.  But it's quite wrong that we're using
number-of-zones*number-of-nodes to size a data structure which has to
accommodate all the entries in the e820 map.  These things aren't related.


On my little x86 PC:

BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009bc00 (usable)
 BIOS-e820: 000000000009bc00 - 000000000009c000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000000ffc0000 (usable)
 BIOS-e820: 000000000ffc0000 - 000000000fff8000 (ACPI data)
 BIOS-e820: 000000000fff8000 - 0000000010000000 (ACPI NVS)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffb80000 - 00000000ffc00000 (reserved)
 BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
255MB LOWMEM available.
found SMP MP-table at 000ff780
Range (nid 0) 0 -> 65472, max 4
On node 0 totalpages: 65472
  DMA zone: 4096 pages, LIFO batch:0
  Normal zone: 61376 pages, LIFO batch:15

So here, the architecture code only called add_active_range() the once, for
the entire memory map.  But on the x86_64 add_active_range() was called
once per e820 entry.  I'm dimly starting to realise that this is perhaps
the problem - the weird-looking definition of MAX_ACTIVE_REGIONS _expects_
the architecture to call add_active_range() with a start_pfn/end_pfn which
describes the entire range of pfns for each zone in each node.  Even if
that span includes not-present pfns.  Would that be correct?  I didn't see
a comment in there describing this design (I do go on).

If so, perhaps the bug is that the x86_64 code isn't doing that.  And that
x86 isn't doing it for some people either.

Anyway.  From the implementation I can see what the code is doing.  But I
see no description of what it is _supposed_ to be doing.  (The process of
finding differences between these two things is known as "debugging").  I
could kludge things by setting MAX_ACTIVE_REGIONS to 1000000, but enough. 
I look forward to the next version ;)

  reply	other threads:[~2006-05-20 20:59 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-08 14:10 [PATCH 0/6] Sizing zones and holes in an architecture independent manner V6 Mel Gorman
2006-05-08 14:10 ` [PATCH 1/6] Introduce mechanism for registering active regions of memory Mel Gorman
2006-05-08 14:11 ` [PATCH 2/6] Have Power use add_active_range() and free_area_init_nodes() Mel Gorman
2006-05-08 14:11 ` [PATCH 3/6] Have x86 use add_active_range() and free_area_init_nodes Mel Gorman
2006-05-08 14:11 ` [PATCH 4/6] Have x86_64 " Mel Gorman
2006-05-20 20:59   ` Andrew Morton [this message]
2006-05-20 21:27     ` Andi Kleen
2006-05-20 21:40       ` Andrew Morton
2006-05-20 22:17         ` Andi Kleen
2006-05-20 22:54           ` Andrew Morton
2006-05-21 16:20       ` Mel Gorman
2006-05-21 15:50     ` Mel Gorman
2006-05-21 19:08       ` Andrew Morton
2006-05-21 22:23         ` Mel Gorman
2006-05-23 18:01     ` Mel Gorman
2006-05-08 14:12 ` [PATCH 5/6] Have ia64 " Mel Gorman
2006-05-15  3:31   ` Andrew Morton
2006-05-15  8:21     ` Andy Whitcroft
2006-05-15 10:00       ` Nick Piggin
2006-05-15 10:19         ` Andy Whitcroft
2006-05-15 10:29           ` KAMEZAWA Hiroyuki
2006-05-15 10:47             ` KAMEZAWA Hiroyuki
2006-05-15 11:02             ` Andy Whitcroft
2006-05-16  0:31             ` Nick Piggin
2006-05-16  1:34               ` KAMEZAWA Hiroyuki
2006-05-16  2:11                 ` Nick Piggin
2006-05-15 12:27     ` Mel Gorman
2006-05-15 22:44       ` Mel Gorman
2006-05-19 14:03     ` Mel Gorman
2006-05-19 14:23       ` Andy Whitcroft
2006-05-08 14:12 ` [PATCH 6/6] Break out memory initialisation code from page_alloc.c to mem_init.c Mel Gorman
2006-05-09  1:47   ` Nick Piggin
2006-05-09  8:24     ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2006-08-21 13:45 [PATCH 0/6] Sizing zones and holes in an architecture independent manner V9 Mel Gorman
2006-08-21 13:46 ` [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes Mel Gorman
2006-08-30 20:57   ` Keith Mannthey
2006-08-31 15:49     ` Mel Gorman
2006-08-31 17:52       ` Keith Mannthey
2006-08-31 18:40         ` Mel Gorman
2006-09-01  3:08           ` Keith Mannthey
2006-09-01  8:33             ` Mel Gorman
2006-09-01  8:46               ` Mika Penttilä
2006-09-04 15:36             ` Mel Gorman
2006-07-08 11:10 [PATCH 0/6] Sizing zones and holes in an architecture independent manner V8 Mel Gorman
2006-07-08 11:12 ` [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes Mel Gorman
2006-04-11 10:39 [PATCH 0/6] [RFC] Sizing zones and holes in an architecture independent manner Mel Gorman
2006-04-11 10:41 ` [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060520135922.129a481d.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=ak@suse.de \
    --cc=bob.picco@hp.com \
    --cc=davej@codemonkey.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=mel@csn.ul.ie \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).