From: "Keith Mannthey" <kmannth@gmail.com>
To: "Mel Gorman" <mel@csn.ul.ie>
Cc: akpm@osdl.org, tony.luck@intel.com, linuxppc-dev@ozlabs.org,
ak@suse.de, bob.picco@hp.com,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes
Date: Thu, 31 Aug 2006 20:08:58 -0700 [thread overview]
Message-ID: <a762e240608312008v3e35b63ay46c95fbb6c3f15ec@mail.gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0608311906300.13392@skynet.skynet.ie>
On 8/31/06, Mel Gorman <mel@csn.ul.ie> wrote:
> On Thu, 31 Aug 2006, Keith Mannthey wrote:
> > On 8/31/06, Mel Gorman <mel@skynet.ie> wrote:
> >> On (30/08/06 13:57), Keith Mannthey didst pronounce:
> >> > On 8/21/06, Mel Gorman <mel@csn.ul.ie> wrote:
> >> > >
> Can you confirm that happens by applying the patch I sent to you and
> checking the output? When the reserve fails, it should print out what
> range it actually checked. I want to be sure it's not checking the
> addresses 0->0x1070000000
See below
> >> > >@@ -329,6 +330,8 @@ acpi_numa_memory_affinity_init(struct ac
> >> > >
> >> > > printk(KERN_INFO "SRAT: Node %u PXM %u %Lx-%Lx\n", node, pxm,
> >> > > nd->start, nd->end);
> >> > >+ e820_register_active_regions(node, nd->start >> PAGE_SHIFT,
> >> > >+ nd->end >> PAGE_SHIFT);
> >> >
> >> > A node chunk in this section of code may be a hot-pluggable zone. With
> >> > MEMORY_HOTPLUG_SPARSE we don't want to register these regions.
> >> >
> >>
> >> The ranges should not get registered as active memory by
> >> e820_register_active_regions() unless they are marked E820_RAM. My
> >> understanding is that the regions for hotadd would be marked "reserved"
> >> in the e820 map. Is that wrong?
> >
> > This is wrong. In a mult-node system that last node add area will not
> > be marked reserved by the e820. The e820 only defines memory <
> > end_pfn. the last node add area is > end_pfn.
> >
>
> ok, that should still be fine. As long as the ranges are not marked
> "usable", add_active_range() will not be called and the holes should be
> counted correctly with the patch I sent you.
>
> > With RESERVE based add-memory you want the add-areas repored by the
> > srat to be setup during boot like all the other pages.
> >
>
> So, do you actally expect a lot of unused mem_map to be allocated with
> struct pages that are inactive until memory is hot-added in an
> x86_64-specific manner? The arch-independent stuff currently will not do
> that. It sets up memmap for where memory really exists. If that is not
> what you expect, it will hit issues at hotadd time which is not the
> current issue but one that can be fixed.
Yes. RESERVED based is a big waste of mem_map space. The add areas
are marked as RESERVED during boot and then later onlined during add.
It might be ok. I will play with tomorrow. I might just need to
call add_active_range in the right spot :)
> >> > > if (ma->flags.hot_pluggable && !reserve_hotadd(node, start, end)
> >> <
> >> > > 0) {
> >> > > /* Ignore hotadd region. Undo damage */
> >> >
> >> > I have but the e820_register_active_regions as a else to this
> >> > statment the absent pages check fails.
> >> >
> >>
> >> The patch below omits this change because I think
> >> e820_register_active_regions() will still have got called by the time
> >> you encounter a hotplug area.
> >
> > called but then removed in setup arch.
>
> By "removed", I assume you mean the active regions removed by the call
> to remove_all_active_regions() in setup_arch(). Before reserve_hotadd() is
> called, e820_register_active_regions() will have reregistered the active
> regions with the NUMA node id.
I see e820_register_active_regions is acting as a filter against the e820
> >> > Also nodes_cover_memory and alot of these check were based against
> >> > comparing the srat data against the e820. Now all this code is
> >> > comparing SRAT against SRAT....
> >> >
> >>
> >> I don't see why. The SRAT table passes a range to
> >> e820_register_active_regions() so should be comparing SRAT to e820
> >
> > let me go off and look at e820_register_active_regions() some more.
Things get clear :)
Should be ok.
> > Sure thing. It is just the hot-add area I am guessing it is an off by
> > one error of some sort.
> >
See below. I do my e820_register_active_area as an else to to if
(hotplug.....!reserve) and the prink is easy to sort out.
I see your pfn are in base 10. Looks like it considers the last
addres to be a present page. (off by one thing).
Thanks,
Keith
Output below
disabling early console
Linux version 2.6.18-rc4-mm3-smp (root@elm3a153) (gcc version 4.1.0
(SUSE Linux)) #6 SMP Thu Aug 31 22:06:00 EDT 2006
Command line: root=/dev/sda3
ip=9.47.66.153:9.47.66.169:9.47.66.1:255.255.255.0 resume=/dev/sda2
showopts earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0
debug numa=hotadd=100
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 0000000000098400 (usable)
BIOS-e820: 0000000000098400 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007ff85e00 (usable)
BIOS-e820: 000000007ff85e00 - 000000007ff98880 (ACPI data)
BIOS-e820: 000000007ff98880 - 0000000080000000 (reserved)
BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000470000000 (usable)
BIOS-e820: 0000001070000000 - 0000001160000000 (usable)
Entering add_active_range(0, 0, 152) 0 entries of 3200 used
Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
Entering add_active_range(0, 17235968, 18219008) 3 entries of 3200 used
end_pfn_map = 18219008
DMI 2.3 present.
ACPI: RSDP (v000 IBM ) @ 0x00000000000fdcf0
ACPI: RSDT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @
0x000000007ff98800
ACPI: FADT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @
0x000000007ff98780
ACPI: MADT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @
0x000000007ff98600
ACPI: SRAT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @
0x000000007ff983c0
ACPI: HPET (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @
0x000000007ff98380
ACPI: SSDT (v001 IBM VIGSSDT0 0x00001000 INTL 0x20030122) @
0x000000007ff90780
ACPI: SSDT (v001 IBM VIGSSDT1 0x00001000 INTL 0x20030122) @
0x000000007ff88bc0
ACPI: DSDT (v001 IBM EXA01ZEU 0x00001000 INTL 0x20030122) @
0x0000000000000000
SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> Node 0
SRAT: PXM 0 -> APIC 2 -> Node 0
SRAT: PXM 0 -> APIC 3 -> Node 0
SRAT: PXM 0 -> APIC 38 -> Node 0
SRAT: PXM 0 -> APIC 39 -> Node 0
SRAT: PXM 0 -> APIC 36 -> Node 0
SRAT: PXM 0 -> APIC 37 -> Node 0
SRAT: PXM 1 -> APIC 64 -> Node 1
SRAT: PXM 1 -> APIC 65 -> Node 1
SRAT: PXM 1 -> APIC 66 -> Node 1
SRAT: PXM 1 -> APIC 67 -> Node 1
SRAT: PXM 1 -> APIC 102 -> Node 1
SRAT: PXM 1 -> APIC 103 -> Node 1
SRAT: PXM 1 -> APIC 100 -> Node 1
SRAT: PXM 1 -> APIC 101 -> Node 1
SRAT: Node 0 PXM 0 0-80000000
Entering add_active_range(0, 0, 152) 0 entries of 3200 used
Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
SRAT: Node 0 PXM 0 0-470000000
Entering add_active_range(0, 0, 152) 2 entries of 3200 used
Entering add_active_range(0, 256, 524165) 2 entries of 3200 used
Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
SRAT: Node 0 PXM 0 0-1070000000
reserve_hotadd called with node 0 sart 470000000 end 1070000000
SRAT: Hotplug area has existing memory
Entering add_active_range(0, 0, 152) 3 entries of 3200 used
Entering add_active_range(0, 256, 524165) 3 entries of 3200 used
Entering add_active_range(0, 1048576, 4653056) 3 entries of 3200 used
SRAT: Node 1 PXM 1 1070000000-1160000000
Entering add_active_range(1, 17235968, 18219008) 3 entries of 3200 used
SRAT: Node 1 PXM 1 1070000000-3200000000
reserve_hotadd called with node 1 sart 1160000000 end 3200000000
SRAT: Hotplug area has existing memory
Entering add_active_range(1, 17235968, 18219008) 4 entries of 3200 used
NUMA: Using 28 for the hash shift.
Bootmem setup node 0 0000000000000000-0000001070000000
Bootmem setup node 1 0000001070000000-0000001160000000
Zone PFN ranges:
DMA 0 -> 4096
DMA32 4096 -> 1048576
Normal 1048576 -> 18219008
early_node_map[4] active PFN ranges
0: 0 -> 152
0: 256 -> 524165
0: 1048576 -> 4653056
1: 17235968 -> 18219008
On node 0 totalpages: 4128541
0 pages used for SPARSE memmap
1149 pages DMA reserved
DMA zone: 2843 pages, LIFO batch:0
0 pages used for SPARSE memmap
DMA32 zone: 520069 pages, LIFO batch:31
0 pages used for SPARSE memmap
Normal zone: 3604480 pages, LIFO batch:31
On node 1 totalpages: 983040
0 pages used for SPARSE memmap
0 pages used for SPARSE memmap
0 pages used for SPARSE memmap
Normal zone: 983040 pages, LIFO batch:31
next prev parent reply other threads:[~2006-09-01 3:09 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-21 13:45 [PATCH 0/6] Sizing zones and holes in an architecture independent manner V9 Mel Gorman
2006-08-21 13:45 ` [PATCH 1/6] Introduce mechanism for registering active regions of memory Mel Gorman
2006-08-21 13:45 ` [PATCH 2/6] Have Power use add_active_range() and free_area_init_nodes() Mel Gorman
2006-08-21 13:46 ` [PATCH 3/6] Have x86 use add_active_range() and free_area_init_nodes Mel Gorman
2006-08-21 13:46 ` [PATCH 4/6] Have x86_64 " Mel Gorman
2006-08-30 20:57 ` Keith Mannthey
2006-08-31 15:49 ` Mel Gorman
2006-08-31 17:52 ` Keith Mannthey
2006-08-31 18:40 ` Mel Gorman
2006-09-01 3:08 ` Keith Mannthey [this message]
2006-09-01 8:33 ` Mel Gorman
2006-09-01 8:46 ` Mika Penttilä
2006-09-04 15:36 ` Mel Gorman
2006-09-04 15:38 ` Account for holes that are outside the range of physical memory Mel Gorman
2006-09-04 15:39 ` Allow an arch to expand node boundaries Mel Gorman
2006-08-21 13:46 ` [PATCH 5/6] Have ia64 use add_active_range() and free_area_init_nodes Mel Gorman
2006-08-21 13:47 ` [PATCH 6/6] Account for memmap and optionally the kernel image as holes Mel Gorman
2006-08-21 18:52 ` [PATCH 0/6] Sizing zones and holes in an architecture independent manner V9 Keith Mannthey
2006-08-22 8:38 ` Mel Gorman
-- strict thread matches above, loose matches on Subject: below --
2006-07-08 11:10 [PATCH 0/6] Sizing zones and holes in an architecture independent manner V8 Mel Gorman
2006-07-08 11:12 ` [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes Mel Gorman
2006-05-08 14:10 [PATCH 0/6] Sizing zones and holes in an architecture independent manner V6 Mel Gorman
2006-05-08 14:11 ` [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes Mel Gorman
2006-05-20 20:59 ` Andrew Morton
2006-05-20 21:27 ` Andi Kleen
2006-05-20 21:40 ` Andrew Morton
2006-05-20 22:17 ` Andi Kleen
2006-05-20 22:54 ` Andrew Morton
2006-05-21 16:20 ` Mel Gorman
2006-05-21 15:50 ` Mel Gorman
2006-05-21 19:08 ` Andrew Morton
2006-05-21 22:23 ` Mel Gorman
2006-05-23 18:01 ` Mel Gorman
2006-04-11 10:39 [PATCH 0/6] [RFC] Sizing zones and holes in an architecture independent manner Mel Gorman
2006-04-11 10:41 ` [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a762e240608312008v3e35b63ay46c95fbb6c3f15ec@mail.gmail.com \
--to=kmannth@gmail.com \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=bob.picco@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=mel@csn.ul.ie \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).