From: Mel Gorman <mel@csn.ul.ie>
To: Keith Mannthey <kmannth@gmail.com>
Cc: akpm@osdl.org, tony.luck@intel.com, linuxppc-dev@ozlabs.org,
ak@suse.de, bob.picco@hp.com,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes
Date: Fri, 1 Sep 2006 09:33:19 +0100 (IST) [thread overview]
Message-ID: <Pine.LNX.4.64.0609010928010.25057@skynet.skynet.ie> (raw)
In-Reply-To: <a762e240608312008v3e35b63ay46c95fbb6c3f15ec@mail.gmail.com>
On Thu, 31 Aug 2006, Keith Mannthey wrote:
> On 8/31/06, Mel Gorman <mel@csn.ul.ie> wrote:
>> On Thu, 31 Aug 2006, Keith Mannthey wrote:
>> > On 8/31/06, Mel Gorman <mel@skynet.ie> wrote:
>> >> On (30/08/06 13:57), Keith Mannthey didst pronounce:
>> >> > On 8/21/06, Mel Gorman <mel@csn.ul.ie> wrote:
>> >> > >
>
>> Can you confirm that happens by applying the patch I sent to you and
>> checking the output? When the reserve fails, it should print out what
>> range it actually checked. I want to be sure it's not checking the
>> addresses 0->0x1070000000
>
> See below
>
Perfect, thanks a lot. I should have enough to reproduce without a test
machine what is going on and develop the required patches.
>> >> > >@@ -329,6 +330,8 @@ acpi_numa_memory_affinity_init(struct ac
>> >> > >
>> >> > > printk(KERN_INFO "SRAT: Node %u PXM %u %Lx-%Lx\n", node, pxm,
>> >> > > nd->start, nd->end);
>> >> > >+ e820_register_active_regions(node, nd->start >> PAGE_SHIFT,
>> >> > >+ nd->end >>
>> PAGE_SHIFT);
>> >> >
>> >> > A node chunk in this section of code may be a hot-pluggable zone. With
>> >> > MEMORY_HOTPLUG_SPARSE we don't want to register these regions.
>> >> >
>> >>
>> >> The ranges should not get registered as active memory by
>> >> e820_register_active_regions() unless they are marked E820_RAM. My
>> >> understanding is that the regions for hotadd would be marked "reserved"
>> >> in the e820 map. Is that wrong?
>> >
>> > This is wrong. In a mult-node system that last node add area will not
>> > be marked reserved by the e820. The e820 only defines memory <
>> > end_pfn. the last node add area is > end_pfn.
>> >
>>
>> ok, that should still be fine. As long as the ranges are not marked
>> "usable", add_active_range() will not be called and the holes should be
>> counted correctly with the patch I sent you.
>>
>> > With RESERVE based add-memory you want the add-areas repored by the
>> > srat to be setup during boot like all the other pages.
>> >
>>
>> So, do you actally expect a lot of unused mem_map to be allocated with
>> struct pages that are inactive until memory is hot-added in an
>> x86_64-specific manner? The arch-independent stuff currently will not do
>> that. It sets up memmap for where memory really exists. If that is not
>> what you expect, it will hit issues at hotadd time which is not the
>> current issue but one that can be fixed.
>
> Yes. RESERVED based is a big waste of mem_map space.
Right, it's all very clear now. At some point in the future, I'd like to
visit why SPARSEMEM-based hot-add is not always used but it's a separate
issue.
> The add areas
> are marked as RESERVED during boot and then later onlined during add.
That explains the reserve_bootmem_node()
> It might be ok. I will play with tomorrow. I might just need to
> call add_active_range in the right spot :)
>
I'll play with this from the opposite perspective - what is required for
any arch using this API to have spare memmap for reserve-based hot-add.
>> >> > > if (ma->flags.hot_pluggable && !reserve_hotadd(node, start,
>> end)
>> >> <
>> >> > > 0) {
>> >> > > /* Ignore hotadd region. Undo damage */
>> >> >
>> >> > I have but the e820_register_active_regions as a else to this
>> >> > statment the absent pages check fails.
>> >> >
>> >>
>> >> The patch below omits this change because I think
>> >> e820_register_active_regions() will still have got called by the time
>> >> you encounter a hotplug area.
>> >
>> > called but then removed in setup arch.
>>
>> By "removed", I assume you mean the active regions removed by the call
>> to remove_all_active_regions() in setup_arch(). Before reserve_hotadd() is
>> called, e820_register_active_regions() will have reregistered the active
>> regions with the NUMA node id.
>
> I see e820_register_active_regions is acting as a filter against the e820
>
Yes. A range of pfn's in given to e820_register_active_regions() and it
reads the e820 for E820_RAM sections within that range.
>> >> > Also nodes_cover_memory and alot of these check were based against
>> >> > comparing the srat data against the e820. Now all this code is
>> >> > comparing SRAT against SRAT....
>> >> >
>> >>
>> >> I don't see why. The SRAT table passes a range to
>> >> e820_register_active_regions() so should be comparing SRAT to e820
>> >
>> > let me go off and look at e820_register_active_regions() some more.
> Things get clear :)
>
> Should be ok.
>
nice.
>> > Sure thing. It is just the hot-add area I am guessing it is an off by
>> > one error of some sort.
>> >
> See below. I do my e820_register_active_area as an else to to if
> (hotplug.....!reserve) and the prink is easy to sort out.
>
yep, it should be easy to put this into a test program.
> I see your pfn are in base 10. Looks like it considers the last
> addres to be a present page. (off by one thing).
>
Probably. I'll start poking now. Thanks a lot.
> Thanks,
> Keith
>
> Output below
> disabling early console
> Linux version 2.6.18-rc4-mm3-smp (root@elm3a153) (gcc version 4.1.0
> (SUSE Linux)) #6 SMP Thu Aug 31 22:06:00 EDT 2006
> Command line: root=/dev/sda3
> ip=9.47.66.153:9.47.66.169:9.47.66.1:255.255.255.0 resume=/dev/sda2
> showopts earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0
> debug numa=hotadd=100
> BIOS-provided physical RAM map:
> BIOS-e820: 0000000000000000 - 0000000000098400 (usable)
> BIOS-e820: 0000000000098400 - 00000000000a0000 (reserved)
> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
> BIOS-e820: 0000000000100000 - 000000007ff85e00 (usable)
> BIOS-e820: 000000007ff85e00 - 000000007ff98880 (ACPI data)
> BIOS-e820: 000000007ff98880 - 0000000080000000 (reserved)
> BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
> BIOS-e820: 0000000100000000 - 0000000470000000 (usable)
> BIOS-e820: 0000001070000000 - 0000001160000000 (usable)
> Entering add_active_range(0, 0, 152) 0 entries of 3200 used
> Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
> Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
> Entering add_active_range(0, 17235968, 18219008) 3 entries of 3200 used
> end_pfn_map = 18219008
> DMI 2.3 present.
> ACPI: RSDP (v000 IBM ) @ 0x00000000000fdcf0
> ACPI: RSDT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @
> 0x000000007ff98800
> ACPI: FADT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @
> 0x000000007ff98780
> ACPI: MADT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @
> 0x000000007ff98600
> ACPI: SRAT (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @
> 0x000000007ff983c0
> ACPI: HPET (v001 IBM EXA01ZEU 0x00001000 IBM 0x45444f43) @
> 0x000000007ff98380
> ACPI: SSDT (v001 IBM VIGSSDT0 0x00001000 INTL 0x20030122) @
> 0x000000007ff90780
> ACPI: SSDT (v001 IBM VIGSSDT1 0x00001000 INTL 0x20030122) @
> 0x000000007ff88bc0
> ACPI: DSDT (v001 IBM EXA01ZEU 0x00001000 INTL 0x20030122) @
> 0x0000000000000000
> SRAT: PXM 0 -> APIC 0 -> Node 0
> SRAT: PXM 0 -> APIC 1 -> Node 0
> SRAT: PXM 0 -> APIC 2 -> Node 0
> SRAT: PXM 0 -> APIC 3 -> Node 0
> SRAT: PXM 0 -> APIC 38 -> Node 0
> SRAT: PXM 0 -> APIC 39 -> Node 0
> SRAT: PXM 0 -> APIC 36 -> Node 0
> SRAT: PXM 0 -> APIC 37 -> Node 0
> SRAT: PXM 1 -> APIC 64 -> Node 1
> SRAT: PXM 1 -> APIC 65 -> Node 1
> SRAT: PXM 1 -> APIC 66 -> Node 1
> SRAT: PXM 1 -> APIC 67 -> Node 1
> SRAT: PXM 1 -> APIC 102 -> Node 1
> SRAT: PXM 1 -> APIC 103 -> Node 1
> SRAT: PXM 1 -> APIC 100 -> Node 1
> SRAT: PXM 1 -> APIC 101 -> Node 1
> SRAT: Node 0 PXM 0 0-80000000
> Entering add_active_range(0, 0, 152) 0 entries of 3200 used
> Entering add_active_range(0, 256, 524165) 1 entries of 3200 used
> SRAT: Node 0 PXM 0 0-470000000
> Entering add_active_range(0, 0, 152) 2 entries of 3200 used
> Entering add_active_range(0, 256, 524165) 2 entries of 3200 used
> Entering add_active_range(0, 1048576, 4653056) 2 entries of 3200 used
> SRAT: Node 0 PXM 0 0-1070000000
> reserve_hotadd called with node 0 sart 470000000 end 1070000000
> SRAT: Hotplug area has existing memory
> Entering add_active_range(0, 0, 152) 3 entries of 3200 used
> Entering add_active_range(0, 256, 524165) 3 entries of 3200 used
> Entering add_active_range(0, 1048576, 4653056) 3 entries of 3200 used
> SRAT: Node 1 PXM 1 1070000000-1160000000
> Entering add_active_range(1, 17235968, 18219008) 3 entries of 3200 used
> SRAT: Node 1 PXM 1 1070000000-3200000000
> reserve_hotadd called with node 1 sart 1160000000 end 3200000000
> SRAT: Hotplug area has existing memory
> Entering add_active_range(1, 17235968, 18219008) 4 entries of 3200 used
> NUMA: Using 28 for the hash shift.
> Bootmem setup node 0 0000000000000000-0000001070000000
> Bootmem setup node 1 0000001070000000-0000001160000000
> Zone PFN ranges:
> DMA 0 -> 4096
> DMA32 4096 -> 1048576
> Normal 1048576 -> 18219008
> early_node_map[4] active PFN ranges
> 0: 0 -> 152
> 0: 256 -> 524165
> 0: 1048576 -> 4653056
> 1: 17235968 -> 18219008
> On node 0 totalpages: 4128541
> 0 pages used for SPARSE memmap
> 1149 pages DMA reserved
> DMA zone: 2843 pages, LIFO batch:0
> 0 pages used for SPARSE memmap
> DMA32 zone: 520069 pages, LIFO batch:31
> 0 pages used for SPARSE memmap
> Normal zone: 3604480 pages, LIFO batch:31
> On node 1 totalpages: 983040
> 0 pages used for SPARSE memmap
> 0 pages used for SPARSE memmap
> 0 pages used for SPARSE memmap
> Normal zone: 983040 pages, LIFO batch:31
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
next prev parent reply other threads:[~2006-09-01 8:33 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-21 13:45 [PATCH 0/6] Sizing zones and holes in an architecture independent manner V9 Mel Gorman
2006-08-21 13:45 ` [PATCH 1/6] Introduce mechanism for registering active regions of memory Mel Gorman
2006-08-21 13:45 ` [PATCH 2/6] Have Power use add_active_range() and free_area_init_nodes() Mel Gorman
2006-08-21 13:46 ` [PATCH 3/6] Have x86 use add_active_range() and free_area_init_nodes Mel Gorman
2006-08-21 13:46 ` [PATCH 4/6] Have x86_64 " Mel Gorman
2006-08-30 20:57 ` Keith Mannthey
2006-08-31 15:49 ` Mel Gorman
2006-08-31 17:52 ` Keith Mannthey
2006-08-31 18:40 ` Mel Gorman
2006-09-01 3:08 ` Keith Mannthey
2006-09-01 8:33 ` Mel Gorman [this message]
2006-09-01 8:46 ` Mika Penttilä
2006-09-04 15:36 ` Mel Gorman
2006-09-04 15:38 ` Account for holes that are outside the range of physical memory Mel Gorman
2006-09-04 15:39 ` Allow an arch to expand node boundaries Mel Gorman
2006-08-21 13:46 ` [PATCH 5/6] Have ia64 use add_active_range() and free_area_init_nodes Mel Gorman
2006-08-21 13:47 ` [PATCH 6/6] Account for memmap and optionally the kernel image as holes Mel Gorman
2006-08-21 18:52 ` [PATCH 0/6] Sizing zones and holes in an architecture independent manner V9 Keith Mannthey
2006-08-22 8:38 ` Mel Gorman
-- strict thread matches above, loose matches on Subject: below --
2006-07-08 11:10 [PATCH 0/6] Sizing zones and holes in an architecture independent manner V8 Mel Gorman
2006-07-08 11:12 ` [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes Mel Gorman
2006-05-08 14:10 [PATCH 0/6] Sizing zones and holes in an architecture independent manner V6 Mel Gorman
2006-05-08 14:11 ` [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes Mel Gorman
2006-05-20 20:59 ` Andrew Morton
2006-05-20 21:27 ` Andi Kleen
2006-05-20 21:40 ` Andrew Morton
2006-05-20 22:17 ` Andi Kleen
2006-05-20 22:54 ` Andrew Morton
2006-05-21 16:20 ` Mel Gorman
2006-05-21 15:50 ` Mel Gorman
2006-05-21 19:08 ` Andrew Morton
2006-05-21 22:23 ` Mel Gorman
2006-05-23 18:01 ` Mel Gorman
2006-04-11 10:39 [PATCH 0/6] [RFC] Sizing zones and holes in an architecture independent manner Mel Gorman
2006-04-11 10:41 ` [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0609010928010.25057@skynet.skynet.ie \
--to=mel@csn.ul.ie \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=bob.picco@hp.com \
--cc=kmannth@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).