From: Robert Picco <Robert.Picco@hp.com>
To: "Martin J. Bligh" <mbligh@aracnet.com>
Cc: Jesse Barnes <jbarnes@sgi.com>,
linux-kernel@vger.kernel.org, colpatch@us.ibm.com,
haveblue@us.ibm.com
Subject: Re: boot time node and memory limit options
Date: Wed, 17 Mar 2004 15:01:21 -0500 [thread overview]
Message-ID: <4058AE91.8000909@hp.com> (raw)
In-Reply-To: <2611830000.1079552673@[10.10.2.4]>
Martin J. Bligh wrote:
>>I agree with sizing issues at boot of hash tables. I've seen them all recover when failing to allocate based
>>on num_physpages and then iterating at smaller allocations until successful. All the primary initialization allocations recover but probably not all drivers. You could have similiar failure scenarios for any boot line parameter(s) implementation which reduces memory.
>>
>>
>>>Don't we have the same arch dependant issue with the current mem= anyway?
>>>Can we come up with something where the arch code calls back into a generic
>>>function to derive limitations, and thereby at least get the parsing done
>>>in a common routine for consistency? There aren't *that* many NUMA arches
>>>to change anyway ...
>>>
>>>
>>>
>>>
>>Well this is heading in the direction Dave has proposed and probably 2.7 material. This would really solve the problem differently than my proposed patch.
>>
>>
>
>Yes ... that's looking very 2.7-ish to reorganise all that stuff. However,
>for now, I still think we need to restrict memory very early on, before
>anything else can allocate bootmem. Are you the absolute first thing that
>ever runs in the boot allocator?
>
>M.
>
>
All the machine dependent initialization code could have allocated
and/or reserved bootmem before the patch would claim additional memory
based on boot line parameters. The patch is called just before
mem_init. There aren't any pages on freelist yet because mem_init
hasn't been called. So I'm not the first thing that ever runs in the
boot allocator. I'm not sure that my answer is addressing your question?
Bob
>
>
>>thanks,
>>
>>Bob
>>
>>
>>
>>>M.
>>>
>>>
>>>
>>>
>>>
>>>>Bob
>>>>Martin J. Bligh wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>--On Tuesday, March 16, 2004 09:43:29 -0800 Jesse Barnes <jbarnes@sgi.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>On Tue, Mar 16, 2004 at 12:28:10PM -0500, Robert Picco wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>This patch supports three boot line options. mem_limit limits the
>>>>>>>amount of physical memory. node_mem_limit limits the amount of
>>>>>>>physical memory per node on a NUMA machine. nodes_limit reduces the
>>>>>>>number of NUMA nodes to the value specified. On a NUMA machine an
>>>>>>>eliminated node's CPU(s) are removed from the cpu_possible_map.
>>>>>>>
>>>>>>>The patch has been tested on an IA64 NUMA machine and uniprocessor X86
>>>>>>>machine.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>I think this patch will be really useful. Matt and Martin, does it look
>>>>>>ok to you? Given that discontiguous support is pretty platform specific
>>>>>>right now, I thought it might be less code if it was done in arch/, but
>>>>>>a platform independent version is awfully nice...
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>I haven't looked at your code yet, but I've had a similar patch in my tree
>>>>>from Dave Hansen for a while you might want to look at:
>>>>>
>>>>>diff -purN -X /home/mbligh/.diff.exclude 320-kcg/arch/i386/kernel/numaq.c 330-numa_mem_equals/arch/i386/kernel/numaq.c
>>>>>--- 320-kcg/arch/i386/kernel/numaq.c 2003-10-01 11:47:33.000000000 -0700
>>>>>+++ 330-numa_mem_equals/arch/i386/kernel/numaq.c 2004-03-14 09:54:00.000000000 -0800
>>>>>@@ -42,6 +42,10 @@ extern long node_start_pfn[], node_end_p
>>>>>* function also increments numnodes with the number of nodes (quads)
>>>>>* present.
>>>>>*/
>>>>>+extern unsigned long max_pages_per_node;
>>>>>+extern int limit_mem_per_node;
>>>>>+
>>>>>+#define node_size_pages(n) (node_end_pfn[n] - node_start_pfn[n])
>>>>>static void __init smp_dump_qct(void)
>>>>>{
>>>>> int node;
>>>>>@@ -60,6 +64,8 @@ static void __init smp_dump_qct(void)
>>>>> eq->hi_shrd_mem_start - eq->priv_mem_size);
>>>>> node_end_pfn[node] = MB_TO_PAGES(
>>>>> eq->hi_shrd_mem_start + eq->hi_shrd_mem_size);
>>>>>+ if (node_size_pages(node) > max_pages_per_node)
>>>>>+ node_end_pfn[node] = node_start_pfn[node] + max_pages_per_node;
>>>>> }
>>>>> }
>>>>>}
>>>>>diff -purN -X /home/mbligh/.diff.exclude 320-kcg/arch/i386/kernel/setup.c 330-numa_mem_equals/arch/i386/kernel/setup.c
>>>>>--- 320-kcg/arch/i386/kernel/setup.c 2004-03-11 14:33:36.000000000 -0800
>>>>>+++ 330-numa_mem_equals/arch/i386/kernel/setup.c 2004-03-14 09:54:00.000000000 -0800
>>>>>@@ -142,7 +142,7 @@ static void __init probe_roms(void)
>>>>> probe_extension_roms(roms);
>>>>>}
>>>>>
>>>>>-static void __init limit_regions(unsigned long long size)
>>>>>+void __init limit_regions(unsigned long long size)
>>>>>{
>>>>> unsigned long long current_addr = 0;
>>>>> int i;
>>>>>@@ -478,6 +478,7 @@ static void __init setup_memory_region(v
>>>>> print_memory_map(who);
>>>>>} /* setup_memory_region */
>>>>>
>>>>>+unsigned long max_pages_per_node = 0xFFFFFFFF;
>>>>>
>>>>>static void __init parse_cmdline_early (char ** cmdline_p)
>>>>>{
>>>>>@@ -521,6 +522,14 @@ static void __init parse_cmdline_early (
>>>>> userdef=1;
>>>>> }
>>>>> }
>>>>>+
>>>>>+ if (c == ' ' && !memcmp(from, "memnode=", 8)) {
>>>>>+ unsigned long long node_size_bytes;
>>>>>+ if (to != command_line)
>>>>>+ to--;
>>>>>+ node_size_bytes = memparse(from+8, &from);
>>>>>+ max_pages_per_node = node_size_bytes >> PAGE_SHIFT;
>>>>>+ }
>>>>>
>>>>> if (c == ' ' && !memcmp(from, "memmap=", 7)) {
>>>>> if (to != command_line)
>>>>>diff -purN -X /home/mbligh/.diff.exclude 320-kcg/arch/i386/kernel/srat.c 330-numa_mem_equals/arch/i386/kernel/srat.c
>>>>>--- 320-kcg/arch/i386/kernel/srat.c 2003-10-01 11:47:33.000000000 -0700
>>>>>+++ 330-numa_mem_equals/arch/i386/kernel/srat.c 2004-03-14 09:54:01.000000000 -0800
>>>>>@@ -53,6 +53,10 @@ struct node_memory_chunk_s {
>>>>>};
>>>>>static struct node_memory_chunk_s node_memory_chunk[MAXCHUNKS];
>>>>>
>>>>>+#define chunk_start(i) (node_memory_chunk[i].start_pfn)
>>>>>+#define chunk_end(i) (node_memory_chunk[i].end_pfn)
>>>>>+#define chunk_size(i) (chunk_end(i)-chunk_start(i))
>>>>>+
>>>>>static int num_memory_chunks; /* total number of memory chunks */
>>>>>static int zholes_size_init;
>>>>>static unsigned long zholes_size[MAX_NUMNODES * MAX_NR_ZONES];
>>>>>@@ -198,6 +202,9 @@ static void __init initialize_physnode_m
>>>>> }
>>>>>}
>>>>>
>>>>>+extern unsigned long max_pages_per_node;
>>>>>+extern int limit_mem_per_node;
>>>>>+
>>>>>/* Parse the ACPI Static Resource Affinity Table */
>>>>>static int __init acpi20_parse_srat(struct acpi_table_srat *sratp)
>>>>>{
>>>>>@@ -281,23 +288,27 @@ static int __init acpi20_parse_srat(stru
>>>>> node_memory_chunk[j].start_pfn,
>>>>> node_memory_chunk[j].end_pfn);
>>>>> }
>>>>>-
>>>>>+
>>>>> /*calculate node_start_pfn/node_end_pfn arrays*/
>>>>> for (nid = 0; nid < numnodes; nid++) {
>>>>>- int been_here_before = 0;
>>>>>+ unsigned long node_present_pages = 0;
>>>>>
>>>>>+ node_start_pfn[nid] = -1;
>>>>> for (j = 0; j < num_memory_chunks; j++){
>>>>>- if (node_memory_chunk[j].nid == nid) {
>>>>>- if (been_here_before == 0) {
>>>>>- node_start_pfn[nid] = node_memory_chunk[j].start_pfn;
>>>>>- node_end_pfn[nid] = node_memory_chunk[j].end_pfn;
>>>>>- been_here_before = 1;
>>>>>- } else { /* We've found another chunk of memory for the node */
>>>>>- if (node_start_pfn[nid] < node_memory_chunk[j].start_pfn) {
>>>>>- node_end_pfn[nid] = node_memory_chunk[j].end_pfn;
>>>>>- }
>>>>>- }
>>>>>- }
>>>>>+ unsigned long proposed_size;
>>>>>+
>>>>>+ if (node_memory_chunk[j].nid != nid)
>>>>>+ continue;
>>>>>+
>>>>>+ proposed_size = node_present_pages + chunk_size(j);
>>>>>+ if (proposed_size > max_pages_per_node)
>>>>>+ chunk_end(j) = chunk_start(j) +
>>>>>+ max_pages_per_node - node_present_pages;
>>>>>+ node_present_pages += chunk_size(j);
>>>>>+
>>>>>+ if (node_start_pfn[nid] == -1)
>>>>>+ node_start_pfn[nid] = chunk_start(j);
>>>>>+ node_end_pfn[nid] = chunk_end(j);
>>>>> }
>>>>> }
>>>>> return 1;
>>>>>
>>>>>-
>>>>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>>>>the body of a message to majordomo@vger.kernel.org
>>>>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>Please read the FAQ at http://www.tux.org/lkml/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>-
>>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>>the body of a message to majordomo@vger.kernel.org
>>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>Please read the FAQ at http://www.tux.org/lkml/
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
>
>
next prev parent reply other threads:[~2004-03-17 20:08 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4057392A.8000602@hp.com>
2004-03-16 17:43 ` boot time node and memory limit options Jesse Barnes
2004-03-16 19:39 ` Martin J. Bligh
2004-03-17 16:15 ` Robert Picco
2004-03-17 16:36 ` Martin J. Bligh
2004-03-17 17:09 ` Dave Hansen
2004-03-17 17:51 ` Jesse Barnes
2004-03-17 18:12 ` Dave Hansen
2004-03-17 19:30 ` Robert Picco
2004-03-17 19:44 ` Martin J. Bligh
2004-03-17 20:01 ` Robert Picco [this message]
2004-03-17 20:58 ` Martin J. Bligh
2004-03-17 20:52 ` Dave Hansen
2004-03-16 17:07 Robert Picco
2004-03-16 17:34 ` Randy.Dunlap
[not found] ` <16471.48076.447058.132559@napali.hpl.hp.com>
2004-03-17 18:07 ` Robert Picco
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4058AE91.8000909@hp.com \
--to=robert.picco@hp.com \
--cc=colpatch@us.ibm.com \
--cc=haveblue@us.ibm.com \
--cc=jbarnes@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mbligh@aracnet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.