From: Tejun Heo <tj@kernel.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: Yinghai Lu <yinghai@kernel.org>, Ingo Molnar <mingo@redhat.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Thomas Gleixner <tglx@linutronix.de>,
linux-kernel@vger.kernel.org
Subject: Re: [RFC] Reverting NUMA-affine page table allocation
Date: Wed, 2 Mar 2011 19:22:34 +0100 [thread overview]
Message-ID: <20110302182234.GA28266@mtj.dyndns.org> (raw)
In-Reply-To: <20110302180827.GA13693@elte.hu>
Hello,
On Wed, Mar 02, 2011 at 07:08:27PM +0100, Ingo Molnar wrote:
> > I tried to clean up the page table allocation code but the necessary
> > changes felt a bit too large at this stage, so IMO that's best left to
> > the next cycle.
>
> Do you plan to implement it more cleanly?
Yeah, that's the plan. I want the page table allocation code cleaned
up before doing this.
I also want to take a dumber/simpler approach at the expense of some
disadvantage to machines with interleaved NUMA nodes which can't do
1GiB mappings. If this scenario is a real concern, which I'm doubtful
about but then again it could be, we can do the callback walking thing
but I'd at least want to know that that's an actual concern we need to
address.
> > To me, it seems complicated for not good enough reasons. I'll defer
> > the decision to x86 maintainers. Ingo, hpa, Thomas, what do you guys
> > think?
>
> Would be nice to see an actual patch that does the revert.
Here it is.
Thanks.
diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 97e6007..bce688d 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -54,8 +54,6 @@ static inline phys_addr_t get_max_mapped(void)
extern unsigned long init_memory_mapping(unsigned long start,
unsigned long end);
-void init_memory_mapping_high(void);
-
extern void initmem_init(void);
extern void free_initmem(void);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 46e684f..c3a606c 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -963,6 +963,14 @@ void __init setup_arch(char **cmdline_p)
max_low_pfn_mapped = init_memory_mapping(0, max_low_pfn<<PAGE_SHIFT);
max_pfn_mapped = max_low_pfn_mapped;
+#ifdef CONFIG_X86_64
+ if (max_pfn > max_low_pfn) {
+ max_pfn_mapped = init_memory_mapping(1UL<<32,
+ max_pfn<<PAGE_SHIFT);
+ /* can we preseve max_low_pfn ?*/
+ max_low_pfn = max_pfn;
+ }
+#endif
memblock.current_limit = get_max_mapped();
/*
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 470cc47..c8813aa 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -606,63 +606,9 @@ kernel_physical_mapping_init(unsigned long start,
void __init initmem_init(void)
{
memblock_x86_register_active_regions(0, 0, max_pfn);
- init_memory_mapping_high();
}
#endif
-struct mapping_work_data {
- unsigned long start;
- unsigned long end;
- unsigned long pfn_mapped;
-};
-
-static int __init_refok
-mapping_work_fn(unsigned long start_pfn, unsigned long end_pfn, void *datax)
-{
- struct mapping_work_data *data = datax;
- unsigned long pfn_mapped;
- unsigned long final_start, final_end;
-
- final_start = max_t(unsigned long, start_pfn<<PAGE_SHIFT, data->start);
- final_end = min_t(unsigned long, end_pfn<<PAGE_SHIFT, data->end);
-
- if (final_end <= final_start)
- return 0;
-
- pfn_mapped = init_memory_mapping(final_start, final_end);
-
- if (pfn_mapped > data->pfn_mapped)
- data->pfn_mapped = pfn_mapped;
-
- return 0;
-}
-
-static unsigned long __init_refok
-init_memory_mapping_active_regions(unsigned long start, unsigned long end)
-{
- struct mapping_work_data data;
-
- data.start = start;
- data.end = end;
- data.pfn_mapped = 0;
-
- work_with_active_regions(MAX_NUMNODES, mapping_work_fn, &data);
-
- return data.pfn_mapped;
-}
-
-void __init_refok init_memory_mapping_high(void)
-{
- if (max_pfn > max_low_pfn) {
- max_pfn_mapped = init_memory_mapping_active_regions(1UL<<32,
- max_pfn<<PAGE_SHIFT);
- /* can we preserve max_low_pfn ? */
- max_low_pfn = max_pfn;
-
- memblock.current_limit = get_max_mapped();
- }
-}
-
void __init paging_init(void)
{
unsigned long max_zone_pfns[MAX_NR_ZONES];
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 74064e8..86491ba 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -543,8 +543,6 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
if (!numa_meminfo_cover_memory(mi))
return -EINVAL;
- init_memory_mapping_high();
-
/* Finally register nodes. */
for_each_node_mask(nid, node_possible_map) {
u64 start = (u64)max_pfn << PAGE_SHIFT;
--
tejun
prev parent reply other threads:[~2011-03-02 18:22 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-26 14:37 [RFC] Reverting NUMA-affine page table allocation Tejun Heo
2011-03-01 8:30 ` Tejun Heo
2011-03-01 22:41 ` Yinghai Lu
2011-03-02 16:19 ` Tejun Heo
2011-03-02 16:47 ` Yinghai Lu
2011-03-02 17:07 ` Tejun Heo
2011-03-02 18:08 ` Ingo Molnar
2011-03-02 18:12 ` H. Peter Anvin
2011-03-02 18:19 ` Ingo Molnar
2011-03-02 18:25 ` Tejun Heo
2011-03-02 18:31 ` Ingo Molnar
2011-03-02 18:32 ` Ingo Molnar
2011-03-02 18:22 ` Tejun Heo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110302182234.GA28266@mtj.dyndns.org \
--to=tj@kernel.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox