* [PATCH] x86_64: Fix page table building regression
@ 2011-03-26 6:43 Eric W. Biederman
2011-03-26 9:19 ` Yinghai Lu
0 siblings, 1 reply; 2+ messages in thread
From: Eric W. Biederman @ 2011-03-26 6:43 UTC (permalink / raw)
To: H. Peter Anvin, Ingo Molnar, Thomas Gleixner; +Cc: linux-kernel, Yinghai Lu
Recently I had cause to enable PAGE_ALLOC_DEBUG and I discovered my
kdump kernel would not boot. After some investigation it turns out that
in commit 80989ce064 "x86: clean up and and print out initial max_pfn_mapped"
that a limitation of the 32bit page table setup was unnecessarily
applied to the 64bit code. The initial 64bit page table setup code is
careful to map in it's initial page table pages and unmap then when done
so they can live anywhere in memory, so we don't need to limit ourselves
to using pages that are already mapped into memory.
In my case I hit this because the first 512M was not usable by the
kdump kernel.
Allocating the page tables higher should improve the reliability of
kdump kernels. As it stands today with the recommended 128M reserved
for a kdump kernel the area reserved for kdump kernels will frequently
be allocated above 512M, and the kdump kernels will only be able to
allocate it's page tables from the low 1M of RAM. Strictly speaking
that memory is available but it is the one piece of memory that we don't
have a 100% guarantee there was not on-going DMA to before the kdump
kernel starts.
Allowing the page tables to not come from the low 512M also will allow
kernels built with DEBUG_PAGE_ALLOC to boot on systems with 256G of RAM.
Cc: stable@kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
---
arch/x86/mm/init.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 947f42a..52460a1 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -33,7 +33,7 @@ int direct_gbpages
static void __init find_early_table_space(unsigned long end, int use_pse,
int use_gbpages)
{
- unsigned long puds, pmds, ptes, tables, start;
+ unsigned long puds, pmds, ptes, tables, start, stop;
phys_addr_t base;
puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
@@ -74,11 +74,13 @@ static void __init find_early_table_space(unsigned long end, int use_pse,
*/
#ifdef CONFIG_X86_32
start = 0x7000;
+ /* The 32bit kernel_physical_mapping_init is limited */
+ stop = max_pfn_mapped<<PAGE_SHIFT;
#else
start = 0x8000;
+ stop = end;
#endif
- base = memblock_find_in_range(start, max_pfn_mapped<<PAGE_SHIFT,
- tables, PAGE_SIZE);
+ base = memblock_find_in_range(start, stop, tables, PAGE_SIZE);
if (base == MEMBLOCK_ERROR)
panic("Cannot find space for the kernel page tables");
--
1.7.4
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] x86_64: Fix page table building regression
2011-03-26 6:43 [PATCH] x86_64: Fix page table building regression Eric W. Biederman
@ 2011-03-26 9:19 ` Yinghai Lu
0 siblings, 0 replies; 2+ messages in thread
From: Yinghai Lu @ 2011-03-26 9:19 UTC (permalink / raw)
To: Eric W. Biederman, greg, stable
Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, linux-kernel
On Fri, Mar 25, 2011 at 11:43 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Recently I had cause to enable PAGE_ALLOC_DEBUG and I discovered my
> kdump kernel would not boot. After some investigation it turns out that
> in commit 80989ce064 "x86: clean up and and print out initial max_pfn_mapped"
> that a limitation of the 32bit page table setup was unnecessarily
> applied to the 64bit code. The initial 64bit page table setup code is
> careful to map in it's initial page table pages and unmap then when done
> so they can live anywhere in memory, so we don't need to limit ourselves
> to using pages that are already mapped into memory.
>
> In my case I hit this because the first 512M was not usable by the
> kdump kernel.
We have one completed fix in mainline already.
We need that patch for 2.6.38 stable tree.
that patch is supposed to be for 2.6.38, but is delayed to 2.6.39.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4b239f458c229de044d6905c2b0f9fe16ed9e01e
x86-64, mm: Put early page table high
While dubug kdump, found current kernel will have problem with crashkernel=512M.
It turns out that initial mapping is to 512M, and later initial mapping to 4G
(acutally is 2040M in my platform), will put page table near 512M.
then initial mapping to 128g will be near 2g.
before this patch:
[ 0.000000] initial memory mapped : 0 - 20000000
[ 0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff]
[ 0.000000] 0000000000 - 007f600000 page 2M
[ 0.000000] 007f600000 - 007f750000 page 4k
[ 0.000000] kernel direct mapping tables up to 7f750000 @
[0x1fffc000-0x1fffffff]
[ 0.000000] memblock_x86_reserve_range: [0x1fffc000-0x1fffdfff]
PGTABLE
[ 0.000000] init_memory_mapping: [0x00000100000000-0x0000207fffffff]
[ 0.000000] 0100000000 - 2080000000 page 2M
[ 0.000000] kernel direct mapping tables up to 2080000000 @
[0x7bc01000-0x7bc83fff]
[ 0.000000] memblock_x86_reserve_range: [0x7bc01000-0x7bc7efff]
PGTABLE
[ 0.000000] RAMDISK: 7bc84000 - 7f745000
[ 0.000000] crashkernel reservation failed - No suitable area found.
after patch:
[ 0.000000] initial memory mapped : 0 - 20000000
[ 0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff]
[ 0.000000] 0000000000 - 007f600000 page 2M
[ 0.000000] 007f600000 - 007f750000 page 4k
[ 0.000000] kernel direct mapping tables up to 7f750000 @
[0x7f74c000-0x7f74ffff]
[ 0.000000] memblock_x86_reserve_range: [0x7f74c000-0x7f74dfff]
PGTABLE
[ 0.000000] init_memory_mapping: [0x00000100000000-0x0000207fffffff]
[ 0.000000] 0100000000 - 2080000000 page 2M
[ 0.000000] kernel direct mapping tables up to 2080000000 @
[0x207ff7d000-0x207fffffff]
[ 0.000000] memblock_x86_reserve_range:
[0x207ff7d000-0x207fffafff] PGTABLE
[ 0.000000] RAMDISK: 7bc84000 - 7f745000
[ 0.000000] memblock_x86_reserve_range: [0x17000000-0x36ffffff]
CRASH KERNEL
[ 0.000000] Reserving 512MB of memory at 368MB for crashkernel
(System RAM: 133120MB)
It means with the patch, page table for [0, 2g) will need 2g, instead
of under 512M,
page table for [4g, 128g) will be near 128g, instead of under 2g.
That would good, if we have lots of memory above 4g, like 1024g, or
2048g or 16T, will not put
related page table under 2g. that would be have chance to fill the
under 2g if 1G or 2M page is
not used.
the code change will use add map_low_page() and update
unmap_low_page() for 64bit, and use them
to get access the corresponding high memory for page table setting.
...
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2011-03-26 9:20 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-26 6:43 [PATCH] x86_64: Fix page table building regression Eric W. Biederman
2011-03-26 9:19 ` Yinghai Lu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.