linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yinghai Lu <yinghai@kernel.org>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Vivek Goyal <vgoyal@redhat.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Stanislaw Gruszka <sgruszka@redhat.com>,
	Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
	Maxim Uvarov <muvarov@gmail.com>,
	linux-kernel@vger.kernel.org, Neil Horman <nhorman@redhat.com>,
	Jeremy Fitzhardinge <jeremy@goop.org>
Subject: Re: kdump broken on 2.6.37-rc4
Date: Thu, 16 Dec 2010 16:39:43 -0800	[thread overview]
Message-ID: <4D0AB14F.9050900@kernel.org> (raw)
In-Reply-To: <4D0AA592.3060805@kernel.org>

On 12/16/2010 03:49 PM, Yinghai Lu wrote:
> On 12/16/2010 03:30 PM, Yinghai Lu wrote:
>> On 12/16/2010 11:58 AM, H. Peter Anvin wrote:
>>> On 12/16/2010 09:28 AM, Yinghai Lu wrote:
>>>>
>>>> the brk is complaining if i change that to 
>>>>
>>>>  	if (end > ((-__PAGE_OFFSET-(128 <<20)-1) & 0x7fffffff))
>>>>  		error("Destination address too large");
>>>>
>>>> brk is complaining when try to get more for dmi ...
>>>> ...
>>>> I'm in purgatory
>>>> bootconsole [uart0] enabled
>>>> Kernel Layout:
>>>>   .text: [0x2e000000-0x2e3f08ca]
>>>> .rodata: [0x2e3f2000-0x2e5a2fff]
>>>>   .data: [0x2e5a3000-0x2e5f6467]
>>>>   .init: [0x2e5f7000-0x2e670fff]
>>>>    .bss: [0x2e675000-0x2e76ffff]
>>>>    .brk: [0x2e770000-0x2e894fff]
>>>>     memblock_x86_reserve_range: [0x00001000-0x00001fff]    EX TRAMPOLINE
>>>>     memblock_x86_reserve_range: [0x2e000000-0x2e76ffff]    TEXT DATA BSS
>>>>     memblock_x86_reserve_range: [0x35bdd000-0x35f49fff]          RAMDISK
>>>>     memblock_x86_reserve_range: [0x0009c800-0x000fffff]  * BIOS reserved
>>>> Initializing cgroup subsys cpuset
>>>> Initializing cgroup subsys cpu
>>>> Linux version 2.6.37-rc5-tip+ (root@mpk12-3214-189-181) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #4 SMP Wed Dec 15 11:04:32 PST 2010
>>>> KERNEL supported cpus:
>>>>   Intel GenuineIntel
>>>>   AMD AuthenticAMD
>>>>   NSC Geode by NSC
>>>>   Cyrix CyrixInstead
>>>>   Centaur CentaurHauls
>>>>   Transmeta GenuineTMx86
>>>>   Transmeta TransmetaCPU
>>>>   UMC UMC UMC UMC
>>>> BIOS-provided physical RAM map:
>>>>  BIOS-e820: [0x00000000000100-0x0000000009c7ff] (usable)
>>>>  BIOS-e820: [0x0000000009c800-0x0000000009ffff] (reserved)
>>>>  BIOS-e820: [0x000000000e0000-0x000000000fffff] (reserved)
>>>>  BIOS-e820: [0x00000000100000-0x0000007ff9ffff] (usable)
>>>>  BIOS-e820: [0x0000007ffae000-0x0000007ffaffff] (usable)
>>>>  BIOS-e820: [0x0000007ffb0000-0x0000007ffbdfff] (ACPI data)
>>>>  BIOS-e820: [0x0000007ffbe000-0x0000007ffeffff] (ACPI NVS)
>>>>  BIOS-e820: [0x0000007fff0000-0x0000007fffffff] (reserved)
>>>>  BIOS-e820: [0x000000e0000000-0x000000efffffff] (reserved)
>>>>  BIOS-e820: [0x000000fec00000-0x000000fec00fff] (reserved)
>>>>  BIOS-e820: [0x000000fee00000-0x000000feefffff] (reserved)
>>>>  BIOS-e820: [0x000000ff700000-0x000000ffffffff] (reserved)
>>>> last_pfn = 0x7ffb0 max_arch_pfn = 0x1000000
>>>> NX (Execute Disable) protection: active
>>>> user-defined physical RAM map:
>>>>  user: [0x00000000000000-0x0000000009ffff] (usable)
>>>>  user: [0x0000002e000000-0x00000035f59fff] (usable)
>>>>  user: [0x0000007ffb0000-0x0000007ffeffff] (ACPI data)
>>>> DMI present.
>>>> BUG: Int 6: CR2   (null)
>>>>      EDI 00000019  ESI ff940c18  EBP   (null)  ESP ee5a5e84
>>>>      EBX ee5cfb68  EDX 00000006  ECX 00000019  EAX ee8e6019
>>>>      err   (null)  EIP ee5fb4dd   CS 00000060  flg 00010002
>>>> Stack: 00000019 ee62bf45 ff942000 00000563 00000001 ff940c00 000018c7 ee62bf83
>>>>        ff940c00 ee62c063 80000000 ee3e6f2f ee50a3c0 ee5a5ed4 ff940c00 ff940c43
>>>>        000018c7   (null) ee3173d4 000018c8 0000007f ff940c00 ff90b1bf ee5a5f18
>>>> Pid: 0, comm: swapper Not tainted 2.6.37-rc5-tip+ #4
>>>> Call Trace:
>>>>  [<ee3dd1d5>] ? hlt_loop+0x0/0x3
>>>>  [<ee5fb4dd>] ? extend_brk+0x31/0x44
>>>
>>> I'm assuming it bails due to:
>>>
>>> 	BUG_ON((char *)(_brk_end + size) > __brk_limit);
>>>
>>> ... could you find out what _brk_end and __brk_limit are?
>>
>> void __init print_kernel_layout(void)
>> {
>>         printk("Kernel Layout:\n");
>>         printk("  .text: [%#010lx-%#010lx]\n", __pa_symbol(&_text), __pa_symbol(&_etext) - 1);
>>         printk(".rodata: [%#010lx-%#010lx]\n", __pa_symbol(&__start_rodata), __pa_symbol(&__end_rodata) - 1);
>>         printk("  .data: [%#010lx-%#010lx]\n", __pa_symbol(&_sdata), __pa_symbol(&_edata) - 1);
>>         printk("  .init: [%#010lx-%#010lx]\n", __pa_symbol(&__init_begin), __pa_symbol(&__init_end) - 1);
>>         printk("   .bss: [%#010lx-%#010lx]\n", __pa_symbol(&__bss_start), __pa_symbol(&__bss_stop) - 1);
>>         printk("   .brk: [%#010lx-%#010lx]\n", __pa_symbol(&__brk_base), __pa_symbol(&__brk_limit) - 1);
>> }
>>
>>>> Kernel Layout:
>>>>   .text: [0x2e000000-0x2e3f08ca]
>>>> .rodata: [0x2e3f2000-0x2e5a2fff]
>>>>   .data: [0x2e5a3000-0x2e5f6467]
>>>>   .init: [0x2e5f7000-0x2e670fff]
>>>>    .bss: [0x2e675000-0x2e76ffff]
>>>>    .brk: [0x2e770000-0x2e894fff]
>>
>> DMI present.
>> _brk_end: ee8e6000, __brk_limit: ee895000 
>>
> 
> looks like in arch/x86/kernel/head_32.S
> will put page_table in _brk....
> 
> if the whole range is some high, it will use more buffer in _brk for ...
> 
> brk pre-calucation could be wrong and too small.

32bit have assume KERNEL_IMAGE_SIZE is 512M
arch/x86/include/asm/page_32_types.h:#define KERNEL_IMAGE_SIZE  (512 * 1024 * 1024)
arch/x86/include/asm/page_64_types.h:#define KERNEL_IMAGE_SIZE  (512 * 1024 * 1024)
arch/x86/kernel/head64.c:       BUILD_BUG_ON(MODULES_VADDR-KERNEL_IMAGE_START < KERNEL_IMAGE_SIZE);
arch/x86/kernel/head64.c:       BUILD_BUG_ON(MODULES_LEN + KERNEL_IMAGE_SIZE > 2*PUD_SIZE);
arch/x86/kernel/head64.c:       max_pfn_mapped = KERNEL_IMAGE_SIZE >> PAGE_SHIFT;
arch/x86/kernel/head_32.S: *     (KERNEL_IMAGE_SIZE/4096) / 1024 pages (worst case, non PAE)
arch/x86/kernel/head_32.S: *     (KERNEL_IMAGE_SIZE/4096) / 512 + 4 pages (worst case for PAE)
arch/x86/kernel/head_32.S: * KERNEL_IMAGE_SIZE should be greater than pa(_end)
arch/x86/kernel/head_32.S:KERNEL_PAGES = (KERNEL_IMAGE_SIZE + MAPPING_BEYOND_END)>>PAGE_SHIFT 

and use that to estimate BRK size.

so we could change the BRK calculating code to handle 896M or just limit crashkernel for 32bit to 512M...

handle 896M one:

---
 arch/x86/boot/compressed/misc.c |    2 +-
 arch/x86/kernel/head_32.S       |    4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/x86/boot/compressed/misc.c
===================================================================
--- linux-2.6.orig/arch/x86/boot/compressed/misc.c
+++ linux-2.6/arch/x86/boot/compressed/misc.c
@@ -365,7 +365,7 @@ asmlinkage void decompress_kernel(void *
 	if (heap > 0x3fffffffffffUL)
 		error("Destination address too large");
 #else
-	if (heap > ((-__PAGE_OFFSET-(512<<20)-1) & 0x7fffffff))
+	if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
 		error("Destination address too large");
 #endif
 #ifndef CONFIG_RELOCATABLE
Index: linux-2.6/arch/x86/kernel/head_32.S
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head_32.S
+++ linux-2.6/arch/x86/kernel/head_32.S
@@ -68,8 +68,10 @@ MAPPING_BEYOND_END = \
  * Worst-case size of the kernel mapping we need to make:
  * the worst-case size of the kernel itself, plus the extra we need
  * to map for the linear map.
+ * to make crashkernel bzImage to stay high, make it map to 896M
+ *  but it will be claimed back when brk is concluded. So no wasting.
  */
-KERNEL_PAGES = (KERNEL_IMAGE_SIZE + MAPPING_BEYOND_END)>>PAGE_SHIFT
+KERNEL_PAGES = (KERNEL_IMAGE_SIZE + (384<<20) + MAPPING_BEYOND_END)>>PAGE_SHIFT
 
 INIT_MAP_SIZE = PAGE_TABLE_SIZE(KERNEL_PAGES) * PAGE_SIZE_asm
 RESERVE_BRK(pagetables, INIT_MAP_SIZE)

  reply	other threads:[~2010-12-17  0:43 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-03 11:16 kdump broken on 2.6.37-rc4 Stanislaw Gruszka
2010-12-03 15:46 ` Maxim Uvarov
2010-12-03 17:11   ` Stanislaw Gruszka
2010-12-03 17:54     ` Neil Horman
2010-12-07 10:50       ` Stanislaw Gruszka
2010-12-07 19:24         ` Yinghai Lu
2010-12-08 14:19           ` Stanislaw Gruszka
2010-12-09  7:16             ` Yinghai Lu
2010-12-09 12:41               ` Stanislaw Gruszka
2010-12-09 20:09                 ` Yinghai Lu
2010-12-13 10:08                   ` Stanislaw Gruszka
2010-12-13 18:20                     ` Yinghai Lu
2010-12-13 19:47                       ` H. Peter Anvin
2010-12-14 22:41                         ` Vivek Goyal
2010-12-15 10:39                           ` Stanislaw Gruszka
2010-12-15 22:41                             ` Yinghai Lu
2010-12-16  4:29                             ` Yinghai Lu
2010-12-16 10:00                               ` Stanislaw Gruszka
2010-12-16 16:16                                 ` H. Peter Anvin
2010-12-16 16:22                                   ` Vivek Goyal
2010-12-16 16:53                                     ` H. Peter Anvin
2010-12-18 21:50                                 ` Yinghai Lu
2010-12-16 14:39                               ` Vivek Goyal
2010-12-16 16:28                                 ` H. Peter Anvin
2010-12-16 17:28                                   ` Yinghai Lu
2010-12-16 19:58                                     ` H. Peter Anvin
2010-12-16 22:57                                       ` Yinghai Lu
2010-12-16 23:30                                       ` Yinghai Lu
2010-12-16 23:49                                         ` Yinghai Lu
2010-12-17  0:39                                           ` Yinghai Lu [this message]
2010-12-17  1:06                                             ` H. Peter Anvin
2010-12-17  1:21                                             ` H. Peter Anvin
2010-12-17  1:51                                             ` H. Peter Anvin
2010-12-17  3:05                                               ` Yinghai Lu
2010-12-17  3:07                                               ` Yinghai Lu
2010-12-17  3:19                                                 ` [tip:x86/urgent] x86-32: Make sure we can map all of lowmem if we need to tip-bot for H. Peter Anvin
2010-12-17 14:33                                                   ` Stanislaw Gruszka
2010-12-16 22:01                                     ` kdump broken on 2.6.37-rc4 Vivek Goyal
2010-12-16 22:58                                       ` Yinghai Lu
2010-12-17 16:15                                         ` Vivek Goyal
2010-12-17  1:15                                       ` H. Peter Anvin
2010-12-17  3:31                               ` H. Peter Anvin
2010-12-17  3:58                                 ` Yinghai
2010-12-17  4:08                                   ` H. Peter Anvin
2010-12-17  4:46                                     ` Yinghai Lu
2010-12-17  5:16                                       ` H. Peter Anvin
2010-12-17 17:01                                   ` Vivek Goyal
2010-12-17 17:56                                     ` H. Peter Anvin
2010-12-17 18:02                                       ` Vivek Goyal
2010-12-17 18:21                                         ` Yinghai Lu
2010-12-17 18:35                                           ` Vivek Goyal
2010-12-17 19:39                                           ` H. Peter Anvin
2010-12-17 19:46                                             ` Yinghai Lu
2010-12-17 19:50                                               ` Vivek Goyal
2010-12-17 19:52                                                 ` Yinghai Lu
2010-12-17 20:01                                                   ` Vivek Goyal
2010-12-17 20:06                                                     ` Yinghai Lu
2010-12-17 20:34                                                       ` Vivek Goyal
2010-12-17 23:51                                                         ` Vivek Goyal
2010-12-17 19:56                                                 ` H. Peter Anvin
2010-12-17 20:11                                                   ` Vivek Goyal
2010-12-17 20:59                                                     ` H. Peter Anvin
2010-12-17 21:13                                                       ` Vivek Goyal
2010-12-20 16:31                                                   ` Stanislaw Gruszka
2010-12-18  4:34                                                 ` [tip:x86/urgent] x86, kexec: Limit the crashkernel address appropriately tip-bot for H. Peter Anvin
2010-12-17 19:50                                               ` kdump broken on 2.6.37-rc4 H. Peter Anvin
2010-12-13 10:25                   ` Américo Wang
2010-12-05 14:35 ` Maciej Rutecki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D0AB14F.9050900@kernel.org \
    --to=yinghai@kernel.org \
    --cc=ebiederm@xmission.com \
    --cc=hpa@zytor.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=muvarov@gmail.com \
    --cc=nhorman@redhat.com \
    --cc=sgruszka@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).