From: Vivek Goyal <vgoyal@in.ibm.com>
To: Mel Gorman <mel@skynet.ie>
Cc: Steve Fox <drfickle@us.ibm.com>, Andi Kleen <ak@suse.de>,
Badari Pulavarty <pbadari@us.ibm.com>,
Martin Bligh <mbligh@mbligh.org>, Andrew Morton <akpm@osdl.org>,
lkml <linux-kernel@vger.kernel.org>,
netdev@vger.kernel.org, kmannth@us.ibm.com,
Andy Whitcroft <apw@shadowen.org>
Subject: Re: 2.6.18-mm2 boot failure on x86-64
Date: Fri, 6 Oct 2006 13:34:41 -0400 [thread overview]
Message-ID: <20061006173441.GC19756@in.ibm.com> (raw)
In-Reply-To: <20061006171105.GC9881@skynet.ie>
On Fri, Oct 06, 2006 at 06:11:05PM +0100, Mel Gorman wrote:
> On (06/10/06 11:36), Vivek Goyal didst pronounce:
> > On Fri, Oct 06, 2006 at 03:33:12PM +0100, Mel Gorman wrote:
> > > > Linux version 2.6.18-git22 (root@elm3b239) (gcc version 4.1.0 (SUSE Linux)) #2 SMP Thu Oct 5 19:05:36 PDT 2006
> > > > Command line: root=/dev/sda1 vga=791 ip=9.47.67.239:9.47.67.50:9.47.67.1:255.255.255.0 resume=/dev/sdb1 showopts earlyprintk=serial,ttyS0,57600 console=tty0 console=ttyS0,57600 autobench_args: root=/dev/sda1 ABAT:1160100417
> > > > BIOS-provided physical RAM map:
> > > > BIOS-e820: 0000000000000000 - 000000000009ac00 (usable)
> > > > BIOS-e820: 000000000009ac00 - 00000000000a0000 (reserved)
> > > > BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
> > > > BIOS-e820: 0000000000100000 - 00000000bff764c0 (usable)
> > > > BIOS-e820: 00000000bff764c0 - 00000000bff98880 (ACPI data)
> > > > BIOS-e820: 00000000bff98880 - 00000000c0000000 (reserved)
> > > > BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
> > > > BIOS-e820: 0000000100000000 - 0000000c00000000 (usable)
> > >
> > > I continued what Steve was doing this morning to see could this be
> > > pinned down. After placing 'CHECK;' in a few places as suggested by
> > > Andi's check, the problem code was identified as that following in
> > > mm/bootmem.c#init_bootmem_core()
> > >
> > > mapsize = get_mapsize(bdata);
> > > memset(bdata->node_bootmem_map, 0xff, mapsize);
> > >
> > > That explains the value in the array at least. A few more printfs around
> > > this point printed out the following in the boot log
> > >
> > > init_bootmem_core(0, 1909, 0, 12582912)
> > > init_bootmem_core: Calling memset(0xFFFF810000775000, 1572864)
> > > AAGH: afinfo corrupted at mm/bootmem.c:121
> > >
> > > where;
> > >
> > > 1909 == mapstart
> > > 0 == start
> > > 12582912 == end
> > > 1572864 == mapsize
> > >
> > > mapstart, start and end being the parameters being passed to
> > > init_bootmem_core(). This means we are calling memset for the physical
> > > range 0x775000 -> 0x8F5000 which is in a usable range according to the
> > > BIOS-e820 map it appears.
> > >
> >
> > Hi Mel,
> >
>
> Hi.
>
> > Where is bss placed in physical memory? I guess bss_start and bss_stop
> > from System.map will tell us. That will confirm that above memset step is
> > stomping over bss. Then we have to just find that somewhere probably
> > we allocated wrong physical memory area for bootmem allocator map.
> >
>
> BSS is at 0x643000 -> 0x777BC4
> init_bootmem wipes from 0x777000 -> 0x8F7000
>
> So the BSS bytes from 0x777000 ->0x777BC4 (which looks very suspiciously
> pile a page alignment of addr & PAGE_MASK) gets set to 0xFF. One possible
> fix is below. It adds a check in bad_addr() to see if the BSS section is
> about to be used for bootmap. It Seems To Work For Me (tm) and illustrates
> the source of the problem even if it's not the 100% correct fix.
>
> diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-git22-clean/arch/x86_64/kernel/e820.c linux-2.6.18-git22-bss_relocate_fix/arch/x86_64/kernel/e820.c
> --- linux-2.6.18-git22-clean/arch/x86_64/kernel/e820.c 2006-10-05 20:42:07.000000000 +0100
> +++ linux-2.6.18-git22-bss_relocate_fix/arch/x86_64/kernel/e820.c 2006-10-06 17:39:51.000000000 +0100
> @@ -51,6 +51,7 @@ extern struct resource code_resource, da
> static inline int bad_addr(unsigned long *addrp, unsigned long size)
> {
> unsigned long addr = *addrp, last = addr + size;
> + unsigned long bss_start, bss_end;
>
> /* various gunk below that needed for SMP startup */
> if (addr < 0x8000) {
> @@ -77,6 +78,14 @@ static inline int bad_addr(unsigned long
> *addrp = __pa_symbol(&_end);
> return 1;
> }
> +
> + /* bss section */
> + bss_start = __pa_symbol(&__bss_start);
> + bss_end = PAGE_ALIGN(__pa_symbol(&__bss_stop));
> + if (addr >= bss_start && addr < bss_end) {
> + *addrp = bss_end;
> + return 1;
> + }
>
Surprising, the kernel code check just before this should have taken care
of it.
/* kernel code */
if (last >= __pa_symbol(&_text) && last < __pa_symbol(&_end)) {
*addrp = __pa_symbol(&_end);
return 1;
}
May be it can be changed to
if (last >= __pa_symbol(&_text) && last < PAGE_ALIGN(__pa_symbol(&_end))) {
But all this seem to be a stopgap fix. Still the real puzzle is exactly
where did it slip out and should be fixed there.
May be some more printks will help us.
Thanks
Vivek
next prev parent reply other threads:[~2006-10-06 17:35 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20060928014623.ccc9b885.akpm@osdl.org>
[not found] ` <efh217$8au$1@sea.gmane.org>
2006-09-28 21:01 ` 2.6.18-mm2 Andrew Morton
2006-09-28 22:45 ` 2.6.18-mm2 Stephen Hemminger
2006-10-04 13:42 ` 2.6.18-mm2 boot failure on x86-64 Steve Fox
2006-10-04 15:45 ` Andrew Morton
2006-10-04 15:55 ` Vivek Goyal
2006-10-04 15:56 ` Andi Kleen
2006-10-05 1:57 ` Keith Mannthey
2006-10-04 16:41 ` Steve Fox
2006-10-05 0:06 ` Andrew Morton
2006-10-05 0:51 ` Vivek Goyal
2006-10-05 0:57 ` Andi Kleen
2006-10-05 1:08 ` Martin Bligh
2006-10-05 2:05 ` Keith Mannthey
2006-10-05 14:53 ` Steve Fox
2006-10-05 15:12 ` Badari Pulavarty
2006-10-05 15:32 ` Steve Fox
2006-10-05 15:40 ` Andi Kleen
2006-10-05 17:57 ` Steve Fox
2006-10-05 18:27 ` Andi Kleen
2006-10-05 18:51 ` Steve Fox
2006-10-05 19:05 ` Andi Kleen
2006-10-05 20:42 ` Steve Fox
2006-10-05 20:50 ` Andi Kleen
2006-10-06 2:23 ` Steve Fox
2006-10-06 14:33 ` Mel Gorman
2006-10-06 15:36 ` Vivek Goyal
2006-10-06 17:11 ` Mel Gorman
2006-10-06 17:34 ` Vivek Goyal [this message]
2006-10-06 17:59 ` Vivek Goyal
2006-10-06 18:03 ` Steve Fox
2006-10-06 20:04 ` Vivek Goyal
2006-10-09 9:53 ` Mel Gorman
2006-10-16 18:16 ` Vivek Goyal
2006-10-16 23:58 ` Andrew Morton
2006-10-17 12:18 ` Adrian Bunk
2006-10-17 17:32 ` Mel Gorman
2006-10-05 18:52 ` Vivek Goyal
2006-10-05 19:08 ` Andi Kleen
2006-10-05 20:25 ` Steve Fox
2006-10-05 20:39 ` Mel Gorman
2006-10-05 20:51 ` Andi Kleen
2006-10-05 23:14 ` 2.6.18-mm2 boot failure on x86-64 II Andi Kleen
2006-10-05 23:32 ` keith mannthey
2006-10-05 23:35 ` Andi Kleen
2006-10-05 23:58 ` keith mannthey
2006-10-06 0:02 ` Badari Pulavarty
2006-10-06 0:12 ` Andrew Morton
[not found] ` <200609290319.k8T3JOwS005455@turing-police.cc.vt.edu>
[not found] ` <20060928202931.dc324339.akpm@osdl.org>
[not found] ` <200609291519.k8TFJfvw004256@turing-police.cc.vt.edu>
[not found] ` <20060929124558.33ef6c75.akpm@osdl.org>
2006-09-30 0:01 ` 2.6.18-mm2 - oops in cache_alloc_refill() Valdis.Kletnieks
2006-09-30 1:20 ` Andrew Morton
2006-09-30 1:33 ` Jean Tourrilhes
2006-09-30 3:31 ` Valdis.Kletnieks
2006-09-30 7:50 ` Valdis.Kletnieks
2006-09-30 8:33 ` Andrew Morton
2006-09-30 1:40 ` Jean Tourrilhes
2006-09-30 3:31 ` Valdis.Kletnieks
2006-09-30 1:57 ` Makefile for linux modules x z
2006-09-30 8:55 ` Sam Ravnborg
2006-09-30 1:59 ` x z
2006-10-02 17:52 ` 2.6.18-mm2 - oops in cache_alloc_refill() Jean Tourrilhes
2006-10-02 19:57 ` Valdis.Kletnieks
2006-10-03 15:58 ` Samuel Tardieu
2006-10-03 16:34 ` Jean Tourrilhes
2006-10-03 16:45 ` Samuel Tardieu
2006-10-03 17:07 ` Jean Tourrilhes
2006-10-05 22:37 ` Pavel Roskin
2006-10-05 22:42 ` Jean Tourrilhes
[not found] ` <20060930133706.GA3291@melchior.yamamaya.is-a-geek.org>
2006-09-30 19:53 ` 2.6.18-mm2 Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061006173441.GC19756@in.ibm.com \
--to=vgoyal@in.ibm.com \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=apw@shadowen.org \
--cc=drfickle@us.ibm.com \
--cc=kmannth@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mbligh@mbligh.org \
--cc=mel@skynet.ie \
--cc=netdev@vger.kernel.org \
--cc=pbadari@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).