netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@in.ibm.com>
To: Mel Gorman <mel@skynet.ie>
Cc: Steve Fox <drfickle@us.ibm.com>, Andi Kleen <ak@suse.de>,
	Badari Pulavarty <pbadari@us.ibm.com>,
	Martin Bligh <mbligh@mbligh.org>, Andrew Morton <akpm@osdl.org>,
	lkml <linux-kernel@vger.kernel.org>,
	netdev@vger.kernel.org, kmannth@us.ibm.com,
	Andy Whitcroft <apw@shadowen.org>
Subject: Re: 2.6.18-mm2 boot failure on x86-64
Date: Fri, 6 Oct 2006 13:59:50 -0400	[thread overview]
Message-ID: <20061006175950.GD19756@in.ibm.com> (raw)
In-Reply-To: <20061006171105.GC9881@skynet.ie>

On Fri, Oct 06, 2006 at 06:11:05PM +0100, Mel Gorman wrote:
> On (06/10/06 11:36), Vivek Goyal didst pronounce:
> > On Fri, Oct 06, 2006 at 03:33:12PM +0100, Mel Gorman wrote:
> > > > Linux version 2.6.18-git22 (root@elm3b239) (gcc version 4.1.0 (SUSE Linux)) #2 SMP Thu Oct 5 19:05:36 PDT 2006
> > > > Command line: root=/dev/sda1 vga=791  ip=9.47.67.239:9.47.67.50:9.47.67.1:255.255.255.0 resume=/dev/sdb1 showopts earlyprintk=serial,ttyS0,57600 console=tty0 console=ttyS0,57600 autobench_args: root=/dev/sda1 ABAT:1160100417
> > > > BIOS-provided physical RAM map:
> > > >  BIOS-e820: 0000000000000000 - 000000000009ac00 (usable)
> > > >  BIOS-e820: 000000000009ac00 - 00000000000a0000 (reserved)
> > > >  BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
> > > >  BIOS-e820: 0000000000100000 - 00000000bff764c0 (usable)
> > > >  BIOS-e820: 00000000bff764c0 - 00000000bff98880 (ACPI data)
> > > >  BIOS-e820: 00000000bff98880 - 00000000c0000000 (reserved)
> > > >  BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
> > > >  BIOS-e820: 0000000100000000 - 0000000c00000000 (usable)
> > > 
> > > I continued what Steve was doing this morning to see could this be
> > > pinned down. After placing 'CHECK;' in a few places as suggested by
> > > Andi's check, the problem code was identified as that following in
> > > mm/bootmem.c#init_bootmem_core()
> > > 
> > >         mapsize = get_mapsize(bdata);
> > >         memset(bdata->node_bootmem_map, 0xff, mapsize);
> > > 
> > > That explains the value in the array at least. A few more printfs around
> > > this point printed out the following in the boot log
> > > 
> > > init_bootmem_core(0, 1909, 0, 12582912)
> > > init_bootmem_core: Calling memset(0xFFFF810000775000, 1572864)
> > > AAGH: afinfo corrupted at mm/bootmem.c:121
> > > 
> > > where;
> > > 
> > > 1909 == mapstart
> > > 0 == start
> > > 12582912 == end
> > > 1572864 == mapsize
> > > 
> > > mapstart, start and end being the parameters being passed to
> > > init_bootmem_core(). This means we are calling memset for the physical
> > > range 0x775000 -> 0x8F5000 which is in a usable range according to the
> > > BIOS-e820 map it appears.
> > > 
> > 
> > Hi Mel,
> > 
> 
> Hi.
> 
> > Where is bss placed in physical memory? I guess bss_start and bss_stop
> > from System.map will tell us. That will confirm that above memset step is
> > stomping over bss. Then we have to just find that somewhere probably
> > we allocated wrong physical memory area for bootmem allocator map.
> > 
> 
> BSS is at 0x643000 -> 0x777BC4
> init_bootmem wipes from 0x777000 -> 0x8F7000
> 
> So the BSS bytes from 0x777000 ->0x777BC4 (which looks very suspiciously
> pile a page alignment of addr & PAGE_MASK) gets set to 0xFF. One possible
> fix is below. It adds a check in bad_addr() to see if the BSS section is
> about to be used for bootmap. It Seems To Work For Me (tm) and illustrates
> the source of the problem even if it's not the 100% correct fix.
> 

Ok, it looks like that code is assuming that memory area returned by
find_e820_area() is page aligned. I found two such instances and that's
what is leading to problem.

        bootmap_size = init_bootmem_node(NODE_DATA(nodeid),
                                         bootmap_start >> PAGE_SHIFT,
                                         start_pfn, end_pfn);

Here bootmap_start is not page aligned and I guess  currently should
contain the value 0x777BC4 (just beyond _end). But the moement I do
bootmap_start>>PAGE_SHIFT, I start stomping bss.

Similar is the case here.

        bootmap = find_e820_area(0, end_pfn<<PAGE_SHIFT, bootmap_size);
        if (bootmap == -1L)
                panic("Cannot find bootmem map of size %ld\n",bootmap_size);
        bootmap_size = init_bootmem(bootmap >> PAGE_SHIFT, end_pfn);

So may be we should return a page aligned address from find_e820_area(). 
May be we can change bad_addr() to set *addrp to next page aligned 
boundary for every check?

 	*addrp = PAGE_ALIGN(__pa_symbol(&_end));

Thanks
Vivek

  parent reply	other threads:[~2006-10-06 18:00 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20060928014623.ccc9b885.akpm@osdl.org>
     [not found] ` <efh217$8au$1@sea.gmane.org>
2006-09-28 21:01   ` 2.6.18-mm2 Andrew Morton
2006-09-28 22:45     ` 2.6.18-mm2 Stephen Hemminger
2006-10-04 13:42     ` 2.6.18-mm2 boot failure on x86-64 Steve Fox
2006-10-04 15:45       ` Andrew Morton
2006-10-04 15:55         ` Vivek Goyal
2006-10-04 15:56         ` Andi Kleen
2006-10-05  1:57           ` Keith Mannthey
2006-10-04 16:41         ` Steve Fox
2006-10-05  0:06           ` Andrew Morton
2006-10-05  0:51             ` Vivek Goyal
2006-10-05  0:57               ` Andi Kleen
2006-10-05  1:08                 ` Martin Bligh
2006-10-05  2:05                   ` Keith Mannthey
2006-10-05 14:53                   ` Steve Fox
2006-10-05 15:12                     ` Badari Pulavarty
2006-10-05 15:32                       ` Steve Fox
2006-10-05 15:40                         ` Andi Kleen
2006-10-05 17:57                           ` Steve Fox
2006-10-05 18:27                             ` Andi Kleen
2006-10-05 18:51                               ` Steve Fox
2006-10-05 19:05                                 ` Andi Kleen
2006-10-05 20:42                                   ` Steve Fox
2006-10-05 20:50                                     ` Andi Kleen
2006-10-06  2:23                                       ` Steve Fox
2006-10-06 14:33                                         ` Mel Gorman
2006-10-06 15:36                                           ` Vivek Goyal
2006-10-06 17:11                                             ` Mel Gorman
2006-10-06 17:34                                               ` Vivek Goyal
2006-10-06 17:59                                               ` Vivek Goyal [this message]
2006-10-06 18:03                                               ` Steve Fox
2006-10-06 20:04                                                 ` Vivek Goyal
2006-10-09  9:53                                                   ` Mel Gorman
2006-10-16 18:16                                                     ` Vivek Goyal
2006-10-16 23:58                                                       ` Andrew Morton
2006-10-17 12:18                                                         ` Adrian Bunk
2006-10-17 17:32                                                           ` Mel Gorman
2006-10-05 18:52                               ` Vivek Goyal
2006-10-05 19:08                                 ` Andi Kleen
2006-10-05 20:25                                   ` Steve Fox
2006-10-05 20:39                                   ` Mel Gorman
2006-10-05 20:51                                     ` Andi Kleen
2006-10-05 23:14                                       ` 2.6.18-mm2 boot failure on x86-64 II Andi Kleen
2006-10-05 23:32                                         ` keith mannthey
2006-10-05 23:35                                           ` Andi Kleen
2006-10-05 23:58                                             ` keith mannthey
2006-10-06  0:02                                               ` Badari Pulavarty
2006-10-06  0:12                                                 ` Andrew Morton
     [not found] ` <200609290319.k8T3JOwS005455@turing-police.cc.vt.edu>
     [not found]   ` <20060928202931.dc324339.akpm@osdl.org>
     [not found]     ` <200609291519.k8TFJfvw004256@turing-police.cc.vt.edu>
     [not found]       ` <20060929124558.33ef6c75.akpm@osdl.org>
2006-09-30  0:01         ` 2.6.18-mm2 - oops in cache_alloc_refill() Valdis.Kletnieks
2006-09-30  1:20           ` Andrew Morton
2006-09-30  1:33             ` Jean Tourrilhes
2006-09-30  3:31               ` Valdis.Kletnieks
2006-09-30  7:50                 ` Valdis.Kletnieks
2006-09-30  8:33                   ` Andrew Morton
2006-09-30  1:40             ` Jean Tourrilhes
2006-09-30  3:31               ` Valdis.Kletnieks
2006-09-30  1:57             ` Makefile for linux modules x z
2006-09-30  8:55               ` Sam Ravnborg
2006-09-30  1:59             ` x z
2006-10-02 17:52             ` 2.6.18-mm2 - oops in cache_alloc_refill() Jean Tourrilhes
2006-10-02 19:57               ` Valdis.Kletnieks
2006-10-03 15:58               ` Samuel Tardieu
2006-10-03 16:34                 ` Jean Tourrilhes
2006-10-03 16:45                   ` Samuel Tardieu
2006-10-03 17:07                     ` Jean Tourrilhes
2006-10-05 22:37                   ` Pavel Roskin
2006-10-05 22:42                     ` Jean Tourrilhes
     [not found] ` <20060930133706.GA3291@melchior.yamamaya.is-a-geek.org>
2006-09-30 19:53   ` 2.6.18-mm2 Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061006175950.GD19756@in.ibm.com \
    --to=vgoyal@in.ibm.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=apw@shadowen.org \
    --cc=drfickle@us.ibm.com \
    --cc=kmannth@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbligh@mbligh.org \
    --cc=mel@skynet.ie \
    --cc=netdev@vger.kernel.org \
    --cc=pbadari@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).