From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: BUG: unable to handle kernel paging request - balloon_init - xen-4.1.0 - 2.6.32.39 Date: Thu, 28 Apr 2011 14:30:19 -0400 Message-ID: <20110428183019.GA9852@dumpdata.com> References: <4DB60C04.6050802@sce.pridelands.org> <20110426031545.GB20779@dumpdata.com> <4DB6522A.9000304@sce.pridelands.org> <20110427200937.GA19853@dumpdata.com> <4DB8AAA6.4050808@sce.pridelands.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <4DB8AAA6.4050808@sce.pridelands.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Scott Garron Cc: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org On Wed, Apr 27, 2011 at 07:45:42PM -0400, Scott Garron wrote: > On 04/27/2011 04:09 PM, Konrad Rzeszutek Wilk wrote: > >Duh! I meant this one: > > > >[ 0.316665] RIP: e030:[] [] > >balloon_init+0x20b/0x25e > > > >Sorry about that. Can you also run your kernel with 'initcall_debug > >loglevel=8' please? > > Ok, I've put what I came up with here: > > http://www.pridelands.org/~simba/xen-debug/debugnotes.txt > > I also added a few pr_info() lines around the offending code to try > to get more of a handle of how far it is getting and what it's working > on at the time of failure: This looks quite odd. We had a flurry of issues like these before were we "forgot" to set the P2M table correctly. So that during [ 0.000000] init_memory_mapping: 0000000100000000-00000001d9ff0000 it would crash b/c for PFNs above the 'dom0_mem' paramater we would return INVALID value and the machine would crash - but only if the value was not aligned (git commit f06e457cb729d58430d1385014fab367b2d4e7c2) But that isn't the case here (dom0_mem=512M). And you say it boots fine under DomU - so there is some P2M, E820 funkiness happening here I think. Had you tried booting the kernel as Dom0 with different sizes of dom0_mem ("dom0_mem=max:2GB?") Or without the dom0_mem parameter at all? What is your CONFIG_XEN_MAX_DOMAIN_MEMORY set to? > > ******** > > diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c > index a065fda..b5f0650 100644 > --- a/drivers/xen/balloon.c > +++ b/drivers/xen/balloon.c > @@ -488,10 +488,13 @@ static int __init balloon_init(void) > */ > extra_pfn_end = min(min(max_pfn, e820_end_of_ram_pfn()), > (unsigned long)PFN_DOWN(xen_extra_mem_start > + xen_ex > + pr_info("extra_pfn_end: 0x%x", extra_pfn_end); /* debug */ > for (pfn = PFN_UP(xen_extra_mem_start); > pfn < extra_pfn_end; > pfn += balloon_npages) { > + pr_info("pfn: 0x%x", pfn); /* debug */ > page = pfn_to_page(pfn); > + pr_info("page: 0x%p", page); /* debug */ > /* totalram_pages doesn't include the boot-time > balloon extension, so don't subtract from it. */ > __balloon_append(page); > > > ******** > > The new serial console output, with "initcall_debug loglevel=8" and > the pr_info() additions to the code can be found here: > > http://www.pridelands.org/~simba/xen-debug/hailstorm-fullserial20110427.txt > > ... but I'll paste the part closest to the crash here for your convenience: > > [ 1.016663] calling balloon_init+0x0/0x280 @ 1 > [ 1.016663] xen_balloon: Initialising balloon driver with page order 0. > [ 1.033446] last_pfn = 0x1d9ff0 max_arch_pfn = 0x400000000 > [ 1.036663] extra_pfn_end: 0x1d9ff0 > [ 1.036663] pfn: 0x100000 > [ 1.036663] page: 0xffffea0003800000 > [ 1.036663] BUG: unable to handle kernel paging request at > ffffea0003800028 > [ 1.036663] IP: [] balloon_init+0x240/0x280 > [ 1.036663] PGD 18402067 PUD 18403067 PMD 0 > > > So the crash is happening within the first iteration of that for() > loop, presumably while calling __balloon_append(page). That's as far as > I dove into it so far, but I figured I'd give you an update as to what > I've found and tried. > > Just for more information sake, I also tried booting this kernel as > a paravirt domU under the Debian Stable 2.6.32-5-xen-amd64 stock kernel > and Xen 4.1.0. It booted without incident (aside from a ridiculously > long spew of printk's from my additions to that for() loop), so the > failure is specific to the kernel booting as a dom0. That probably > doesn't narrow down much, but I figured it was noteworthy. > > -- > Scott Garron