From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: Xen dom0 crash: "d0:v0: unhandled page fault (ec=0000)" Date: Mon, 1 Nov 2010 13:46:02 -0400 Message-ID: <20101101174602.GA6227@dumpdata.com> References: <19629.39326.337589.71778@wylie.me.uk> <1287498599.12843.2111.camel@qabil.uk.xensource.com> <4CBDB229.3030501@infinitumb.de> <1287503143.12843.2191.camel@qabil.uk.xensource.com> <4CBE2A43.70200@hfp.de> <1287564863.12843.4194.camel@qabil.uk.xensource.com> <1288367063.23619.51.camel@qabil.uk.xensource.com> <20101029161553.GA27408@dumpdata.com> <4CCEC2A8.6040103@goop.org> <20101101173940.GA6068@dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <20101101173940.GA6068@dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Jeremy Fitzhardinge Cc: "Alan J. Wylie" , "xen-devel@lists.xensource.com" , Gianni Tedesco , Stefan Kuhne , sven , Andreas Kinzler List-Id: xen-devel@lists.xenproject.org On Mon, Nov 01, 2010 at 01:39:40PM -0400, Konrad Rzeszutek Wilk wrote: > > >> http://pastebin.com/3m0DpDdW - 2.6.32.24-gd0054d6-dirty - broken > .. snip.. > > The way is this is supposed to work is: > > > > 1. Xen gives the domain N pages > > 2. There's an E820 which describes M pages (M > N) > > 3. The kernel traverses the existing E820 and finds holes and adds > > the memory to a new E820_RAM region beyond M > > 4. Set up P2M for pages up to N > > 5. When the kernel maps all "RAM", the region from N-M is not > > present, and has no valid P2M mapping; in that case, xen_make_pte > > will return a non-present pte. > > Right, and somehow his machine/kernel is not doing this. His 'N' ends up being 'M' so > the region N-M is added to the "RAM", and xen_make_pte I _think_ returns a non-present pte > (or maybe it does present a present pte?) In the previous kernel (2.6.32.18), it > does exactly what you described. Not that I am actually sure what is causing this. The interesting part is that he sees this twice: [ 0.000000] last_pfn = 0x2d0699 max_arch_pfn = 0x400000000 [ 0.000000] last_pfn = 0x2f000 max_arch_pfn = 0x400000000 And he mentioned on IRC to me that this was not due to any debugging patches.