From mboxrd@z Thu Jan  1 00:00:00 1970
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: Crash on boot with 2.6.37-rc8-git3
Date: Thu, 20 Jan 2011 14:24:34 -0500
Message-ID: <20110120192434.GA10001@dumpdata.com>
References: <alpine.LFD.2.02.1101072034080.9613@vega4.dur.ac.uk>
	<20110107212359.GA22976@dumpdata.com>
	<alpine.LFD.2.02.1101080007020.8723@vega3.dur.ac.uk>
	<20110110184225.GB9837@dumpdata.com>
	<alpine.LFD.2.02.1101180027120.16611@vega1.dur.ac.uk>
	<alpine.LFD.2.02.1101192249200.30335@vega4.dur.ac.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Return-path: <xen-devel-bounces@lists.xensource.com>
Content-Disposition: inline
In-Reply-To: <alpine.LFD.2.02.1101192249200.30335@vega4.dur.ac.uk>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: M A Young <m.a.young@durham.ac.uk>
Cc: xen-devel@lists.xensource.com
List-Id: xen-devel@lists.xenproject.org

On Wed, Jan 19, 2011 at 10:54:00PM +0000, M A Young wrote:
> On Tue, 18 Jan 2011, M A Young wrote:
> 
> >My next theory is that the issue is that the system is an
> >alignment issue. The NODE DATA is put in the range
> >00000000df659800 to 00000000df66d7ff (the top end of the second
> >"usable" chunk) and the problem come when it tries to write to the
> >final 2K piece (00000000df66d000 to 00000000df66d800 -
> >00000000df66d000 occurs on the stack) which hasn't been
> >initialized properly because it isn't a 4K piece.
> >Does this sound plausible?
> 
> Further experiments confirm that it is this 2K piece causing the
> problem - if I reserve the 2K chunk in the same was that NODE DATA
> is reserved (though without zeroing it) the system boots, if I
> reduce this to reserving only 1K then it doesn't.

I think my math is off here. The reserve call is made on the
df659800 -> df66d7ff, that would be 20 pages of data. The last
PFN df66d is where it dies b/c there is no PTE entry set for it?

What happens if you fudge the code so it allocates those pages to be
page aligned. So df65a000->df66e000 ? We skip this way the region
df659800->df659fff and start on a new PFN (and pte).