From mboxrd@z Thu Jan  1 00:00:00 1970
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: acpidump crashes on some machines
Date: Thu, 23 Aug 2012 10:10:55 -0400
Message-ID: <20120823141055.GH26098@phenom.dumpdata.com>
References: <4FE1C423.6070001@amd.com>
	<20120620145127.GD12787@phenom.dumpdata.com>
	<4FE32DDA.10204@amd.com>
	<20120630014825.GA7003@phenom.dumpdata.com>
	<20120630021936.GA27100@phenom.dumpdata.com>
	<50114002.2030700@amd.com>
	<20120817205207.GA3002@phenom.dumpdata.com>
	<503602A0.9040908@amd.com> <50360465.3090602@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Content-Disposition: inline
In-Reply-To: <50360465.3090602@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: David Vrabel <david.vrabel@citrix.com>
Cc: Andre Przywara <andre.przywara@amd.com>, Jeremy Fitzhardinge <jeremy@goop.org>, xen-devel <xen-devel@lists.xensource.com>, Jan Beulich <JBeulich@suse.com>
List-Id: xen-devel@lists.xenproject.org

On Thu, Aug 23, 2012 at 11:22:29AM +0100, David Vrabel wrote:
> On 23/08/12 11:14, Andre Przywara wrote:
> > On 08/17/2012 10:52 PM, Konrad Rzeszutek Wilk wrote:
> >> On Thu, Jul 26, 2012 at 03:02:58PM +0200, Andre Przywara wrote:
> >>> On 06/30/2012 04:19 AM, Konrad Rzeszutek Wilk wrote:
> >>>
> >>> Konrad, David,
> >>>
> >>> back on track for this issue. Thanks for your input, I could do some
> >>> more debugging (see below for a refresh):
> >>>
> >>> It seems like it affects only the first page of the 1:1 mapping. I
> >>> didn't have an issues with the last PFN or the page behind it (which
> >>> failed properly).
> >>>
> >>> David, thanks for the hint with varying dom0_mem parameter. I
> >>> thought I already checked this, but I did it once again and it
> >>> turned out that it is only an issue if dom0_mem is smaller than the
> >>> ACPI area, which generates a hole in the memory map. So we have
> >>> (simplified)
> >>> * 1:1 mapping to 1 MB
> >>> * normal mapping till dom0_mem
> >>> * unmapped area till ACPI E820 area
> >>> * ACPI E820 1:1 mapping
> >>>
> >>> As far as I could chase it down the 1:1 mapping itself looks OK, I
> >>> couldn't find any off-by-one bugs here. So maybe it is code that
> >>> later on invalidates areas between the normal guest mapping and the
> >>> ACPI mem?
> >>
> >> I think I found it. Can you try this pls [and if you can't find
> >> early_to_phys.. just use the __set_phys_to call]
> > 
> > Yes, that works. At least after a quick test on my test box. Both the 
> > test module and acpidump work as expected. If I replace the "<" in your 
> > patch with the original "<=", I get the warning (and due to the 
> > "continue" it also works).
> 
> Note that the balloon driver could subsequently overwrite the p2m entry.

Hmm, I am not seeing how.. the region that is passed in is right up to
the PFN (I believe). And I did run with this patch over a couple of days
with ballooning up and down. But maybe I missed something?

Let me prep a patch that adds some more checks in the balloon driver
just in case we do hit this.

>  I don't think it is worth redoing the patch to adjust the region passed
> to the balloon driver to avoid this though.
> 
> > I also successfully tested the minimal fix (just replacing <= with <).
> > I will feed it to the testers here to cover more machines.
> > 
> > Do you want to keep the warnings in (which exceed 80 characters, btw)?
> 
> I think we do.
> 
> David