From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753017AbcBBNen (ORCPT <rfc822;w@1wt.eu>);
	Tue, 2 Feb 2016 08:34:43 -0500
Received: from smtp02.citrix.com ([66.165.176.63]:1100 "EHLO SMTP02.CITRIX.COM"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1750717AbcBBNem (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Tue, 2 Feb 2016 08:34:42 -0500
X-IronPort-AV: E=Sophos;i="5.22,384,1449532800"; 
   d="scan'208";a="335528107"
Subject: Re: [Xen-devel] dom0 show call trace and failed to boot on HSW-EX
 platform
To: "Li, Liang Z" <liang.z.li@intel.com>,
        David Vrabel <david.vrabel@citrix.com>,
        Andrew Cooper <andrew.cooper3@citrix.com>,
        "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
References: <F2CBF3009FA73547804AE4C663CAB28E0373E0AC@SHSMSX101.ccr.corp.intel.com>
 <56B080C7.9070704@citrix.com> <56B08385.2060009@citrix.com>
 <F2CBF3009FA73547804AE4C663CAB28E03742372@shsmsx102.ccr.corp.intel.com>
CC: Daniel Kiper <daniel.kiper@oracle.com>, Tim Deegan <tim@xen.org>,
        "Jan Beulich" <JBeulich@suse.com>
From: David Vrabel <david.vrabel@citrix.com>
Message-ID: <56B0B06E.4010502@citrix.com>
Date: Tue, 2 Feb 2016 13:34:38 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Icedove/38.5.0
MIME-Version: 1.0
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E03742372@shsmsx102.ccr.corp.intel.com>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 7bit
X-DLP: MIA2
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 02/02/16 13:15, Li, Liang Z wrote:
>>>> We found dom0 will crash when booing on HSW-EX server, the dom0
>>>> kernel version is v4.4. By debugging I found the your patch '
>>>> x86/xen: discard RAM regions above the maximum reservation' , which
>> the commit ID is : f5775e0b6116b7e2425ccf535243b21 caused the regression.
>> The debug message is listed below:
>>>>
>> ==========================================================
>>>>  (XEN) mm.c:884:d0v14 pg_owner 0 l1e_owner 0, but real_pg_owner -1
>>>>  (XEN) mm.c:955:d0v14 Error getting mfn 1080000 (pfn
>>>> ffffffffffffffff) from L1
>>>>  (XEN) mm.c:1269:d0v14 Failure in alloc_l1_table: entry 0
>>>>  (XEN) mm.c:2175:d0v14 Error while validating mfn 188d903 (pfn
>>>> 17a7cc) for type
>>>>  (XEN) mm.c:3101:d0v14 Error -16 while pinning mfn 188d903
>>>>  [   33.768792] ------------[ cut here ]------------
>>>> WARNING: CPU: 14 PID: 1 at arch/x86/xen/multicalls.c:129 xen_mc_
>>>>  [   33.783809] Modules linked in:
>>>>  [   33.787304] CPU: 14 PID: 1 Comm: swapper/0 Not tainted 4.4.0 #1
>>>>  [   33.793991] Hardware name: Intel Corporation BRICKLAND/BRICKLAND,
>> BIOS
>>>>  [   33.805624]  0000000000000081 ffff88017d2537c8 ffffffff812ff954
>> 000000000000[24;80H[24;80H[24;80H[24;80H
>>>>  [   33.813961]  0000000000000000 0000000000000081 0000000000000000
>> ffff88017d25[24;80H[24;80H[24;80H[24;80H
>>>>  [   33.822300]  ffffffff810ca120 ffffffff81cb7f00 ffff8801879ca280
>> 000000000000[24;80H[24;80H[24;80H[24;80H
>>>>  [   33.830639] Call Trace:
>>>>  [   33.833457]  [<ffffffff812ff954>] dump_stack+0x48/0x64
>>>>  [   33.839277]  [<ffffffff810ca120>] warn_slowpath_common+0x90/0xd0
>>>>  [   33.846058]  [<ffffffff810ca175>] warn_slowpath_null+0x15/0x20
>>>>  [   33.852659]  [<ffffffff81060133>] xen_mc_flush+0x1c3/0x1d0
>>>>  [   33.858858]  [<ffffffff8106449f>] xen_alloc_pte+0x20f/0x300
>>>>  [   33.865158]  [<ffffffff810beef5>] ? update_page_count+0x45/0x60
>>>>  [   33.871855]  [<ffffffff817a1194>] ? phys_pte_init+0x170/0x183
>>>>  [   33.878345]  [<ffffffff817a148d>] phys_pmd_init+0x2e6/0x389
>>>>  [   33.884649]  [<ffffffff817a17dd>] phys_pud_init+0x2ad/0x3dc
>>>>  [   33.890954]  [<ffffffff817a290d>]
>> kernel_physical_mapping_init+0xec/0x211
>>>>  [   33.898613]  [<ffffffff8179df8d>] init_memory_mapping+0x17d/0x2f0
>>>>  [   33.905496]  [<ffffffff81104f11>] ?
>> __raw_callee_save___pv_queued_spin_unloc[24;80H[24;80H[24;80H[2
>> 4;80H[24;80H[24;80H[24;80H[24;80H[24;80H[24;80H[24;80H
>>>>  [   33.914516]  [<ffffffff813643f7>] ?
>> acpi_os_signal_semaphore+0x2e/0x32
>>>>  [   33.921889]  [<ffffffff810ba7b8>] arch_add_memory+0x48/0xf0
>>>>  [   33.928186]  [<ffffffff8179eb80>] add_memory_resource+0x80/0x110
>>>>  [   33.934967]  [<ffffffff8179ec8d>] add_memory+0x7d/0xc0
>>>>  [   33.940787]  [<ffffffff81399538>]
>> acpi_memory_device_add+0x14f/0x237
>>
>> We shouldn't be adding memory based on the ACPI tables.
>>
>> David
> 
> To solve this issue, what's your suggestion, simply revert? Or with a workaround?

Memory hotplug is not supported, and needs to be disabled.  You can use
"acpi_no_memhotplug" on the dom0 kernel command line or disable
CONFIG_ACPI_HOTPLUG_MEMORY.

David