From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751460AbcEKIJW (ORCPT ); Wed, 11 May 2016 04:09:22 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:46864 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751308AbcEKIJQ (ORCPT ); Wed, 11 May 2016 04:09:16 -0400 Subject: Re: [Xen-devel] bad page flags booting 32bit dom0 on 64bit hypervisor using dom0_mem (kernel >=4.2) To: xen-devel , Linux Kernel Mailing List References: <57273050.6060300@canonical.com> <57273CDE.10300@suse.com> <5727632C.1020209@canonical.com> From: Stefan Bader X-Enigmail-Draft-Status: N1110 Cc: Juergen Gross , David Vrabel , Mel Gorman , Nathan Zimmer Message-ID: <5732E89A.1040501@canonical.com> Date: Wed, 11 May 2016 10:08:58 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: <5727632C.1020209@canonical.com> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="xuhiK4gpfq0QFlrLuvuQicOHcvetN3ccp" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --xuhiK4gpfq0QFlrLuvuQicOHcvetN3ccp Content-Type: multipart/mixed; boundary="------------030301020703040008000309" This is a multi-part message in MIME format. --------------030301020703040008000309 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 02.05.2016 16:24, Stefan Bader wrote: > On 02.05.2016 13:41, Juergen Gross wrote: >> On 02/05/16 12:47, Stefan Bader wrote: >>> I recently tried to boot 32bit dom0 on 64bit Xen host which I configu= red to run >>> with a limited, fix amount of memory for dom0. It seems that somewher= e between >>> kernel versions 3.19 and 4.2 (sorry that is still a wide range) the L= inux kernel >>> would report bad page flags for a range of pages (which seem to be ar= ound the >>> end of the guest pfn range). For a 4.2 kernel that was easily missed = as the boot >>> finished ok and dom0 was accessible. However starting with 4.4 (teste= d 4.5 and a >>> 4.6-rc) the serial console output freezes after some of those bad pag= e flag >>> messages and then (unfortunately without any further helpful output) = the host >>> reboots (I assume there is a panic that triggers a reset). >>> >>> I suspect the problem is more a kernel side one. It is just possible = to >>> influence things by variation of dom0_mem=3D#,max:#. 512M seems ok, 1= 024M, 2048M, >>> and 3072M cause bad page flags starting around kernel 4.2 and reboots= around >>> 4.4. Then 4096M and not clamping dom0 memory seem to be ok again (tho= ugh not >>> limiting dom0 memory seems to cause trouble on 32bit dom0 later when = a domU >>> tries to balloon memory, but I think that is a different problem). >>> >>> I have not seen this on a 64bit dom0. Below is an example of those ba= d page >>> errors. Somehow it looks to be a page marked as reserved. Initially I= wondered >>> whether this could be a problem of not clearing page flags when movin= g mappings >>> to match the e820. But I never looked into i386 memory setup in that = detail. So >>> I am posting this, hoping that someone may have an idea from the deta= il about >>> where to look next. PAE is enabled there. Usually its bpf init that g= ets hit but >>> that likely is just because that is doing the first vmallocs. >> >> Could you please post the kernel config, Xen and dom0 boot parameters?= >> I'm quite sure this is no common problem as there are standard tests >> running for each kernel version including 32 bit dom0 with limited >> memory size. >=20 > Hi J=C3=BCrgen, >=20 > sure. Though by doing that I realized where I actually messed the whole= thing > up. I got the max limit syntax completely wrong. :( Instead of the corr= ect > "dom0_mem=3D1024M,max:1024M" I am using "dom0_mem=3D1024M:max=3D1024M" = which I guess > is like not having max set at all. Not sure whether that is a valid use= case. >=20 > When I actually do the dom0_mem argument right, there are no bad page f= lag > errors even in 4.4 with 1024M limit. I was at least consistent in my > mis-configuration, so doing the same stupid thing on 64bit seems to be = handled > more gracefully. >=20 > Likely false alarm. But at least cut&pasting the config into mail made = me spot > the problem... >=20 Ok, thinking that "dom0_mem=3Dx" (without a max or min) still is a valid = case, I went ahead and did a bisect for when the bad page flag issue started. I e= nded up at: 92923ca "mm: meminit: only set page reserved in the memblock region" And with a few more printks in the new functions I finally realized why t= his goes wrong. The new reserve_bootmem_region is using unsigned long for sta= rt and end addresses which just isn't working too well for 32bit. For Xen dom0 the problem with that can just be more easily triggered. Whe= n dom0 memory is limited to a small size but allowed to balloon for more, the additional system memory is put into reserved regions. In my case a host with 8G memory and say 1G initial dom0 memory this crea= ted (apart from other) one reserved region which started at 4GB and covered t= he remaining 4G of host memory. Which reserve_bootmem_region() got as 0-4G d= ue to the unsigned long conversion. This basically marked *all* memory below 4G= as reserved. The fix is relatively simple, just use phys_addr_t for start and end. I t= ested this on 4.2 and 4.4 kernels. Both now boot without errors and neither doe= s the 4.4 kernel crash. Maybe still not 100% safe when running on very large me= mory systems (if I did not get the math wrong 16T) but at least some improveme= nt... -Stefan --------------030301020703040008000309 Content-Type: text/x-diff; name="0001-mm-Use-phys_addr_t-for-reserve_bootmem_region-argume.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename*0="0001-mm-Use-phys_addr_t-for-reserve_bootmem_region-argume.pa"; filename*1="tch" =46rom 1588a8b3983f63f8e690b91e99fe631902e38805 Mon Sep 17 00:00:00 2001 From: Stefan Bader Date: Tue, 10 May 2016 19:05:16 +0200 Subject: [PATCH] mm: Use phys_addr_t for reserve_bootmem_region arguments= Since 92923ca the reserved bit is set on reserved memblock regions. However start and end address are passed as unsigned long. This is only 32bit on i386, so it can end up marking the wrong pages reserved for ranges at 4GB and above. This was observed on a 32bit Xen dom0 which was booted with initial memory set to a value below 4G but allowing to balloon in memory (dom0_mem=3D1024M for example). This would define a reserved bootmem region for the additional memory (for example on a 8GB system there was a reverved region covering the 4GB-8GB range). But since the addresses were passed on as unsigned long, this was actually marking all pages from 0 to 4GB as reserved. Fixes: 92923ca "mm: meminit: only set page reserved in the memblock regio= n" Signed-off-by: Stefan Bader Cc: # 4.2+ --- include/linux/mm.h | 2 +- mm/page_alloc.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index b56ff72..4c1ff62 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1715,7 +1715,7 @@ extern void free_highmem_page(struct page *page); extern void adjust_managed_page_count(struct page *page, long count); extern void mem_init_print_info(const char *str); =20 -extern void reserve_bootmem_region(unsigned long start, unsigned long en= d); +extern void reserve_bootmem_region(phys_addr_t start, phys_addr_t end); =20 /* Free the reserved page into the buddy system, so it gets managed. */ static inline void __free_reserved_page(struct page *page) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c69531a..eb66f89 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -951,7 +951,7 @@ static inline void init_reserved_page(unsigned long p= fn) * marks the pages PageReserved. The remaining valid pages are later * sent to the buddy page allocator. */ -void __meminit reserve_bootmem_region(unsigned long start, unsigned long= end) +void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end= ) { unsigned long start_pfn =3D PFN_DOWN(start); unsigned long end_pfn =3D PFN_UP(end); --=20 1.9.1 --------------030301020703040008000309-- --xuhiK4gpfq0QFlrLuvuQicOHcvetN3ccp Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBCgAGBQJXMuihAAoJEOhnXe7L7s6j0BQP/iEpTvcgAP/w+elW9OE4Zof1 wqqs3czSUK+qP5J1zOBLUdb3/zHuWQbY1XsJkYYaqRcu20jKpHH2ooK6C6oAGFxD ec5TGf6JtWb3Y/FkqJvnkmJopIZR+eo9o8GGSs0LvEpiIAs/2nSicruIAJRBJPFr dyTmHuB/bM4+7lCpJcnn2OMijUBz0lzgPybF1zJzMXTak6BFihLjxlbKyU4QGvif jqICc6ZwtJRLxbdhEWxqu9w7QhKik7j2Lmuf+he7OKRCTvegYCOpd5z+ccNAOsRz 2QndeN7mLSvh0iqOkoUTSadxialkSCI92qlP1Y6/2setVOlkt1lPjFbys7ouo5Kj UHqu8/clyrGfGnZYxjMOOOrLnIDg5qx7TMxOrNdjLHgZ7FkgkEwcftOrhPn0pUS5 XAM1i0nr4rCJDN4MHjl175ged76NHxsRMdOkX/4OeFwyLSSnXFWw0BQecl2HclKh 6j1MKDwjrhwjUaZF8D07S1VNqLOdnzXwL6mOLgBjcbTEayDbibqWibC/FGtFAcAG 3NV+aa1D/Hdx5Q55CdRFTBbB4/8z9zU+VCHxY1CpjQbnwvHhIgbhnXu52EJB6FuC X0FwEiBoBrrJAZ1AJ0ENrHdz5taZ8Au9F/ncW+22hL+ZQsmqJuRe9rW0Pt51rn/D 901TEmM1SorvCCOEOoTT =H0bE -----END PGP SIGNATURE----- --xuhiK4gpfq0QFlrLuvuQicOHcvetN3ccp--