From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56455) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bO61l-0004nh-3o for qemu-devel@nongnu.org; Fri, 15 Jul 2016 12:31:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bO61g-0004sQ-SB for qemu-devel@nongnu.org; Fri, 15 Jul 2016 12:31:48 -0400 Received: from 4.mo178.mail-out.ovh.net ([46.105.49.171]:44630) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bO61g-0004sG-Iq for qemu-devel@nongnu.org; Fri, 15 Jul 2016 12:31:44 -0400 Received: from player169.ha.ovh.net (b9.ovh.net [213.186.33.59]) by mo178.mail-out.ovh.net (Postfix) with ESMTP id AC521100CA53 for ; Fri, 15 Jul 2016 18:31:43 +0200 (CEST) Date: Fri, 15 Jul 2016 18:31:36 +0200 From: Greg Kurz Message-ID: <20160715183136.1a57c8d2@bahia.lan> In-Reply-To: References: <1468570225-14101-1-git-send-email-thuth@redhat.com> <20160715083530.GA14615@voom.fritz.box> <5a9731af-6636-7031-4d50-20815cdfb5e0@redhat.com> <20160715171829.0a9dfd16@bahia.lan> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/Jge1R1m9+_raqBERQ+X_TjA"; protocol="application/pgp-signature" Subject: Re: [Qemu-devel] [PATCH] ppc: Yet another fix for the huge page support detection mechanism List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Thomas Huth Cc: David Gibson , qemu-ppc@nongnu.org, qemu-devel@nongnu.org --Sig_/Jge1R1m9+_raqBERQ+X_TjA Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Fri, 15 Jul 2016 17:54:44 +0200 Thomas Huth wrote: > On 15.07.2016 17:18, Greg Kurz wrote: > > On Fri, 15 Jul 2016 14:28:44 +0200 > > Thomas Huth wrote: > > =20 > >> On 15.07.2016 10:35, David Gibson wrote: =20 > >>> On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote: =20 > >>>> Commit 86b50f2e1bef ("Disable huge page support if it is not availab= le > >>>> for main RAM") already made sure that huge page support is not annou= nced > >>>> to the guest if the normal RAM of non-NUMA configurations is not bac= ked > >>>> by a huge page filesystem. However, there is one more case that can = go > >>>> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not config= ured > >>>> with huge page support (and only the memory of a DIMM is configured = with > >>>> it). When QEMU is started with the following command line for exampl= e, > >>>> the Linux guest currently crashes because it is trying to use huge p= ages > >>>> on a memory region that does not support huge pages: > >>>> > >>>> qemu-system-ppc64 -enable-kvm ... -m 1G,slots=3D4,maxmem=3D32G -obj= ect \ > >>>> memory-backend-file,policy=3Ddefault,mem-path=3D/hugepages,size= =3D1G,id=3Dmem-mem1 \ > >>>> -device pc-dimm,id=3Ddimm-mem1,memdev=3Dmem-mem1 -smp 2 \ > >>>> -numa node,nodeid=3D0 -numa node,nodeid=3D1 > >>>> > >>>> To fix this issue, we've got to make sure to disable huge page suppo= rt, > >>>> too, when there is a NUMA node that is not using a memory backend wi= th > >>>> huge page support. > >>>> > >>>> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740 > >>>> Signed-off-by: Thomas Huth > >>>> --- > >>>> target-ppc/kvm.c | 10 +++++++--- > >>>> 1 file changed, 7 insertions(+), 3 deletions(-) > >>>> > >>>> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c > >>>> index 884d564..7a8f555 100644 > >>>> --- a/target-ppc/kvm.c > >>>> +++ b/target-ppc/kvm.c > >>>> @@ -389,12 +389,16 @@ static long getrampagesize(void) > >>>> =20 > >>>> object_child_foreach(memdev_root, find_max_supported_pagesize, = &hpsize); > >>>> =20 > >>>> - if (hpsize =3D=3D LONG_MAX) { > >>>> + if (hpsize =3D=3D LONG_MAX || hpsize =3D=3D getpagesize()) { > >>>> return getpagesize(); > >>>> } > >>>> =20 > >>>> - if (nb_numa_nodes =3D=3D 0 && hpsize > getpagesize()) { > >>>> - /* No NUMA nodes and normal RAM without -mem-path =3D=3D> n= o huge pages! */ > >>>> + /* If NUMA is disabled or the NUMA nodes are not backed with a > >>>> + * memory-backend, then there is at least one node using "norma= l" > >>>> + * RAM. And since normal RAM has not been configured with "-mem= -path" > >>>> + * (what we've checked earlier here already), we can not use hu= ge pages! > >>>> + */ > >>>> + if (nb_numa_nodes =3D=3D 0 || numa_info[0].node_memdev =3D=3D N= ULL) { =20 > >>> > >>> Is that second clause sufficient, or do you need to loop through and > >>> check the memdev of every node? =20 > >> > >> Checking the first entry should be sufficient. QEMU forces you to > >> specify either a memory backend for all NUMA nodes (which we should ha= ve > >> looked at during the object_child_foreach() some lines earlier), or you > >> must not specify a memory backend for any NUMA node at all. You can not > >> mix the settings, so checking numa_info[0] is enough. =20 > >=20 > > And what happens if we specify a hugepage memdev backend to one of the > > nodes and a regular RAM memdev backend to the other ? =20 >=20 > I think that should be handled with the object_child_foreach() logic in > that function ... unless I completely misunderstood the code ;-) >=20 You're right. The loop always catches the smallest page size. :) So this patch indeed fixes the case you describe in the changelog. Reviewed-by: Greg Kurz Tested-by: Greg Kurz > > I actually wanted to try that but I hit an assertion, which isn't > > related to this patch I think: > >=20 > > qemu-system-ppc64: memory.c:1934: memory_region_add_subregion_common:=20 > > Assertion `!subregion->container' failed. =20 >=20 > I just tried that, too, and I did not get that assertion: >=20 I tried with the master branch (commit 14c7d99333e4) + your patch... I'll investigate that. > qemu-system-ppc64 -enable-kvm ... -m 2G,slots=3D4,maxmem=3D32G \ > -object memory-backend-file,policy=3Ddefault,mem-path=3D/mnt/kvm_hugepag= e,size=3D1G,id=3Dmem-mem1 \ > -object memory-backend-file,policy=3Ddefault,mem-path=3D/mnt,size=3D1G,i= d=3Dmem-mem2 \ > -smp 2 -numa node,nodeid=3D0,memdev=3Dmem-mem1 \ > -numa node,nodeid=3D1,memdev=3Dmem-mem2 >=20 > And the guest was starting fine, with huge pages disabled. >=20 > > So I tried to trick the logic you are trying to fix the other way > > round: > >=20 > > -mem-path /dev/hugepages \ > > -m 1G,slots=3D4,maxmem=3D32G \ > > -object memory-backend-ram,policy=3Ddefault,size=3D1G,id=3Dmem-mem1 \ > > -device pc-dimm,id=3Ddimm-mem1,memdev=3Dmem-mem1 \ > > -smp 2 \ > > -numa node,nodeid=3D0 -numa node,nodeid=3D1 > >=20 > > The guest fails the same way as before your patch: the hugepage size is > > advertised to the guest, but the numa node is associated to regular ram= . =20 >=20 > You're right, this is still an issue here! ... so we need yet another > fix for this case :-/ >=20 Maybe check the memory backend objects first if we have some, else return gethugepagesize() if we have mem-path, else return getpagesize() ? > Thanks for the testing! >=20 > Thomas >=20 >=20 You're welcome. -- Greg --Sig_/Jge1R1m9+_raqBERQ+X_TjA Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAleJD+gACgkQAvw66wEB28I7qACeI5Uv2yk0o0vc9lH+Tk0X0BJR 6wAAoKIbJx39c/wYSIrTPeycmCD6HhCa =UvdV -----END PGP SIGNATURE----- --Sig_/Jge1R1m9+_raqBERQ+X_TjA--