From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Bader Subject: Re: 2nd level lockups using VMX nesting on 3.11 based host kernel Date: Tue, 10 Sep 2013 09:52:23 +0200 Message-ID: <522ECFB7.3000302@canonical.com> References: <5225E1DF.9040109@canonical.com> <20130903181333.GA28283@redhat.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enig42686EA566F588AF33DA7E4D" Cc: kvm@vger.kernel.org To: Gleb Natapov Return-path: Received: from youngberry.canonical.com ([91.189.89.112]:60363 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752352Ab3IJHwh (ORCPT ); Tue, 10 Sep 2013 03:52:37 -0400 In-Reply-To: <20130903181333.GA28283@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig42686EA566F588AF33DA7E4D Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 03.09.2013 20:13, Gleb Natapov wrote: > On Tue, Sep 03, 2013 at 03:19:27PM +0200, Stefan Bader wrote: >> With current 3.11 kernels we got reports of nested qemu failing in wei= rd ways. I >> believe 3.10 also had issues before. Not sure whether those were the s= ame. >> With 3.8 based kernels (close to current stable) I found no such issue= s. > Try to bisect it. It took a while to bisect. Though I am not sure this helps much. Starting= from v3.9, the first broken commit is: commit 5f3d5799974b89100268ba813cec8db7bd0693fb KVM: nVMX: Rework event injection and recovery This sounds reasonable as this changes event injection between nested lev= els. However starting with this patch I am unable to start any second level gu= est. Very soon after the second level guest starts, the first (and by that the= second level as well) lock up completely without any visible messages. This goes on until commit 5a2892ce72e010e3cb96b438d7cdddce0c88e0e6 KVM: nVMX: Skip PF interception check when queuing during nested run In between there was also a period where first level did not lock up but = would either seem not to schedule the second level guest or displayed internal = error messages from starting the second level. Given that it sounds like the current double faults in second level might= be one of the issues introduced by the injection rework that remains until now w= hile other issues were fixed from the second commit on. I am not really deeply familiar with the nVMX code, just trying to make s= ense of observations. The double fault always seems to originate from the cmos_in= terrupt function in the second level guest. It is not immediate and sometimes too= k several repeated runs to trigger (during bisect I would require 10 succes= sful test runs before marking it good). So could it maybe be some event / inte= rrupt (cmos related?) that accidentally gets injected into the wrong guest leve= l? Or maybe the same event taking place at the same time for more than one leve= l and messing up things? -Stefan --------------enig42686EA566F588AF33DA7E4D Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iQIcBAEBCgAGBQJSLs/AAAoJEOhnXe7L7s6jRLEP/1reJc97DGgVLG1+lgpCgBsK LmQ7D7SEqIAlL4AkTMk6NJgjM3rlJoVvwE4g//NMxIL9SF6oaDjEeAkY8LNBXWV0 H/FyQfTvUgPF0gwIYCX17XcTptO5hVIp/JCv9fCf96vqoEEsnjcItZASE69yF5Si p5NOrF0mAckbcltrFhrlYSJb7UYGJjTBbc08ZDHm085wlrXKbRoRqFiLWkIE+9g4 C020XRlqfMsgsgUlb63ZQgAXYzlM+JfCwnF6xV5Fyv+7PwBo/mC+dMwJUgHI25xT GKyHT32LTlkZfkPEg6w1Txwoyf2ArXp1/L1wvO/ITGBi/IHbGk6etpeYdqZ43MyL nyzrcmcfsNIPryPFz3PCI2sjuiucSdLlK+tszU8N5t8lkxZT5dqcLfyYAFCOKP8X XdkI4qdvLHxYEkXdxUmsckaxKMC+HbBKuLWsxUFDNNyB+y31qFOXhNiZnCa8zNL0 RZaGp35mPmjdBdY9tob7i+zwKfD3k1GcFkMIvGHSLoTDsPK3rxzkS29sssUxW7Fr N3EXB8qZjtkz1nPOGu1XB+8alfanqXkipMliQadsZltC4OmlTu9fpygNHzqCrN9G Npybs7w0Y80l5DdO0uEO+qlkpvBCIzkUYEc5+2GVtHWe8C1JEzSNz5rSiC8HcZWn C38tEn8Z1bT4OYNT4DtW =al/v -----END PGP SIGNATURE----- --------------enig42686EA566F588AF33DA7E4D--