From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Bader Subject: 2nd level lockups using VMX nesting on 3.11 based host kernel Date: Tue, 03 Sep 2013 15:19:27 +0200 Message-ID: <5225E1DF.9040109@canonical.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enigB953F85E1E750C0BD55CDFC5" To: kvm@vger.kernel.org Return-path: Received: from youngberry.canonical.com ([91.189.89.112]:45176 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755382Ab3ICNTj (ORCPT ); Tue, 3 Sep 2013 09:19:39 -0400 Received: from p5b2e468b.dip0.t-ipconnect.de ([91.46.70.139] helo=[192.168.2.5]) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1VGqW9-0004Uo-VU for kvm@vger.kernel.org; Tue, 03 Sep 2013 13:19:38 +0000 Sender: kvm-owner@vger.kernel.org List-ID: This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigB953F85E1E750C0BD55CDFC5 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable With current 3.11 kernels we got reports of nested qemu failing in weird = ways. I believe 3.10 also had issues before. Not sure whether those were the same= =2E With 3.8 based kernels (close to current stable) I found no such issues. It is possible to reproduce things with the following setup: Host 64bit user-space, kernel 3.8 or 3.11 based, 64bit hypervisor, Haswell CPU masked to core2duo for 1rst, 4-core, 8G memory 1rst 32 or 64bit user-space, kernel 3.8 or 3.11 based, virtio net and block device, swap, 64bit hypervisor, 2-vcpu, 2G memory 2nd 32bit user-space, kernel 3.8 based user network stack virtio, virtio block device, 2-vcpu, 1G memory Test is basically to start the 2nd level guest from a base raw image file= and perform some package updates and install some new packages through ssh in= side. With a 3.8 kernel running on the host the host logs some attempted (and l= ikely ignored) MSR accesses (caused by masking the vcpu to core2duo) but the in= stall in the 2nd level succeeds, except when the 1rst level runs a 32bit usersp= ace. In that case I could observe the 1rst level qemu process to use a lot of = cpu time but the 2nd level showed signs of soft-lockup. Maybe at least one of= the second level vcpus not getting scheduled anymore? Switching the host kernel to 3.11 (about -rc4) the 2nd level install fail= s with various symptoms. For a 32bit user-space it looks like the previously de= scribed lockup. Though I observed NMI reason 21 and 31 messages as well. With a 3.8 kernel and 64bit user-space in 1rst level there were the NMI m= essages again but a double fault crash in 2nd level which seemed to have cmos_int= errupt function on the start of the stack. Using a 3.11-rc4 64bit user-space in 1rst level only had the double fault= without the NMI messages. The symptoms could vary but the ones described above were the most likely= with a given combination. I also tried 3.11 with 64bit userspace on host and 1rs= t level while not doing any cpu masking (so 1rst level sees a qemu 64bit vcpu). T= his got rid of the msr messages but otherwise would make the 2nd level get stuck = without any messages. Sometimes with 1rst level busy sometimes not. Though, excep= t for very rare cases where things really went bad and in one case took down th= e host, the 2nd level guest can be killed by ctrl-c from the 1rst level guest. Now I am not sure which way to debug this better. Has anybody seem simila= r things or can help me with some advice on how to get more information? Thanks, Stefan Please cc me on replies as I am not subscribed. --------------enigB953F85E1E750C0BD55CDFC5 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iQIcBAEBCgAGBQJSJeHoAAoJEOhnXe7L7s6jOI8QANVK4gS/ib0itoGhyZhz5Sqz PMX6htzTudw6b+SKMW+WxM5a/VqCwYoVxuomwAUPqiRSgh2Q+ojoDm0DjnQ070Ac TpD8LuA3XsWLT6dKM6GPhF2lH1vwjfdcZu4Sk9dLc8kkU8GOmqK/r4ZWkygzrVej zVSjwjOHq8nDy9B1E45jI463tGf0O7i13rAkLZoE+JvdCOAETrOt63gZNj+4z9YB BtJrgWdIer+uXSyXCbhvB+wGpFqjQppK0Bdv8EaG3KkU0ft4becDgqx1qVxyLqfA R8EeKruASlJbTsRRjyNnCd1OZb9WuGzM2b2LqgvybD4vt85LzyUMey4AGE74CvHi YhmDeMk8p+qxr3oWlpxyaCa/kaIRzBISnvFL/nPDkOwZqbnOJ8aXvuVUqlSm7yF9 myInW8JnD63fftop/0rTN/Tp6rfI4VghNPW5/JbXyBhWlRrySQJ7Ga+VMe8vCwWf pQfVVAyaAwGHD+wX04o6T6Wqb22LJo/bROqIh55Nv/P7HP51Of+6MhB47nwFlm3I 05jy/lNlplrGzywZu14JvoAdhW4c/kSphkda+vFgqlRwI50QE9HtADRyZfcRLyYg bvpsKq/OVJMnfoknLCbs9OG4QQZj3c8pIF5IQdaK/InjJYbRrP4sW0gltv5sIE+P mS69A0OfVqA2cn3gU0bE =YGch -----END PGP SIGNATURE----- --------------enigB953F85E1E750C0BD55CDFC5--