From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kashyap Chamarthy Subject: Re: nVMX regression v3.13+, bisected Date: Thu, 27 Feb 2014 17:40:56 +0530 Message-ID: <20140227121056.GD25995@tesla.redhat.com> References: <530E43EC.7000600@canonical.com> <530E4DB9.5050001@redhat.com> <530E4E25.4050508@canonical.com> <20140226204423.GA25995@tesla.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Paolo Bonzini , kvm@vger.kernel.org, Anthoine Bourgeois To: Stefan Bader Return-path: Received: from mx1.redhat.com ([209.132.183.28]:44740 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750813AbaB0MLH (ORCPT ); Thu, 27 Feb 2014 07:11:07 -0500 Content-Disposition: inline In-Reply-To: <20140226204423.GA25995@tesla.redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Thu, Feb 27, 2014 at 02:14:23AM +0530, Kashyap Chamarthy wrote: > On Wed, Feb 26, 2014 at 09:27:17PM +0100, Stefan Bader wrote: > > On 26.02.2014 21:25, Paolo Bonzini wrote: > > [. . .] > > > >> > > >> I bisected this and ended up on the following commit which, when reverted made > > >> the launch work again: > > >> > > >> Author: Anthoine Bourgeois > > >> Date: Wed Nov 13 11:45:37 2013 +0100 > > >> > > >> kvm, vmx: Fix lazy FPU on nested guest > > >> > > >> If a nested guest does a NM fault but its CR0 doesn't contain the TS > > >> flag (because it was already cleared by the guest with L1 aid) then we > > >> have to activate FPU ourselves in L0 and then continue to L2. If TS flag > > >> is set then we fallback on the previous behavior, forward the fault to > > >> L1 if it asked for. > > >> > > >> Signed-off-by: Anthoine Bourgeois > > >> Signed-off-by: Paolo Bonzini > > >> > > >> The condition to exit to L0 seems to be according to what the description says. > > >> Could it be that the handling in L0 is doing something wrong? > > > > > > Thanks, I'll look at it tomorrow or Friday. > > > > > > Paolo > > > > > Great thanks. And maybe it helps if I actually add the link to the bug report as > > I had intended... :-P > > > > [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1278531 > > Yes, I'm seeing something similar[*] in a consistent manner with minimal > Fedora installs on L0, L1 and L2 Ok, I just tried to debug an L2 guest (a libguestfs appliance) via gdb following this method[1]. This is how far I got: >>From shell on L1, launch the libguestfs appliance (note: here libguestfs is compiled with gdb debugging enabled, so QEMU won't start running the appliance): $ ./run libguestfs-test-tool [. . .] checking modpath /lib/modules/3.14.0-0.rc2.git0.1.fc21.x86_64 is a directory picked kernel vmlinuz-3.14.0-0.rc2.git0.1.fc21.x86_64 supermin helper [00000ms] finished creating kernel [. . .] libguestfs: warning: qemu debugging is enabled, connect gdb to tcp::1234 to begin [. . .] >>From a different shell, I invoke gdb like that: (gdb) symbol-file /usr/lib/debug/lib/modules/3.14.0-0.rc4.git0.1.fc21.x86_64/vmlinux Reading symbols from /usr/lib/debug/lib/modules/3.14.0-0.rc4.git0.1.fc21.x86_64/vmlinux...done. (gdb) target remote tcp::1234 Remote debugging using tcp::1234 0x0000fff0 in ftrace_stack () (gdb) bt #0 0x00000997 in irq_stack_union () #1 0x00000000 in ?? () (gdb) (gdb) c Continuing. Again, back to libguestfs-test-tool, it's just hung attempting to booting from ROM: [. . .] SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ (mockbuild@) Wed Aug 14 23:57:08 UTC 2013 Term: 80x24 4 0 SeaBIOS (version 1.7.4-20140106_154858-) Booting from ROM... Back to gdb, to find out _what_ file the above function is trying to be executed from: (gdb) c Continuing. ^C Program received signal SIGINT, Interrupt. 0x00000997 in irq_stack_union () (gdb) bt #0 0x00000997 in irq_stack_union () #1 0x00000000 in ?? () (gdb) list 1 /* 2 * Copyright 2002, 2003 Andi Kleen, SuSE Labs. 3 * 4 * This file is subject to the terms and conditions of the GNU General Public 5 * License. See the file COPYING in the main directory of this archive 6 * for more details. No warranty for anything given at all. 7 */ 8 #include 9 #include 10 #include (gdb) [. . .] (gdb) 241 ENDPROC(csum_partial_copy_generic) (gdb) Line number 242 out of range; arch/x86/lib/csum-copy_64.S has 241 lines. (gdb) PS: Paolo, I'll try to test with your new patch soon. Thanks. [1] https://github.com/libguestfs/libguestfs/blob/master/src/launch-direct.c#L404 > > [*] https://bugzilla.kernel.org/show_bug.cgi?id=69491#c7 > > -- /kashyap