From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kiszka Subject: Re: qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop Date: Fri, 17 Aug 2012 16:41:04 +0200 Message-ID: <502E5800.5060609@siemens.com> References: <4FEC56B2.6050502@dlhnet.de> <502E42E9.2020402@siemens.com> <502E56D3.6060607@siemens.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Peter Lieven , "qemu-devel@nongnu.org" , "kvm@vger.kernel.org" , Avi Kivity , Paolo Bonzini To: Stefan Hajnoczi Return-path: Received: from david.siemens.de ([192.35.17.14]:30277 "EHLO david.siemens.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758034Ab2HQOlK (ORCPT ); Fri, 17 Aug 2012 10:41:10 -0400 In-Reply-To: <502E56D3.6060607@siemens.com> Sender: kvm-owner@vger.kernel.org List-ID: On 2012-08-17 16:36, Jan Kiszka wrote: > On 2012-08-17 15:11, Jan Kiszka wrote: >> On 2012-08-06 17:11, Stefan Hajnoczi wrote: >>> On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven wrote: >>>> i debugged my initial problem further and found out that the problem happens >>>> to be that >>>> the main thread is stuck in pause_all_vcpus() on reset or quit commands in >>>> the monitor >>>> if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the >>>> condition from while (ret == 0) >>>> to while ((ret == 0) && !env->stop); it works, but is this the right fix? >>>> "Quit" command seems to work, but on "Reset" the VM enterns pause state. >>> >>> I think I'm hitting something similar. I installed a F17 amd64 guest >>> (3.5 kernel) but before booting entered the GRUB boot menu edit mode. >>> The guest seemed unresponsive so I switched to the monitor, which also >>> froze shortly afterwards. The VNC screen ended up being all black. >>> >>> qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede >>> Linux 3.2.0-3-amd64 from Debian testing >>> >>> $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive >>> if=virtio,cache=none,file=f17.img,aio=native -serial stdio >>> >>> (gdb) thread apply all bt >>> >>> Thread 3 (Thread 0x7f8008e23700 (LWP 367)): >>> #0 0x00007f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82 >>> #1 0x00007f80137b92c9 in kvm_vcpu_ioctl >>> (env=env@entry=0x7f8015b49640, type=type@entry=44672) >>> at /home/stefanha/qemu-kvm/kvm-all.c:1619 >>> #2 0x00007f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640) >>> at /home/stefanha/qemu-kvm/kvm-all.c:1506 >>> #3 0x00007f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640) >>> at /home/stefanha/qemu-kvm/cpus.c:756 >>> #4 0x00007f800fb4db50 in start_thread (arg=) at >>> pthread_create.c:304 >>> #5 0x00007f800f8986dd in clone () at >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >>> #6 0x0000000000000000 in ?? () >>> >>> This vcpu is still executing guest code and I've seen it successfully >>> dispatching I/O. The problem is it's missing the exit_request... >>> >>> Thread 2 (Thread 0x7f8008622700 (LWP 368)): >>> #0 pthread_cond_wait@@GLIBC_2.3.2 () >>> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >>> #1 0x00007f801372b229 in qemu_cond_wait (cond=, >>> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 >>> #2 0x00007f8013766eff in qemu_kvm_wait_io_event (env=) >>> at /home/stefanha/qemu-kvm/cpus.c:724 >>> #3 qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at >>> /home/stefanha/qemu-kvm/cpus.c:761 >>> #4 0x00007f800fb4db50 in start_thread (arg=) at >>> pthread_create.c:304 >>> #5 0x00007f800f8986dd in clone () at >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >>> #6 0x0000000000000000 in ?? () >>> >>> No problems here. >>> >>> Thread 1 (Thread 0x7f801347b8c0 (LWP 365)): >>> #0 pthread_cond_wait@@GLIBC_2.3.2 () >>> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >>> #1 0x00007f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80, >>> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 >>> #2 0x00007f8013768949 in pause_all_vcpus () at >>> /home/stefanha/qemu-kvm/cpus.c:962 >>> #3 0x00007f80136028c8 in main (argc=, argv=, >>> envp=) at /home/stefanha/qemu-kvm/vl.c:3695 >>> >>> We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause. >>> Unfortunately vcpu #0 has ->exit_request=0 although ->stop=1. >>> >>> Here are the vcpus: >>> >>> (gdb) p first_cpu >>> $6 = (struct CPUX86State *) 0x7f8015b49640 >>> (gdb) p first_cpu->next_cpu >>> $7 = (struct CPUX86State *) 0x7f8015b67450 >>> (gdb) p first_cpu->next_cpu->next_cpu >>> $8 = (struct CPUX86State *) 0x0 >>> >>> (gdb) p first_cpu->stop >>> $9 = 1 >>> (gdb) p first_cpu->stopped >>> $10 = 0 >>> (gdb) p first_cpu->exit_request >>> $11 = 0 >> >> CPUState::exit_request is only set on specific synchronous events, see >> target-i386/kvm.c. >> >> More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick >> will skip the kicking via a signal. Maybe there is some race. Let me >> think about such possibilities again... > > diff --git a/cpus.c b/cpus.c > index e476a3c..30f3228 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -726,6 +726,9 @@ static void qemu_kvm_wait_io_event(CPUArchState *env) > } > > qemu_kvm_eat_signals(env); > + /* Ensure that checking env->stop cannot overtake signal processing so > + * that we lose the latter without stopping. */ > + smp_rmb(); rmb is nonsense. Should be a plain barrier() - if at all. > qemu_wait_io_event_common(env); > } > > Can anyone imagine that such a barrier may actually be required? If it > is currently possible that env->stop is evaluated before we called into > sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the > signal without properly processing its reason (stop). Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:37349) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T2NjZ-00067K-CE for qemu-devel@nongnu.org; Fri, 17 Aug 2012 10:41:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1T2NjW-00009h-Q1 for qemu-devel@nongnu.org; Fri, 17 Aug 2012 10:41:09 -0400 Received: from david.siemens.de ([192.35.17.14]:15083) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T2NjW-00009V-Fo for qemu-devel@nongnu.org; Fri, 17 Aug 2012 10:41:06 -0400 Message-ID: <502E5800.5060609@siemens.com> Date: Fri, 17 Aug 2012 16:41:04 +0200 From: Jan Kiszka MIME-Version: 1.0 References: <4FEC56B2.6050502@dlhnet.de> <502E42E9.2020402@siemens.com> <502E56D3.6060607@siemens.com> In-Reply-To: <502E56D3.6060607@siemens.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Paolo Bonzini , Peter Lieven , "qemu-devel@nongnu.org" , "kvm@vger.kernel.org" , Avi Kivity On 2012-08-17 16:36, Jan Kiszka wrote: > On 2012-08-17 15:11, Jan Kiszka wrote: >> On 2012-08-06 17:11, Stefan Hajnoczi wrote: >>> On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven wrote: >>>> i debugged my initial problem further and found out that the problem happens >>>> to be that >>>> the main thread is stuck in pause_all_vcpus() on reset or quit commands in >>>> the monitor >>>> if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the >>>> condition from while (ret == 0) >>>> to while ((ret == 0) && !env->stop); it works, but is this the right fix? >>>> "Quit" command seems to work, but on "Reset" the VM enterns pause state. >>> >>> I think I'm hitting something similar. I installed a F17 amd64 guest >>> (3.5 kernel) but before booting entered the GRUB boot menu edit mode. >>> The guest seemed unresponsive so I switched to the monitor, which also >>> froze shortly afterwards. The VNC screen ended up being all black. >>> >>> qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede >>> Linux 3.2.0-3-amd64 from Debian testing >>> >>> $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive >>> if=virtio,cache=none,file=f17.img,aio=native -serial stdio >>> >>> (gdb) thread apply all bt >>> >>> Thread 3 (Thread 0x7f8008e23700 (LWP 367)): >>> #0 0x00007f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82 >>> #1 0x00007f80137b92c9 in kvm_vcpu_ioctl >>> (env=env@entry=0x7f8015b49640, type=type@entry=44672) >>> at /home/stefanha/qemu-kvm/kvm-all.c:1619 >>> #2 0x00007f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640) >>> at /home/stefanha/qemu-kvm/kvm-all.c:1506 >>> #3 0x00007f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640) >>> at /home/stefanha/qemu-kvm/cpus.c:756 >>> #4 0x00007f800fb4db50 in start_thread (arg=) at >>> pthread_create.c:304 >>> #5 0x00007f800f8986dd in clone () at >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >>> #6 0x0000000000000000 in ?? () >>> >>> This vcpu is still executing guest code and I've seen it successfully >>> dispatching I/O. The problem is it's missing the exit_request... >>> >>> Thread 2 (Thread 0x7f8008622700 (LWP 368)): >>> #0 pthread_cond_wait@@GLIBC_2.3.2 () >>> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >>> #1 0x00007f801372b229 in qemu_cond_wait (cond=, >>> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 >>> #2 0x00007f8013766eff in qemu_kvm_wait_io_event (env=) >>> at /home/stefanha/qemu-kvm/cpus.c:724 >>> #3 qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at >>> /home/stefanha/qemu-kvm/cpus.c:761 >>> #4 0x00007f800fb4db50 in start_thread (arg=) at >>> pthread_create.c:304 >>> #5 0x00007f800f8986dd in clone () at >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >>> #6 0x0000000000000000 in ?? () >>> >>> No problems here. >>> >>> Thread 1 (Thread 0x7f801347b8c0 (LWP 365)): >>> #0 pthread_cond_wait@@GLIBC_2.3.2 () >>> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >>> #1 0x00007f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80, >>> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 >>> #2 0x00007f8013768949 in pause_all_vcpus () at >>> /home/stefanha/qemu-kvm/cpus.c:962 >>> #3 0x00007f80136028c8 in main (argc=, argv=, >>> envp=) at /home/stefanha/qemu-kvm/vl.c:3695 >>> >>> We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause. >>> Unfortunately vcpu #0 has ->exit_request=0 although ->stop=1. >>> >>> Here are the vcpus: >>> >>> (gdb) p first_cpu >>> $6 = (struct CPUX86State *) 0x7f8015b49640 >>> (gdb) p first_cpu->next_cpu >>> $7 = (struct CPUX86State *) 0x7f8015b67450 >>> (gdb) p first_cpu->next_cpu->next_cpu >>> $8 = (struct CPUX86State *) 0x0 >>> >>> (gdb) p first_cpu->stop >>> $9 = 1 >>> (gdb) p first_cpu->stopped >>> $10 = 0 >>> (gdb) p first_cpu->exit_request >>> $11 = 0 >> >> CPUState::exit_request is only set on specific synchronous events, see >> target-i386/kvm.c. >> >> More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick >> will skip the kicking via a signal. Maybe there is some race. Let me >> think about such possibilities again... > > diff --git a/cpus.c b/cpus.c > index e476a3c..30f3228 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -726,6 +726,9 @@ static void qemu_kvm_wait_io_event(CPUArchState *env) > } > > qemu_kvm_eat_signals(env); > + /* Ensure that checking env->stop cannot overtake signal processing so > + * that we lose the latter without stopping. */ > + smp_rmb(); rmb is nonsense. Should be a plain barrier() - if at all. > qemu_wait_io_event_common(env); > } > > Can anyone imagine that such a barrier may actually be required? If it > is currently possible that env->stop is evaluated before we called into > sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the > signal without properly processing its reason (stop). Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux