* Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop [not found] ` <7C6F41F3-D0BC-4753-853D-E68B2AAAAADB@dlhnet.de> @ 2012-07-01 8:19 ` Avi Kivity 2012-07-01 19:18 ` Peter Lieven 0 siblings, 1 reply; 13+ messages in thread From: Avi Kivity @ 2012-07-01 8:19 UTC (permalink / raw) To: Peter Lieven; +Cc: Jan Kiszka, qemu-devel@nongnu.org, kvm@vger.kernel.org On 06/28/2012 10:27 PM, Peter Lieven wrote: > > Am 28.06.2012 um 18:32 schrieb Avi Kivity: > >> On 06/28/2012 07:29 PM, Peter Lieven wrote: >>>> Yes. A signal is sent, and KVM returns from the guest to userspace on >>>> pending signals. >> >>> is there a description available how this process exactly works? >> >> The kernel part is in vcpu_enter_guest(), see the check for >> signal_pending(). But this hasn't seen changes for quite a long while. > > Thank you, i will have a look. I noticed a few patches that where submitted > during the last year, maybe one of them is related: > > Switch SIG_IPI to SIGUSR1 > Fix signal handling of SIG_IPI when io-thread is enabled > > In the first commit there is mentioned a "32-on-64-bit Linux kernel bug" > is there any reference to that? http://web.archiveorange.com/archive/v/1XS1vwGSFLyYygwTXg1K. Are you running 32-on-64? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop 2012-07-01 8:19 ` [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop Avi Kivity @ 2012-07-01 19:18 ` Peter Lieven 2012-07-02 7:05 ` Jan Kiszka 0 siblings, 1 reply; 13+ messages in thread From: Peter Lieven @ 2012-07-01 19:18 UTC (permalink / raw) To: Avi Kivity; +Cc: Jan Kiszka, qemu-devel@nongnu.org, kvm@vger.kernel.org Am 01.07.2012 um 10:19 schrieb Avi Kivity: > On 06/28/2012 10:27 PM, Peter Lieven wrote: >> >> Am 28.06.2012 um 18:32 schrieb Avi Kivity: >> >>> On 06/28/2012 07:29 PM, Peter Lieven wrote: >>>>> Yes. A signal is sent, and KVM returns from the guest to userspace on >>>>> pending signals. >>> >>>> is there a description available how this process exactly works? >>> >>> The kernel part is in vcpu_enter_guest(), see the check for >>> signal_pending(). But this hasn't seen changes for quite a long while. >> >> Thank you, i will have a look. I noticed a few patches that where submitted >> during the last year, maybe one of them is related: >> >> Switch SIG_IPI to SIGUSR1 >> Fix signal handling of SIG_IPI when io-thread is enabled >> >> In the first commit there is mentioned a "32-on-64-bit Linux kernel bug" >> is there any reference to that? > > > http://web.archiveorange.com/archive/v/1XS1vwGSFLyYygwTXg1K. Are you > running 32-on-64? I think the issue occurs when running a 32-bit guest on a 64-bit system. Afaik, the isolinux loader where is see the race is 32-bit altough it is a 64-bit ubuntu lts cd image. The second case where i have seen the race is on shutdown of a Windows 2000 Server which is also 32-bit. Peter > > > -- > error compiling committee.c: too many arguments to function > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop 2012-07-01 19:18 ` Peter Lieven @ 2012-07-02 7:05 ` Jan Kiszka 2012-07-02 8:12 ` Peter Lieven 0 siblings, 1 reply; 13+ messages in thread From: Jan Kiszka @ 2012-07-02 7:05 UTC (permalink / raw) To: Peter Lieven; +Cc: Avi Kivity, kvm@vger.kernel.org, qemu-devel@nongnu.org On 2012-07-01 21:18, Peter Lieven wrote: > > Am 01.07.2012 um 10:19 schrieb Avi Kivity: > >> On 06/28/2012 10:27 PM, Peter Lieven wrote: >>> >>> Am 28.06.2012 um 18:32 schrieb Avi Kivity: >>> >>>> On 06/28/2012 07:29 PM, Peter Lieven wrote: >>>>>> Yes. A signal is sent, and KVM returns from the guest to userspace on >>>>>> pending signals. >>>> >>>>> is there a description available how this process exactly works? >>>> >>>> The kernel part is in vcpu_enter_guest(), see the check for >>>> signal_pending(). But this hasn't seen changes for quite a long while. >>> >>> Thank you, i will have a look. I noticed a few patches that where submitted >>> during the last year, maybe one of them is related: >>> >>> Switch SIG_IPI to SIGUSR1 >>> Fix signal handling of SIG_IPI when io-thread is enabled >>> >>> In the first commit there is mentioned a "32-on-64-bit Linux kernel bug" >>> is there any reference to that? >> >> >> http://web.archiveorange.com/archive/v/1XS1vwGSFLyYygwTXg1K. Are you >> running 32-on-64? > > I think the issue occurs when running a 32-bit guest on a 64-bit system. Afaik, the > isolinux loader where is see the race is 32-bit altough it is a 64-bit ubuntu lts > cd image. The second case where i have seen the race is on shutdown of a > Windows 2000 Server which is also 32-bit. "32-on-64" particularly means using a 32-bit QEMU[-kvm] binary on a 64-bit host kernel. What does "file qemu-system-x86_64" report about yours? Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop 2012-07-02 7:05 ` Jan Kiszka @ 2012-07-02 8:12 ` Peter Lieven 0 siblings, 0 replies; 13+ messages in thread From: Peter Lieven @ 2012-07-02 8:12 UTC (permalink / raw) To: Jan Kiszka; +Cc: Avi Kivity, kvm@vger.kernel.org, qemu-devel@nongnu.org On 02.07.2012 09:05, Jan Kiszka wrote: > On 2012-07-01 21:18, Peter Lieven wrote: >> Am 01.07.2012 um 10:19 schrieb Avi Kivity: >> >>> On 06/28/2012 10:27 PM, Peter Lieven wrote: >>>> Am 28.06.2012 um 18:32 schrieb Avi Kivity: >>>> >>>>> On 06/28/2012 07:29 PM, Peter Lieven wrote: >>>>>>> Yes. A signal is sent, and KVM returns from the guest to userspace on >>>>>>> pending signals. >>>>>> is there a description available how this process exactly works? >>>>> The kernel part is in vcpu_enter_guest(), see the check for >>>>> signal_pending(). But this hasn't seen changes for quite a long while. >>>> Thank you, i will have a look. I noticed a few patches that where submitted >>>> during the last year, maybe one of them is related: >>>> >>>> Switch SIG_IPI to SIGUSR1 >>>> Fix signal handling of SIG_IPI when io-thread is enabled >>>> >>>> In the first commit there is mentioned a "32-on-64-bit Linux kernel bug" >>>> is there any reference to that? >>> >>> http://web.archiveorange.com/archive/v/1XS1vwGSFLyYygwTXg1K. Are you >>> running 32-on-64? >> I think the issue occurs when running a 32-bit guest on a 64-bit system. Afaik, the >> isolinux loader where is see the race is 32-bit altough it is a 64-bit ubuntu lts >> cd image. The second case where i have seen the race is on shutdown of a >> Windows 2000 Server which is also 32-bit. > "32-on-64" particularly means using a 32-bit QEMU[-kvm] binary on a > 64-bit host kernel. What does "file qemu-system-x86_64" report about yours? Its custom build on a 64-bit linux as 64-bit application. I will try to continue to find out today whats going wrong. Any help or hints appreciated ;-) Thanks, Peter > Jan > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop [not found] <4FEC56B2.6050502@dlhnet.de> [not found] ` <4FEC5B5A.4060302@siemens.com> @ 2012-08-06 15:11 ` Stefan Hajnoczi 2012-08-17 13:11 ` Jan Kiszka 1 sibling, 1 reply; 13+ messages in thread From: Stefan Hajnoczi @ 2012-08-06 15:11 UTC (permalink / raw) To: Peter Lieven Cc: Jan Kiszka, qemu-devel@nongnu.org, kvm@vger.kernel.org, Avi Kivity On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven <pl@dlhnet.de> wrote: > i debugged my initial problem further and found out that the problem happens > to be that > the main thread is stuck in pause_all_vcpus() on reset or quit commands in > the monitor > if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the > condition from while (ret == 0) > to while ((ret == 0) && !env->stop); it works, but is this the right fix? > "Quit" command seems to work, but on "Reset" the VM enterns pause state. I think I'm hitting something similar. I installed a F17 amd64 guest (3.5 kernel) but before booting entered the GRUB boot menu edit mode. The guest seemed unresponsive so I switched to the monitor, which also froze shortly afterwards. The VNC screen ended up being all black. qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede Linux 3.2.0-3-amd64 from Debian testing $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive if=virtio,cache=none,file=f17.img,aio=native -serial stdio (gdb) thread apply all bt Thread 3 (Thread 0x7f8008e23700 (LWP 367)): #0 0x00007f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82 #1 0x00007f80137b92c9 in kvm_vcpu_ioctl (env=env@entry=0x7f8015b49640, type=type@entry=44672) at /home/stefanha/qemu-kvm/kvm-all.c:1619 #2 0x00007f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640) at /home/stefanha/qemu-kvm/kvm-all.c:1506 #3 0x00007f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640) at /home/stefanha/qemu-kvm/cpus.c:756 #4 0x00007f800fb4db50 in start_thread (arg=<optimized out>) at pthread_create.c:304 #5 0x00007f800f8986dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () This vcpu is still executing guest code and I've seen it successfully dispatching I/O. The problem is it's missing the exit_request... Thread 2 (Thread 0x7f8008622700 (LWP 368)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f801372b229 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 #2 0x00007f8013766eff in qemu_kvm_wait_io_event (env=<optimized out>) at /home/stefanha/qemu-kvm/cpus.c:724 #3 qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at /home/stefanha/qemu-kvm/cpus.c:761 #4 0x00007f800fb4db50 in start_thread (arg=<optimized out>) at pthread_create.c:304 #5 0x00007f800f8986dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () No problems here. Thread 1 (Thread 0x7f801347b8c0 (LWP 365)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80, mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 #2 0x00007f8013768949 in pause_all_vcpus () at /home/stefanha/qemu-kvm/cpus.c:962 #3 0x00007f80136028c8 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/stefanha/qemu-kvm/vl.c:3695 We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause. Unfortunately vcpu #0 has ->exit_request=0 although ->stop=1. Here are the vcpus: (gdb) p first_cpu $6 = (struct CPUX86State *) 0x7f8015b49640 (gdb) p first_cpu->next_cpu $7 = (struct CPUX86State *) 0x7f8015b67450 (gdb) p first_cpu->next_cpu->next_cpu $8 = (struct CPUX86State *) 0x0 (gdb) p first_cpu->stop $9 = 1 (gdb) p first_cpu->stopped $10 = 0 (gdb) p first_cpu->exit_request $11 = 0 :( This isn't easy to reproduce. I tried entering the GRUB boot menu again and there was no deadlock. Stefan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop 2012-08-06 15:11 ` Stefan Hajnoczi @ 2012-08-17 13:11 ` Jan Kiszka 2012-08-17 14:36 ` Jan Kiszka 0 siblings, 1 reply; 13+ messages in thread From: Jan Kiszka @ 2012-08-17 13:11 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Peter Lieven, qemu-devel@nongnu.org, kvm@vger.kernel.org, Avi Kivity On 2012-08-06 17:11, Stefan Hajnoczi wrote: > On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven <pl@dlhnet.de> wrote: >> i debugged my initial problem further and found out that the problem happens >> to be that >> the main thread is stuck in pause_all_vcpus() on reset or quit commands in >> the monitor >> if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the >> condition from while (ret == 0) >> to while ((ret == 0) && !env->stop); it works, but is this the right fix? >> "Quit" command seems to work, but on "Reset" the VM enterns pause state. > > I think I'm hitting something similar. I installed a F17 amd64 guest > (3.5 kernel) but before booting entered the GRUB boot menu edit mode. > The guest seemed unresponsive so I switched to the monitor, which also > froze shortly afterwards. The VNC screen ended up being all black. > > qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede > Linux 3.2.0-3-amd64 from Debian testing > > $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive > if=virtio,cache=none,file=f17.img,aio=native -serial stdio > > (gdb) thread apply all bt > > Thread 3 (Thread 0x7f8008e23700 (LWP 367)): > #0 0x00007f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82 > #1 0x00007f80137b92c9 in kvm_vcpu_ioctl > (env=env@entry=0x7f8015b49640, type=type@entry=44672) > at /home/stefanha/qemu-kvm/kvm-all.c:1619 > #2 0x00007f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640) > at /home/stefanha/qemu-kvm/kvm-all.c:1506 > #3 0x00007f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640) > at /home/stefanha/qemu-kvm/cpus.c:756 > #4 0x00007f800fb4db50 in start_thread (arg=<optimized out>) at > pthread_create.c:304 > #5 0x00007f800f8986dd in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 > #6 0x0000000000000000 in ?? () > > This vcpu is still executing guest code and I've seen it successfully > dispatching I/O. The problem is it's missing the exit_request... > > Thread 2 (Thread 0x7f8008622700 (LWP 368)): > #0 pthread_cond_wait@@GLIBC_2.3.2 () > at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 > #1 0x00007f801372b229 in qemu_cond_wait (cond=<optimized out>, > mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 > #2 0x00007f8013766eff in qemu_kvm_wait_io_event (env=<optimized out>) > at /home/stefanha/qemu-kvm/cpus.c:724 > #3 qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at > /home/stefanha/qemu-kvm/cpus.c:761 > #4 0x00007f800fb4db50 in start_thread (arg=<optimized out>) at > pthread_create.c:304 > #5 0x00007f800f8986dd in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 > #6 0x0000000000000000 in ?? () > > No problems here. > > Thread 1 (Thread 0x7f801347b8c0 (LWP 365)): > #0 pthread_cond_wait@@GLIBC_2.3.2 () > at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 > #1 0x00007f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80, > mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 > #2 0x00007f8013768949 in pause_all_vcpus () at > /home/stefanha/qemu-kvm/cpus.c:962 > #3 0x00007f80136028c8 in main (argc=<optimized out>, argv=<optimized out>, > envp=<optimized out>) at /home/stefanha/qemu-kvm/vl.c:3695 > > We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause. > Unfortunately vcpu #0 has ->exit_request=0 although ->stop=1. > > Here are the vcpus: > > (gdb) p first_cpu > $6 = (struct CPUX86State *) 0x7f8015b49640 > (gdb) p first_cpu->next_cpu > $7 = (struct CPUX86State *) 0x7f8015b67450 > (gdb) p first_cpu->next_cpu->next_cpu > $8 = (struct CPUX86State *) 0x0 > > (gdb) p first_cpu->stop > $9 = 1 > (gdb) p first_cpu->stopped > $10 = 0 > (gdb) p first_cpu->exit_request > $11 = 0 CPUState::exit_request is only set on specific synchronous events, see target-i386/kvm.c. More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick will skip the kicking via a signal. Maybe there is some race. Let me think about such possibilities again... Jan > > :( > > This isn't easy to reproduce. I tried entering the GRUB boot menu > again and there was no deadlock. > > Stefan > -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop 2012-08-17 13:11 ` Jan Kiszka @ 2012-08-17 14:36 ` Jan Kiszka 2012-08-17 14:41 ` Jan Kiszka 0 siblings, 1 reply; 13+ messages in thread From: Jan Kiszka @ 2012-08-17 14:36 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Paolo Bonzini, Peter Lieven, qemu-devel@nongnu.org, kvm@vger.kernel.org, Avi Kivity On 2012-08-17 15:11, Jan Kiszka wrote: > On 2012-08-06 17:11, Stefan Hajnoczi wrote: >> On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven <pl@dlhnet.de> wrote: >>> i debugged my initial problem further and found out that the problem happens >>> to be that >>> the main thread is stuck in pause_all_vcpus() on reset or quit commands in >>> the monitor >>> if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the >>> condition from while (ret == 0) >>> to while ((ret == 0) && !env->stop); it works, but is this the right fix? >>> "Quit" command seems to work, but on "Reset" the VM enterns pause state. >> >> I think I'm hitting something similar. I installed a F17 amd64 guest >> (3.5 kernel) but before booting entered the GRUB boot menu edit mode. >> The guest seemed unresponsive so I switched to the monitor, which also >> froze shortly afterwards. The VNC screen ended up being all black. >> >> qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede >> Linux 3.2.0-3-amd64 from Debian testing >> >> $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive >> if=virtio,cache=none,file=f17.img,aio=native -serial stdio >> >> (gdb) thread apply all bt >> >> Thread 3 (Thread 0x7f8008e23700 (LWP 367)): >> #0 0x00007f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82 >> #1 0x00007f80137b92c9 in kvm_vcpu_ioctl >> (env=env@entry=0x7f8015b49640, type=type@entry=44672) >> at /home/stefanha/qemu-kvm/kvm-all.c:1619 >> #2 0x00007f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640) >> at /home/stefanha/qemu-kvm/kvm-all.c:1506 >> #3 0x00007f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640) >> at /home/stefanha/qemu-kvm/cpus.c:756 >> #4 0x00007f800fb4db50 in start_thread (arg=<optimized out>) at >> pthread_create.c:304 >> #5 0x00007f800f8986dd in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >> #6 0x0000000000000000 in ?? () >> >> This vcpu is still executing guest code and I've seen it successfully >> dispatching I/O. The problem is it's missing the exit_request... >> >> Thread 2 (Thread 0x7f8008622700 (LWP 368)): >> #0 pthread_cond_wait@@GLIBC_2.3.2 () >> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >> #1 0x00007f801372b229 in qemu_cond_wait (cond=<optimized out>, >> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 >> #2 0x00007f8013766eff in qemu_kvm_wait_io_event (env=<optimized out>) >> at /home/stefanha/qemu-kvm/cpus.c:724 >> #3 qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at >> /home/stefanha/qemu-kvm/cpus.c:761 >> #4 0x00007f800fb4db50 in start_thread (arg=<optimized out>) at >> pthread_create.c:304 >> #5 0x00007f800f8986dd in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >> #6 0x0000000000000000 in ?? () >> >> No problems here. >> >> Thread 1 (Thread 0x7f801347b8c0 (LWP 365)): >> #0 pthread_cond_wait@@GLIBC_2.3.2 () >> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >> #1 0x00007f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80, >> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 >> #2 0x00007f8013768949 in pause_all_vcpus () at >> /home/stefanha/qemu-kvm/cpus.c:962 >> #3 0x00007f80136028c8 in main (argc=<optimized out>, argv=<optimized out>, >> envp=<optimized out>) at /home/stefanha/qemu-kvm/vl.c:3695 >> >> We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause. >> Unfortunately vcpu #0 has ->exit_request=0 although ->stop=1. >> >> Here are the vcpus: >> >> (gdb) p first_cpu >> $6 = (struct CPUX86State *) 0x7f8015b49640 >> (gdb) p first_cpu->next_cpu >> $7 = (struct CPUX86State *) 0x7f8015b67450 >> (gdb) p first_cpu->next_cpu->next_cpu >> $8 = (struct CPUX86State *) 0x0 >> >> (gdb) p first_cpu->stop >> $9 = 1 >> (gdb) p first_cpu->stopped >> $10 = 0 >> (gdb) p first_cpu->exit_request >> $11 = 0 > > CPUState::exit_request is only set on specific synchronous events, see > target-i386/kvm.c. > > More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick > will skip the kicking via a signal. Maybe there is some race. Let me > think about such possibilities again... diff --git a/cpus.c b/cpus.c index e476a3c..30f3228 100644 --- a/cpus.c +++ b/cpus.c @@ -726,6 +726,9 @@ static void qemu_kvm_wait_io_event(CPUArchState *env) } qemu_kvm_eat_signals(env); + /* Ensure that checking env->stop cannot overtake signal processing so + * that we lose the latter without stopping. */ + smp_rmb(); qemu_wait_io_event_common(env); } Can anyone imagine that such a barrier may actually be required? If it is currently possible that env->stop is evaluated before we called into sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the signal without properly processing its reason (stop). Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop 2012-08-17 14:36 ` Jan Kiszka @ 2012-08-17 14:41 ` Jan Kiszka 2012-08-17 15:04 ` Jan Kiszka 0 siblings, 1 reply; 13+ messages in thread From: Jan Kiszka @ 2012-08-17 14:41 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Paolo Bonzini, Peter Lieven, qemu-devel@nongnu.org, kvm@vger.kernel.org, Avi Kivity On 2012-08-17 16:36, Jan Kiszka wrote: > On 2012-08-17 15:11, Jan Kiszka wrote: >> On 2012-08-06 17:11, Stefan Hajnoczi wrote: >>> On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven <pl@dlhnet.de> wrote: >>>> i debugged my initial problem further and found out that the problem happens >>>> to be that >>>> the main thread is stuck in pause_all_vcpus() on reset or quit commands in >>>> the monitor >>>> if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the >>>> condition from while (ret == 0) >>>> to while ((ret == 0) && !env->stop); it works, but is this the right fix? >>>> "Quit" command seems to work, but on "Reset" the VM enterns pause state. >>> >>> I think I'm hitting something similar. I installed a F17 amd64 guest >>> (3.5 kernel) but before booting entered the GRUB boot menu edit mode. >>> The guest seemed unresponsive so I switched to the monitor, which also >>> froze shortly afterwards. The VNC screen ended up being all black. >>> >>> qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede >>> Linux 3.2.0-3-amd64 from Debian testing >>> >>> $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive >>> if=virtio,cache=none,file=f17.img,aio=native -serial stdio >>> >>> (gdb) thread apply all bt >>> >>> Thread 3 (Thread 0x7f8008e23700 (LWP 367)): >>> #0 0x00007f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82 >>> #1 0x00007f80137b92c9 in kvm_vcpu_ioctl >>> (env=env@entry=0x7f8015b49640, type=type@entry=44672) >>> at /home/stefanha/qemu-kvm/kvm-all.c:1619 >>> #2 0x00007f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640) >>> at /home/stefanha/qemu-kvm/kvm-all.c:1506 >>> #3 0x00007f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640) >>> at /home/stefanha/qemu-kvm/cpus.c:756 >>> #4 0x00007f800fb4db50 in start_thread (arg=<optimized out>) at >>> pthread_create.c:304 >>> #5 0x00007f800f8986dd in clone () at >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >>> #6 0x0000000000000000 in ?? () >>> >>> This vcpu is still executing guest code and I've seen it successfully >>> dispatching I/O. The problem is it's missing the exit_request... >>> >>> Thread 2 (Thread 0x7f8008622700 (LWP 368)): >>> #0 pthread_cond_wait@@GLIBC_2.3.2 () >>> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >>> #1 0x00007f801372b229 in qemu_cond_wait (cond=<optimized out>, >>> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 >>> #2 0x00007f8013766eff in qemu_kvm_wait_io_event (env=<optimized out>) >>> at /home/stefanha/qemu-kvm/cpus.c:724 >>> #3 qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at >>> /home/stefanha/qemu-kvm/cpus.c:761 >>> #4 0x00007f800fb4db50 in start_thread (arg=<optimized out>) at >>> pthread_create.c:304 >>> #5 0x00007f800f8986dd in clone () at >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >>> #6 0x0000000000000000 in ?? () >>> >>> No problems here. >>> >>> Thread 1 (Thread 0x7f801347b8c0 (LWP 365)): >>> #0 pthread_cond_wait@@GLIBC_2.3.2 () >>> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >>> #1 0x00007f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80, >>> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 >>> #2 0x00007f8013768949 in pause_all_vcpus () at >>> /home/stefanha/qemu-kvm/cpus.c:962 >>> #3 0x00007f80136028c8 in main (argc=<optimized out>, argv=<optimized out>, >>> envp=<optimized out>) at /home/stefanha/qemu-kvm/vl.c:3695 >>> >>> We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause. >>> Unfortunately vcpu #0 has ->exit_request=0 although ->stop=1. >>> >>> Here are the vcpus: >>> >>> (gdb) p first_cpu >>> $6 = (struct CPUX86State *) 0x7f8015b49640 >>> (gdb) p first_cpu->next_cpu >>> $7 = (struct CPUX86State *) 0x7f8015b67450 >>> (gdb) p first_cpu->next_cpu->next_cpu >>> $8 = (struct CPUX86State *) 0x0 >>> >>> (gdb) p first_cpu->stop >>> $9 = 1 >>> (gdb) p first_cpu->stopped >>> $10 = 0 >>> (gdb) p first_cpu->exit_request >>> $11 = 0 >> >> CPUState::exit_request is only set on specific synchronous events, see >> target-i386/kvm.c. >> >> More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick >> will skip the kicking via a signal. Maybe there is some race. Let me >> think about such possibilities again... > > diff --git a/cpus.c b/cpus.c > index e476a3c..30f3228 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -726,6 +726,9 @@ static void qemu_kvm_wait_io_event(CPUArchState *env) > } > > qemu_kvm_eat_signals(env); > + /* Ensure that checking env->stop cannot overtake signal processing so > + * that we lose the latter without stopping. */ > + smp_rmb(); rmb is nonsense. Should be a plain barrier() - if at all. > qemu_wait_io_event_common(env); > } > > Can anyone imagine that such a barrier may actually be required? If it > is currently possible that env->stop is evaluated before we called into > sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the > signal without properly processing its reason (stop). Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop 2012-08-17 14:41 ` Jan Kiszka @ 2012-08-17 15:04 ` Jan Kiszka 2012-08-19 9:42 ` Avi Kivity 0 siblings, 1 reply; 13+ messages in thread From: Jan Kiszka @ 2012-08-17 15:04 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Paolo Bonzini, Peter Lieven, qemu-devel@nongnu.org, kvm@vger.kernel.org, Avi Kivity On 2012-08-17 16:41, Jan Kiszka wrote: > On 2012-08-17 16:36, Jan Kiszka wrote: >> On 2012-08-17 15:11, Jan Kiszka wrote: >>> On 2012-08-06 17:11, Stefan Hajnoczi wrote: >>>> On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven <pl@dlhnet.de> wrote: >>>>> i debugged my initial problem further and found out that the problem happens >>>>> to be that >>>>> the main thread is stuck in pause_all_vcpus() on reset or quit commands in >>>>> the monitor >>>>> if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the >>>>> condition from while (ret == 0) >>>>> to while ((ret == 0) && !env->stop); it works, but is this the right fix? >>>>> "Quit" command seems to work, but on "Reset" the VM enterns pause state. >>>> >>>> I think I'm hitting something similar. I installed a F17 amd64 guest >>>> (3.5 kernel) but before booting entered the GRUB boot menu edit mode. >>>> The guest seemed unresponsive so I switched to the monitor, which also >>>> froze shortly afterwards. The VNC screen ended up being all black. >>>> >>>> qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede >>>> Linux 3.2.0-3-amd64 from Debian testing >>>> >>>> $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive >>>> if=virtio,cache=none,file=f17.img,aio=native -serial stdio >>>> >>>> (gdb) thread apply all bt >>>> >>>> Thread 3 (Thread 0x7f8008e23700 (LWP 367)): >>>> #0 0x00007f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82 >>>> #1 0x00007f80137b92c9 in kvm_vcpu_ioctl >>>> (env=env@entry=0x7f8015b49640, type=type@entry=44672) >>>> at /home/stefanha/qemu-kvm/kvm-all.c:1619 >>>> #2 0x00007f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640) >>>> at /home/stefanha/qemu-kvm/kvm-all.c:1506 >>>> #3 0x00007f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640) >>>> at /home/stefanha/qemu-kvm/cpus.c:756 >>>> #4 0x00007f800fb4db50 in start_thread (arg=<optimized out>) at >>>> pthread_create.c:304 >>>> #5 0x00007f800f8986dd in clone () at >>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >>>> #6 0x0000000000000000 in ?? () >>>> >>>> This vcpu is still executing guest code and I've seen it successfully >>>> dispatching I/O. The problem is it's missing the exit_request... >>>> >>>> Thread 2 (Thread 0x7f8008622700 (LWP 368)): >>>> #0 pthread_cond_wait@@GLIBC_2.3.2 () >>>> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >>>> #1 0x00007f801372b229 in qemu_cond_wait (cond=<optimized out>, >>>> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 >>>> #2 0x00007f8013766eff in qemu_kvm_wait_io_event (env=<optimized out>) >>>> at /home/stefanha/qemu-kvm/cpus.c:724 >>>> #3 qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at >>>> /home/stefanha/qemu-kvm/cpus.c:761 >>>> #4 0x00007f800fb4db50 in start_thread (arg=<optimized out>) at >>>> pthread_create.c:304 >>>> #5 0x00007f800f8986dd in clone () at >>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >>>> #6 0x0000000000000000 in ?? () >>>> >>>> No problems here. >>>> >>>> Thread 1 (Thread 0x7f801347b8c0 (LWP 365)): >>>> #0 pthread_cond_wait@@GLIBC_2.3.2 () >>>> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >>>> #1 0x00007f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80, >>>> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 >>>> #2 0x00007f8013768949 in pause_all_vcpus () at >>>> /home/stefanha/qemu-kvm/cpus.c:962 >>>> #3 0x00007f80136028c8 in main (argc=<optimized out>, argv=<optimized out>, >>>> envp=<optimized out>) at /home/stefanha/qemu-kvm/vl.c:3695 >>>> >>>> We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause. >>>> Unfortunately vcpu #0 has ->exit_request=0 although ->stop=1. >>>> >>>> Here are the vcpus: >>>> >>>> (gdb) p first_cpu >>>> $6 = (struct CPUX86State *) 0x7f8015b49640 >>>> (gdb) p first_cpu->next_cpu >>>> $7 = (struct CPUX86State *) 0x7f8015b67450 >>>> (gdb) p first_cpu->next_cpu->next_cpu >>>> $8 = (struct CPUX86State *) 0x0 >>>> >>>> (gdb) p first_cpu->stop >>>> $9 = 1 >>>> (gdb) p first_cpu->stopped >>>> $10 = 0 >>>> (gdb) p first_cpu->exit_request >>>> $11 = 0 >>> >>> CPUState::exit_request is only set on specific synchronous events, see >>> target-i386/kvm.c. >>> >>> More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick >>> will skip the kicking via a signal. Maybe there is some race. Let me >>> think about such possibilities again... >> >> diff --git a/cpus.c b/cpus.c >> index e476a3c..30f3228 100644 >> --- a/cpus.c >> +++ b/cpus.c >> @@ -726,6 +726,9 @@ static void qemu_kvm_wait_io_event(CPUArchState *env) >> } >> >> qemu_kvm_eat_signals(env); >> + /* Ensure that checking env->stop cannot overtake signal processing so >> + * that we lose the latter without stopping. */ >> + smp_rmb(); > > rmb is nonsense. Should be a plain barrier() - if at all. > >> qemu_wait_io_event_common(env); >> } >> >> Can anyone imagine that such a barrier may actually be required? If it >> is currently possible that env->stop is evaluated before we called into >> sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the >> signal without properly processing its reason (stop). Should not be required (TM): Both signal eating / stop checking and stop setting / signal generation happens under the BQL, thus the ordering must not make a difference here. Don't see where we could lose a signal. Maybe due to a subtle memory corruption that sets thread_kicked to non-zero, preventing the kicking this way. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop 2012-08-17 15:04 ` Jan Kiszka @ 2012-08-19 9:42 ` Avi Kivity 2012-08-21 7:21 ` Jan Kiszka 0 siblings, 1 reply; 13+ messages in thread From: Avi Kivity @ 2012-08-19 9:42 UTC (permalink / raw) To: Jan Kiszka Cc: Stefan Hajnoczi, Peter Lieven, qemu-devel@nongnu.org, kvm@vger.kernel.org, Paolo Bonzini On 08/17/2012 06:04 PM, Jan Kiszka wrote: > >>> Can anyone imagine that such a barrier may actually be required? If it >>> is currently possible that env->stop is evaluated before we called into >>> sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the >>> signal without properly processing its reason (stop). > > Should not be required (TM): Both signal eating / stop checking and stop > setting / signal generation happens under the BQL, thus the ordering > must not make a difference here. Agree. > Don't see where we could lose a signal. Maybe due to a subtle memory > corruption that sets thread_kicked to non-zero, preventing the kicking > this way. Cannot be ruled out, yet too much of a coincidence. Could be a kernel bug (either in kvm or elsewhere), we've had several before in this area. Is this reproducible? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop 2012-08-19 9:42 ` Avi Kivity @ 2012-08-21 7:21 ` Jan Kiszka 2012-08-21 8:23 ` Stefan Hajnoczi 0 siblings, 1 reply; 13+ messages in thread From: Jan Kiszka @ 2012-08-21 7:21 UTC (permalink / raw) To: Avi Kivity Cc: Stefan Hajnoczi, Peter Lieven, qemu-devel@nongnu.org, kvm@vger.kernel.org, Paolo Bonzini On 2012-08-19 11:42, Avi Kivity wrote: > On 08/17/2012 06:04 PM, Jan Kiszka wrote: >> >>>> Can anyone imagine that such a barrier may actually be required? If it >>>> is currently possible that env->stop is evaluated before we called into >>>> sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the >>>> signal without properly processing its reason (stop). >> >> Should not be required (TM): Both signal eating / stop checking and stop >> setting / signal generation happens under the BQL, thus the ordering >> must not make a difference here. > > Agree. > > >> Don't see where we could lose a signal. Maybe due to a subtle memory >> corruption that sets thread_kicked to non-zero, preventing the kicking >> this way. > > Cannot be ruled out, yet too much of a coincidence. > > Could be a kernel bug (either in kvm or elsewhere), we've had several > before in this area. > > Is this reproducible? Not for me. Peter only hit it very rarely, Peter obviously more easily. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop 2012-08-21 7:21 ` Jan Kiszka @ 2012-08-21 8:23 ` Stefan Hajnoczi 2012-08-22 12:52 ` Peter Lieven 0 siblings, 1 reply; 13+ messages in thread From: Stefan Hajnoczi @ 2012-08-21 8:23 UTC (permalink / raw) To: Avi Kivity Cc: Paolo Bonzini, Peter Lieven, qemu-devel@nongnu.org, kvm@vger.kernel.org, Jan Kiszka On Tue, Aug 21, 2012 at 8:21 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote: > On 2012-08-19 11:42, Avi Kivity wrote: >> On 08/17/2012 06:04 PM, Jan Kiszka wrote: >>> >>>>> Can anyone imagine that such a barrier may actually be required? If it >>>>> is currently possible that env->stop is evaluated before we called into >>>>> sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the >>>>> signal without properly processing its reason (stop). >>> >>> Should not be required (TM): Both signal eating / stop checking and stop >>> setting / signal generation happens under the BQL, thus the ordering >>> must not make a difference here. >> >> Agree. >> >> >>> Don't see where we could lose a signal. Maybe due to a subtle memory >>> corruption that sets thread_kicked to non-zero, preventing the kicking >>> this way. >> >> Cannot be ruled out, yet too much of a coincidence. >> >> Could be a kernel bug (either in kvm or elsewhere), we've had several >> before in this area. >> >> Is this reproducible? > > Not for me. Peter only hit it very rarely, Peter obviously more easily. I have only hit this once and was not able to reproduce it. Stefan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop 2012-08-21 8:23 ` Stefan Hajnoczi @ 2012-08-22 12:52 ` Peter Lieven 0 siblings, 0 replies; 13+ messages in thread From: Peter Lieven @ 2012-08-22 12:52 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Paolo Bonzini, Jan Kiszka, Avi Kivity, kvm@vger.kernel.org, qemu-devel@nongnu.org On 08/21/12 10:23, Stefan Hajnoczi wrote: > On Tue, Aug 21, 2012 at 8:21 AM, Jan Kiszka<jan.kiszka@siemens.com> wrote: >> On 2012-08-19 11:42, Avi Kivity wrote: >>> On 08/17/2012 06:04 PM, Jan Kiszka wrote: >>>>>> Can anyone imagine that such a barrier may actually be required? If it >>>>>> is currently possible that env->stop is evaluated before we called into >>>>>> sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the >>>>>> signal without properly processing its reason (stop). >>>> Should not be required (TM): Both signal eating / stop checking and stop >>>> setting / signal generation happens under the BQL, thus the ordering >>>> must not make a difference here. >>> Agree. >>> >>> >>>> Don't see where we could lose a signal. Maybe due to a subtle memory >>>> corruption that sets thread_kicked to non-zero, preventing the kicking >>>> this way. >>> Cannot be ruled out, yet too much of a coincidence. >>> >>> Could be a kernel bug (either in kvm or elsewhere), we've had several >>> before in this area. >>> >>> Is this reproducible? >> Not for me. Peter only hit it very rarely, Peter obviously more easily. > I have only hit this once and was not able to reproduce it. For me it was very reproducible, but my issue was fixed by: http://www.mail-archive.com/kvm@vger.kernel.org/msg70908.html Never seen this since then, Peter > Stefan ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2012-08-22 12:52 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <4FEC56B2.6050502@dlhnet.de> [not found] ` <4FEC5B5A.4060302@siemens.com> [not found] ` <4FEC7214.2020900@dlhnet.de> [not found] ` <4FEC76A8.6060100@siemens.com> [not found] ` <4FEC866F.5000402@dlhnet.de> [not found] ` <4FEC8722.7070301@redhat.com> [not found] ` <7C6F41F3-D0BC-4753-853D-E68B2AAAAADB@dlhnet.de> 2012-07-01 8:19 ` [Qemu-devel] qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop Avi Kivity 2012-07-01 19:18 ` Peter Lieven 2012-07-02 7:05 ` Jan Kiszka 2012-07-02 8:12 ` Peter Lieven 2012-08-06 15:11 ` Stefan Hajnoczi 2012-08-17 13:11 ` Jan Kiszka 2012-08-17 14:36 ` Jan Kiszka 2012-08-17 14:41 ` Jan Kiszka 2012-08-17 15:04 ` Jan Kiszka 2012-08-19 9:42 ` Avi Kivity 2012-08-21 7:21 ` Jan Kiszka 2012-08-21 8:23 ` Stefan Hajnoczi 2012-08-22 12:52 ` Peter Lieven
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).