From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Lieven Subject: Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1 Date: Thu, 28 Jun 2012 12:34:54 +0200 Message-ID: <4FEC334E.7080802@dlhnet.de> References: <4FEB2945.1030607@dlhnet.de> <4FEB3AC6.6010206@web.de> <4FEC1FC9.7050103@dlhnet.de> <4FEC2210.1030005@siemens.com> <4FEC2475.4030202@dlhnet.de> <4FEC263C.5030504@siemens.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "qemu-devel@nongnu.org" , "kvm@vger.kernel.org" , Gleb Natapov To: Jan Kiszka Return-path: Received: from ssl.dlhnet.de ([91.198.192.8]:54372 "EHLO ssl.dlh.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750726Ab2F1Ke4 (ORCPT ); Thu, 28 Jun 2012 06:34:56 -0400 In-Reply-To: <4FEC263C.5030504@siemens.com> Sender: kvm-owner@vger.kernel.org List-ID: On 28.06.2012 11:39, Jan Kiszka wrote: > On 2012-06-28 11:31, Peter Lieven wrote: >> On 28.06.2012 11:21, Jan Kiszka wrote: >>> On 2012-06-28 11:11, Peter Lieven wrote: >>>> On 27.06.2012 18:54, Jan Kiszka wrote: >>>>> On 2012-06-27 17:39, Peter Lieven wrote: >>>>>> Hi all, >>>>>> >>>>>> i debugged this further and found out that kvm-kmod-3.0 is working with >>>>>> qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is >>>>>> working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0). >>>>>> Has anyone a clue which new KVM feature could cause this if a vcpu is in >>>>>> an infinite loop? >>>>> Before accusing kvm-kmod ;), can you check if the effect is visible with >>>>> an original Linux 3.3.x or 3.4.x kernel as well? >>>> sorry, i should have been more specific. maybe I also misunderstood sth. >>>> I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel >>>> 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if >>>> I use >>>> a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't. >>>> however, maybe we don't have to dig to deep - see below. >>> kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4 >>> working on an older kernel. This step may introduce bugs of its own. >>> Therefore my suggestion to use a "real" 3.x kernel to exclude that risk >>> first of all. >>> >>>>> Then, bisection the change in qemu-kvm that apparently resolved the >>>>> issue would be interesting. >>>>> >>>>> If we have to dig deeper, tracing [1] the lockup would likely be helpful >>>>> (all events of the qemu process, not just KVM related ones: trace-cmd >>>>> record -e all qemu-system-x86_64 ...). >>>> that here is bascially whats going on: >>>> >>>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read >>>> len 3 gpa 0xa0000 val 0x10ff >>>> qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva >>>> 0xa0000 gpa 0xa0000 Read GPA >>>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio >>>> unsatisfied-read len 1 gpa 0xa0000 val 0x0 >>>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason >>>> KVM_EXIT_MMIO (6) >>>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio >>>> read len 3 gpa 0xa0000 val 0x10ff >>>> qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva >>>> 0xa0000 gpa 0xa0000 Read GPA >>>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio >>>> unsatisfied-read len 1 gpa 0xa0000 val 0x0 >>>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason >>>> KVM_EXIT_MMIO (6) >>>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio >>>> read len 3 gpa 0xa0000 val 0x10ff >>>> qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva >>>> 0xa0000 gpa 0xa0000 Read GPA >>>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio >>>> unsatisfied-read len 1 gpa 0xa0000 val 0x0 >>>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason >>>> KVM_EXIT_MMIO (6) >>>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio >>>> read len 3 gpa 0xa0000 val 0x10ff >>>> qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva >>>> 0xa0000 gpa 0xa0000 Read GPA >>>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio >>>> unsatisfied-read len 1 gpa 0xa0000 val 0x0 >>>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason >>>> KVM_EXIT_MMIO (6) >>>> >>>> its doing that forever. this is tracing the kvm module. doing the >>>> qemu-system-x86_64 trace is a bit compilcated, but >>>> maybe this is already sufficient. otherwise i will of course gather this >>>> info as well. >>> That's only tracing KVM event, and it's tracing when things went wrong >>> already. We may need a full trace (-e all) specifically for the period >>> when this pattern above started. >> i will do that. maybe i should explain that the vcpu is executing >> garbage when this above starts. its basically booting from an empty >> harddisk. >> >> if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env); >> >> maybe the time to handle the monitor/qmp connection is just to short. >> if i understand furhter correctly, it can only handle monitor connections >> while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i >> wrong here? the time spend in this state might be rather short. > Unless you played with priorities and affinities, the Linux scheduler > should provide the required time to the iothread. > >> my concern is not that the machine hangs, just the the hypervisor is >> unresponsive >> and its impossible to reset or quit gracefully. the only way to get the >> hypervisor >> ended is via SIGKILL. > Right. Even if the guest runs wild, you must be able to control the vm > via the monitor etc. If not, that's a bug. what i observed just know is that the monitor is working up to the point i try to quit the hypervisor or try to reset the cpu. so we where looking at a completely wrong place... it seems that in this short excerpt, that the deadlock appears not on excution but when the vcpus shall be paused. Program received signal SIGINT, Interrupt. 0x00007fc8ec36785c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 (gdb) thread apply all bt Thread 4 (Thread 0x7fc8eb5a9700 (LWP 731)): #0 0x00007fc8ec0b7747 in ioctl () from /lib/libc.so.6 #1 0x00007fc8ed92eedc in kvm_vcpu_ioctl (env=0x7fc8ef6c4b80, type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101 #2 0x00007fc8ed92e936 in kvm_cpu_exec (env=0x7fc8ef6c4b80) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:987 #3 0x00007fc8ed8fd2b4 in qemu_kvm_cpu_thread_fn (arg=0x7fc8ef6c4b80) at /usr/src/qemu-kvm-1.0.1/cpus.c:740 #4 0x00007fc8ec3629ca in start_thread () from /lib/libpthread.so.0 #5 0x00007fc8ec0bfcdd in clone () from /lib/libc.so.6 #6 0x0000000000000000 in ?? () Thread 3 (Thread 0x7fc8eada8700 (LWP 732)): #0 0x00007fc8ec36785c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x00007fc8ed882189 in qemu_cond_wait (cond=0x7fc8ef6ead00, mutex=0x7fc8ee4f2940) at qemu-thread-posix.c:113 #2 0x00007fc8ed8fd1b7 in qemu_kvm_wait_io_event (env=0x7fc8ef6dcae0) at /usr/src/qemu-kvm-1.0.1/cpus.c:710 #3 0x00007fc8ed8fd2d8 in qemu_kvm_cpu_thread_fn (arg=0x7fc8ef6dcae0) at /usr/src/qemu-kvm-1.0.1/cpus.c:745 #4 0x00007fc8ec3629ca in start_thread () from /lib/libpthread.so.0 #5 0x00007fc8ec0bfcdd in clone () from /lib/libc.so.6 #6 0x0000000000000000 in ?? () Thread 2 (Thread 0x7fc8a8fae700 (LWP 733)): #0 0x00007fc8ec36785c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x00007fc8ed882189 in qemu_cond_wait (cond=0x7fc8efb54550, mutex=0x7fc8efb54580) at qemu-thread-posix.c:113 #2 0x00007fc8ed8b7d49 in vnc_worker_thread_loop (queue=0x7fc8efb54550) at ui/vnc-jobs-async.c:222 #3 0x00007fc8ed8b8227 in vnc_worker_thread (arg=0x7fc8efb54550) at ui/vnc-jobs-async.c:318 #4 0x00007fc8ec3629ca in start_thread () from /lib/libpthread.so.0 #5 0x00007fc8ec0bfcdd in clone () from /lib/libc.so.6 #6 0x0000000000000000 in ?? () Thread 1 (Thread 0x7fc8ed718700 (LWP 725)): #0 0x00007fc8ec36785c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x00007fc8ed882189 in qemu_cond_wait (cond=0x7fc8ee16a120, mutex=0x7fc8ee4f2940) at qemu-thread-posix.c:113 #2 0x00007fc8ed8fd67b in pause_all_vcpus () at /usr/src/qemu-kvm-1.0.1/cpus.c:881 #3 0x00007fc8ed8548cd in main (argc=38, argv=0x7fffcde70a48, envp=0x7fffcde70b80) at /usr/src/qemu-kvm-1.0.1/vl.c:3525 (gdb) q Peter