From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Lieven Subject: Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1 Date: Thu, 28 Jun 2012 11:11:37 +0200 Message-ID: <4FEC1FC9.7050103@dlhnet.de> References: <4FEB2945.1030607@dlhnet.de> <4FEB3AC6.6010206@web.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, Gleb Natapov To: Jan Kiszka Return-path: Received: from ssl.dlhnet.de ([91.198.192.8]:53643 "EHLO ssl.dlh.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932163Ab2F1JLk (ORCPT ); Thu, 28 Jun 2012 05:11:40 -0400 In-Reply-To: <4FEB3AC6.6010206@web.de> Sender: kvm-owner@vger.kernel.org List-ID: On 27.06.2012 18:54, Jan Kiszka wrote: > On 2012-06-27 17:39, Peter Lieven wrote: >> Hi all, >> >> i debugged this further and found out that kvm-kmod-3.0 is working with >> qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is >> working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0). >> Has anyone a clue which new KVM feature could cause this if a vcpu is in >> an infinite loop? > Before accusing kvm-kmod ;), can you check if the effect is visible with > an original Linux 3.3.x or 3.4.x kernel as well? sorry, i should have been more specific. maybe I also misunderstood sth. I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if I use a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't. however, maybe we don't have to dig to deep - see below. > Then, bisection the change in qemu-kvm that apparently resolved the > issue would be interesting. > > If we have to dig deeper, tracing [1] the lockup would likely be helpful > (all events of the qemu process, not just KVM related ones: trace-cmd > record -e all qemu-system-x86_64 ...). that here is bascially whats going on: qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa0000 val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa0000 gpa 0xa0000 Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa0000 val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa0000 val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa0000 gpa 0xa0000 Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa0000 val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa0000 val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa0000 gpa 0xa0000 Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa0000 val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa0000 val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa0000 gpa 0xa0000 Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa0000 val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) its doing that forever. this is tracing the kvm module. doing the qemu-system-x86_64 trace is a bit compilcated, but maybe this is already sufficient. otherwise i will of course gather this info as well. thanks peter > Jan > > [1] http://www.linux-kvm.org/page/Tracing > >> --- >> >> Hi, >> >> we recently came across multiple VMs racing and stopping working. It >> seems to happen when the system is at 100% cpu. >> One way to reproduce this is: >> qemu-kvm-1.0.1 with vnc-thread enabled >> >> cmdline (or similar): >> /usr/bin/qemu-kvm-1.0.1 -net >> tap,vlan=141,script=no,downscript=no,ifname=tap15,vnet_hdr -net >> nic,vlan=141,model=virtio,macaddr=52:54:00:ff:00:f7 -drive >> format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-efdf4e007-16700198c7f4fead-02-debug-race-hd01,if=virtio,cache=none,aio=native >> -m 2048 -smp 2,sockets=1,cores=2,threads=1 -monitor >> tcp:0:4026,server,nowait -vnc :26 -qmp tcp:0:3026,server,nowait -name >> 02-debug-race -boot order=dc,menu=off -cdrom >> /home/kvm/cdrom//root/ubuntu-12.04-server-amd64.iso -k de -pidfile >> /var/run/qemu/vm-221.pid -mem-prealloc -cpu >> host,+x2apic,model_id=Intel(R) Xeon(R) CPU L5640 @ >> 2.27GHz,-tsc -rtc base=utc -usb -usbdevice tablet -no-hpet -vga cirrus >> >> it is important that the attached virtio image contains only zeroes. if >> the system boots from cd, select boot from first harddisk. >> the hypervisor then hangs at 100% cpu and neither monitor nor qmp are >> responsive anymore. >> >> i have also seen customers reporting this when a VM is shut down. >> >> if this is connected to the threaded vnc server it might be important to >> connected at this time. >> >> debug backtrace attached. >> >> Thanks, >> Peter >> >> -- >> >> (gdb) file /usr/bin/qemu-kvm-1.0.1 >> Reading symbols from /usr/bin/qemu-kvm-1.0.1...done. >> (gdb) attach 5145 >> Attaching to program: /usr/bin/qemu-kvm-1.0.1, process 5145 >> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols >> found)...done. >> Loaded symbols for /lib64/ld-linux-x86-64.so.2 >> [Thread debugging using libthread_db enabled] >> [New Thread 0x7f54d08b9700 (LWP 5253)] >> [New Thread 0x7f5552757700 (LWP 5152)] >> [New Thread 0x7f5552f58700 (LWP 5151)] >> 0x00007f5553c6b5a3 in select () from /lib/libc.so.6 >> (gdb) info threads >> 4 Thread 0x7f5552f58700 (LWP 5151) 0x00007f5553c6a747 in ioctl () >> from /lib/libc.so.6 >> 3 Thread 0x7f5552757700 (LWP 5152) 0x00007f5553c6a747 in ioctl () >> from /lib/libc.so.6 >> 2 Thread 0x7f54d08b9700 (LWP 5253) 0x00007f5553f1a85c in >> pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 >> * 1 Thread 0x7f555550d700 (LWP 5145) 0x00007f5553c6b5a3 in select () >> from /lib/libc.so.6 >> (gdb) thread apply all bt >> >> Thread 4 (Thread 0x7f5552f58700 (LWP 5151)): >> #0 0x00007f5553c6a747 in ioctl () from /lib/libc.so.6 >> #1 0x00007f5555727830 in kvm_vcpu_ioctl (env=0x7f5557652f10, >> type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101 >> #2 0x00007f555572728a in kvm_cpu_exec (env=0x7f5557652f10) at >> /usr/src/qemu-kvm-1.0.1/kvm-all.c:987 >> #3 0x00007f55556f5c08 in qemu_kvm_cpu_thread_fn (arg=0x7f5557652f10) at >> /usr/src/qemu-kvm-1.0.1/cpus.c:740 >> #4 0x00007f5553f159ca in start_thread () from /lib/libpthread.so.0 >> #5 0x00007f5553c72cdd in clone () from /lib/libc.so.6 >> #6 0x0000000000000000 in ?? () >> >> Thread 3 (Thread 0x7f5552757700 (LWP 5152)): >> #0 0x00007f5553c6a747 in ioctl () from /lib/libc.so.6 >> #1 0x00007f5555727830 in kvm_vcpu_ioctl (env=0x7f555766ae60, >> type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101 >> #2 0x00007f555572728a in kvm_cpu_exec (env=0x7f555766ae60) at >> /usr/src/qemu-kvm-1.0.1/kvm-all.c:987 >> #3 0x00007f55556f5c08 in qemu_kvm_cpu_thread_fn (arg=0x7f555766ae60) at >> /usr/src/qemu-kvm-1.0.1/cpus.c:740 >> #4 0x00007f5553f159ca in start_thread () from /lib/libpthread.so.0 >> #5 0x00007f5553c72cdd in clone () from /lib/libc.so.6 >> #6 0x0000000000000000 in ?? () >> >> Thread 2 (Thread 0x7f54d08b9700 (LWP 5253)): >> #0 0x00007f5553f1a85c in pthread_cond_wait@@GLIBC_2.3.2 () from >> /lib/libpthread.so.0 >> #1 0x00007f5555679f5d in qemu_cond_wait (cond=0x7f5557ede1e0, >> mutex=0x7f5557ede210) at qemu-thread-posix.c:113 >> #2 0x00007f55556b06a1 in vnc_worker_thread_loop (queue=0x7f5557ede1e0) >> at ui/vnc-jobs-async.c:222 >> #3 0x00007f55556b0b7f in vnc_worker_thread (arg=0x7f5557ede1e0) at >> ui/vnc-jobs-async.c:318 >> #4 0x00007f5553f159ca in start_thread () from /lib/libpthread.so.0 >> #5 0x00007f5553c72cdd in clone () from /lib/libc.so.6 >> #6 0x0000000000000000 in ?? () >> >> Thread 1 (Thread 0x7f555550d700 (LWP 5145)): >> #0 0x00007f5553c6b5a3 in select () from /lib/libc.so.6 >> #1 0x00007f55556516be in main_loop_wait (nonblocking=0) at main-loop.c:456 >> #2 0x00007f5555647ad0 in main_loop () at /usr/src/qemu-kvm-1.0.1/vl.c:1482 >> #3 0x00007f555564c698 in main (argc=38, argv=0x7ffff9d894a8, >> envp=0x7ffff9d895e0) at /usr/src/qemu-kvm-1.0.1/vl.c:3523 >> (gdb) thread apply all bt full >> >> Thread 4 (Thread 0x7f5552f58700 (LWP 5151)): >> #0 0x00007f5553c6a747 in ioctl () from /lib/libc.so.6 >> No symbol table info available. >> #1 0x00007f5555727830 in kvm_vcpu_ioctl (env=0x7f5557652f10, >> type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101 >> ret = 32597 >> arg = 0x0 >> ap = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = >> 0x7f5552f57e50, reg_save_area = 0x7f5552f57d90}} >> #2 0x00007f555572728a in kvm_cpu_exec (env=0x7f5557652f10) at >> /usr/src/qemu-kvm-1.0.1/kvm-all.c:987 >> run = 0x7f55553e2000 >> ret = 0 >> run_ret = 0 >> #3 0x00007f55556f5c08 in qemu_kvm_cpu_thread_fn (arg=0x7f5557652f10) at >> /usr/src/qemu-kvm-1.0.1/cpus.c:740 >> env = 0x7f5557652f10 >> r = 0 >> #4 0x00007f5553f159ca in start_thread () from /lib/libpthread.so.0 >> No symbol table info available. >> #5 0x00007f5553c72cdd in clone () from /lib/libc.so.6 >> No symbol table info available. >> #6 0x0000000000000000 in ?? () >> No symbol table info available. >> >> Thread 3 (Thread 0x7f5552757700 (LWP 5152)): >> #0 0x00007f5553c6a747 in ioctl () from /lib/libc.so.6 >> No symbol table info available. >> #1 0x00007f5555727830 in kvm_vcpu_ioctl (env=0x7f555766ae60, >> type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101 >> ret = 0 >> arg = 0x0 >> ap = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = >> 0x7f5552756e50, reg_save_area = 0x7f5552756d90}} >> #2 0x00007f555572728a in kvm_cpu_exec (env=0x7f555766ae60) at >> /usr/src/qemu-kvm-1.0.1/kvm-all.c:987 >> run = 0x7f55553df000 >> ret = 32597 >> run_ret = 1433358864 >> #3 0x00007f55556f5c08 in qemu_kvm_cpu_thread_fn (arg=0x7f555766ae60) at >> /usr/src/qemu-kvm-1.0.1/cpus.c:740 >> env = 0x7f555766ae60 >> r = 65536 >> #4 0x00007f5553f159ca in start_thread () from /lib/libpthread.so.0 >> No symbol table info available. >> #5 0x00007f5553c72cdd in clone () from /lib/libc.so.6 >> No symbol table info available. >> #6 0x0000000000000000 in ?? () >> No symbol table info available. >> >> Thread 2 (Thread 0x7f54d08b9700 (LWP 5253)): >> #0 0x00007f5553f1a85c in pthread_cond_wait@@GLIBC_2.3.2 () from >> /lib/libpthread.so.0 >> ---Type to continue, or q to quit--- >> No symbol table info available. >> #1 0x00007f5555679f5d in qemu_cond_wait (cond=0x7f5557ede1e0, >> mutex=0x7f5557ede210) at qemu-thread-posix.c:113 >> err = 32597 >> __func__ = "qemu_cond_wait" >> #2 0x00007f55556b06a1 in vnc_worker_thread_loop (queue=0x7f5557ede1e0) >> at ui/vnc-jobs-async.c:222 >> job = 0x7f5557edd510 >> entry = 0x0 >> tmp = 0x0 >> vs = {csock = -1, ds = 0x7f5557e8ec40, dirty = {{0, 0, 0} >> }, lossy_rect = 0x7f5557edd570, vd = 0x7f54d08ba010, >> need_update = 0, force_update = 0, >> features = 227, absolute = 0, last_x = 0, last_y = 0, >> client_width = 0, client_height = 0, vnc_encoding = 6, major = 0, minor >> = 0, auth = 0, >> challenge = '\000', info = 0x0, output = >> {capacity = 1230913, offset = 1448, buffer = 0x7f5558176d60 ""}, input = >> {capacity = 0, offset = 0, >> buffer = 0x0}, write_pixels = 0x7f55556b2aaf >> , clientds = {flags = 0 '\000', width = 640, >> height = 480, linesize = 1280, >> data = 0x7f54d0c00000 "B\a", pf = {bits_per_pixel = 32 ' ', >> bytes_per_pixel = 4 '\004', depth = 24 '\030', rmask = 16711680, gmask = >> 65280, bmask = 255, amask = 0, >> rshift = 16 '\020', gshift = 8 '\b', bshift = 0 '\000', >> ashift = 0 '\000', rmax = 255 '\377', gmax = 255 '\377', bmax = 255 >> '\377', amax = 0 '\000', >> rbits = 8 '\b', gbits = 8 '\b', bbits = 8 '\b', abits = 0 >> '\000'}}, audio_cap = 0x0, as = {freq = 0, nchannels = 0, fmt = >> AUD_FMT_U8, endianness = 0}, >> read_handler = 0, read_handler_expect = 0, modifiers_state = >> '\000', led = 0x0, abort = false, output_mutex = >> {lock = {__data = {__lock = 0, >> __count = 0, __owner = 0, __nusers = 0, __kind = 0, >> __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' >> , __align = 0}}, >> bh = 0x0, jobs_buffer = {capacity = 0, offset = 0, buffer = >> 0x0}, tight = {type = 0, quality = 255 '\377', compression = 9 '\t', >> pixel24 = 0 '\000', tight = { >> capacity = 0, offset = 0, buffer = 0x0}, tmp = {capacity = >> 0, offset = 0, buffer = 0x0}, zlib = {capacity = 0, offset = 0, buffer = >> 0x0}, gradient = { >> capacity = 0, offset = 0, buffer = 0x0}, levels = {0, 0, >> 0, 0}, stream = {{next_in = 0x0, avail_in = 0, total_in = 0, next_out = >> 0x0, avail_out = 0, >> total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree >> = 0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, {next_in = >> 0x0, avail_in = 0, total_in = 0, >> next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, >> state = 0x0, zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = >> 0, reserved = 0}, { >> next_in = 0x0, avail_in = 0, total_in = 0, next_out = >> 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, >> zfree = 0, opaque = 0x0, >> data_type = 0, adler = 0, reserved = 0}, {next_in = 0x0, >> avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = >> 0, msg = 0x0, state = 0x0, >> zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, >> adler = 0, reserved = 0}}}, zlib = {zlib = {capacity = 1229488, offset = >> 1152000, >> buffer = 0x7f5557fe9320 ""}, tmp = {capacity = 1230913, >> offset = 20, buffer = 0x7f5558176d60 ""}, stream = {next_in = >> 0x7f5558102720 "\030\b ", avail_in = 0, >> total_in = 7658880, next_out = 0x7f5558177308 "", >> avail_out = 1229465, total_out = 49027, msg = 0x0, state = 0x7f55581155e0, >> zalloc = 0x7f55556a622f, zfree = >> 0x7f55556a626c, opaque = 0x7f54d08a6810, data_type = 0, >> adler = 197164569, reserved = 0}, >> level = 9}, hextile = {send_tile = 0x7f55556a1180 >> }, zrle = {type = 0, fb = {capacity = 0, >> offset = 0, buffer = 0x0}, zrle = { >> capacity = 0, offset = 0, buffer = 0x0}, tmp = {capacity = >> 0, offset = 0, buffer = 0x0}, zlib = {capacity = 0, offset = 0, buffer = >> 0x0}, stream = {next_in = 0x0, >> avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, >> total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0, opaque = >> 0x0, data_type = 0, adler = 0, >> reserved = 0}, palette = {pool = {{idx = 0, color = 0, >> next = {le_next = 0x0, le_prev = 0x0}}}, size = 0, >> max = 0, bpp = 0, table = {{ >> lh_first = 0x0}}}}, zywrle = {buf >> = {0}}, mouse_mode_notifier = {notify = 0, node = >> {tqe_next = 0x0, >> tqe_prev = 0x0}}, next = {tqe_next = 0x0, tqe_prev = 0x0}} >> n_rectangles = 1 >> saved_offset = 2 >> #3 0x00007f55556b0b7f in vnc_worker_thread (arg=0x7f5557ede1e0) at >> ui/vnc-jobs-async.c:318 >> queue = 0x7f5557ede1e0 >> #4 0x00007f5553f159ca in start_thread () from /lib/libpthread.so.0 >> No symbol table info available. >> #5 0x00007f5553c72cdd in clone () from /lib/libc.so.6 >> No symbol table info available. >> #6 0x0000000000000000 in ?? () >> No symbol table info available. >> >> ---Type to continue, or q to quit--- >> Thread 1 (Thread 0x7f555550d700 (LWP 5145)): >> #0 0x00007f5553c6b5a3 in select () from /lib/libc.so.6 >> No symbol table info available. >> #1 0x00007f55556516be in main_loop_wait (nonblocking=0) at main-loop.c:456 >> rfds = {fds_bits = {14197552, 0}} >> wfds = {fds_bits = {0}} >> xfds = {fds_bits = {0}} >> ret = 32597 >> nfds = 23 >> tv = {tv_sec = 0, tv_usec = 817147} >> timeout = 1000 >> #2 0x00007f5555647ad0 in main_loop () at /usr/src/qemu-kvm-1.0.1/vl.c:1482 >> nonblocking = false >> last_io = 1 >> #3 0x00007f555564c698 in main (argc=38, argv=0x7ffff9d894a8, >> envp=0x7ffff9d895e0) at /usr/src/qemu-kvm-1.0.1/vl.c:3523 >> gdbstub_dev = 0x0 >> i = 64 >> snapshot = 0 >> linux_boot = 0 >> icount_option = 0x0 >> initrd_filename = 0x0 >> kernel_filename = 0x0 >> kernel_cmdline = 0x7f55557d8bef "" >> boot_devices = "dc", '\000' >> ds = 0x7f5557e8ec40 >> dcl = 0x0 >> cyls = 0 >> heads = 0 >> secs = 0 >> translation = 0 >> hda_opts = 0x0 >> opts = 0x7f5557637cb0 >> olist = 0x7ffff9d89268 >> optind = 38 >> optarg = 0x7ffff9d89bed "cirrus" >> loadvm = 0x0 >> machine = 0x7f5555b53500 >> cpu_model = 0x7ffff9d89b6f "host,+x2apic,model_id=Intel(R) >> Xeon(R) CPU", ' ', "L5640 @ 2.27GHz,-tsc" >> pid_file = 0x7ffff9d89b43 "/var/run/qemu/vm-221.pid" >> incoming = 0x0 >> show_vnc_port = 0 >> defconfig = 1 >> log_mask = 0x0 >> log_file = 0x0 >> mem_trace = {malloc = 0x7f5555649081, realloc >> = 0x7f55556490b6, free = 0x7f55556490fa >> , calloc = 0, >> ---Type to continue, or q to quit--- >> try_malloc = 0, try_realloc = 0} >> trace_events = 0x0 >> trace_file = 0x0 >> (gdb) >> >> >> > > >