From: Laszlo Ersek <lersek@redhat.com>
To: "Emilio G. Cota" <cota@braap.org>
Cc: "Alex Bennée" <alex.bennee@linaro.org>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"qemu devel list" <qemu-devel@nongnu.org>,
"Cole Robinson" <crobinso@redhat.com>
Subject: [Qemu-devel] qemu <-> libvirt communication regressed in QEMU commit 5243722376
Date: Wed, 16 Sep 2015 14:13:38 +0200 [thread overview]
Message-ID: <55F95CF2.3000401@redhat.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 3350 bytes --]
Hi Emilio,
I've arrived at your patch, noted in the subject, with bisection (please
see the bisection log attached).
I'm on RHEL-7.1. Sometimes I have to work with upstream QEMU, and then I
use it with my preexistent libvirt guests, pulling QEMU somewhat
infrequently. My libvirt-related version numbers are:
libvirtd: 1.2.8-16.el7_1.3.x86_64
libvirt-python: 1.2.8-7.el7_1.1.x86_64
libvirt-g*: 0.1.7-3.el7.x86_64
virt-manager: 1.1.0-12.el7.noarch
The symptom is that when your patch is built into QEMU, then QEMU
starts, but hangs as soon as I click the specific VM's entry in
virt-manager's list.
In the process list ("ps"), I can then see two qemu processes, parent
and child. I saved backtraces for both of them, while they were hung.
The command lines are also visible in the attached text files. The line
numbers (ie. the QEMU binary) matches the tree when checked out and
built at exactly your patch.
(I double checked: if I build at 5243722376^, then it works.)
The configure command was:
./configure \
--audio-drv-list=alsa \
--target-list=x86_64-softmmu,i386-softmmu,aarch64-softmmu \
--disable-vde \
--enable-werror \
--enable-spice \
--disable-stack-protector \
--prefix=/opt/qemu-installed \
--disable-gtk \
--enable-debug \
--enable-trace-backends=stderr
I don't think libvirt, or for that matter, any QMP interfaces, have
anything to do with this. I rather believe that libvirt invokes QEMU for
retrieving the capabilities in a way that exposes a possible problem in
your patch. (Hence I provided my libvirt version numbers just to be sure.)
... In fact I'm confused about your patch. rcu_init() makes sure that at
fork(), the parent will first acquire both "rcu_sync_lock" and
"rcu_registry_lock". Meaning, no other thread in the parent can hold
those mutexen when the parent thread calling fork() actually forks.
Then, in the parent, the original thread simply releases both mutexen,
in rcu_init_unlock(). In the child, only the one thread exists that
called fork() in the parent. However, that one child thread does own the
copies of both mutexen. So it is prudent for the child to release both
copies.
Your patch causes "rcu_registry_lock" to be reinitialized in the child,
rather than released, plus "rcu_sync_lock" remains untouched (ie. locked
by the one thread that exists in the child). Why is that correct?
(Side note: we're talking process-private, not process-shared mutexen.)
I can be easily wrong, but I don't understand the commit message, and
why the patch is correct.
... Hm, I can see the discussion here:
http://thread.gmane.org/gmane.comp.emulators.qemu/356765/focus=360421
Okay... let me see 24fa90499f... "The problem is that releasing
error-checking locks in the child fails under glibc with EPERM". <--
That is a striking surprise to me, but still, the removal of
PTHREAD_MUTEX_ERRORCHECK only justifies why your patch would *not* be
necessary.
The last paragraph of your email that I linked above talks about a
"possibility of corruption". Maybe I've managed to trigger that. If so,
I hope it won't be hard to fix up.
... Hm, apparently Alex had mentioned the same concern as I did now,
about ignoring "rcu_sync_lock" in the child, in message
<http://thread.gmane.org/gmane.comp.emulators.qemu/356765/focus=360602>.
Was that concern cleared up eventually?
Thanks!
Laszlo
[-- Attachment #2: bisect.log --]
[-- Type: text/x-log, Size: 1628 bytes --]
git bisect start
# bad: [619622424dba749feef752d76d79ef2569f7f250] Merge remote-tracking branch 'remotes/berrange/tags/vnc-crypto-v9-for-upstream' into staging
git bisect bad 619622424dba749feef752d76d79ef2569f7f250
# good: [2b750d9d261bda7f75b39dfc1e1e5f22502929d5] Merge remote-tracking branch 'remotes/aurel/tags/pull-sh4-next-20150913' into staging
git bisect good 2b750d9d261bda7f75b39dfc1e1e5f22502929d5
# bad: [a2aa09e18186801931763fbd40a751fa39971b18] Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
git bisect bad a2aa09e18186801931763fbd40a751fa39971b18
# bad: [0c71d41e2aa3c7356500ae624166f3bb8c201aee] scripts/dump-guest-memory.py: fix after RAMBlock change
git bisect bad 0c71d41e2aa3c7356500ae624166f3bb8c201aee
# good: [3c9589e180d98cdadb143bd2a792fb9d19d9aec6] Move RAMBlock and ram_list to ram_addr.h
git bisect good 3c9589e180d98cdadb143bd2a792fb9d19d9aec6
# bad: [3904e6bf042391abc749d717465022e96e276fc7] cutils: Add qemu_strtoull() wrapper
git bisect bad 3904e6bf042391abc749d717465022e96e276fc7
# bad: [709037636992e9289ce9147e59d56fb35d90b140] linux-user: call rcu_(un)register_thread on pthread_(exit|create)
git bisect bad 709037636992e9289ce9147e59d56fb35d90b140
# bad: [5243722376873a48e9852a58b91f4d4101ee66e4] rcu: init rcu_registry_lock after fork
git bisect bad 5243722376873a48e9852a58b91f4d4101ee66e4
# good: [12a1ddc160cb6a73e8a6c319f3962a20da2cd22f] Makefile.target: include top level build dir in vpath
git bisect good 12a1ddc160cb6a73e8a6c319f3962a20da2cd22f
# first bad commit: [5243722376873a48e9852a58b91f4d4101ee66e4] rcu: init rcu_registry_lock after fork
[-- Attachment #3: parent.txt --]
[-- Type: text/plain, Size: 2920 bytes --]
UID PID PPID C STIME TTY TIME CMD
qemu 17305 1752 0 13:24 ? 00:00:00 /opt/qemu-installed/bin/qemu-system-i386 -S -no-user-config -nodefaults -nographic -M none -qmp unix:/var/lib/libvirt/qemu/capabilities.monitor.sock,server,nowait -pidfile /var/lib/libvirt/qemu/capabilities.pidfile -daemonize
(gdb) thread apply all bt full
Thread 2 (Thread 0x7fa9c3db7700 (LWP 17306)):
#0 0x00007fa9c7dda949 in syscall () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007fa9cebc0f73 in futex_wait (ev=0x7fa9cf5245a4 <rcu_call_ready_event>, val=4294967295) at util/qemu-thread-posix.c:301
No locals.
#2 0x00007fa9cebc106a in qemu_event_wait (ev=0x7fa9cf5245a4 <rcu_call_ready_event>) at util/qemu-thread-posix.c:408
value = 1
#3 0x00007fa9cebd4666 in call_rcu_thread (opaque=0x0) at util/rcu.c:254
tries = 0
n = 0
node = 0x7fa9ce712990
#4 0x00007fa9cd2fedf5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#5 0x00007fa9c7de01ad in clone () from /lib64/libc.so.6
No symbol table info available.
Thread 1 (Thread 0x7fa9ce6f2bc0 (LWP 17305)):
#0 0x00007fa9cd30525d in read () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x00007fa9ce915c7c in os_daemonize () at os-posix.c:223
status = 0 '\000'
len = 140733260032912
pid = 17307
fds = {4, 5}
#2 0x00007fa9ce92a803 in main (argc=12, argv=0x7fff03f8efd8, envp=0x7fff03f8f040) at vl.c:4034
i = 0
snapshot = 0
linux_boot = 0
initrd_filename = 0x7fa9d0749eb0 "îkÅΩ\177"
kernel_filename = 0x7fa9d0749ea0 ""
kernel_cmdline = 0x7fa9cebd4e20 <__libc_csu_init> "AWA\211ÿAVI\211öAUI\211ÕATL\215%"
boot_order = 0x0
boot_once = 0x0
ds = 0x7fa9cec56d38
cyls = 0
heads = 0
secs = 0
translation = 0
hda_opts = 0x0
opts = 0x7fa9d0790e90
machine_opts = 0xfffffffe7fffffff
icount_opts = 0x0
olist = 0x7fa9cf03b140 <qemu_machine_opts>
optind = 12
optarg = 0x7fa9d0790f40 "none"
loadvm = 0x0
machine_class = 0x7fa9d077a160
cpu_model = 0x0
vga_model = 0x0
qtest_chrdev = 0x0
qtest_log = 0x0
pid_file = 0x7fff03f8ff59 "/var/lib/libvirt/qemu/capabilities.pidfile"
incoming = 0x0
show_vnc_port = 0
defconfig = true
userconfig = false
log_mask = 0x0
log_file = 0x0
mem_trace = {malloc = 0x7fa9ce9276a2 <malloc_and_trace>, realloc = 0x7fa9ce9276d7 <realloc_and_trace>, free = 0x7fa9ce92771b <free_and_trace>, calloc = 0x0, try_malloc = 0x0, try_realloc = 0x0}
trace_events = 0x0
trace_file = 0x0
maxram_size = 134217728
ram_slots = 0
vmstate_dump_file = 0x0
main_loop_err = 0x0
err = 0x0
__func__ = "main"
[-- Attachment #4: child.txt --]
[-- Type: text/plain, Size: 2666 bytes --]
UID PID PPID C STIME TTY TIME CMD
qemu 17307 17305 0 13:24 ? 00:00:00 /opt/qemu-installed/bin/qemu-system-i386 -S -no-user-config -nodefaults -nographic -M none -qmp unix:/var/lib/libvirt/qemu/capabilities.monitor.sock,server,nowait -pidfile /var/lib/libvirt/qemu/capabilities.pidfile -daemonize
(gdb) thread apply all bt full
Thread 1 (Thread 0x7fa9ce6f2bc0 (LWP 17307)):
#0 0x00007fa9cd304f7d in __lll_lock_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x00007fa9cd300d32 in _L_lock_791 () from /lib64/libpthread.so.0
No symbol table info available.
#2 0x00007fa9cd300c38 in pthread_mutex_lock () from /lib64/libpthread.so.0
No symbol table info available.
#3 0x00007fa9cebc0ad1 in qemu_mutex_lock (mutex=0x7fa9cf524560 <rcu_sync_lock>) at util/qemu-thread-posix.c:73
err = 0
__func__ = "qemu_mutex_lock"
#4 0x00007fa9cebd491a in rcu_init_lock () at util/rcu.c:329
No locals.
#5 0x00007fa9c7da7512 in fork () from /lib64/libc.so.6
No symbol table info available.
#6 0x00007fa9ce915cef in os_daemonize () at os-posix.c:240
pid = 0
fds = {4, 5}
#7 0x00007fa9ce92a803 in main (argc=12, argv=0x7fff03f8efd8, envp=0x7fff03f8f040) at vl.c:4034
i = 0
snapshot = 0
linux_boot = 0
initrd_filename = 0x7fa9d0749eb0 "îkÅΩ\177"
kernel_filename = 0x7fa9d0749ea0 ""
kernel_cmdline = 0x7fa9cebd4e20 <__libc_csu_init> "AWA\211ÿAVI\211öAUI\211ÕATL\215%"
boot_order = 0x0
boot_once = 0x0
ds = 0x7fa9cec56d38
cyls = 0
heads = 0
secs = 0
translation = 0
hda_opts = 0x0
opts = 0x7fa9d0790e90
machine_opts = 0xfffffffe7fffffff
icount_opts = 0x0
olist = 0x7fa9cf03b140 <qemu_machine_opts>
optind = 12
optarg = 0x7fa9d0790f40 "none"
loadvm = 0x0
machine_class = 0x7fa9d077a160
cpu_model = 0x0
vga_model = 0x0
qtest_chrdev = 0x0
qtest_log = 0x0
pid_file = 0x7fff03f8ff59 "/var/lib/libvirt/qemu/capabilities.pidfile"
incoming = 0x0
show_vnc_port = 0
defconfig = true
userconfig = false
log_mask = 0x0
log_file = 0x0
mem_trace = {malloc = 0x7fa9ce9276a2 <malloc_and_trace>, realloc = 0x7fa9ce9276d7 <realloc_and_trace>, free = 0x7fa9ce92771b <free_and_trace>, calloc = 0x0, try_malloc = 0x0, try_realloc = 0x0}
trace_events = 0x0
trace_file = 0x0
maxram_size = 134217728
ram_slots = 0
vmstate_dump_file = 0x0
main_loop_err = 0x0
err = 0x0
__func__ = "main"
next reply other threads:[~2015-09-16 12:13 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-16 12:13 Laszlo Ersek [this message]
2015-09-16 12:26 ` [Qemu-devel] qemu <-> libvirt communication regressed in QEMU commit 5243722376 Paolo Bonzini
2015-09-16 13:16 ` Laszlo Ersek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55F95CF2.3000401@redhat.com \
--to=lersek@redhat.com \
--cc=alex.bennee@linaro.org \
--cc=cota@braap.org \
--cc=crobinso@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).