qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] Segfaults in chardev due to races
@ 2018-12-21 22:31 Max Reitz
  2018-12-22  9:17 ` Paolo Bonzini
  0 siblings, 1 reply; 5+ messages in thread
From: Max Reitz @ 2018-12-21 22:31 UTC (permalink / raw)
  To: Marc-André Lureau; +Cc: Paolo Bonzini, qemu-devel@nongnu.org

[-- Attachment #1: Type: text/plain, Size: 8556 bytes --]

Hi,

While fixing iotest 147 (which currently cannot be run in parallel),
I've noticed that after it's fixed (and does run in parallel), I see
some segfaults in chardev/char-socket.c.

They usually look like this:

#0  0x0000556f3b5765d0 in object_get_class (obj=obj@entry=0x0) at
qom/object.c:822
#1  0x0000556f3b6162a5 in qio_channel_writev_full (ioc=0x0,
iov=0x7fffc8bc9aa0, niov=1, fds=0x0, nfds=0, errp=0x0) at io/channel.c:76
#2  0x0000556f3b6056b4 in io_channel_send_full (ioc=0x0,
buf=buf@entry=0x7fd3a4006be0, len=len@entry=138, fds=0x0, nfds=0) at
chardev/char-io.c:123
#3  0x0000556f3b60a50b in tcp_chr_write
    (chr=0x556f3cfa8600, buf=0x7fd3a4006be0 "{\"timestamp\":
{\"seconds\": 1545429461, \"microseconds\": 104325}, \"event\":
\"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=138)
    at chardev/char-socket.c:135
#4  0x0000556f3b602440 in qemu_chr_write_buffer
    (s=s@entry=0x556f3cfa8600, buf=buf@entry=0x7fd3a4006be0
"{\"timestamp\": {\"seconds\": 1545429461, \"microseconds\": 104325},
\"event\": \"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=138, offset=offset@entry=0x7fffc8bc9b90,
write_all=false) at chardev/char.c:112
#5  0x0000556f3b6026ef in qemu_chr_write
    (s=0x556f3cfa8600, buf=buf@entry=0x7fd3a4006be0 "{\"timestamp\":
{\"seconds\": 1545429461, \"microseconds\": 104325}, \"event\":
\"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=<optimized out>,
write_all=write_all@entry=false) at chardev/char.c:147
#6  0x0000556f3b6046af in qemu_chr_fe_write
    (be=be@entry=0x556f3cece140, buf=buf@entry=0x7fd3a4006be0
"{\"timestamp\": {\"seconds\": 1545429461, \"microseconds\": 104325},
\"event\": \"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=<optimized out>) at chardev/char-fe.c:42
#7  0x0000556f3b2513d8 in monitor_flush_locked
(mon=mon@entry=0x556f3cece140) at monitor.c:406
#8  0x0000556f3b2515a9 in monitor_flush_locked (mon=0x556f3cece140) at
monitor.c:449
#9  0x0000556f3b2515a9 in monitor_puts (mon=mon@entry=0x556f3cece140,
str=0x556f3de59919 "") at monitor.c:449
#10 0x0000556f3b2515ef in qmp_send_response (mon=0x556f3cece140,
rsp=<optimized out>) at monitor.c:498
#11 0x0000556f3b2517ea in monitor_qapi_event_emit
(event=event@entry=QAPI_EVENT_SHUTDOWN,
qdict=qdict@entry=0x556f3e4613e0) at monitor.c:526
#12 0x0000556f3b251b22 in monitor_qapi_event_queue_no_reenter
(qdict=0x556f3e4613e0, event=QAPI_EVENT_SHUTDOWN) at monitor.c:551
#13 0x0000556f3b251b22 in monitor_qapi_event_queue
(event=QAPI_EVENT_SHUTDOWN, qdict=0x0) at monitor.c:627
#14 0x0000556f3b251b22 in monitor_qapi_event_queue
(event=event@entry=QAPI_EVENT_SHUTDOWN,
qdict=qdict@entry=0x556f3e4613e0) at monitor.c:594
#15 0x0000556f3b6563f1 in qapi_event_send_shutdown (guest=<optimized
out>, reason=<optimized out>) at qapi/qapi-events-run-state.c:49
#16 0x0000556f3b37ac07 in main_loop_should_exit () at
include/sysemu/sysemu.h:36
#17 0x0000556f3b37ac07 in main_loop () at vl.c:1909
#18 0x0000556f3b207735 in main (argc=<optimized out>, argv=<optimized
out>, envp=<optimized out>) at vl.c:4661


Investigating, I saw that sometimes tcp_chr_write() is invoked in
parallel with tcp_chr_free_connection() -- from two different
AioContexts (and two different pthreads).  s->ioc is set to NULL before
s->connected is set to 0, and because these run concurrently,
tcp_chr_write() sees the s->connected as 1 and then fetches the
already-NULL s->ioc which causes the segfault.

The issue is more easily reproducible by inserting the following into
tcp_chr_free_connection() above "s->connected = 0;":

     for (volatile int i = 0; i < 1000000; i++) {
         __asm__ __volatile__ ("" ::: "memory");

     }


Then, run this command line:

$ x86_64-softmmu/qemu-system-x86_64 -qmp tcp:localhost:4242,server

And this script:

#!/usr/bin/ruby
require 'socket'
c = TCPSocket.new('localhost', 4242)
c.readline
c.puts("{'execute':'qmp_capabilities'}")
c.readline
c.puts("{'execute':'quit'}")
c.close

(The above backtrace was generated this way because for some reason,
after everything, I couldn't see the segfaults in 147 any more.  Huh.
But anyway, the loop I added does not yield a coroutine or anything like
that, it just enlarges the race window, so I think it still is a real
issue.)

Now I thought it would suffice to just pull "s->disconnected = 0;" up to
the start of tcp_chr_free_connection().  But first of all I wasn't very
fond of that solution, because tcp_chr_write() says with an XXX comment
that seeing !s->disconnected is actually bad.

More importantly, though, once I did that, another segfault appeared
(again from running iotest 147 in parallel):

(gdb) bt
#0  0x00005629e4c7b29f in object_dynamic_cast_assert
    (obj=0x5629e6847690, typename=typename@entry=0x5629e4f3933b
"qio-channel-socket", file=file@entry=0x5629e4f423e8
"io/channel-socket.c", line=line@entry=512,
func=func@entry=0x5629e4f42b00 <__func__.20946>
"qio_channel_socket_writev") at qom/object.c:686
#1  0x00005629e4d1ded2 in qio_channel_socket_writev (ioc=<optimized
out>, iov=0x7ffe36e71c40, niov=1, fds=0x0, nfds=0, errp=0x0) at
io/channel-socket.c:512
#2  0x00005629e4d0a6b4 in io_channel_send_full (ioc=0x5629e6847690,
buf=buf@entry=0x7f7ae4006a30, len=len@entry=138, fds=0x0, nfds=0) at
chardev/char-io.c:123
#3  0x00005629e4d0f4cb in tcp_chr_write
    (chr=0x5629e67dced0, buf=0x7f7ae4006a30 "{\"timestamp\":
{\"seconds\": 1545428667, \"microseconds\": 895334}, \"event\":
\"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=138)
    at chardev/char-socket.c:135
#4  0x00005629e4d07440 in qemu_chr_write_buffer
    (s=s@entry=0x5629e67dced0, buf=buf@entry=0x7f7ae4006a30
"{\"timestamp\": {\"seconds\": 1545428667, \"microseconds\": 895334},
\"event\": \"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=138, offset=offset@entry=0x7ffe36e71d30,
write_all=false) at chardev/char.c:112
#5  0x00005629e4d076ef in qemu_chr_write
    (s=0x5629e67dced0, buf=buf@entry=0x7f7ae4006a30 "{\"timestamp\":
{\"seconds\": 1545428667, \"microseconds\": 895334}, \"event\":
\"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=<optimized out>,
write_all=write_all@entry=false) at chardev/char.c:147
#6  0x00005629e4d096af in qemu_chr_fe_write
    (be=be@entry=0x5629e681da90, buf=buf@entry=0x7f7ae4006a30
"{\"timestamp\": {\"seconds\": 1545428667, \"microseconds\": 895334},
\"event\": \"SHUTDOWN\", \"data\": {\"guest\": false, \"reason\":
\"host-qmp-quit\"}}\r\n", len=<optimized out>) at chardev/char-fe.c:42
#7  0x00005629e49563d8 in monitor_flush_locked
(mon=mon@entry=0x5629e681da90) at monitor.c:406
#8  0x00005629e49565a9 in monitor_flush_locked (mon=0x5629e681da90) at
monitor.c:449
#9  0x00005629e49565a9 in monitor_puts (mon=mon@entry=0x5629e681da90,
str=0x5629e768c7f9 "") at monitor.c:449
#10 0x00005629e49565ef in qmp_send_response (mon=0x5629e681da90,
rsp=<optimized out>) at monitor.c:498
#11 0x00005629e49567ea in monitor_qapi_event_emit
(event=event@entry=QAPI_EVENT_SHUTDOWN,
qdict=qdict@entry=0x5629e79825f0) at monitor.c:526
#12 0x00005629e4956b22 in monitor_qapi_event_queue_no_reenter
(qdict=0x5629e79825f0, event=QAPI_EVENT_SHUTDOWN) at monitor.c:551
#13 0x00005629e4956b22 in monitor_qapi_event_queue
(event=QAPI_EVENT_SHUTDOWN, qdict=0x0) at monitor.c:627
#14 0x00005629e4956b22 in monitor_qapi_event_queue
(event=event@entry=QAPI_EVENT_SHUTDOWN,
qdict=qdict@entry=0x5629e79825f0) at monitor.c:594
#15 0x00005629e4d5b3b1 in qapi_event_send_shutdown (guest=<optimized
out>, reason=<optimized out>) at qapi/qapi-events-run-state.c:49
#16 0x00005629e4a7fc07 in main_loop_should_exit () at
include/sysemu/sysemu.h:36
#17 0x00005629e4a7fc07 in main_loop () at vl.c:1909
#18 0x00005629e490c735 in main (argc=<optimized out>, argv=<optimized
out>, envp=<optimized out>) at vl.c:4661


At this point, I decided that I've ventured far enough.  (Especially
since it's Dec 21 and 11 pm and all that.)

I suppose the issue is that QMP events are sent by one thread, and
client disconnects are handled by a different one.  So if a QMP event is
sent while a client disconnects concurrently, races may occur; and the
only protection against concurrent access appears to be the
chr_write_lock, which I don't think is enough.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-02-11  6:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-21 22:31 [Qemu-devel] Segfaults in chardev due to races Max Reitz
2018-12-22  9:17 ` Paolo Bonzini
2019-01-23 15:33   ` Max Reitz
2019-02-06 18:38     ` Marc-André Lureau
2019-02-11  6:13       ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).