qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition
@ 2019-02-22 11:59 Alberto Garcia
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 1/3] main-loop: Fix GSource leak in qio_task_thread_worker() Alberto Garcia
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Alberto Garcia @ 2019-02-22 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alberto Garcia, Paolo Bonzini, Daniel P . Berrangé,
	Marc-André Lureau

This fixes a race condition in which the tcp_chr_read() ioc handler
can close a connection that is being written to from another thread.

This is essentially v1 rebased on top of the current master, after
Daniel and Marc-André's chardev series have been merged.

Note: vhost-user-test still fails if QTEST_VHOST_USER_FIXME is set.

Berto

RFC: https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg01510.html

v1: https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg01834.html
- Fixes memory leaks and adds a qemu_idle_add() function

v2:
- Rebased on top of the current master (fc3dbb90f2eb069801bfb4cfe9cbc)
- Patches 1 and 2: Remove the changes in char-pty.c, they're not
                   needed after the rebase.
- Patch 3: Fix conflicts after the rebase.

git backport-diff against v1:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/3:[down] 'main-loop: Fix GSource leak in qio_task_thread_worker()'
002/3:[0027] [FC] 'main-loop: Add qemu_idle_add()'
003/3:[0004] [FC] 'char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout()'

Alberto Garcia (3):
  main-loop: Fix GSource leak in qio_task_thread_worker()
  main-loop: Add qemu_idle_add()
  char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout()

 chardev/char-socket.c    | 19 +++++++++++++++++--
 include/qemu/main-loop.h | 12 ++++++++++++
 io/task.c                |  9 +++------
 util/main-loop.c         |  9 +++++++++
 4 files changed, 41 insertions(+), 8 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v2 1/3] main-loop: Fix GSource leak in qio_task_thread_worker()
  2019-02-22 11:59 [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition Alberto Garcia
@ 2019-02-22 11:59 ` Alberto Garcia
  2019-02-22 12:08   ` Daniel P. Berrangé
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 2/3] main-loop: Add qemu_idle_add() Alberto Garcia
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Alberto Garcia @ 2019-02-22 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alberto Garcia, Paolo Bonzini, Daniel P . Berrangé,
	Marc-André Lureau

After g_source_attach() the GMainContext holds a reference to the
GSource, so the caller does not need to keep it.

qio_task_thread_worker() is not releasing its reference so the GSource
is being leaked since a17536c594bfed94d05667b419f747b692f5fc7f.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 io/task.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/io/task.c b/io/task.c
index 64c4c7126a..1ae7b86488 100644
--- a/io/task.c
+++ b/io/task.c
@@ -136,6 +136,7 @@ static gpointer qio_task_thread_worker(gpointer opaque)
                           qio_task_thread_result, task, NULL);
     g_source_attach(task->thread->completion,
                     task->thread->context);
+    g_source_unref(task->thread->completion);
     trace_qio_task_thread_source_attach(task, task->thread->completion);
 
     qemu_cond_signal(&task->thread_cond);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v2 2/3] main-loop: Add qemu_idle_add()
  2019-02-22 11:59 [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition Alberto Garcia
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 1/3] main-loop: Fix GSource leak in qio_task_thread_worker() Alberto Garcia
@ 2019-02-22 11:59 ` Alberto Garcia
  2019-02-22 12:10   ` Daniel P. Berrangé
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 3/3] char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout() Alberto Garcia
  2019-02-22 12:36 ` [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition Paolo Bonzini
  3 siblings, 1 reply; 9+ messages in thread
From: Alberto Garcia @ 2019-02-22 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alberto Garcia, Paolo Bonzini, Daniel P . Berrangé,
	Marc-André Lureau

This works like g_idle_add() but allows specifying a different
GMainContext. It also returns the GSource directly instead of its ID
number for convenience.

qio_task_thread_worker() is modified to make use of this new function.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 include/qemu/main-loop.h | 12 ++++++++++++
 io/task.c                | 10 +++-------
 util/main-loop.c         |  9 +++++++++
 3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index f6ba78ea73..f966c015d0 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -295,6 +295,18 @@ void qemu_mutex_lock_iothread_impl(const char *file, int line);
  */
 void qemu_mutex_unlock_iothread(void);
 
+/**
+ * qemu_idle_add: Add an idle function to a GMainContext
+ *
+ * This function is similar to GLib's g_idle_add() but it allows
+ * specifying a GMainContext instead of using the global one.
+ *
+ * The returned GSource is owned by the GMainContext and is deleted
+ * automatically after @func returns false. It can also be deleted by
+ * removing it from the GMainContext using g_source_destroy().
+ */
+GSource *qemu_idle_add(GSourceFunc func, gpointer data, GMainContext *ctx);
+
 /* internal interfaces */
 
 void qemu_fd_register(int fd);
diff --git a/io/task.c b/io/task.c
index 1ae7b86488..16754e22ba 100644
--- a/io/task.c
+++ b/io/task.c
@@ -19,6 +19,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 #include "io/task.h"
 #include "qapi/error.h"
 #include "qemu/thread.h"
@@ -130,13 +131,8 @@ static gpointer qio_task_thread_worker(gpointer opaque)
     trace_qio_task_thread_exit(task);
 
     qemu_mutex_lock(&task->thread_lock);
-
-    task->thread->completion = g_idle_source_new();
-    g_source_set_callback(task->thread->completion,
-                          qio_task_thread_result, task, NULL);
-    g_source_attach(task->thread->completion,
-                    task->thread->context);
-    g_source_unref(task->thread->completion);
+    task->thread->completion = qemu_idle_add(qio_task_thread_result, task,
+                                             task->thread->context);
     trace_qio_task_thread_source_attach(task, task->thread->completion);
 
     qemu_cond_signal(&task->thread_cond);
diff --git a/util/main-loop.c b/util/main-loop.c
index d4a521caeb..e23eda68cc 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -171,6 +171,15 @@ int qemu_init_main_loop(Error **errp)
     return 0;
 }
 
+GSource *qemu_idle_add(GSourceFunc func, gpointer data, GMainContext *ctx)
+{
+    GSource *idle = g_idle_source_new();
+    g_source_set_callback(idle, func, data, NULL);
+    g_source_attach(idle, ctx);
+    g_source_unref(idle);
+    return idle;
+}
+
 static int max_priority;
 
 #ifndef _WIN32
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v2 3/3] char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout()
  2019-02-22 11:59 [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition Alberto Garcia
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 1/3] main-loop: Fix GSource leak in qio_task_thread_worker() Alberto Garcia
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 2/3] main-loop: Add qemu_idle_add() Alberto Garcia
@ 2019-02-22 11:59 ` Alberto Garcia
  2019-02-22 12:16   ` Daniel P. Berrangé
  2019-02-22 12:36 ` [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition Paolo Bonzini
  3 siblings, 1 reply; 9+ messages in thread
From: Alberto Garcia @ 2019-02-22 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alberto Garcia, Paolo Bonzini, Daniel P . Berrangé,
	Marc-André Lureau

There's a race condition in which the tcp_chr_read() ioc handler can
close a connection that is being written to from another thread.

Running iotest 136 in a loop triggers this problem and crashes QEMU.

 (gdb) bt
 #0  0x00005558b842902d in object_get_class (obj=0x0) at qom/object.c:860
 #1  0x00005558b84f92db in qio_channel_writev_full (ioc=0x0, iov=0x7ffc355decf0, niov=1, fds=0x0, nfds=0, errp=0x0) at io/channel.c:76
 #2  0x00005558b84e0e9e in io_channel_send_full (ioc=0x0, buf=0x5558baf5beb0, len=138, fds=0x0, nfds=0) at chardev/char-io.c:123
 #3  0x00005558b84e4a69 in tcp_chr_write (chr=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138) at chardev/char-socket.c:135
 #4  0x00005558b84dca55 in qemu_chr_write_buffer (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, offset=0x7ffc355dedd0, write_all=false) at chardev/char.c:112
 #5  0x00005558b84dcbc2 in qemu_chr_write (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, write_all=false) at chardev/char.c:147
 #6  0x00005558b84dfb26 in qemu_chr_fe_write (be=0x5558ba476610, buf=0x5558baf5beb0 "...", len=138) at chardev/char-fe.c:42
 #7  0x00005558b8088c86 in monitor_flush_locked (mon=0x5558ba476610) at monitor.c:406
 #8  0x00005558b8088e8c in monitor_puts (mon=0x5558ba476610, str=0x5558ba921e49 "") at monitor.c:449
 #9  0x00005558b8089178 in qmp_send_response (mon=0x5558ba476610, rsp=0x5558bb161600) at monitor.c:498
 #10 0x00005558b808920c in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:526
 #11 0x00005558b8089307 in monitor_qapi_event_queue_no_reenter (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:551
 #12 0x00005558b80896c0 in qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:626
 #13 0x00005558b855f23b in qapi_event_send_shutdown (guest=false, reason=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at qapi/qapi-events-run-state.c:43
 #14 0x00005558b81911ef in qemu_system_shutdown (cause=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at vl.c:1837
 #15 0x00005558b8191308 in main_loop_should_exit () at vl.c:1885
 #16 0x00005558b819140d in main_loop () at vl.c:1924
 #17 0x00005558b8198c84 in main (argc=18, argv=0x7ffc355df3f8, envp=0x7ffc355df490) at vl.c:4665

This patch adds a lock to protect tcp_chr_disconnect() and
socket_reconnect_timeout()

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 chardev/char-socket.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 4fcdd8aedd..67c2eeac6d 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -172,7 +172,9 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
 
         if (ret < 0 && errno != EAGAIN) {
             if (tcp_chr_read_poll(chr) <= 0) {
+                qemu_mutex_unlock(&chr->chr_write_lock);
                 tcp_chr_disconnect(chr);
+                qemu_mutex_lock(&chr->chr_write_lock);
                 return len;
             } /* else let the read handler finish it properly */
         }
@@ -464,6 +466,13 @@ static void update_disconnected_filename(SocketChardev *s)
     }
 }
 
+static gboolean tcp_chr_be_event_closed(gpointer opaque)
+{
+    Chardev *chr = opaque;
+    qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
+    return FALSE;
+}
+
 /* NB may be called even if tcp_chr_connect has not been
  * reached, due to TLS or telnet initialization failure,
  * so can *not* assume s->state == TCP_CHARDEV_STATE_CONNECTED
@@ -471,7 +480,10 @@ static void update_disconnected_filename(SocketChardev *s)
 static void tcp_chr_disconnect(Chardev *chr)
 {
     SocketChardev *s = SOCKET_CHARDEV(chr);
-    bool emit_close = s->state == TCP_CHARDEV_STATE_CONNECTED;
+    bool emit_close;
+
+    qemu_mutex_lock(&chr->chr_write_lock);
+    emit_close = s->state == TCP_CHARDEV_STATE_CONNECTED;
 
     tcp_chr_free_connection(chr);
 
@@ -481,11 +493,12 @@ static void tcp_chr_disconnect(Chardev *chr)
     }
     update_disconnected_filename(s);
     if (emit_close) {
-        qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
+        qemu_idle_add(tcp_chr_be_event_closed, chr, chr->gcontext);
     }
     if (s->reconnect_time) {
         qemu_chr_socket_restart_timer(chr);
     }
+    qemu_mutex_unlock(&chr->chr_write_lock);
 }
 
 static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, void *opaque)
@@ -1128,8 +1141,10 @@ static gboolean socket_reconnect_timeout(gpointer opaque)
     Chardev *chr = CHARDEV(opaque);
     SocketChardev *s = SOCKET_CHARDEV(opaque);
 
+    qemu_mutex_lock(&chr->chr_write_lock);
     g_source_unref(s->reconnect_timer);
     s->reconnect_timer = NULL;
+    qemu_mutex_unlock(&chr->chr_write_lock);
 
     if (chr->be_open) {
         return false;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] main-loop: Fix GSource leak in qio_task_thread_worker()
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 1/3] main-loop: Fix GSource leak in qio_task_thread_worker() Alberto Garcia
@ 2019-02-22 12:08   ` Daniel P. Berrangé
  0 siblings, 0 replies; 9+ messages in thread
From: Daniel P. Berrangé @ 2019-02-22 12:08 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: qemu-devel, Paolo Bonzini, Marc-André Lureau

In $SUBJECT  s/main-loop/io/

On Fri, Feb 22, 2019 at 01:59:10PM +0200, Alberto Garcia wrote:
> After g_source_attach() the GMainContext holds a reference to the
> GSource, so the caller does not need to keep it.
> 
> qio_task_thread_worker() is not releasing its reference so the GSource
> is being leaked since a17536c594bfed94d05667b419f747b692f5fc7f.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  io/task.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/io/task.c b/io/task.c
> index 64c4c7126a..1ae7b86488 100644
> --- a/io/task.c
> +++ b/io/task.c
> @@ -136,6 +136,7 @@ static gpointer qio_task_thread_worker(gpointer opaque)
>                            qio_task_thread_result, task, NULL);
>      g_source_attach(task->thread->completion,
>                      task->thread->context);
> +    g_source_unref(task->thread->completion);
>      trace_qio_task_thread_source_attach(task, task->thread->completion);
>  
>      qemu_cond_signal(&task->thread_cond);

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
 
Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] main-loop: Add qemu_idle_add()
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 2/3] main-loop: Add qemu_idle_add() Alberto Garcia
@ 2019-02-22 12:10   ` Daniel P. Berrangé
  0 siblings, 0 replies; 9+ messages in thread
From: Daniel P. Berrangé @ 2019-02-22 12:10 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: qemu-devel, Paolo Bonzini, Marc-André Lureau

On Fri, Feb 22, 2019 at 01:59:11PM +0200, Alberto Garcia wrote:
> This works like g_idle_add() but allows specifying a different
> GMainContext. It also returns the GSource directly instead of its ID
> number for convenience.
> 
> qio_task_thread_worker() is modified to make use of this new function.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  include/qemu/main-loop.h | 12 ++++++++++++
>  io/task.c                | 10 +++-------
>  util/main-loop.c         |  9 +++++++++
>  3 files changed, 24 insertions(+), 7 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout()
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 3/3] char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout() Alberto Garcia
@ 2019-02-22 12:16   ` Daniel P. Berrangé
  2019-02-22 12:32     ` Alberto Garcia
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel P. Berrangé @ 2019-02-22 12:16 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: qemu-devel, Paolo Bonzini, Marc-André Lureau

On Fri, Feb 22, 2019 at 01:59:12PM +0200, Alberto Garcia wrote:
> There's a race condition in which the tcp_chr_read() ioc handler can
> close a connection that is being written to from another thread.
> 
> Running iotest 136 in a loop triggers this problem and crashes QEMU.
> 
>  (gdb) bt
>  #0  0x00005558b842902d in object_get_class (obj=0x0) at qom/object.c:860
>  #1  0x00005558b84f92db in qio_channel_writev_full (ioc=0x0, iov=0x7ffc355decf0, niov=1, fds=0x0, nfds=0, errp=0x0) at io/channel.c:76
>  #2  0x00005558b84e0e9e in io_channel_send_full (ioc=0x0, buf=0x5558baf5beb0, len=138, fds=0x0, nfds=0) at chardev/char-io.c:123
>  #3  0x00005558b84e4a69 in tcp_chr_write (chr=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138) at chardev/char-socket.c:135
>  #4  0x00005558b84dca55 in qemu_chr_write_buffer (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, offset=0x7ffc355dedd0, write_all=false) at chardev/char.c:112
>  #5  0x00005558b84dcbc2 in qemu_chr_write (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, write_all=false) at chardev/char.c:147
>  #6  0x00005558b84dfb26 in qemu_chr_fe_write (be=0x5558ba476610, buf=0x5558baf5beb0 "...", len=138) at chardev/char-fe.c:42
>  #7  0x00005558b8088c86 in monitor_flush_locked (mon=0x5558ba476610) at monitor.c:406
>  #8  0x00005558b8088e8c in monitor_puts (mon=0x5558ba476610, str=0x5558ba921e49 "") at monitor.c:449
>  #9  0x00005558b8089178 in qmp_send_response (mon=0x5558ba476610, rsp=0x5558bb161600) at monitor.c:498
>  #10 0x00005558b808920c in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:526
>  #11 0x00005558b8089307 in monitor_qapi_event_queue_no_reenter (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:551
>  #12 0x00005558b80896c0 in qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:626
>  #13 0x00005558b855f23b in qapi_event_send_shutdown (guest=false, reason=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at qapi/qapi-events-run-state.c:43
>  #14 0x00005558b81911ef in qemu_system_shutdown (cause=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at vl.c:1837
>  #15 0x00005558b8191308 in main_loop_should_exit () at vl.c:1885
>  #16 0x00005558b819140d in main_loop () at vl.c:1924
>  #17 0x00005558b8198c84 in main (argc=18, argv=0x7ffc355df3f8, envp=0x7ffc355df490) at vl.c:4665
> 
> This patch adds a lock to protect tcp_chr_disconnect() and
> socket_reconnect_timeout()

Can you think of any way to test this in the unit tests ?  I can understand
if its too difficult but just curious if there's any viable option ?

> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 4fcdd8aedd..67c2eeac6d 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -172,7 +172,9 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
>  
>          if (ret < 0 && errno != EAGAIN) {
>              if (tcp_chr_read_poll(chr) <= 0) {
> +                qemu_mutex_unlock(&chr->chr_write_lock);
>                  tcp_chr_disconnect(chr);
> +                qemu_mutex_lock(&chr->chr_write_lock);

The idea of unlock & relocking here, just so tcp_chr_disconnect can
also acquire locks, doesn't make me too happy. This is really an
anti-pattern to me.

I think we should rename the existing method to be
tcp_chr_disconnect_locked() and document that it must only
be called with the write lock held, and call that from
here.

Then introduce a new wrapper for all the other callers to
carry on using which does the locking

 static void tcp_chr_disconnect(Chardev *chr)
 {
    qemu_mutex_lock(&chr->chr_write_lock);
    tcp_chr_disconnect_locked(chr);
    qemu_mutex_unlock(&chr->chr_write_lock);
 }
  
Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout()
  2019-02-22 12:16   ` Daniel P. Berrangé
@ 2019-02-22 12:32     ` Alberto Garcia
  0 siblings, 0 replies; 9+ messages in thread
From: Alberto Garcia @ 2019-02-22 12:32 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, Paolo Bonzini, Marc-André Lureau

On Fri 22 Feb 2019 01:16:57 PM CET, Daniel P. Berrangé wrote:
> On Fri, Feb 22, 2019 at 01:59:12PM +0200, Alberto Garcia wrote:
>> There's a race condition in which the tcp_chr_read() ioc handler can
>> close a connection that is being written to from another thread.
>> 
>> Running iotest 136 in a loop triggers this problem and crashes QEMU.
>> 
>>  (gdb) bt
>>  #0  0x00005558b842902d in object_get_class (obj=0x0) at qom/object.c:860
>>  #1  0x00005558b84f92db in qio_channel_writev_full (ioc=0x0, iov=0x7ffc355decf0, niov=1, fds=0x0, nfds=0, errp=0x0) at io/channel.c:76
>>  #2  0x00005558b84e0e9e in io_channel_send_full (ioc=0x0, buf=0x5558baf5beb0, len=138, fds=0x0, nfds=0) at chardev/char-io.c:123
>>  #3  0x00005558b84e4a69 in tcp_chr_write (chr=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138) at chardev/char-socket.c:135
>>  #4  0x00005558b84dca55 in qemu_chr_write_buffer (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, offset=0x7ffc355dedd0, write_all=false) at chardev/char.c:112
>>  #5  0x00005558b84dcbc2 in qemu_chr_write (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, write_all=false) at chardev/char.c:147
>>  #6  0x00005558b84dfb26 in qemu_chr_fe_write (be=0x5558ba476610, buf=0x5558baf5beb0 "...", len=138) at chardev/char-fe.c:42
>>  #7  0x00005558b8088c86 in monitor_flush_locked (mon=0x5558ba476610) at monitor.c:406
>>  #8  0x00005558b8088e8c in monitor_puts (mon=0x5558ba476610, str=0x5558ba921e49 "") at monitor.c:449
>>  #9  0x00005558b8089178 in qmp_send_response (mon=0x5558ba476610, rsp=0x5558bb161600) at monitor.c:498
>>  #10 0x00005558b808920c in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:526
>>  #11 0x00005558b8089307 in monitor_qapi_event_queue_no_reenter (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:551
>>  #12 0x00005558b80896c0 in qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:626
>>  #13 0x00005558b855f23b in qapi_event_send_shutdown (guest=false, reason=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at qapi/qapi-events-run-state.c:43
>>  #14 0x00005558b81911ef in qemu_system_shutdown (cause=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at vl.c:1837
>>  #15 0x00005558b8191308 in main_loop_should_exit () at vl.c:1885
>>  #16 0x00005558b819140d in main_loop () at vl.c:1924
>>  #17 0x00005558b8198c84 in main (argc=18, argv=0x7ffc355df3f8, envp=0x7ffc355df490) at vl.c:4665
>> 
>> This patch adds a lock to protect tcp_chr_disconnect() and
>> socket_reconnect_timeout()
>
> Can you think of any way to test this in the unit tests ?  I can
> understand if its too difficult but just curious if there's any viable
> option ?

I haven't thought about it. If anyone has a suggestion of how to
reproduce it we can add a new test later.

>>          if (ret < 0 && errno != EAGAIN) {
>>              if (tcp_chr_read_poll(chr) <= 0) {
>> +                qemu_mutex_unlock(&chr->chr_write_lock);
>>                  tcp_chr_disconnect(chr);
>> +                qemu_mutex_lock(&chr->chr_write_lock);
>
> The idea of unlock & relocking here, just so tcp_chr_disconnect can
> also acquire locks, doesn't make me too happy. This is really an
> anti-pattern to me.

I'm not too happy about it either. I'll send a new version with
tcp_chr_disconnect_locked() as you suggest.

Berto

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition
  2019-02-22 11:59 [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition Alberto Garcia
                   ` (2 preceding siblings ...)
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 3/3] char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout() Alberto Garcia
@ 2019-02-22 12:36 ` Paolo Bonzini
  3 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2019-02-22 12:36 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Daniel P . Berrangé, Marc-André Lureau

On 22/02/19 12:59, Alberto Garcia wrote:
> This fixes a race condition in which the tcp_chr_read() ioc handler
> can close a connection that is being written to from another thread.
> 
> This is essentially v1 rebased on top of the current master, after
> Daniel and Marc-André's chardev series have been merged.
> 
> Note: vhost-user-test still fails if QTEST_VHOST_USER_FIXME is set.

Thanks.  I'll look into it next (this week most of the time I had set
aside was eaten debugging other chardev issues... it's a rathole :)).

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-02-22 12:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-02-22 11:59 [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition Alberto Garcia
2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 1/3] main-loop: Fix GSource leak in qio_task_thread_worker() Alberto Garcia
2019-02-22 12:08   ` Daniel P. Berrangé
2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 2/3] main-loop: Add qemu_idle_add() Alberto Garcia
2019-02-22 12:10   ` Daniel P. Berrangé
2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 3/3] char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout() Alberto Garcia
2019-02-22 12:16   ` Daniel P. Berrangé
2019-02-22 12:32     ` Alberto Garcia
2019-02-22 12:36 ` [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).