* [PATCH 0/4] chardev: Fix issues found by vhost-user-test
@ 2025-05-15 22:20 Fabiano Rosas
2025-05-15 22:20 ` [PATCH 1/4] chardev: Fix QIOChannel refcount Fabiano Rosas
` (4 more replies)
0 siblings, 5 replies; 16+ messages in thread
From: Fabiano Rosas @ 2025-05-15 22:20 UTC (permalink / raw)
To: qemu-devel; +Cc: marcandre.lureau, berrange, pbonzini
Running vhost-user-test with ASAN on a loaded machine reveals several
intermittent issues. These show up every time I test the qtest tree so
I'm trying to get rid of them.
1- UAF of IOWatchPoll.
This one is self explanatory, ASAN caught it.
2- Reference counting issues in glib. It seems it's possible to unref
a source while adding a callback to it, and glib asserts. This
shows up on all architectures, only on the ASAN build after
hundreds of iterations.
3- Extra yank_unregister_function call leads to abort(). This shows up
on all architectures, but it's quite hidden due to vhost-user-test
using a dedicated server thread which dies and causes timeouts in
the test.
Manifests as assert(s->fds_num) failing. Only on the ASAN build,
after tens of iterations (quite common).
Thanks
Fabiano Rosas (4):
chardev: Fix QIOChannel refcount
chardev: Don't attempt to unregister yank function more than once
chardev: Consolidate yank registration
chardev: Introduce a lock for hup_source
chardev/char-io.c | 5 +++++
chardev/char-socket.c | 38 ++++++++++++--------------------------
chardev/char.c | 2 ++
include/chardev/char.h | 1 +
4 files changed, 20 insertions(+), 26 deletions(-)
--
2.35.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/4] chardev: Fix QIOChannel refcount
2025-05-15 22:20 [PATCH 0/4] chardev: Fix issues found by vhost-user-test Fabiano Rosas
@ 2025-05-15 22:20 ` Fabiano Rosas
2025-05-19 10:49 ` Daniel P. Berrangé
2025-05-15 22:20 ` [PATCH 2/4] chardev: Don't attempt to unregister yank function more than once Fabiano Rosas
` (3 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Fabiano Rosas @ 2025-05-15 22:20 UTC (permalink / raw)
To: qemu-devel; +Cc: marcandre.lureau, berrange, pbonzini
The IOWatchPoll holds a reference to the iochannel while the "child"
source (iwp->src) is removed from the context and freed. Freeing the
source leads to the iochannel being also freed at
qio_channel_fd_source_finalize().
Later, io_watch_poll_prepare() tries to create another source with the
same iochannel and hits an use after free:
==8241==ERROR: AddressSanitizer: heap-use-after-free on address 0x514000000040
READ of size 8 at 0x514000000040 thread T2
#0 0x561c2d272fcd in object_get_class ../qom/object.c:1043:17
#1 0x561c2d338f84 in QIO_CHANNEL_GET_CLASS include/io/channel.h:29:1
#2 0x561c2d33b26f in qio_channel_create_watch ../io/channel.c:388:30
#3 0x561c2d2f0993 in io_watch_poll_prepare ../chardev/char-io.c:65:20
...
0x514000000040 is located 0 bytes inside of 392-byte region [0x514000000040,0x5140000001c8)
freed by thread T2 here:
#0 0x561c2d2319a5 in free
#1 0x7fb2c0926638 in g_free
#2 0x561c2d276507 in object_finalize ../qom/object.c:734:9
#3 0x561c2d271d0d in object_unref ../qom/object.c:1231:9
#4 0x561c2d32ef1d in qio_channel_fd_source_finalize ../io/channel-watch.c:95:5
#5 0x7fb2c091d124 in g_source_unref_internal ../glib/gmain.c:2298
#6 0x561c2d2f0b6c in io_watch_poll_prepare ../chardev/char-io.c:71:9
...
previously allocated by thread T3 (connect) here:
#0 0x561c2d231c69 in malloc
#1 0x7fb2c0926518 in g_malloc
#2 0x561c2d27246e in object_new_with_type ../qom/object.c:767:15
#3 0x561c2d272530 in object_new ../qom/object.c:789:12
#4 0x561c2d320193 in qio_channel_socket_new ../io/channel-socket.c:64:31
#5 0x561c2d308013 in tcp_chr_connect_client_async ../chardev/char-socket.c:1181:12
#6 0x561c2d3002e7 in qmp_chardev_open_socket_client ../chardev/char-socket.c:1281:9
...
Fix the issue by incrementing the iochannel reference count when the
IOWatchPoll takes a reference and decrementing when it is finalized.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
chardev/char-io.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/chardev/char-io.c b/chardev/char-io.c
index 3be17b51ca..d9b11f335f 100644
--- a/chardev/char-io.c
+++ b/chardev/char-io.c
@@ -88,6 +88,9 @@ static gboolean io_watch_poll_dispatch(GSource *source, GSourceFunc callback,
static void io_watch_poll_finalize(GSource *source)
{
IOWatchPoll *iwp = io_watch_poll_from_source(source);
+
+ object_unref(OBJECT(iwp->ioc));
+
if (iwp->src) {
g_source_destroy(iwp->src);
g_source_unref(iwp->src);
@@ -117,6 +120,8 @@ GSource *io_add_watch_poll(Chardev *chr,
iwp->fd_can_read = fd_can_read;
iwp->opaque = user_data;
iwp->ioc = ioc;
+ object_ref(OBJECT(iwp->ioc));
+
iwp->fd_read = (GSourceFunc) fd_read;
iwp->src = NULL;
iwp->context = context;
--
2.35.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/4] chardev: Don't attempt to unregister yank function more than once
2025-05-15 22:20 [PATCH 0/4] chardev: Fix issues found by vhost-user-test Fabiano Rosas
2025-05-15 22:20 ` [PATCH 1/4] chardev: Fix QIOChannel refcount Fabiano Rosas
@ 2025-05-15 22:20 ` Fabiano Rosas
2025-05-19 10:52 ` Daniel P. Berrangé
2025-05-15 22:20 ` [PATCH 3/4] chardev: Consolidate yank registration Fabiano Rosas
` (2 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Fabiano Rosas @ 2025-05-15 22:20 UTC (permalink / raw)
To: qemu-devel; +Cc: marcandre.lureau, berrange, pbonzini
tcp_chr_free_connection() can be called multiple times in succession,
in which case the yank function will get as argument a NULL s->sioc
that has been cleared by the previous tcp_chr_free_connection() call.
This leads to an abort() at yank_unregister_function().
#0 __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 __GI_abort () at abort.c:79
#2 qtest_check_status (s=0x513000005600) at ../tests/qtest/libqtest.c:209
#3 qtest_wait_qemu (s=0x513000005600) at ../tests/qtest/libqtest.c:273
#4 qtest_kill_qemu (s=0x513000005600) at ../tests/qtest/libqtest.c:285
#5 kill_qemu_hook_func (s=0x513000005600) at ../tests/qtest/libqtest.c:294
#6 g_hook_list_invoke (hook_list=0x55ea9cc750c0 <abrt_hooks>, may_recurse=0) at ../glib/ghook.c:534
#7 sigabrt_handler (signo=6) at ../tests/qtest/libqtest.c:299
#8 <signal handler called>
#9 __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#10 __GI_abort () at abort.c:79
#11 yank_unregister_function (instance=0x7fb26f2ea9a0,
func=0x55ea9bcc0a10 <char_socket_yank_iochannel>, opaque=0x0) at
../util/yank.c:151
#12 tcp_chr_free_connection (chr=0x51300000ffc0) at ../chardev/char-socket.c:385
#13 tcp_chr_disconnect_locked (chr=0x51300000ffc0) at ../chardev/char-socket.c:477
#14 tcp_chr_disconnect (chr=0x51300000ffc0) at ../chardev/char-socket.c:495
#15 tcp_chr_hup (channel=0x514000000040, cond=G_IO_HUP, opaque=0x51300000ffc0) at ../chardev/char-socket.c:536
#16 qio_channel_fd_source_dispatch (source=0x50c0000b5fc0, callback=0x55ea9bcd6770 <tcp_chr_hup>,
user_data=0x51300000ffc0) at ../io/channel-watch.c:84
#17 g_main_dispatch (context=0x50f000000040) at ../glib/gmain.c:3381
#18 g_main_context_dispatch (context=context@entry=0x50f000000040) at ../glib/gmain.c:4099
#19 g_main_context_iterate (context=0x50f000000040, block=block@entry=1, dispatch=dispatch@entry=1,
self=<optimized out>) at ../glib/gmain.c:4175
#20 g_main_loop_run (loop=0x502000055690) at ../glib/gmain.c:4373
Commit ebae6477dc ("chardev: check if the chardev is registered for
yanking") seems to have encountered a similar issue, but checking
s->registered_yank is not a complete solution because that flag
pertains to the yank instance, not to each individual function.
Skip the yank_unregister_function() in case s->sioc is already NULL,
which indicates the last yank function was already removed.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
Can tcp_chr_free_connection() race with itself? I'm assuming no.
Could we just make yank_unregister_instance() remove all yank
functions at once? Those asserts/abort in the yank code are a bit
masochistic.
---
chardev/char-socket.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index e8dd2931dc..8ae225d953 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -378,7 +378,8 @@ static void tcp_chr_free_connection(Chardev *chr)
tcp_set_msgfds(chr, NULL, 0);
remove_fd_in_watch(chr);
- if (s->registered_yank &&
+
+ if (s->registered_yank && s->sioc &&
(s->state == TCP_CHARDEV_STATE_CONNECTING
|| s->state == TCP_CHARDEV_STATE_CONNECTED)) {
yank_unregister_function(CHARDEV_YANK_INSTANCE(chr->label),
--
2.35.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/4] chardev: Consolidate yank registration
2025-05-15 22:20 [PATCH 0/4] chardev: Fix issues found by vhost-user-test Fabiano Rosas
2025-05-15 22:20 ` [PATCH 1/4] chardev: Fix QIOChannel refcount Fabiano Rosas
2025-05-15 22:20 ` [PATCH 2/4] chardev: Don't attempt to unregister yank function more than once Fabiano Rosas
@ 2025-05-15 22:20 ` Fabiano Rosas
2025-05-19 10:53 ` Daniel P. Berrangé
2025-05-15 22:20 ` [PATCH 4/4] chardev: Introduce a lock for hup_source Fabiano Rosas
2025-05-24 17:50 ` [PATCH 0/4] chardev: Fix issues found by vhost-user-test Marc-André Lureau
4 siblings, 1 reply; 16+ messages in thread
From: Fabiano Rosas @ 2025-05-15 22:20 UTC (permalink / raw)
To: qemu-devel; +Cc: marcandre.lureau, berrange, pbonzini
There's currently five places where the yank function is being
registered and they all come right before tcp_chr_new_client(). Fold
them into it.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
chardev/char-socket.c | 31 ++++++-------------------------
1 file changed, 6 insertions(+), 25 deletions(-)
diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 8ae225d953..d16608f1ed 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -904,6 +904,12 @@ static int tcp_chr_new_client(Chardev *chr, QIOChannelSocket *sioc)
s->sioc = sioc;
object_ref(OBJECT(sioc));
+ if (s->registered_yank) {
+ yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
+ char_socket_yank_iochannel,
+ QIO_CHANNEL(sioc));
+ }
+
qio_channel_set_blocking(s->ioc, false, NULL);
if (s->do_nodelay) {
@@ -944,11 +950,6 @@ static int tcp_chr_add_client(Chardev *chr, int fd)
}
tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
tcp_chr_set_client_ioc_name(chr, sioc);
- if (s->registered_yank) {
- yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
- char_socket_yank_iochannel,
- QIO_CHANNEL(sioc));
- }
ret = tcp_chr_new_client(chr, sioc);
object_unref(OBJECT(sioc));
return ret;
@@ -963,11 +964,6 @@ static void tcp_chr_accept(QIONetListener *listener,
tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
tcp_chr_set_client_ioc_name(chr, cioc);
- if (s->registered_yank) {
- yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
- char_socket_yank_iochannel,
- QIO_CHANNEL(cioc));
- }
tcp_chr_new_client(chr, cioc);
}
@@ -983,11 +979,6 @@ static int tcp_chr_connect_client_sync(Chardev *chr, Error **errp)
object_unref(OBJECT(sioc));
return -1;
}
- if (s->registered_yank) {
- yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
- char_socket_yank_iochannel,
- QIO_CHANNEL(sioc));
- }
tcp_chr_new_client(chr, sioc);
object_unref(OBJECT(sioc));
return 0;
@@ -1003,11 +994,6 @@ static void tcp_chr_accept_server_sync(Chardev *chr)
tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
sioc = qio_net_listener_wait_client(s->listener);
tcp_chr_set_client_ioc_name(chr, sioc);
- if (s->registered_yank) {
- yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
- char_socket_yank_iochannel,
- QIO_CHANNEL(sioc));
- }
tcp_chr_new_client(chr, sioc);
object_unref(OBJECT(sioc));
}
@@ -1181,11 +1167,6 @@ static void tcp_chr_connect_client_async(Chardev *chr)
tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
sioc = qio_channel_socket_new();
tcp_chr_set_client_ioc_name(chr, sioc);
- if (s->registered_yank) {
- yank_register_function(CHARDEV_YANK_INSTANCE(chr->label),
- char_socket_yank_iochannel,
- QIO_CHANNEL(sioc));
- }
/*
* Normally code would use the qio_channel_socket_connect_async
* method which uses a QIOTask + qio_task_set_error internally
--
2.35.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/4] chardev: Introduce a lock for hup_source
2025-05-15 22:20 [PATCH 0/4] chardev: Fix issues found by vhost-user-test Fabiano Rosas
` (2 preceding siblings ...)
2025-05-15 22:20 ` [PATCH 3/4] chardev: Consolidate yank registration Fabiano Rosas
@ 2025-05-15 22:20 ` Fabiano Rosas
2025-05-19 11:00 ` Daniel P. Berrangé
2025-05-24 17:50 ` [PATCH 0/4] chardev: Fix issues found by vhost-user-test Marc-André Lureau
4 siblings, 1 reply; 16+ messages in thread
From: Fabiano Rosas @ 2025-05-15 22:20 UTC (permalink / raw)
To: qemu-devel; +Cc: marcandre.lureau, berrange, pbonzini
It's possible for the hup_source to have its reference decremented by
remove_hup_source() while it's still being added to the context,
leading to asserts in glib:
g_source_set_callback_indirect: assertion 'g_atomic_int_get
(&source->ref_count) > 0'
g_source_attach: assertion 'g_atomic_int_get (&source->ref_count) > 0'
failed
Add a lock to serialize removal and creation.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
chardev/char-socket.c | 4 ++++
chardev/char.c | 2 ++
include/chardev/char.h | 1 +
3 files changed, 7 insertions(+)
diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index d16608f1ed..88db9acd0d 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -374,7 +374,9 @@ static void tcp_chr_free_connection(Chardev *chr)
s->read_msgfds_num = 0;
}
+ qemu_mutex_lock(&chr->hup_source_lock);
remove_hup_source(s);
+ qemu_mutex_unlock(&chr->hup_source_lock);
tcp_set_msgfds(chr, NULL, 0);
remove_fd_in_watch(chr);
@@ -613,6 +615,7 @@ static void update_ioc_handlers(SocketChardev *s)
tcp_chr_read, chr,
chr->gcontext);
+ qemu_mutex_lock(&chr->hup_source_lock);
remove_hup_source(s);
s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP);
/*
@@ -634,6 +637,7 @@ static void update_ioc_handlers(SocketChardev *s)
g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup,
chr, NULL);
g_source_attach(s->hup_source, chr->gcontext);
+ qemu_mutex_unlock(&chr->hup_source_lock);
}
static void tcp_chr_connect(void *opaque)
diff --git a/chardev/char.c b/chardev/char.c
index bbebd246c3..d03f698b38 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -279,6 +279,7 @@ static void char_init(Object *obj)
chr->handover_yank_instance = false;
chr->logfd = -1;
qemu_mutex_init(&chr->chr_write_lock);
+ qemu_mutex_init(&chr->hup_source_lock);
/*
* Assume if chr_update_read_handler is implemented it will
@@ -316,6 +317,7 @@ static void char_finalize(Object *obj)
close(chr->logfd);
}
qemu_mutex_destroy(&chr->chr_write_lock);
+ qemu_mutex_destroy(&chr->hup_source_lock);
}
static const TypeInfo char_type_info = {
diff --git a/include/chardev/char.h b/include/chardev/char.h
index 429852f8d9..064184153d 100644
--- a/include/chardev/char.h
+++ b/include/chardev/char.h
@@ -60,6 +60,7 @@ struct Chardev {
Object parent_obj;
QemuMutex chr_write_lock;
+ QemuMutex hup_source_lock;
CharBackend *be;
char *label;
char *filename;
--
2.35.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/4] chardev: Fix QIOChannel refcount
2025-05-15 22:20 ` [PATCH 1/4] chardev: Fix QIOChannel refcount Fabiano Rosas
@ 2025-05-19 10:49 ` Daniel P. Berrangé
0 siblings, 0 replies; 16+ messages in thread
From: Daniel P. Berrangé @ 2025-05-19 10:49 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, marcandre.lureau, pbonzini
On Thu, May 15, 2025 at 07:20:11PM -0300, Fabiano Rosas wrote:
> The IOWatchPoll holds a reference to the iochannel while the "child"
> source (iwp->src) is removed from the context and freed. Freeing the
> source leads to the iochannel being also freed at
> qio_channel_fd_source_finalize().
>
> Later, io_watch_poll_prepare() tries to create another source with the
> same iochannel and hits an use after free:
>
> ==8241==ERROR: AddressSanitizer: heap-use-after-free on address 0x514000000040
> READ of size 8 at 0x514000000040 thread T2
> #0 0x561c2d272fcd in object_get_class ../qom/object.c:1043:17
> #1 0x561c2d338f84 in QIO_CHANNEL_GET_CLASS include/io/channel.h:29:1
> #2 0x561c2d33b26f in qio_channel_create_watch ../io/channel.c:388:30
> #3 0x561c2d2f0993 in io_watch_poll_prepare ../chardev/char-io.c:65:20
> ...
>
> 0x514000000040 is located 0 bytes inside of 392-byte region [0x514000000040,0x5140000001c8)
> freed by thread T2 here:
> #0 0x561c2d2319a5 in free
> #1 0x7fb2c0926638 in g_free
> #2 0x561c2d276507 in object_finalize ../qom/object.c:734:9
> #3 0x561c2d271d0d in object_unref ../qom/object.c:1231:9
> #4 0x561c2d32ef1d in qio_channel_fd_source_finalize ../io/channel-watch.c:95:5
> #5 0x7fb2c091d124 in g_source_unref_internal ../glib/gmain.c:2298
> #6 0x561c2d2f0b6c in io_watch_poll_prepare ../chardev/char-io.c:71:9
> ...
>
> previously allocated by thread T3 (connect) here:
> #0 0x561c2d231c69 in malloc
> #1 0x7fb2c0926518 in g_malloc
> #2 0x561c2d27246e in object_new_with_type ../qom/object.c:767:15
> #3 0x561c2d272530 in object_new ../qom/object.c:789:12
> #4 0x561c2d320193 in qio_channel_socket_new ../io/channel-socket.c:64:31
> #5 0x561c2d308013 in tcp_chr_connect_client_async ../chardev/char-socket.c:1181:12
> #6 0x561c2d3002e7 in qmp_chardev_open_socket_client ../chardev/char-socket.c:1281:9
> ...
>
> Fix the issue by incrementing the iochannel reference count when the
> IOWatchPoll takes a reference and decrementing when it is finalized.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> chardev/char-io.c | 5 +++++
> 1 file changed, 5 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/4] chardev: Don't attempt to unregister yank function more than once
2025-05-15 22:20 ` [PATCH 2/4] chardev: Don't attempt to unregister yank function more than once Fabiano Rosas
@ 2025-05-19 10:52 ` Daniel P. Berrangé
0 siblings, 0 replies; 16+ messages in thread
From: Daniel P. Berrangé @ 2025-05-19 10:52 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, marcandre.lureau, pbonzini
On Thu, May 15, 2025 at 07:20:12PM -0300, Fabiano Rosas wrote:
> tcp_chr_free_connection() can be called multiple times in succession,
> in which case the yank function will get as argument a NULL s->sioc
> that has been cleared by the previous tcp_chr_free_connection() call.
>
> This leads to an abort() at yank_unregister_function().
>
> #0 __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1 __GI_abort () at abort.c:79
> #2 qtest_check_status (s=0x513000005600) at ../tests/qtest/libqtest.c:209
> #3 qtest_wait_qemu (s=0x513000005600) at ../tests/qtest/libqtest.c:273
> #4 qtest_kill_qemu (s=0x513000005600) at ../tests/qtest/libqtest.c:285
> #5 kill_qemu_hook_func (s=0x513000005600) at ../tests/qtest/libqtest.c:294
> #6 g_hook_list_invoke (hook_list=0x55ea9cc750c0 <abrt_hooks>, may_recurse=0) at ../glib/ghook.c:534
> #7 sigabrt_handler (signo=6) at ../tests/qtest/libqtest.c:299
> #8 <signal handler called>
> #9 __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #10 __GI_abort () at abort.c:79
> #11 yank_unregister_function (instance=0x7fb26f2ea9a0,
> func=0x55ea9bcc0a10 <char_socket_yank_iochannel>, opaque=0x0) at
> ../util/yank.c:151
> #12 tcp_chr_free_connection (chr=0x51300000ffc0) at ../chardev/char-socket.c:385
> #13 tcp_chr_disconnect_locked (chr=0x51300000ffc0) at ../chardev/char-socket.c:477
> #14 tcp_chr_disconnect (chr=0x51300000ffc0) at ../chardev/char-socket.c:495
> #15 tcp_chr_hup (channel=0x514000000040, cond=G_IO_HUP, opaque=0x51300000ffc0) at ../chardev/char-socket.c:536
> #16 qio_channel_fd_source_dispatch (source=0x50c0000b5fc0, callback=0x55ea9bcd6770 <tcp_chr_hup>,
> user_data=0x51300000ffc0) at ../io/channel-watch.c:84
> #17 g_main_dispatch (context=0x50f000000040) at ../glib/gmain.c:3381
> #18 g_main_context_dispatch (context=context@entry=0x50f000000040) at ../glib/gmain.c:4099
> #19 g_main_context_iterate (context=0x50f000000040, block=block@entry=1, dispatch=dispatch@entry=1,
> self=<optimized out>) at ../glib/gmain.c:4175
> #20 g_main_loop_run (loop=0x502000055690) at ../glib/gmain.c:4373
>
> Commit ebae6477dc ("chardev: check if the chardev is registered for
> yanking") seems to have encountered a similar issue, but checking
> s->registered_yank is not a complete solution because that flag
> pertains to the yank instance, not to each individual function.
>
> Skip the yank_unregister_function() in case s->sioc is already NULL,
> which indicates the last yank function was already removed.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> Can tcp_chr_free_connection() race with itself? I'm assuming no.
>
> Could we just make yank_unregister_instance() remove all yank
> functions at once? Those asserts/abort in the yank code are a bit
> masochistic.
> ---
> chardev/char-socket.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/4] chardev: Consolidate yank registration
2025-05-15 22:20 ` [PATCH 3/4] chardev: Consolidate yank registration Fabiano Rosas
@ 2025-05-19 10:53 ` Daniel P. Berrangé
0 siblings, 0 replies; 16+ messages in thread
From: Daniel P. Berrangé @ 2025-05-19 10:53 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, marcandre.lureau, pbonzini
On Thu, May 15, 2025 at 07:20:13PM -0300, Fabiano Rosas wrote:
> There's currently five places where the yank function is being
> registered and they all come right before tcp_chr_new_client(). Fold
> them into it.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> chardev/char-socket.c | 31 ++++++-------------------------
> 1 file changed, 6 insertions(+), 25 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/4] chardev: Introduce a lock for hup_source
2025-05-15 22:20 ` [PATCH 4/4] chardev: Introduce a lock for hup_source Fabiano Rosas
@ 2025-05-19 11:00 ` Daniel P. Berrangé
2025-05-19 14:21 ` Fabiano Rosas
2026-02-28 15:16 ` Peter Maydell
0 siblings, 2 replies; 16+ messages in thread
From: Daniel P. Berrangé @ 2025-05-19 11:00 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, marcandre.lureau, pbonzini
On Thu, May 15, 2025 at 07:20:14PM -0300, Fabiano Rosas wrote:
> It's possible for the hup_source to have its reference decremented by
> remove_hup_source() while it's still being added to the context,
> leading to asserts in glib:
IIUC this must mean that
tcp_chr_free_connection
is being called concurrently with
update_ioc_handlers
I'm wondering if that is really intended, or a sign of a deeper
bug that we'll just paper over if we add the mutex proposed here.
Are you able to provide stack traces showing the 2 concurrent
operations that are triggering this problem ?
>
> g_source_set_callback_indirect: assertion 'g_atomic_int_get
> (&source->ref_count) > 0'
>
> g_source_attach: assertion 'g_atomic_int_get (&source->ref_count) > 0'
> failed
>
> Add a lock to serialize removal and creation.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
> chardev/char-socket.c | 4 ++++
> chardev/char.c | 2 ++
> include/chardev/char.h | 1 +
> 3 files changed, 7 insertions(+)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index d16608f1ed..88db9acd0d 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -374,7 +374,9 @@ static void tcp_chr_free_connection(Chardev *chr)
> s->read_msgfds_num = 0;
> }
>
> + qemu_mutex_lock(&chr->hup_source_lock);
> remove_hup_source(s);
> + qemu_mutex_unlock(&chr->hup_source_lock);
>
> tcp_set_msgfds(chr, NULL, 0);
> remove_fd_in_watch(chr);
> @@ -613,6 +615,7 @@ static void update_ioc_handlers(SocketChardev *s)
> tcp_chr_read, chr,
> chr->gcontext);
>
> + qemu_mutex_lock(&chr->hup_source_lock);
> remove_hup_source(s);
> s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP);
> /*
> @@ -634,6 +637,7 @@ static void update_ioc_handlers(SocketChardev *s)
> g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup,
> chr, NULL);
> g_source_attach(s->hup_source, chr->gcontext);
> + qemu_mutex_unlock(&chr->hup_source_lock);
> }
>
> static void tcp_chr_connect(void *opaque)
> diff --git a/chardev/char.c b/chardev/char.c
> index bbebd246c3..d03f698b38 100644
> --- a/chardev/char.c
> +++ b/chardev/char.c
> @@ -279,6 +279,7 @@ static void char_init(Object *obj)
> chr->handover_yank_instance = false;
> chr->logfd = -1;
> qemu_mutex_init(&chr->chr_write_lock);
> + qemu_mutex_init(&chr->hup_source_lock);
>
> /*
> * Assume if chr_update_read_handler is implemented it will
> @@ -316,6 +317,7 @@ static void char_finalize(Object *obj)
> close(chr->logfd);
> }
> qemu_mutex_destroy(&chr->chr_write_lock);
> + qemu_mutex_destroy(&chr->hup_source_lock);
> }
>
> static const TypeInfo char_type_info = {
> diff --git a/include/chardev/char.h b/include/chardev/char.h
> index 429852f8d9..064184153d 100644
> --- a/include/chardev/char.h
> +++ b/include/chardev/char.h
> @@ -60,6 +60,7 @@ struct Chardev {
> Object parent_obj;
>
> QemuMutex chr_write_lock;
> + QemuMutex hup_source_lock;
> CharBackend *be;
> char *label;
> char *filename;
> --
> 2.35.3
>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/4] chardev: Introduce a lock for hup_source
2025-05-19 11:00 ` Daniel P. Berrangé
@ 2025-05-19 14:21 ` Fabiano Rosas
2026-02-28 15:16 ` Peter Maydell
1 sibling, 0 replies; 16+ messages in thread
From: Fabiano Rosas @ 2025-05-19 14:21 UTC (permalink / raw)
To: Daniel P. Berrangé; +Cc: qemu-devel, marcandre.lureau, pbonzini
Daniel P. Berrangé <berrange@redhat.com> writes:
> On Thu, May 15, 2025 at 07:20:14PM -0300, Fabiano Rosas wrote:
>> It's possible for the hup_source to have its reference decremented by
>> remove_hup_source() while it's still being added to the context,
>> leading to asserts in glib:
>
> IIUC this must mean that
>
> tcp_chr_free_connection
>
> is being called concurrently with
>
> update_ioc_handlers
>
> I'm wondering if that is really intended, or a sign of a deeper
> bug that we'll just paper over if we add the mutex proposed here.
>
Yeah... I can't tell, I'm new to this code. But I agree that this smells
of a bug somewhere else.
> Are you able to provide stack traces showing the 2 concurrent
> operations that are triggering this problem ?
>
I wasn't able to, it triggers in the glib subprocess which is a pain to
debug. I'll give it another try now that there's fixes for the other
bugs.
>>
>> g_source_set_callback_indirect: assertion 'g_atomic_int_get
>> (&source->ref_count) > 0'
>>
>> g_source_attach: assertion 'g_atomic_int_get (&source->ref_count) > 0'
>> failed
>>
>> Add a lock to serialize removal and creation.
>>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>> chardev/char-socket.c | 4 ++++
>> chardev/char.c | 2 ++
>> include/chardev/char.h | 1 +
>> 3 files changed, 7 insertions(+)
>>
>> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
>> index d16608f1ed..88db9acd0d 100644
>> --- a/chardev/char-socket.c
>> +++ b/chardev/char-socket.c
>> @@ -374,7 +374,9 @@ static void tcp_chr_free_connection(Chardev *chr)
>> s->read_msgfds_num = 0;
>> }
>>
>> + qemu_mutex_lock(&chr->hup_source_lock);
>> remove_hup_source(s);
>> + qemu_mutex_unlock(&chr->hup_source_lock);
>>
>> tcp_set_msgfds(chr, NULL, 0);
>> remove_fd_in_watch(chr);
>> @@ -613,6 +615,7 @@ static void update_ioc_handlers(SocketChardev *s)
>> tcp_chr_read, chr,
>> chr->gcontext);
>>
>> + qemu_mutex_lock(&chr->hup_source_lock);
>> remove_hup_source(s);
>> s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP);
>> /*
>> @@ -634,6 +637,7 @@ static void update_ioc_handlers(SocketChardev *s)
>> g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup,
>> chr, NULL);
>> g_source_attach(s->hup_source, chr->gcontext);
>> + qemu_mutex_unlock(&chr->hup_source_lock);
>> }
>>
>> static void tcp_chr_connect(void *opaque)
>> diff --git a/chardev/char.c b/chardev/char.c
>> index bbebd246c3..d03f698b38 100644
>> --- a/chardev/char.c
>> +++ b/chardev/char.c
>> @@ -279,6 +279,7 @@ static void char_init(Object *obj)
>> chr->handover_yank_instance = false;
>> chr->logfd = -1;
>> qemu_mutex_init(&chr->chr_write_lock);
>> + qemu_mutex_init(&chr->hup_source_lock);
>>
>> /*
>> * Assume if chr_update_read_handler is implemented it will
>> @@ -316,6 +317,7 @@ static void char_finalize(Object *obj)
>> close(chr->logfd);
>> }
>> qemu_mutex_destroy(&chr->chr_write_lock);
>> + qemu_mutex_destroy(&chr->hup_source_lock);
>> }
>>
>> static const TypeInfo char_type_info = {
>> diff --git a/include/chardev/char.h b/include/chardev/char.h
>> index 429852f8d9..064184153d 100644
>> --- a/include/chardev/char.h
>> +++ b/include/chardev/char.h
>> @@ -60,6 +60,7 @@ struct Chardev {
>> Object parent_obj;
>>
>> QemuMutex chr_write_lock;
>> + QemuMutex hup_source_lock;
>> CharBackend *be;
>> char *label;
>> char *filename;
>> --
>> 2.35.3
>>
>
> With regards,
> Daniel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] chardev: Fix issues found by vhost-user-test
2025-05-15 22:20 [PATCH 0/4] chardev: Fix issues found by vhost-user-test Fabiano Rosas
` (3 preceding siblings ...)
2025-05-15 22:20 ` [PATCH 4/4] chardev: Introduce a lock for hup_source Fabiano Rosas
@ 2025-05-24 17:50 ` Marc-André Lureau
4 siblings, 0 replies; 16+ messages in thread
From: Marc-André Lureau @ 2025-05-24 17:50 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, berrange, pbonzini
Hi
On Fri, May 16, 2025 at 12:21 AM Fabiano Rosas <farosas@suse.de> wrote:
>
> Running vhost-user-test with ASAN on a loaded machine reveals several
> intermittent issues. These show up every time I test the qtest tree so
> I'm trying to get rid of them.
>
> 1- UAF of IOWatchPoll.
> This one is self explanatory, ASAN caught it.
>
> 2- Reference counting issues in glib. It seems it's possible to unref
> a source while adding a callback to it, and glib asserts. This
> shows up on all architectures, only on the ASAN build after
> hundreds of iterations.
>
> 3- Extra yank_unregister_function call leads to abort(). This shows up
> on all architectures, but it's quite hidden due to vhost-user-test
> using a dedicated server thread which dies and causes timeouts in
> the test.
>
> Manifests as assert(s->fds_num) failing. Only on the ASAN build,
> after tens of iterations (quite common).
>
> Thanks
>
> Fabiano Rosas (4):
> chardev: Fix QIOChannel refcount
> chardev: Don't attempt to unregister yank function more than once
> chardev: Consolidate yank registration
> chardev: Introduce a lock for hup_source
Daniel, would you take the first 3 (or all) patches in your next I/O PR ?
>
> chardev/char-io.c | 5 +++++
> chardev/char-socket.c | 38 ++++++++++++--------------------------
> chardev/char.c | 2 ++
> include/chardev/char.h | 1 +
> 4 files changed, 20 insertions(+), 26 deletions(-)
>
> --
> 2.35.3
>
>
--
Marc-André Lureau
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/4] chardev: Introduce a lock for hup_source
2025-05-19 11:00 ` Daniel P. Berrangé
2025-05-19 14:21 ` Fabiano Rosas
@ 2026-02-28 15:16 ` Peter Maydell
2026-03-04 10:34 ` Daniel P. Berrangé
1 sibling, 1 reply; 16+ messages in thread
From: Peter Maydell @ 2026-02-28 15:16 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Fabiano Rosas, qemu-devel, marcandre.lureau, pbonzini
On Mon, 19 May 2025 at 12:00, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Thu, May 15, 2025 at 07:20:14PM -0300, Fabiano Rosas wrote:
> > It's possible for the hup_source to have its reference decremented by
> > remove_hup_source() while it's still being added to the context,
> > leading to asserts in glib:
>
> IIUC this must mean that
>
> tcp_chr_free_connection
>
> is being called concurrently with
>
> update_ioc_handlers
>
> I'm wondering if that is really intended, or a sign of a deeper
> bug that we'll just paper over if we add the mutex proposed here.
>
> Are you able to provide stack traces showing the 2 concurrent
> operations that are triggering this problem ?
(Pulling up this thread from last year since I ran into something
similar running the sanitizers, and Fabiano pointed me at this series
he'd sent that fixes most/all of the problems.)
I had a look at this, and although I couldn't get a specific
backtrace, I got enough information from asan leak backtraces
and looking at the code that I think I understand where the race
is here:
vhost-user-test.c:test_server_create_chr() sets up a chardev with:
qemu_chr_new()
qemu_chr_fe_init()
qemu_chr_fe_set_handlers()
For the "connect-fail" and "flags-mismatch" tests, it includes in
the options to the chardev "reconnect-ms=1000".
qemu_chr_new() ends up in tcp_chr_open(), which calls
qmp_chardev_open_socket_client().
If reconnect_ms is not 0, qmp_chardev_open_socket_client() calls
tcp_chr_connect_client_async() instead of doing the connect synchronously.
That kicks off a qio_task_new() to do the connecting, and immediately
returns, so we return successfully from qemu_chr_new(). So
test_server_create_chr() will continue execution into qemu_chr_fe_init()
and qemu_chr_fe_set_handlers().
qemu_chr_fe_set_handlers() calls qemu_chr_be_update_read_handlers(), which
calls the chr_update_read_handler method for the ChardevClass, which here
is tcp_chr_update_read_handler(). That function calls update_ioc_handlers().
Meanwhile, the thread that got kicked off to do the connect is
running in parallel. If the connect succeeds then qemu_chr_socket_conneted()
will call tcp_chr_connect(), which will also call
update_ioc_handlers(). (If the connect fails then presumably we end
up in the tcp_char_disconnect or otherwise destroying the chardev,
though I haven't figured out exactly what happens in this case.)
None of this appears to be taking any kind of lock while it modifies
the SocketChardev fields, so we could have two threads in
update_ioc_handlers() simultaneously.
How is this intended to work ? Is the test case code incorrect to assume
it can work with the chardev it got back from qemu_chr_new(), or should
the chardev be handling this case?
thanks
-- PMM
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/4] chardev: Introduce a lock for hup_source
2026-02-28 15:16 ` Peter Maydell
@ 2026-03-04 10:34 ` Daniel P. Berrangé
2026-03-06 13:57 ` Peter Maydell
0 siblings, 1 reply; 16+ messages in thread
From: Daniel P. Berrangé @ 2026-03-04 10:34 UTC (permalink / raw)
To: Peter Maydell; +Cc: Fabiano Rosas, qemu-devel, marcandre.lureau, pbonzini
On Sat, Feb 28, 2026 at 03:16:11PM +0000, Peter Maydell wrote:
> On Mon, 19 May 2025 at 12:00, Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Thu, May 15, 2025 at 07:20:14PM -0300, Fabiano Rosas wrote:
> > > It's possible for the hup_source to have its reference decremented by
> > > remove_hup_source() while it's still being added to the context,
> > > leading to asserts in glib:
> >
> > IIUC this must mean that
> >
> > tcp_chr_free_connection
> >
> > is being called concurrently with
> >
> > update_ioc_handlers
> >
> > I'm wondering if that is really intended, or a sign of a deeper
> > bug that we'll just paper over if we add the mutex proposed here.
> >
> > Are you able to provide stack traces showing the 2 concurrent
> > operations that are triggering this problem ?
>
> (Pulling up this thread from last year since I ran into something
> similar running the sanitizers, and Fabiano pointed me at this series
> he'd sent that fixes most/all of the problems.)
>
> I had a look at this, and although I couldn't get a specific
> backtrace, I got enough information from asan leak backtraces
> and looking at the code that I think I understand where the race
> is here:
>
> vhost-user-test.c:test_server_create_chr() sets up a chardev with:
> qemu_chr_new()
> qemu_chr_fe_init()
> qemu_chr_fe_set_handlers()
>
> For the "connect-fail" and "flags-mismatch" tests, it includes in
> the options to the chardev "reconnect-ms=1000".
>
> qemu_chr_new() ends up in tcp_chr_open(), which calls
> qmp_chardev_open_socket_client().
>
> If reconnect_ms is not 0, qmp_chardev_open_socket_client() calls
> tcp_chr_connect_client_async() instead of doing the connect synchronously.
> That kicks off a qio_task_new() to do the connecting, and immediately
> returns, so we return successfully from qemu_chr_new(). So
> test_server_create_chr() will continue execution into qemu_chr_fe_init()
> and qemu_chr_fe_set_handlers().
>
> qemu_chr_fe_set_handlers() calls qemu_chr_be_update_read_handlers(), which
> calls the chr_update_read_handler method for the ChardevClass, which here
> is tcp_chr_update_read_handler(). That function calls update_ioc_handlers().
>
> Meanwhile, the thread that got kicked off to do the connect is
> running in parallel. If the connect succeeds then qemu_chr_socket_conneted()
> will call tcp_chr_connect(), which will also call
> update_ioc_handlers(). (If the connect fails then presumably we end
> up in the tcp_char_disconnect or otherwise destroying the chardev,
> though I haven't figured out exactly what happens in this case.)
>
> None of this appears to be taking any kind of lock while it modifies
> the SocketChardev fields, so we could have two threads in
> update_ioc_handlers() simultaneously.
It is subtle, as the QIOTask object deals with 2 separate callbacks.
The callback that is provided to qio_task_new() is what is invoked when
the task is marked as completed. This is guaranteed to always be
invoked in main event loop context.
The callback that is provided to qio_task_run_in_thread() is what runs
in the throwaway background thread. When this callback returns, the
QIOTask creates a glib idle source in the main event loop to run the
completion callback.
So wrt char-socket.c, tcp_chr_connect_client_task will run in the
background thread, but it only calls QIOChannel APIs so is safe.
The qemu_chr_socket_connected method will run in main event loop
context, so it does not need locking wrt other chardev I/O
callbacks.
> How is this intended to work ? Is the test case code incorrect to assume
> it can work with the chardev it got back from qemu_chr_new(), or should
> the chardev be handling this case?
The test is really rather had to understand, but I feel like something
in the test ought to be looking at the CHR_EVENT_OPENED event to
identify when the connection is actually established ?
With regards,
Daniel
--
|: https://berrange.com ~~ https://hachyderm.io/@berrange :|
|: https://libvirt.org ~~ https://entangle-photo.org :|
|: https://pixelfed.art/berrange ~~ https://fstop138.berrange.com :|
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/4] chardev: Introduce a lock for hup_source
2026-03-04 10:34 ` Daniel P. Berrangé
@ 2026-03-06 13:57 ` Peter Maydell
2026-03-06 14:49 ` Daniel P. Berrangé
0 siblings, 1 reply; 16+ messages in thread
From: Peter Maydell @ 2026-03-06 13:57 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Fabiano Rosas, qemu-devel, marcandre.lureau, pbonzini
On Wed, 4 Mar 2026 at 10:34, Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Sat, Feb 28, 2026 at 03:16:11PM +0000, Peter Maydell wrote:
> > I had a look at this, and although I couldn't get a specific
> > backtrace, I got enough information from asan leak backtraces
> > and looking at the code that I think I understand where the race
> > is here:
> >
> > vhost-user-test.c:test_server_create_chr() sets up a chardev with:
> > qemu_chr_new()
> > qemu_chr_fe_init()
> > qemu_chr_fe_set_handlers()
> >
> > For the "connect-fail" and "flags-mismatch" tests, it includes in
> > the options to the chardev "reconnect-ms=1000".
> >
> > qemu_chr_new() ends up in tcp_chr_open(), which calls
> > qmp_chardev_open_socket_client().
> >
> > If reconnect_ms is not 0, qmp_chardev_open_socket_client() calls
> > tcp_chr_connect_client_async() instead of doing the connect synchronously.
> > That kicks off a qio_task_new() to do the connecting, and immediately
> > returns, so we return successfully from qemu_chr_new(). So
> > test_server_create_chr() will continue execution into qemu_chr_fe_init()
> > and qemu_chr_fe_set_handlers().
> >
> > qemu_chr_fe_set_handlers() calls qemu_chr_be_update_read_handlers(), which
> > calls the chr_update_read_handler method for the ChardevClass, which here
> > is tcp_chr_update_read_handler(). That function calls update_ioc_handlers().
> >
> > Meanwhile, the thread that got kicked off to do the connect is
> > running in parallel. If the connect succeeds then qemu_chr_socket_conneted()
> > will call tcp_chr_connect(), which will also call
> > update_ioc_handlers(). (If the connect fails then presumably we end
> > up in the tcp_char_disconnect or otherwise destroying the chardev,
> > though I haven't figured out exactly what happens in this case.)
> >
> > None of this appears to be taking any kind of lock while it modifies
> > the SocketChardev fields, so we could have two threads in
> > update_ioc_handlers() simultaneously.
>
> It is subtle, as the QIOTask object deals with 2 separate callbacks.
>
> The callback that is provided to qio_task_new() is what is invoked when
> the task is marked as completed. This is guaranteed to always be
> invoked in main event loop context.
>
> The callback that is provided to qio_task_run_in_thread() is what runs
> in the throwaway background thread. When this callback returns, the
> QIOTask creates a glib idle source in the main event loop to run the
> completion callback.
>
>
> So wrt char-socket.c, tcp_chr_connect_client_task will run in the
> background thread, but it only calls QIOChannel APIs so is safe.
>
> The qemu_chr_socket_connected method will run in main event loop
> context, so it does not need locking wrt other chardev I/O
> callbacks.
This is starting to sound like the problem is the test code. The
test code is calling qemu_chr_new() and qemu_chr_fe_set_handlers()
from a random thread it created by directly calling g_thread_new().
So these calls from this thread do not have any kind of lock and
can run in parallel with the qemu_chr_socket_connected method from
the main event loop.
I guess we need to take a lock here, but which one?
> > How is this intended to work ? Is the test case code incorrect to assume
> > it can work with the chardev it got back from qemu_chr_new(), or should
> > the chardev be handling this case?
>
> The test is really rather had to understand, but I feel like something
> in the test ought to be looking at the CHR_EVENT_OPENED event to
> identify when the connection is actually established ?
The way to get a CHR_EVENT_OPENED would be to set the frontend
handlers with qemu_chr_fe_set_handlers(). It's the call to set
the frontend handlers that is already going wrong. (But the test
doesn't really need to care about the EVENT_OPENED: it can
just wait til it gets the "you have some data to read" callback.)
thanks
-- PMM
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/4] chardev: Introduce a lock for hup_source
2026-03-06 13:57 ` Peter Maydell
@ 2026-03-06 14:49 ` Daniel P. Berrangé
2026-03-06 14:55 ` Peter Maydell
0 siblings, 1 reply; 16+ messages in thread
From: Daniel P. Berrangé @ 2026-03-06 14:49 UTC (permalink / raw)
To: Peter Maydell; +Cc: Fabiano Rosas, qemu-devel, marcandre.lureau, pbonzini
On Fri, Mar 06, 2026 at 01:57:45PM +0000, Peter Maydell wrote:
> On Wed, 4 Mar 2026 at 10:34, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > On Sat, Feb 28, 2026 at 03:16:11PM +0000, Peter Maydell wrote:
> > > I had a look at this, and although I couldn't get a specific
> > > backtrace, I got enough information from asan leak backtraces
> > > and looking at the code that I think I understand where the race
> > > is here:
> > >
> > > vhost-user-test.c:test_server_create_chr() sets up a chardev with:
> > > qemu_chr_new()
> > > qemu_chr_fe_init()
> > > qemu_chr_fe_set_handlers()
> > >
> > > For the "connect-fail" and "flags-mismatch" tests, it includes in
> > > the options to the chardev "reconnect-ms=1000".
> > >
> > > qemu_chr_new() ends up in tcp_chr_open(), which calls
> > > qmp_chardev_open_socket_client().
> > >
> > > If reconnect_ms is not 0, qmp_chardev_open_socket_client() calls
> > > tcp_chr_connect_client_async() instead of doing the connect synchronously.
> > > That kicks off a qio_task_new() to do the connecting, and immediately
> > > returns, so we return successfully from qemu_chr_new(). So
> > > test_server_create_chr() will continue execution into qemu_chr_fe_init()
> > > and qemu_chr_fe_set_handlers().
> > >
> > > qemu_chr_fe_set_handlers() calls qemu_chr_be_update_read_handlers(), which
> > > calls the chr_update_read_handler method for the ChardevClass, which here
> > > is tcp_chr_update_read_handler(). That function calls update_ioc_handlers().
> > >
> > > Meanwhile, the thread that got kicked off to do the connect is
> > > running in parallel. If the connect succeeds then qemu_chr_socket_conneted()
> > > will call tcp_chr_connect(), which will also call
> > > update_ioc_handlers(). (If the connect fails then presumably we end
> > > up in the tcp_char_disconnect or otherwise destroying the chardev,
> > > though I haven't figured out exactly what happens in this case.)
> > >
> > > None of this appears to be taking any kind of lock while it modifies
> > > the SocketChardev fields, so we could have two threads in
> > > update_ioc_handlers() simultaneously.
> >
> > It is subtle, as the QIOTask object deals with 2 separate callbacks.
> >
> > The callback that is provided to qio_task_new() is what is invoked when
> > the task is marked as completed. This is guaranteed to always be
> > invoked in main event loop context.
> >
> > The callback that is provided to qio_task_run_in_thread() is what runs
> > in the throwaway background thread. When this callback returns, the
> > QIOTask creates a glib idle source in the main event loop to run the
> > completion callback.
> >
> >
> > So wrt char-socket.c, tcp_chr_connect_client_task will run in the
> > background thread, but it only calls QIOChannel APIs so is safe.
> >
> > The qemu_chr_socket_connected method will run in main event loop
> > context, so it does not need locking wrt other chardev I/O
> > callbacks.
>
> This is starting to sound like the problem is the test code. The
> test code is calling qemu_chr_new() and qemu_chr_fe_set_handlers()
> from a random thread it created by directly calling g_thread_new().
> So these calls from this thread do not have any kind of lock and
> can run in parallel with the qemu_chr_socket_connected method from
> the main event loop.
>
> I guess we need to take a lock here, but which one?
>
> > > How is this intended to work ? Is the test case code incorrect to assume
> > > it can work with the chardev it got back from qemu_chr_new(), or should
> > > the chardev be handling this case?
> >
> > The test is really rather had to understand, but I feel like something
> > in the test ought to be looking at the CHR_EVENT_OPENED event to
> > identify when the connection is actually established ?
>
> The way to get a CHR_EVENT_OPENED would be to set the frontend
> handlers with qemu_chr_fe_set_handlers(). It's the call to set
> the frontend handlers that is already going wrong. (But the test
> doesn't really need to care about the EVENT_OPENED: it can
> just wait til it gets the "you have some data to read" callback.)
I'm rather feeling that the test program would be better off not
using chardevs at all, and instead using QIOChannelSocket directly.
The chardevs are horribly complex when trying to deal with
connection oriented semantics, as they are really not a connection
based API; the event stuff is a hacky afterthought that barely
gets us by.
With regards,
Daniel
--
|: https://berrange.com ~~ https://hachyderm.io/@berrange :|
|: https://libvirt.org ~~ https://entangle-photo.org :|
|: https://pixelfed.art/berrange ~~ https://fstop138.berrange.com :|
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/4] chardev: Introduce a lock for hup_source
2026-03-06 14:49 ` Daniel P. Berrangé
@ 2026-03-06 14:55 ` Peter Maydell
0 siblings, 0 replies; 16+ messages in thread
From: Peter Maydell @ 2026-03-06 14:55 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Fabiano Rosas, qemu-devel, marcandre.lureau, pbonzini
On Fri, 6 Mar 2026 at 14:49, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Fri, Mar 06, 2026 at 01:57:45PM +0000, Peter Maydell wrote:
> > On Wed, 4 Mar 2026 at 10:34, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > It is subtle, as the QIOTask object deals with 2 separate callbacks.
> > >
> > > The callback that is provided to qio_task_new() is what is invoked when
> > > the task is marked as completed. This is guaranteed to always be
> > > invoked in main event loop context.
> > >
> > > The callback that is provided to qio_task_run_in_thread() is what runs
> > > in the throwaway background thread. When this callback returns, the
> > > QIOTask creates a glib idle source in the main event loop to run the
> > > completion callback.
> > >
> > >
> > > So wrt char-socket.c, tcp_chr_connect_client_task will run in the
> > > background thread, but it only calls QIOChannel APIs so is safe.
> > >
> > > The qemu_chr_socket_connected method will run in main event loop
> > > context, so it does not need locking wrt other chardev I/O
> > > callbacks.
> >
> > This is starting to sound like the problem is the test code. The
> > test code is calling qemu_chr_new() and qemu_chr_fe_set_handlers()
> > from a random thread it created by directly calling g_thread_new().
> > So these calls from this thread do not have any kind of lock and
> > can run in parallel with the qemu_chr_socket_connected method from
> > the main event loop.
> >
> > I guess we need to take a lock here, but which one?
> >
> > > > How is this intended to work ? Is the test case code incorrect to assume
> > > > it can work with the chardev it got back from qemu_chr_new(), or should
> > > > the chardev be handling this case?
> > >
> > > The test is really rather had to understand, but I feel like something
> > > in the test ought to be looking at the CHR_EVENT_OPENED event to
> > > identify when the connection is actually established ?
> >
> > The way to get a CHR_EVENT_OPENED would be to set the frontend
> > handlers with qemu_chr_fe_set_handlers(). It's the call to set
> > the frontend handlers that is already going wrong. (But the test
> > doesn't really need to care about the EVENT_OPENED: it can
> > just wait til it gets the "you have some data to read" callback.)
>
> I'm rather feeling that the test program would be better off not
> using chardevs at all, and instead using QIOChannelSocket directly.
> The chardevs are horribly complex when trying to deal with
> connection oriented semantics, as they are really not a connection
> based API; the event stuff is a hacky afterthought that barely
> gets us by.
Perhaps. But that sounds like a lot more work than "identify what
lock one is supposed to be holding to validly call the qemu_chr*
functions and make sure we hold it in test_server_connect()" :-)
-- PMM
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-03-06 14:56 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-15 22:20 [PATCH 0/4] chardev: Fix issues found by vhost-user-test Fabiano Rosas
2025-05-15 22:20 ` [PATCH 1/4] chardev: Fix QIOChannel refcount Fabiano Rosas
2025-05-19 10:49 ` Daniel P. Berrangé
2025-05-15 22:20 ` [PATCH 2/4] chardev: Don't attempt to unregister yank function more than once Fabiano Rosas
2025-05-19 10:52 ` Daniel P. Berrangé
2025-05-15 22:20 ` [PATCH 3/4] chardev: Consolidate yank registration Fabiano Rosas
2025-05-19 10:53 ` Daniel P. Berrangé
2025-05-15 22:20 ` [PATCH 4/4] chardev: Introduce a lock for hup_source Fabiano Rosas
2025-05-19 11:00 ` Daniel P. Berrangé
2025-05-19 14:21 ` Fabiano Rosas
2026-02-28 15:16 ` Peter Maydell
2026-03-04 10:34 ` Daniel P. Berrangé
2026-03-06 13:57 ` Peter Maydell
2026-03-06 14:49 ` Daniel P. Berrangé
2026-03-06 14:55 ` Peter Maydell
2025-05-24 17:50 ` [PATCH 0/4] chardev: Fix issues found by vhost-user-test Marc-André Lureau
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.