* [PATCH 0/8] Fix deadlock with bdrv_open of self-served NBD
@ 2025-11-03 20:10 Eric Blake
2025-11-03 20:10 ` [PATCH 1/8] qio: Add trace points to net_listener Eric Blake
` (7 more replies)
0 siblings, 8 replies; 41+ messages in thread
From: Eric Blake @ 2025-11-03 20:10 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, berrange, kwolf
https://gitlab.com/qemu-project/qemu/-/issues/3169 was an excellent
bug report of a deadlock scenario when qemu opens both the server and
client end of an NBD intermediary in a qcow2 backing chain. It took
me quite a bit of time to understand why the deadlock was even
happening, and then some mental gymnastics on how best to break the
deadlock. Ideally - we'd get rid of all nested event loops, and have
the main loop do EVERYTHING event-driven. But that's a much bigger
code change to the entire code base of QEMU, so I went with the next
best thing of keeping the AioContext nested loop but ensuring that
accepting NBD clients is now properly part of the AioContext rather
than gated by only the main loop progress.
I have not researched how long the deadlock has been present, to
determine if it has ever worked before earlier refactorings when we
started using AioContext more heavily, or if it has always been broken
until now to try and connect QEMU as a client to a self-served NBD
server. But either way, I think this series should be part of the
10.2 release; I'm awfully close to soft freeze, but I think this
counts as a bug fix worth having even if it doesn't make it in before
-rc1.
Eric Blake (8):
qio: Add trace points to net_listener
qio: Minor optimization when callback function is unchanged
qio: Remember context of qio_net_listener_set_client_func_full
qio: Factor out helpers qio_net_listener_[un]watch
qio: Let listening sockets remember their owning QIONetListener
qio: Hoist ref of listener outside loop
qio: Use AioContext for default-context QIONetListener
iotests: Add coverage of recent NBD qio deadlock fix
include/io/channel-socket.h | 1 +
include/io/net-listener.h | 1 +
io/channel-socket.c | 1 +
io/net-listener.c | 136 +++++++++++-------
io/trace-events | 5 +
tests/qemu-iotests/tests/nbd-in-qcow2-chain | 84 +++++++++++
.../qemu-iotests/tests/nbd-in-qcow2-chain.out | 56 ++++++++
tests/qemu-iotests/tests/vvfat.out | 0
8 files changed, 236 insertions(+), 48 deletions(-)
create mode 100755 tests/qemu-iotests/tests/nbd-in-qcow2-chain
create mode 100644 tests/qemu-iotests/tests/nbd-in-qcow2-chain.out
mode change 100755 => 100644 tests/qemu-iotests/tests/vvfat.out
--
2.51.1
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 1/8] qio: Add trace points to net_listener
2025-11-03 20:10 [PATCH 0/8] Fix deadlock with bdrv_open of self-served NBD Eric Blake
@ 2025-11-03 20:10 ` Eric Blake
2025-11-04 10:43 ` Daniel P. Berrangé
2025-11-04 11:08 ` Kevin Wolf
2025-11-03 20:10 ` [PATCH 2/8] qio: Minor optimization when callback function is unchanged Eric Blake
` (6 subsequent siblings)
7 siblings, 2 replies; 41+ messages in thread
From: Eric Blake @ 2025-11-03 20:10 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, berrange, kwolf
Upcoming patches will adjust how net_listener watches for new client
connections; adding trace points now makes it easier to debug that the
changes work as intended. For example, adding
--trace='qio_net_listener*' to the qemu-storage-daemon command line
before --nbd-server will track when the server first starts listening
for clients.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
io/net-listener.c | 17 +++++++++++++++++
io/trace-events | 5 +++++
2 files changed, 22 insertions(+)
diff --git a/io/net-listener.c b/io/net-listener.c
index 47405965a66..0adbc409cf2 100644
--- a/io/net-listener.c
+++ b/io/net-listener.c
@@ -23,6 +23,7 @@
#include "io/dns-resolver.h"
#include "qapi/error.h"
#include "qemu/module.h"
+#include "trace.h"
QIONetListener *qio_net_listener_new(void)
{
@@ -50,6 +51,7 @@ static gboolean qio_net_listener_channel_func(QIOChannel *ioc,
return TRUE;
}
+ trace_qio_net_listener_callback(listener, listener->io_func);
if (listener->io_func) {
listener->io_func(listener, sioc, listener->io_data);
}
@@ -124,6 +126,8 @@ void qio_net_listener_add(QIONetListener *listener,
listener->connected = true;
if (listener->io_func != NULL) {
+ trace_qio_net_listener_watch_enabled(listener, listener->io_func,
+ "add");
object_ref(OBJECT(listener));
listener->io_source[listener->nsioc] = qio_channel_add_watch_source(
QIO_CHANNEL(listener->sioc[listener->nsioc]), G_IO_IN,
@@ -143,6 +147,9 @@ void qio_net_listener_set_client_func_full(QIONetListener *listener,
{
size_t i;
+ if (listener->io_func) {
+ trace_qio_net_listener_watch_disabled(listener, "set_client_func");
+ }
if (listener->io_notify) {
listener->io_notify(listener->io_data);
}
@@ -159,6 +166,8 @@ void qio_net_listener_set_client_func_full(QIONetListener *listener,
}
if (listener->io_func != NULL) {
+ trace_qio_net_listener_watch_enabled(listener, listener->io_func,
+ "set_client_func");
for (i = 0; i < listener->nsioc; i++) {
object_ref(OBJECT(listener));
listener->io_source[i] = qio_channel_add_watch_source(
@@ -218,6 +227,9 @@ QIOChannelSocket *qio_net_listener_wait_client(QIONetListener *listener)
};
size_t i;
+ if (listener->io_func) {
+ trace_qio_net_listener_watch_disabled(listener, "wait_client");
+ }
for (i = 0; i < listener->nsioc; i++) {
if (listener->io_source[i]) {
g_source_destroy(listener->io_source[i]);
@@ -248,6 +260,8 @@ QIOChannelSocket *qio_net_listener_wait_client(QIONetListener *listener)
g_main_context_unref(ctxt);
if (listener->io_func != NULL) {
+ trace_qio_net_listener_watch_enabled(listener, listener->io_func,
+ "wait_client");
for (i = 0; i < listener->nsioc; i++) {
object_ref(OBJECT(listener));
listener->io_source[i] = qio_channel_add_watch_source(
@@ -268,6 +282,9 @@ void qio_net_listener_disconnect(QIONetListener *listener)
return;
}
+ if (listener->io_func) {
+ trace_qio_net_listener_watch_disabled(listener, "disconnect");
+ }
for (i = 0; i < listener->nsioc; i++) {
if (listener->io_source[i]) {
g_source_destroy(listener->io_source[i]);
diff --git a/io/trace-events b/io/trace-events
index dc3a63ba1fb..8cc4cae3a5d 100644
--- a/io/trace-events
+++ b/io/trace-events
@@ -72,3 +72,8 @@ qio_channel_command_new_pid(void *ioc, int writefd, int readfd, int pid) "Comman
qio_channel_command_new_spawn(void *ioc, const char *binary, int flags) "Command new spawn ioc=%p binary=%s flags=%d"
qio_channel_command_abort(void *ioc, int pid) "Command abort ioc=%p pid=%d"
qio_channel_command_wait(void *ioc, int pid, int ret, int status) "Command abort ioc=%p pid=%d ret=%d status=%d"
+
+# net-listener.c
+qio_net_listener_watch_enabled(void *listener, void *func, const char *extra) "Net listener=%p watch enabled func=%p by %s"
+qio_net_listener_watch_disabled(void *listener, const char *extra) "Net listener=%p watch disabled by %s"
+qio_net_listener_callback(void *listener, void *func) "Net listener=%p callback forwarding to func=%p"
--
2.51.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 2/8] qio: Minor optimization when callback function is unchanged
2025-11-03 20:10 [PATCH 0/8] Fix deadlock with bdrv_open of self-served NBD Eric Blake
2025-11-03 20:10 ` [PATCH 1/8] qio: Add trace points to net_listener Eric Blake
@ 2025-11-03 20:10 ` Eric Blake
2025-11-04 10:44 ` Daniel P. Berrangé
2025-11-04 11:13 ` Kevin Wolf
2025-11-03 20:10 ` [PATCH 3/8] qio: Remember context of qio_net_listener_set_client_func_full Eric Blake
` (5 subsequent siblings)
7 siblings, 2 replies; 41+ messages in thread
From: Eric Blake @ 2025-11-03 20:10 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, berrange, kwolf
In qemu-nbd and other NBD server setups where parallel clients are
supported, it is common that the caller will re-register the same
callback function as long as it has not reached its limit on
simultaneous clients. In that case, there is no need to tear down and
reinstall GSource watches in the GMainContext.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
io/net-listener.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/io/net-listener.c b/io/net-listener.c
index 0adbc409cf2..e89286ea63c 100644
--- a/io/net-listener.c
+++ b/io/net-listener.c
@@ -147,6 +147,10 @@ void qio_net_listener_set_client_func_full(QIONetListener *listener,
{
size_t i;
+ if (listener->io_func == func && listener->io_data == data) {
+ return;
+ }
+
if (listener->io_func) {
trace_qio_net_listener_watch_disabled(listener, "set_client_func");
}
--
2.51.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 3/8] qio: Remember context of qio_net_listener_set_client_func_full
2025-11-03 20:10 [PATCH 0/8] Fix deadlock with bdrv_open of self-served NBD Eric Blake
2025-11-03 20:10 ` [PATCH 1/8] qio: Add trace points to net_listener Eric Blake
2025-11-03 20:10 ` [PATCH 2/8] qio: Minor optimization when callback function is unchanged Eric Blake
@ 2025-11-03 20:10 ` Eric Blake
2025-11-04 10:50 ` Daniel P. Berrangé
2025-11-04 11:25 ` Kevin Wolf
2025-11-03 20:10 ` [PATCH 4/8] qio: Factor out helpers qio_net_listener_[un]watch Eric Blake
` (4 subsequent siblings)
7 siblings, 2 replies; 41+ messages in thread
From: Eric Blake @ 2025-11-03 20:10 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, berrange, kwolf
io/net-listener.c has two modes of use: asynchronous (the user calls
qio_net_listener_set_client_func to wake up the callback via the
global GMainContext, or qio_net_listener_set_client_func_full to wake
up the callback via the caller's own alternative GMainContext), and
synchronous (the user calls qio_net_listener_wait_client which creates
its own GMainContext and waits for the first client connection before
returning, with no need for a user's callback). But commit 938c8b79
("qio: store gsources for net listeners", v2.12.0) has a latent logic
flaw: when qio_net_listener_wait_client finishes on its temporary
context, it reverts all of the siocs back to the global GMainContext
rather than the potentially non-NULL context they might have been
originally registered with. Similarly, if the user creates a
net-listener, adds initial addresses, registers an async callback with
a non-default context (which ties to all siocs for the initial
addresses), then adds more addresses with qio_net_listener_add, the
siocs for later addresses are blindly placed in the global context,
rather than sharing the context of the earlier ones.
In practice, I don't think this has caused issues. As pointed out by
the original commit, all async callers prior to that commit were
already okay with the NULL default context; and the typical usage
pattern is to first add ALL the addresses the listener will pay
attention to before ever setting the async callback. Likewise, if a
file uses only qio_net_listener_set_client_func instead of
qio_net_listener_set_client_func_full, then it is never using a custom
context, so later assignments of async callbacks will still be to the
same global context as earlier ones. Meanwhile, any callers that want
to do the sync operation to grab the first client are unlikely to
register an async callback; altogether bypassing the question of
whether later assignments of a GSource are being tied to a different
context over time.
I do note that chardev/char-socket.c is the only file that calls both
qio_net_listener_wait_client (sync for a single client in
tcp_chr_accept_server_sync), and qio_net_listener_set_client_func_full
(several places, all with chr->gcontext, but sometimes with a NULL
callback function during teardown). But as far as I can tell, the two
uses are mutually exclusive, based on the is_waitconnect parameter to
qmp_chardev_open_socket_server.
That said, it is more robust to remember when a callback function is
tied to a non-default context, and have both the sync wait and any
late address additions honor that same context. That way, the code
will be robust even if a later user performs a sync wait for a
specific client in the middle of servicing a longer-lived
QIONetListener that has an async callback for all other clients.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
include/io/net-listener.h | 1 +
io/net-listener.c | 5 +++--
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/io/net-listener.h b/include/io/net-listener.h
index ab9f291ed62..42fbfab5467 100644
--- a/include/io/net-listener.h
+++ b/include/io/net-listener.h
@@ -50,6 +50,7 @@ struct QIONetListener {
QIOChannelSocket **sioc;
GSource **io_source;
size_t nsioc;
+ GMainContext *context;
bool connected;
diff --git a/io/net-listener.c b/io/net-listener.c
index e89286ea63c..15df673fb6e 100644
--- a/io/net-listener.c
+++ b/io/net-listener.c
@@ -132,7 +132,7 @@ void qio_net_listener_add(QIONetListener *listener,
listener->io_source[listener->nsioc] = qio_channel_add_watch_source(
QIO_CHANNEL(listener->sioc[listener->nsioc]), G_IO_IN,
qio_net_listener_channel_func,
- listener, (GDestroyNotify)object_unref, NULL);
+ listener, (GDestroyNotify)object_unref, listener->context);
}
listener->nsioc++;
@@ -160,6 +160,7 @@ void qio_net_listener_set_client_func_full(QIONetListener *listener,
listener->io_func = func;
listener->io_data = data;
listener->io_notify = notify;
+ listener->context = context;
for (i = 0; i < listener->nsioc; i++) {
if (listener->io_source[i]) {
@@ -271,7 +272,7 @@ QIOChannelSocket *qio_net_listener_wait_client(QIONetListener *listener)
listener->io_source[i] = qio_channel_add_watch_source(
QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
qio_net_listener_channel_func,
- listener, (GDestroyNotify)object_unref, NULL);
+ listener, (GDestroyNotify)object_unref, listener->context);
}
}
--
2.51.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 4/8] qio: Factor out helpers qio_net_listener_[un]watch
2025-11-03 20:10 [PATCH 0/8] Fix deadlock with bdrv_open of self-served NBD Eric Blake
` (2 preceding siblings ...)
2025-11-03 20:10 ` [PATCH 3/8] qio: Remember context of qio_net_listener_set_client_func_full Eric Blake
@ 2025-11-03 20:10 ` Eric Blake
2025-11-04 11:03 ` Daniel P. Berrangé
2025-11-04 12:37 ` Kevin Wolf
2025-11-03 20:10 ` [PATCH 5/8] qio: Let listening sockets remember their owning QIONetListener Eric Blake
` (3 subsequent siblings)
7 siblings, 2 replies; 41+ messages in thread
From: Eric Blake @ 2025-11-03 20:10 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, berrange, kwolf
The code had three similar repetitions of an iteration over one or all
of nsiocs to set up a GSource, and likewise for teardown. Since an
upcoming patch wants to tweak whether GSource or AioContext is used,
its better to consolidate that into one helper function for fewer
places to edit later.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
io/net-listener.c | 109 +++++++++++++++++++---------------------------
1 file changed, 45 insertions(+), 64 deletions(-)
diff --git a/io/net-listener.c b/io/net-listener.c
index 15df673fb6e..e1378b9a612 100644
--- a/io/net-listener.c
+++ b/io/net-listener.c
@@ -106,6 +106,45 @@ int qio_net_listener_open_sync(QIONetListener *listener,
}
}
+/*
+ * i == 0 to set watch on entire array, non-zero to only set watch on
+ * recent additions when earlier entries are already watched.
+ */
+static void
+qio_net_listener_watch(QIONetListener *listener, size_t i, const char *caller)
+{
+ if (!listener->io_func) {
+ return;
+ }
+
+ trace_qio_net_listener_watch_enabled(listener, listener->io_func, caller);
+ for ( ; i < listener->nsioc; i++) {
+ object_ref(OBJECT(listener));
+ listener->io_source[i] = qio_channel_add_watch_source(
+ QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
+ qio_net_listener_channel_func,
+ listener, (GDestroyNotify)object_unref, listener->context);
+ }
+}
+
+static void
+qio_net_listener_unwatch(QIONetListener *listener, const char *caller)
+{
+ size_t i;
+
+ if (!listener->io_func) {
+ return;
+ }
+
+ trace_qio_net_listener_watch_disabled(listener, caller);
+ for (i = 0; i < listener->nsioc; i++) {
+ if (listener->io_source[i]) {
+ g_source_destroy(listener->io_source[i]);
+ g_source_unref(listener->io_source[i]);
+ listener->io_source[i] = NULL;
+ }
+ }
+}
void qio_net_listener_add(QIONetListener *listener,
QIOChannelSocket *sioc)
@@ -125,17 +164,7 @@ void qio_net_listener_add(QIONetListener *listener,
object_ref(OBJECT(sioc));
listener->connected = true;
- if (listener->io_func != NULL) {
- trace_qio_net_listener_watch_enabled(listener, listener->io_func,
- "add");
- object_ref(OBJECT(listener));
- listener->io_source[listener->nsioc] = qio_channel_add_watch_source(
- QIO_CHANNEL(listener->sioc[listener->nsioc]), G_IO_IN,
- qio_net_listener_channel_func,
- listener, (GDestroyNotify)object_unref, listener->context);
- }
-
- listener->nsioc++;
+ qio_net_listener_watch(listener, listener->nsioc++, "add");
}
@@ -145,15 +174,11 @@ void qio_net_listener_set_client_func_full(QIONetListener *listener,
GDestroyNotify notify,
GMainContext *context)
{
- size_t i;
-
if (listener->io_func == func && listener->io_data == data) {
return;
}
- if (listener->io_func) {
- trace_qio_net_listener_watch_disabled(listener, "set_client_func");
- }
+ qio_net_listener_unwatch(listener, "set_client_func");
if (listener->io_notify) {
listener->io_notify(listener->io_data);
}
@@ -162,25 +187,7 @@ void qio_net_listener_set_client_func_full(QIONetListener *listener,
listener->io_notify = notify;
listener->context = context;
- for (i = 0; i < listener->nsioc; i++) {
- if (listener->io_source[i]) {
- g_source_destroy(listener->io_source[i]);
- g_source_unref(listener->io_source[i]);
- listener->io_source[i] = NULL;
- }
- }
-
- if (listener->io_func != NULL) {
- trace_qio_net_listener_watch_enabled(listener, listener->io_func,
- "set_client_func");
- for (i = 0; i < listener->nsioc; i++) {
- object_ref(OBJECT(listener));
- listener->io_source[i] = qio_channel_add_watch_source(
- QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
- qio_net_listener_channel_func,
- listener, (GDestroyNotify)object_unref, context);
- }
- }
+ qio_net_listener_watch(listener, 0, "set_client_func");
}
void qio_net_listener_set_client_func(QIONetListener *listener,
@@ -232,16 +239,7 @@ QIOChannelSocket *qio_net_listener_wait_client(QIONetListener *listener)
};
size_t i;
- if (listener->io_func) {
- trace_qio_net_listener_watch_disabled(listener, "wait_client");
- }
- for (i = 0; i < listener->nsioc; i++) {
- if (listener->io_source[i]) {
- g_source_destroy(listener->io_source[i]);
- g_source_unref(listener->io_source[i]);
- listener->io_source[i] = NULL;
- }
- }
+ qio_net_listener_unwatch(listener, "wait_client");
sources = g_new0(GSource *, listener->nsioc);
for (i = 0; i < listener->nsioc; i++) {
@@ -264,17 +262,7 @@ QIOChannelSocket *qio_net_listener_wait_client(QIONetListener *listener)
g_main_loop_unref(loop);
g_main_context_unref(ctxt);
- if (listener->io_func != NULL) {
- trace_qio_net_listener_watch_enabled(listener, listener->io_func,
- "wait_client");
- for (i = 0; i < listener->nsioc; i++) {
- object_ref(OBJECT(listener));
- listener->io_source[i] = qio_channel_add_watch_source(
- QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
- qio_net_listener_channel_func,
- listener, (GDestroyNotify)object_unref, listener->context);
- }
- }
+ qio_net_listener_watch(listener, 0, "wait_client");
return data.sioc;
}
@@ -287,15 +275,8 @@ void qio_net_listener_disconnect(QIONetListener *listener)
return;
}
- if (listener->io_func) {
- trace_qio_net_listener_watch_disabled(listener, "disconnect");
- }
+ qio_net_listener_unwatch(listener, "disconnect");
for (i = 0; i < listener->nsioc; i++) {
- if (listener->io_source[i]) {
- g_source_destroy(listener->io_source[i]);
- g_source_unref(listener->io_source[i]);
- listener->io_source[i] = NULL;
- }
qio_channel_close(QIO_CHANNEL(listener->sioc[i]), NULL);
}
listener->connected = false;
--
2.51.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 5/8] qio: Let listening sockets remember their owning QIONetListener
2025-11-03 20:10 [PATCH 0/8] Fix deadlock with bdrv_open of self-served NBD Eric Blake
` (3 preceding siblings ...)
2025-11-03 20:10 ` [PATCH 4/8] qio: Factor out helpers qio_net_listener_[un]watch Eric Blake
@ 2025-11-03 20:10 ` Eric Blake
2025-11-05 20:06 ` Eric Blake
2025-11-03 20:10 ` [PATCH 6/8] qio: Hoist ref of listener outside loop Eric Blake
` (2 subsequent siblings)
7 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2025-11-03 20:10 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, berrange, kwolf
Make it easier to get from the sioc listening to a single address on
behalf of a NetListener back to its owning object, which will be
beneficial in an upcoming patch that teaches NetListener how to
interact with AioContext.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
include/io/channel-socket.h | 1 +
io/channel-socket.c | 1 +
io/net-listener.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
index a88cf8b3a9f..eee686f3b4d 100644
--- a/include/io/channel-socket.h
+++ b/include/io/channel-socket.h
@@ -49,6 +49,7 @@ struct QIOChannelSocket {
socklen_t remoteAddrLen;
ssize_t zero_copy_queued;
ssize_t zero_copy_sent;
+ struct QIONetListener *listener;
};
diff --git a/io/channel-socket.c b/io/channel-socket.c
index 712b793eaf2..59e929f97c3 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -65,6 +65,7 @@ qio_channel_socket_new(void)
sioc->fd = -1;
sioc->zero_copy_queued = 0;
sioc->zero_copy_sent = 0;
+ sioc->listener = NULL;
ioc = QIO_CHANNEL(sioc);
qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
diff --git a/io/net-listener.c b/io/net-listener.c
index e1378b9a612..afdacdd5ff4 100644
--- a/io/net-listener.c
+++ b/io/net-listener.c
@@ -152,6 +152,7 @@ void qio_net_listener_add(QIONetListener *listener,
if (listener->name) {
qio_channel_set_name(QIO_CHANNEL(sioc), listener->name);
}
+ sioc->listener = listener;
listener->sioc = g_renew(QIOChannelSocket *, listener->sioc,
listener->nsioc + 1);
--
2.51.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 6/8] qio: Hoist ref of listener outside loop
2025-11-03 20:10 [PATCH 0/8] Fix deadlock with bdrv_open of self-served NBD Eric Blake
` (4 preceding siblings ...)
2025-11-03 20:10 ` [PATCH 5/8] qio: Let listening sockets remember their owning QIONetListener Eric Blake
@ 2025-11-03 20:10 ` Eric Blake
2025-11-04 11:13 ` Daniel P. Berrangé
2025-11-03 20:10 ` [PATCH 7/8] qio: Use AioContext for default-context QIONetListener Eric Blake
2025-11-03 20:10 ` [PATCH 8/8] iotests: Add coverage of recent NBD qio deadlock fix Eric Blake
7 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2025-11-03 20:10 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, berrange, kwolf
The point of QIONetListener is to allow a server to listen to more
than one socket address at a time, and respond to clients in a
first-come-first-serve order across any of those addresses. While
some servers (like NBD) really do want to serve multiple simultaneous
clients, many other servers only care about the first client to
connect, and will immediately deregister the callback, possibly by
dropping their reference to the QIONetListener. The existing code
ensures that all other pending callbacks remain safe once the first
callback drops the listener, by adding an extra reference to the
listener for each GSource created, where those references pair to the
eventual teardown of each GSource after a given callbacks has been
serviced or aborted. But it is equally acceptable to hoist the
reference to the listener outside the loop - as long as there is a
callback function registered, it is sufficient to have a single
reference live for the entire array of sioc, rather than one reference
per sioc in the array.
Hoisting the reference like this will make it easier for an upcoming
patch to still ensure the listener cannot be prematurely garbage
collected during the user's callback, even when the callback no longer
uses a per-sioc GSource.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
io/net-listener.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/io/net-listener.c b/io/net-listener.c
index afdacdd5ff4..ce29bf3c993 100644
--- a/io/net-listener.c
+++ b/io/net-listener.c
@@ -118,12 +118,14 @@ qio_net_listener_watch(QIONetListener *listener, size_t i, const char *caller)
}
trace_qio_net_listener_watch_enabled(listener, listener->io_func, caller);
- for ( ; i < listener->nsioc; i++) {
+ if (i == 0) {
object_ref(OBJECT(listener));
+ }
+ for ( ; i < listener->nsioc; i++) {
listener->io_source[i] = qio_channel_add_watch_source(
QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
qio_net_listener_channel_func,
- listener, (GDestroyNotify)object_unref, listener->context);
+ listener, NULL, listener->context);
}
}
@@ -144,6 +146,7 @@ qio_net_listener_unwatch(QIONetListener *listener, const char *caller)
listener->io_source[i] = NULL;
}
}
+ object_unref(OBJECT(listener));
}
void qio_net_listener_add(QIONetListener *listener,
--
2.51.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 7/8] qio: Use AioContext for default-context QIONetListener
2025-11-03 20:10 [PATCH 0/8] Fix deadlock with bdrv_open of self-served NBD Eric Blake
` (5 preceding siblings ...)
2025-11-03 20:10 ` [PATCH 6/8] qio: Hoist ref of listener outside loop Eric Blake
@ 2025-11-03 20:10 ` Eric Blake
2025-11-04 11:37 ` Daniel P. Berrangé
2025-11-04 15:14 ` Kevin Wolf
2025-11-03 20:10 ` [PATCH 8/8] iotests: Add coverage of recent NBD qio deadlock fix Eric Blake
7 siblings, 2 replies; 41+ messages in thread
From: Eric Blake @ 2025-11-03 20:10 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, berrange, kwolf
The user "John Doe" reported a deadlock when attempting to use
qemu-storage-daemon to serve both a base file over NBD, and a qcow2
file with that NBD export as its backing file, from the same process,
even though it worked just fine when there were two q-s-d processes.
The bulk of the NBD server code properly uses coroutines to make
progress in an event-driven manner, but the code for spawning a new
coroutine at the point when listen(2) detects a new client was
hard-coded to use the global GMainContext; in other words, the
callback that triggers nbd_client_new to let the server start the
negotiation sequence with the client requires the main loop to be
making progress. However, the code for bdrv_open of a qcow2 image
with an NBD backing file uses an AIO_WAIT_WHILE nested event loop to
ensure that the entire qcow2 backing chain is either fully loaded or
rejected, without any side effects from the main loop causing unwanted
changes to the disk being loaded (in short, an AioContext represents
the set of actions that are known to be safe while handling block
layer I/O, while excluding any other pending actions in the global
main loop with potentially larger risk of unwanted side effects).
This creates a classic case of deadlock: the server can't progress to
the point of accept(2)ing the client to write to the NBD socket
because the main loop is being starved until the AIO_WAIT_WHILE
completes the bdrv_open, but the AIO_WAIT_WHILE can't progress because
it is blocked on the client coroutine stuck in a read() of the
expected magic number from the server side of the socket.
Fortunately, the way that AioContext is set up, any callback that is
registered to the global AioContext will also be serviced by the main
loop. So the fix for the deadlock is to alter QIONetListener so that
if it is not being used in an explicit alternative GMainContext, then
it should perform its polling via the global AioContext (which
indirectly still progresses in the default GMainContext) rather than
directly in the default GMainContext. This has no change in behavior
to any prior use that did not starve the main loop, but has the
additional benefit that in the bdrv_open case of a nested AioContext
loop, the server's listen/accept handler is no longer starved because
it is now part of the same AioContext loop. From there, since NBD
already uses coroutines for both server and client code, the nested
AioContext loop finishes quickly and opening the qcow2 backing chain
no longer deadlocks.
The next patch will add a unit test (kept separate to make it easier
to rearrange the series to demonstrate the deadlock without this
patch).
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/3169
Signed-off-by: Eric Blake <eblake@redhat.com>
---
io/net-listener.c | 53 ++++++++++++++++++++++++++++++++++++++---------
io/trace-events | 4 ++--
2 files changed, 45 insertions(+), 12 deletions(-)
diff --git a/io/net-listener.c b/io/net-listener.c
index ce29bf3c993..9f4e3c0be0c 100644
--- a/io/net-listener.c
+++ b/io/net-listener.c
@@ -23,6 +23,7 @@
#include "io/dns-resolver.h"
#include "qapi/error.h"
#include "qemu/module.h"
+#include "qemu/main-loop.h"
#include "trace.h"
QIONetListener *qio_net_listener_new(void)
@@ -62,6 +63,15 @@ static gboolean qio_net_listener_channel_func(QIOChannel *ioc,
}
+static void qio_net_listener_aio_func(void *opaque)
+{
+ QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(opaque);
+
+ qio_net_listener_channel_func(QIO_CHANNEL(sioc), G_IO_IN,
+ sioc->listener);
+}
+
+
int qio_net_listener_open_sync(QIONetListener *listener,
SocketAddress *addr,
int num,
@@ -117,15 +127,33 @@ qio_net_listener_watch(QIONetListener *listener, size_t i, const char *caller)
return;
}
- trace_qio_net_listener_watch_enabled(listener, listener->io_func, caller);
+ trace_qio_net_listener_watch_enabled(listener, listener->io_func,
+ listener->context, caller);
if (i == 0) {
object_ref(OBJECT(listener));
}
for ( ; i < listener->nsioc; i++) {
- listener->io_source[i] = qio_channel_add_watch_source(
- QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
- qio_net_listener_channel_func,
- listener, NULL, listener->context);
+ if (listener->context) {
+ /*
+ * The user passed a GMainContext with the async callback;
+ * they plan on running their own g_main_loop.
+ */
+ listener->io_source[i] = qio_channel_add_watch_source(
+ QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
+ qio_net_listener_channel_func,
+ listener, NULL, listener->context);
+ } else {
+ /*
+ * The user is fine with the default context. But by doing
+ * it in the main thread's AioContext rather than
+ * specifically in a GMainContext, we can remain
+ * responsive even if another AioContext depends on
+ * connecting to this server.
+ */
+ aio_set_fd_handler(qemu_get_aio_context(), listener->sioc[i]->fd,
+ qio_net_listener_aio_func, NULL, NULL, NULL,
+ listener->sioc[i]);
+ }
}
}
@@ -138,12 +166,17 @@ qio_net_listener_unwatch(QIONetListener *listener, const char *caller)
return;
}
- trace_qio_net_listener_watch_disabled(listener, caller);
+ trace_qio_net_listener_watch_disabled(listener, listener->context, caller);
for (i = 0; i < listener->nsioc; i++) {
- if (listener->io_source[i]) {
- g_source_destroy(listener->io_source[i]);
- g_source_unref(listener->io_source[i]);
- listener->io_source[i] = NULL;
+ if (listener->context) {
+ if (listener->io_source[i]) {
+ g_source_destroy(listener->io_source[i]);
+ g_source_unref(listener->io_source[i]);
+ listener->io_source[i] = NULL;
+ }
+ } else {
+ aio_set_fd_handler(qemu_get_aio_context(), listener->sioc[i]->fd,
+ NULL, NULL, NULL, NULL, NULL);
}
}
object_unref(OBJECT(listener));
diff --git a/io/trace-events b/io/trace-events
index 8cc4cae3a5d..1b01b2d51e6 100644
--- a/io/trace-events
+++ b/io/trace-events
@@ -74,6 +74,6 @@ qio_channel_command_abort(void *ioc, int pid) "Command abort ioc=%p pid=%d"
qio_channel_command_wait(void *ioc, int pid, int ret, int status) "Command abort ioc=%p pid=%d ret=%d status=%d"
# net-listener.c
-qio_net_listener_watch_enabled(void *listener, void *func, const char *extra) "Net listener=%p watch enabled func=%p by %s"
-qio_net_listener_watch_disabled(void *listener, const char *extra) "Net listener=%p watch disabled by %s"
+qio_net_listener_watch_enabled(void *listener, void *func, void *ctx, const char *extra) "Net listener=%p watch enabled func=%p ctx=%p by %s"
+qio_net_listener_watch_disabled(void *listener, void *ctx, const char *extra) "Net listener=%p watch disabled ctx=%p by %s"
qio_net_listener_callback(void *listener, void *func) "Net listener=%p callback forwarding to func=%p"
--
2.51.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 8/8] iotests: Add coverage of recent NBD qio deadlock fix
2025-11-03 20:10 [PATCH 0/8] Fix deadlock with bdrv_open of self-served NBD Eric Blake
` (6 preceding siblings ...)
2025-11-03 20:10 ` [PATCH 7/8] qio: Use AioContext for default-context QIONetListener Eric Blake
@ 2025-11-03 20:10 ` Eric Blake
2025-11-04 11:38 ` Vladimir Sementsov-Ogievskiy
7 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2025-11-03 20:10 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-block, berrange, kwolf, Vladimir Sementsov-Ogievskiy,
Hanna Reitz
Test that all images in a qcow2 chain using an NBD backing file can be
served by the same process. Prior to the recent QIONetListener fixes,
this test would demonstrate deadlock.
The test borrows heavily from the original formula by "John Doe" in
the gitlab bug, but uses a Unix socket rather than TCP to avoid port
contention, and uses a full-blown QEMU rather than qemu-storage-daemon
since both programs were impacted.
[While preparing this patch by making the new test executable, I
noticed vvfat.out does not need execute permissions]
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/3169
Signed-off-by: Eric Blake <eblake@redhat.com>
---
tests/qemu-iotests/tests/nbd-in-qcow2-chain | 84 +++++++++++++++++++
.../qemu-iotests/tests/nbd-in-qcow2-chain.out | 56 +++++++++++++
tests/qemu-iotests/tests/vvfat.out | 0
3 files changed, 140 insertions(+)
create mode 100755 tests/qemu-iotests/tests/nbd-in-qcow2-chain
create mode 100644 tests/qemu-iotests/tests/nbd-in-qcow2-chain.out
mode change 100755 => 100644 tests/qemu-iotests/tests/vvfat.out
diff --git a/tests/qemu-iotests/tests/nbd-in-qcow2-chain b/tests/qemu-iotests/tests/nbd-in-qcow2-chain
new file mode 100755
index 00000000000..b89f74d4552
--- /dev/null
+++ b/tests/qemu-iotests/tests/nbd-in-qcow2-chain
@@ -0,0 +1,84 @@
+#!/usr/bin/env bash
+# group: rw quick
+#
+# Test of opening both server and client NBD in a qcow2 backing chain
+#
+# Copyright (C) Red Hat, Inc.
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+# creator
+owner=eblake@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1 # failure is the default!
+
+_cleanup()
+{
+ _cleanup_qemu
+ _cleanup_test_img
+ rm -f "$SOCK_DIR/nbd"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+cd ..
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+. ./common.nbd
+
+_supported_fmt qcow2 # Hardcoded to qcow2 command line and QMP below
+_supported_proto file
+
+size=100M
+
+echo
+echo "=== Preparing base image ==="
+
+TEST_IMG="$TEST_IMG.base" _make_test_img $size
+
+echo
+echo "=== Starting QEMU and exposing base image ==="
+
+_launch_qemu -machine q35
+h1=$QEMU_HANDLE
+_send_qemu_cmd $QEMU_HANDLE '{"execute": "qmp_capabilities"}' 'return'
+_send_qemu_cmd $QEMU_HANDLE '{"execute": "blockdev-add",
+ "arguments": {"node-name":"base", "driver":"qcow2",
+ "file":{"driver":"file", "filename":"'"$TEST_IMG.base"'"}}}' 'return'
+_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start",
+ "arguments": {"addr":{"type":"unix",
+ "data":{"path":"'"$SOCK_DIR/nbd"'"}}}}' 'return'
+_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-add",
+ "arguments": {"device":"base","name":"base"}}' 'return'
+
+echo
+echo "=== Creating wrapper image ==="
+
+_make_test_img -F raw -b "nbd+unix:///base?socket=$SOCK_DIR/nbd" $size
+
+echo
+echo "=== Adding wrapper image ==="
+
+_send_qemu_cmd $QEMU_HANDLE '{"execute": "blockdev-add",
+ "arguments": {"node-name":"wrap", "driver":"qcow2",
+ "file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' 'return'
+_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-add",
+ "arguments": {"device":"wrap","name":"wrap"}}' 'return'
+
+echo
+echo "=== Checking NBD server ==="
+
+$QEMU_NBD --list -k $SOCK_DIR/nbd
+
+echo
+echo "=== Cleaning up ==="
+
+_send_qemu_cmd $QEMU_HANDLE '{"execute":"quit"}' ''
+
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/nbd-in-qcow2-chain.out b/tests/qemu-iotests/tests/nbd-in-qcow2-chain.out
new file mode 100644
index 00000000000..5f1d31ae2e0
--- /dev/null
+++ b/tests/qemu-iotests/tests/nbd-in-qcow2-chain.out
@@ -0,0 +1,56 @@
+QA output created by nbd-in-qcow2-chain
+
+=== Preparing base image ===
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=104857600
+
+=== Starting QEMU and exposing base image ===
+{"execute": "qmp_capabilities"}
+{"return": {}}
+{"execute": "blockdev-add",
+ "arguments": {"node-name":"base", "driver":"IMGFMT",
+ "file":{"driver":"file", "filename":"TEST_DIR/t.IMGFMT.base"}}}
+{"return": {}}
+{"execute":"nbd-server-start",
+ "arguments": {"addr":{"type":"unix",
+ "data":{"path":"SOCK_DIR/nbd"}}}}
+{"return": {}}
+{"execute":"nbd-server-add",
+ "arguments": {"device":"base","name":"base"}}
+{"return": {}}
+
+=== Creating wrapper image ===
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=104857600 backing_file=nbd+unix:///base?socket=SOCK_DIR/nbd backing_fmt=raw
+
+=== Adding wrapper image ===
+{"execute": "blockdev-add",
+ "arguments": {"node-name":"wrap", "driver":"IMGFMT",
+ "file":{"driver":"file", "filename":"TEST_DIR/t.IMGFMT"}}}
+{"return": {}}
+{"execute":"nbd-server-add",
+ "arguments": {"device":"wrap","name":"wrap"}}
+{"return": {}}
+
+=== Checking NBD server ===
+exports available: 2
+ export: 'base'
+ size: 104857600
+ flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
+ min block: 1
+ opt block: 4096
+ max block: 33554432
+ transaction size: 64-bit
+ available meta contexts: 1
+ base:allocation
+ export: 'wrap'
+ size: 104857600
+ flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
+ min block: 1
+ opt block: 4096
+ max block: 33554432
+ transaction size: 64-bit
+ available meta contexts: 1
+ base:allocation
+
+=== Cleaning up ===
+{"execute":"quit"}
+*** done
diff --git a/tests/qemu-iotests/tests/vvfat.out b/tests/qemu-iotests/tests/vvfat.out
old mode 100755
new mode 100644
--
2.51.1
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH 1/8] qio: Add trace points to net_listener
2025-11-03 20:10 ` [PATCH 1/8] qio: Add trace points to net_listener Eric Blake
@ 2025-11-04 10:43 ` Daniel P. Berrangé
2025-11-04 11:08 ` Kevin Wolf
1 sibling, 0 replies; 41+ messages in thread
From: Daniel P. Berrangé @ 2025-11-04 10:43 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, kwolf
On Mon, Nov 03, 2025 at 02:10:52PM -0600, Eric Blake wrote:
> Upcoming patches will adjust how net_listener watches for new client
> connections; adding trace points now makes it easier to debug that the
> changes work as intended. For example, adding
> --trace='qio_net_listener*' to the qemu-storage-daemon command line
> before --nbd-server will track when the server first starts listening
> for clients.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> io/net-listener.c | 17 +++++++++++++++++
> io/trace-events | 5 +++++
> 2 files changed, 22 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/8] qio: Minor optimization when callback function is unchanged
2025-11-03 20:10 ` [PATCH 2/8] qio: Minor optimization when callback function is unchanged Eric Blake
@ 2025-11-04 10:44 ` Daniel P. Berrangé
2025-11-04 11:13 ` Kevin Wolf
1 sibling, 0 replies; 41+ messages in thread
From: Daniel P. Berrangé @ 2025-11-04 10:44 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, kwolf
On Mon, Nov 03, 2025 at 02:10:53PM -0600, Eric Blake wrote:
> In qemu-nbd and other NBD server setups where parallel clients are
> supported, it is common that the caller will re-register the same
> callback function as long as it has not reached its limit on
> simultaneous clients. In that case, there is no need to tear down and
> reinstall GSource watches in the GMainContext.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> io/net-listener.c | 4 ++++
> 1 file changed, 4 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 3/8] qio: Remember context of qio_net_listener_set_client_func_full
2025-11-03 20:10 ` [PATCH 3/8] qio: Remember context of qio_net_listener_set_client_func_full Eric Blake
@ 2025-11-04 10:50 ` Daniel P. Berrangé
2025-11-04 11:25 ` Kevin Wolf
1 sibling, 0 replies; 41+ messages in thread
From: Daniel P. Berrangé @ 2025-11-04 10:50 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, kwolf
On Mon, Nov 03, 2025 at 02:10:54PM -0600, Eric Blake wrote:
> io/net-listener.c has two modes of use: asynchronous (the user calls
> qio_net_listener_set_client_func to wake up the callback via the
> global GMainContext, or qio_net_listener_set_client_func_full to wake
> up the callback via the caller's own alternative GMainContext), and
> synchronous (the user calls qio_net_listener_wait_client which creates
> its own GMainContext and waits for the first client connection before
> returning, with no need for a user's callback). But commit 938c8b79
> ("qio: store gsources for net listeners", v2.12.0) has a latent logic
> flaw: when qio_net_listener_wait_client finishes on its temporary
> context, it reverts all of the siocs back to the global GMainContext
> rather than the potentially non-NULL context they might have been
> originally registered with. Similarly, if the user creates a
> net-listener, adds initial addresses, registers an async callback with
> a non-default context (which ties to all siocs for the initial
> addresses), then adds more addresses with qio_net_listener_add, the
> siocs for later addresses are blindly placed in the global context,
> rather than sharing the context of the earlier ones.
>
> In practice, I don't think this has caused issues. As pointed out by
> the original commit, all async callers prior to that commit were
> already okay with the NULL default context; and the typical usage
> pattern is to first add ALL the addresses the listener will pay
> attention to before ever setting the async callback. Likewise, if a
> file uses only qio_net_listener_set_client_func instead of
> qio_net_listener_set_client_func_full, then it is never using a custom
> context, so later assignments of async callbacks will still be to the
> same global context as earlier ones. Meanwhile, any callers that want
> to do the sync operation to grab the first client are unlikely to
> register an async callback; altogether bypassing the question of
> whether later assignments of a GSource are being tied to a different
> context over time.
>
> I do note that chardev/char-socket.c is the only file that calls both
> qio_net_listener_wait_client (sync for a single client in
> tcp_chr_accept_server_sync), and qio_net_listener_set_client_func_full
> (several places, all with chr->gcontext, but sometimes with a NULL
> callback function during teardown). But as far as I can tell, the two
> uses are mutually exclusive, based on the is_waitconnect parameter to
> qmp_chardev_open_socket_server.
>
> That said, it is more robust to remember when a callback function is
> tied to a non-default context, and have both the sync wait and any
> late address additions honor that same context. That way, the code
> will be robust even if a later user performs a sync wait for a
> specific client in the middle of servicing a longer-lived
> QIONetListener that has an async callback for all other clients.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> include/io/net-listener.h | 1 +
> io/net-listener.c | 5 +++--
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/include/io/net-listener.h b/include/io/net-listener.h
> index ab9f291ed62..42fbfab5467 100644
> --- a/include/io/net-listener.h
> +++ b/include/io/net-listener.h
> @@ -50,6 +50,7 @@ struct QIONetListener {
> QIOChannelSocket **sioc;
> GSource **io_source;
> size_t nsioc;
> + GMainContext *context;
>
> bool connected;
>
> diff --git a/io/net-listener.c b/io/net-listener.c
> index e89286ea63c..15df673fb6e 100644
> --- a/io/net-listener.c
> +++ b/io/net-listener.c
> @@ -132,7 +132,7 @@ void qio_net_listener_add(QIONetListener *listener,
> listener->io_source[listener->nsioc] = qio_channel_add_watch_source(
> QIO_CHANNEL(listener->sioc[listener->nsioc]), G_IO_IN,
> qio_net_listener_channel_func,
> - listener, (GDestroyNotify)object_unref, NULL);
> + listener, (GDestroyNotify)object_unref, listener->context);
> }
>
> listener->nsioc++;
> @@ -160,6 +160,7 @@ void qio_net_listener_set_client_func_full(QIONetListener *listener,
> listener->io_func = func;
> listener->io_data = data;
> listener->io_notify = notify;
> + listener->context = context;
The previous patch added a short circuit condition:
if (listener->io_func == func && listener->io_data == data) {
return;
}
I feel like we should have "&& listener->contxt == context" in
the short circuit check too
>
> for (i = 0; i < listener->nsioc; i++) {
> if (listener->io_source[i]) {
> @@ -271,7 +272,7 @@ QIOChannelSocket *qio_net_listener_wait_client(QIONetListener *listener)
> listener->io_source[i] = qio_channel_add_watch_source(
> QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
> qio_net_listener_channel_func,
> - listener, (GDestroyNotify)object_unref, NULL);
> + listener, (GDestroyNotify)object_unref, listener->context);
> }
> }
If we extend the above short circuit condition then...
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 4/8] qio: Factor out helpers qio_net_listener_[un]watch
2025-11-03 20:10 ` [PATCH 4/8] qio: Factor out helpers qio_net_listener_[un]watch Eric Blake
@ 2025-11-04 11:03 ` Daniel P. Berrangé
2025-11-04 13:15 ` Kevin Wolf
2025-11-04 12:37 ` Kevin Wolf
1 sibling, 1 reply; 41+ messages in thread
From: Daniel P. Berrangé @ 2025-11-04 11:03 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, kwolf
On Mon, Nov 03, 2025 at 02:10:55PM -0600, Eric Blake wrote:
> The code had three similar repetitions of an iteration over one or all
> of nsiocs to set up a GSource, and likewise for teardown. Since an
> upcoming patch wants to tweak whether GSource or AioContext is used,
> its better to consolidate that into one helper function for fewer
> places to edit later.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> io/net-listener.c | 109 +++++++++++++++++++---------------------------
> 1 file changed, 45 insertions(+), 64 deletions(-)
>
> diff --git a/io/net-listener.c b/io/net-listener.c
> index 15df673fb6e..e1378b9a612 100644
> --- a/io/net-listener.c
> +++ b/io/net-listener.c
> @@ -106,6 +106,45 @@ int qio_net_listener_open_sync(QIONetListener *listener,
> }
> }
> void qio_net_listener_add(QIONetListener *listener,
> QIOChannelSocket *sioc)
> @@ -125,17 +164,7 @@ void qio_net_listener_add(QIONetListener *listener,
> object_ref(OBJECT(sioc));
> listener->connected = true;
>
> - if (listener->io_func != NULL) {
> - trace_qio_net_listener_watch_enabled(listener, listener->io_func,
> - "add");
> - object_ref(OBJECT(listener));
> - listener->io_source[listener->nsioc] = qio_channel_add_watch_source(
> - QIO_CHANNEL(listener->sioc[listener->nsioc]), G_IO_IN,
> - qio_net_listener_channel_func,
> - listener, (GDestroyNotify)object_unref, listener->context);
> - }
> -
> - listener->nsioc++;
> + qio_net_listener_watch(listener, listener->nsioc++, "add");
Nit-picking, I'd have a slight preference to keep the 'nsioc' increment
on the following line from the qio_net_listener_watch call, as I don't
like side effects in passing the function arguments.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/8] qio: Add trace points to net_listener
2025-11-03 20:10 ` [PATCH 1/8] qio: Add trace points to net_listener Eric Blake
2025-11-04 10:43 ` Daniel P. Berrangé
@ 2025-11-04 11:08 ` Kevin Wolf
2025-11-05 17:18 ` Eric Blake
1 sibling, 1 reply; 41+ messages in thread
From: Kevin Wolf @ 2025-11-04 11:08 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, berrange
Am 03.11.2025 um 21:10 hat Eric Blake geschrieben:
> Upcoming patches will adjust how net_listener watches for new client
> connections; adding trace points now makes it easier to debug that the
> changes work as intended. For example, adding
> --trace='qio_net_listener*' to the qemu-storage-daemon command line
> before --nbd-server will track when the server first starts listening
> for clients.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> io/net-listener.c | 17 +++++++++++++++++
> io/trace-events | 5 +++++
> 2 files changed, 22 insertions(+)
>
> diff --git a/io/net-listener.c b/io/net-listener.c
> index 47405965a66..0adbc409cf2 100644
> --- a/io/net-listener.c
> +++ b/io/net-listener.c
> @@ -23,6 +23,7 @@
> #include "io/dns-resolver.h"
> #include "qapi/error.h"
> #include "qemu/module.h"
> +#include "trace.h"
>
> QIONetListener *qio_net_listener_new(void)
> {
> @@ -50,6 +51,7 @@ static gboolean qio_net_listener_channel_func(QIOChannel *ioc,
> return TRUE;
> }
>
> + trace_qio_net_listener_callback(listener, listener->io_func);
> if (listener->io_func) {
> listener->io_func(listener, sioc, listener->io_data);
> }
Not necessarily a problem, but I wonder why you decided to have the
trace point unconditional here...
> @@ -143,6 +147,9 @@ void qio_net_listener_set_client_func_full(QIONetListener *listener,
> {
> size_t i;
>
> + if (listener->io_func) {
> + trace_qio_net_listener_watch_disabled(listener, "set_client_func");
> + }
> if (listener->io_notify) {
> listener->io_notify(listener->io_data);
> }
...while everywhere else you only call it with a non-NULL
listener->io_func.
Kevin
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 6/8] qio: Hoist ref of listener outside loop
2025-11-03 20:10 ` [PATCH 6/8] qio: Hoist ref of listener outside loop Eric Blake
@ 2025-11-04 11:13 ` Daniel P. Berrangé
2025-11-05 21:57 ` Eric Blake
0 siblings, 1 reply; 41+ messages in thread
From: Daniel P. Berrangé @ 2025-11-04 11:13 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, kwolf
On Mon, Nov 03, 2025 at 02:10:57PM -0600, Eric Blake wrote:
> The point of QIONetListener is to allow a server to listen to more
> than one socket address at a time, and respond to clients in a
> first-come-first-serve order across any of those addresses. While
> some servers (like NBD) really do want to serve multiple simultaneous
> clients, many other servers only care about the first client to
> connect, and will immediately deregister the callback, possibly by
> dropping their reference to the QIONetListener. The existing code
> ensures that all other pending callbacks remain safe once the first
> callback drops the listener, by adding an extra reference to the
> listener for each GSource created, where those references pair to the
> eventual teardown of each GSource after a given callbacks has been
> serviced or aborted. But it is equally acceptable to hoist the
> reference to the listener outside the loop - as long as there is a
> callback function registered, it is sufficient to have a single
> reference live for the entire array of sioc, rather than one reference
> per sioc in the array.
>
> Hoisting the reference like this will make it easier for an upcoming
> patch to still ensure the listener cannot be prematurely garbage
> collected during the user's callback, even when the callback no longer
> uses a per-sioc GSource.
It isn't quite this simple. Glib reference counts the callback
func / data, holding a reference when dispatching the callback.
IOW, even if the GSource is unrefed, the callback 'notify'
function won't be called if the main loop is in the process
of dispatching.
With this change, the reference on 'listener' can now be
released even if the callback is currently dispatching.
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> io/net-listener.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/io/net-listener.c b/io/net-listener.c
> index afdacdd5ff4..ce29bf3c993 100644
> --- a/io/net-listener.c
> +++ b/io/net-listener.c
> @@ -118,12 +118,14 @@ qio_net_listener_watch(QIONetListener *listener, size_t i, const char *caller)
> }
>
> trace_qio_net_listener_watch_enabled(listener, listener->io_func, caller);
> - for ( ; i < listener->nsioc; i++) {
> + if (i == 0) {
> object_ref(OBJECT(listener));
> + }
> + for ( ; i < listener->nsioc; i++) {
> listener->io_source[i] = qio_channel_add_watch_source(
> QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
> qio_net_listener_channel_func,
> - listener, (GDestroyNotify)object_unref, listener->context);
> + listener, NULL, listener->context);
> }
> }
>
> @@ -144,6 +146,7 @@ qio_net_listener_unwatch(QIONetListener *listener, const char *caller)
> listener->io_source[i] = NULL;
> }
> }
> + object_unref(OBJECT(listener));
> }
>
> void qio_net_listener_add(QIONetListener *listener,
> --
> 2.51.1
>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/8] qio: Minor optimization when callback function is unchanged
2025-11-03 20:10 ` [PATCH 2/8] qio: Minor optimization when callback function is unchanged Eric Blake
2025-11-04 10:44 ` Daniel P. Berrangé
@ 2025-11-04 11:13 ` Kevin Wolf
2025-11-05 17:23 ` Eric Blake
1 sibling, 1 reply; 41+ messages in thread
From: Kevin Wolf @ 2025-11-04 11:13 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, berrange
Am 03.11.2025 um 21:10 hat Eric Blake geschrieben:
> In qemu-nbd and other NBD server setups where parallel clients are
> supported, it is common that the caller will re-register the same
> callback function as long as it has not reached its limit on
> simultaneous clients. In that case, there is no need to tear down and
> reinstall GSource watches in the GMainContext.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> io/net-listener.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/io/net-listener.c b/io/net-listener.c
> index 0adbc409cf2..e89286ea63c 100644
> --- a/io/net-listener.c
> +++ b/io/net-listener.c
> @@ -147,6 +147,10 @@ void qio_net_listener_set_client_func_full(QIONetListener *listener,
> {
> size_t i;
>
> + if (listener->io_func == func && listener->io_data == data) {
> + return;
> + }
Wouldn't a change of listener->io_notify also be significant?
Kevin
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 3/8] qio: Remember context of qio_net_listener_set_client_func_full
2025-11-03 20:10 ` [PATCH 3/8] qio: Remember context of qio_net_listener_set_client_func_full Eric Blake
2025-11-04 10:50 ` Daniel P. Berrangé
@ 2025-11-04 11:25 ` Kevin Wolf
2025-11-05 19:18 ` Eric Blake
1 sibling, 1 reply; 41+ messages in thread
From: Kevin Wolf @ 2025-11-04 11:25 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, berrange
Am 03.11.2025 um 21:10 hat Eric Blake geschrieben:
> io/net-listener.c has two modes of use: asynchronous (the user calls
> qio_net_listener_set_client_func to wake up the callback via the
> global GMainContext, or qio_net_listener_set_client_func_full to wake
> up the callback via the caller's own alternative GMainContext), and
> synchronous (the user calls qio_net_listener_wait_client which creates
> its own GMainContext and waits for the first client connection before
> returning, with no need for a user's callback). But commit 938c8b79
> ("qio: store gsources for net listeners", v2.12.0) has a latent logic
> flaw: when qio_net_listener_wait_client finishes on its temporary
> context, it reverts all of the siocs back to the global GMainContext
> rather than the potentially non-NULL context they might have been
> originally registered with. Similarly, if the user creates a
> net-listener, adds initial addresses, registers an async callback with
> a non-default context (which ties to all siocs for the initial
> addresses), then adds more addresses with qio_net_listener_add, the
> siocs for later addresses are blindly placed in the global context,
> rather than sharing the context of the earlier ones.
>
> In practice, I don't think this has caused issues. As pointed out by
> the original commit, all async callers prior to that commit were
> already okay with the NULL default context; and the typical usage
> pattern is to first add ALL the addresses the listener will pay
> attention to before ever setting the async callback. Likewise, if a
> file uses only qio_net_listener_set_client_func instead of
> qio_net_listener_set_client_func_full, then it is never using a custom
> context, so later assignments of async callbacks will still be to the
> same global context as earlier ones. Meanwhile, any callers that want
> to do the sync operation to grab the first client are unlikely to
> register an async callback; altogether bypassing the question of
> whether later assignments of a GSource are being tied to a different
> context over time.
>
> I do note that chardev/char-socket.c is the only file that calls both
> qio_net_listener_wait_client (sync for a single client in
> tcp_chr_accept_server_sync), and qio_net_listener_set_client_func_full
> (several places, all with chr->gcontext, but sometimes with a NULL
> callback function during teardown). But as far as I can tell, the two
> uses are mutually exclusive, based on the is_waitconnect parameter to
> qmp_chardev_open_socket_server.
>
> That said, it is more robust to remember when a callback function is
> tied to a non-default context, and have both the sync wait and any
> late address additions honor that same context. That way, the code
> will be robust even if a later user performs a sync wait for a
> specific client in the middle of servicing a longer-lived
> QIONetListener that has an async callback for all other clients.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> @@ -160,6 +160,7 @@ void qio_net_listener_set_client_func_full(QIONetListener *listener,
> listener->io_func = func;
> listener->io_data = data;
> listener->io_notify = notify;
> + listener->context = context;
Now that you show me this, I think patch 2 actually also needs to check
that context is unchanged. We don't remember the old value before this
patch, so maybe the order of patch 2 and 3 should be swapped.
Kevin
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 7/8] qio: Use AioContext for default-context QIONetListener
2025-11-03 20:10 ` [PATCH 7/8] qio: Use AioContext for default-context QIONetListener Eric Blake
@ 2025-11-04 11:37 ` Daniel P. Berrangé
2025-11-05 22:06 ` Eric Blake
2025-11-04 15:14 ` Kevin Wolf
1 sibling, 1 reply; 41+ messages in thread
From: Daniel P. Berrangé @ 2025-11-04 11:37 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, kwolf
On Mon, Nov 03, 2025 at 02:10:58PM -0600, Eric Blake wrote:
> The user "John Doe" reported a deadlock when attempting to use
> qemu-storage-daemon to serve both a base file over NBD, and a qcow2
> file with that NBD export as its backing file, from the same process,
> even though it worked just fine when there were two q-s-d processes.
> The bulk of the NBD server code properly uses coroutines to make
> progress in an event-driven manner, but the code for spawning a new
> coroutine at the point when listen(2) detects a new client was
> hard-coded to use the global GMainContext; in other words, the
> callback that triggers nbd_client_new to let the server start the
> negotiation sequence with the client requires the main loop to be
> making progress. However, the code for bdrv_open of a qcow2 image
> with an NBD backing file uses an AIO_WAIT_WHILE nested event loop to
> ensure that the entire qcow2 backing chain is either fully loaded or
> rejected, without any side effects from the main loop causing unwanted
> changes to the disk being loaded (in short, an AioContext represents
> the set of actions that are known to be safe while handling block
> layer I/O, while excluding any other pending actions in the global
> main loop with potentially larger risk of unwanted side effects).
>
> This creates a classic case of deadlock: the server can't progress to
> the point of accept(2)ing the client to write to the NBD socket
> because the main loop is being starved until the AIO_WAIT_WHILE
> completes the bdrv_open, but the AIO_WAIT_WHILE can't progress because
> it is blocked on the client coroutine stuck in a read() of the
> expected magic number from the server side of the socket.
>
> Fortunately, the way that AioContext is set up, any callback that is
> registered to the global AioContext will also be serviced by the main
> loop. So the fix for the deadlock is to alter QIONetListener so that
> if it is not being used in an explicit alternative GMainContext, then
> it should perform its polling via the global AioContext (which
> indirectly still progresses in the default GMainContext) rather than
> directly in the default GMainContext. This has no change in behavior
> to any prior use that did not starve the main loop, but has the
> additional benefit that in the bdrv_open case of a nested AioContext
> loop, the server's listen/accept handler is no longer starved because
> it is now part of the same AioContext loop. From there, since NBD
> already uses coroutines for both server and client code, the nested
> AioContext loop finishes quickly and opening the qcow2 backing chain
> no longer deadlocks.
>
> The next patch will add a unit test (kept separate to make it easier
> to rearrange the series to demonstrate the deadlock without this
> patch).
>
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/3169
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> io/net-listener.c | 53 ++++++++++++++++++++++++++++++++++++++---------
> io/trace-events | 4 ++--
> 2 files changed, 45 insertions(+), 12 deletions(-)
>
> diff --git a/io/net-listener.c b/io/net-listener.c
> index ce29bf3c993..9f4e3c0be0c 100644
> --- a/io/net-listener.c
> +++ b/io/net-listener.c
> @@ -23,6 +23,7 @@
> #include "io/dns-resolver.h"
> #include "qapi/error.h"
> #include "qemu/module.h"
> +#include "qemu/main-loop.h"
> #include "trace.h"
>
> QIONetListener *qio_net_listener_new(void)
> @@ -62,6 +63,15 @@ static gboolean qio_net_listener_channel_func(QIOChannel *ioc,
> }
>
>
> +static void qio_net_listener_aio_func(void *opaque)
> +{
> + QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(opaque);
> +
> + qio_net_listener_channel_func(QIO_CHANNEL(sioc), G_IO_IN,
> + sioc->listener);
> +}
> +
> +
> int qio_net_listener_open_sync(QIONetListener *listener,
> SocketAddress *addr,
> int num,
> @@ -117,15 +127,33 @@ qio_net_listener_watch(QIONetListener *listener, size_t i, const char *caller)
> return;
> }
>
> - trace_qio_net_listener_watch_enabled(listener, listener->io_func, caller);
> + trace_qio_net_listener_watch_enabled(listener, listener->io_func,
> + listener->context, caller);
> if (i == 0) {
> object_ref(OBJECT(listener));
> }
> for ( ; i < listener->nsioc; i++) {
> - listener->io_source[i] = qio_channel_add_watch_source(
> - QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
> - qio_net_listener_channel_func,
> - listener, NULL, listener->context);
> + if (listener->context) {
> + /*
> + * The user passed a GMainContext with the async callback;
> + * they plan on running their own g_main_loop.
> + */
> + listener->io_source[i] = qio_channel_add_watch_source(
> + QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
> + qio_net_listener_channel_func,
> + listener, NULL, listener->context);
> + } else {
> + /*
> + * The user is fine with the default context. But by doing
> + * it in the main thread's AioContext rather than
> + * specifically in a GMainContext, we can remain
> + * responsive even if another AioContext depends on
> + * connecting to this server.
> + */
> + aio_set_fd_handler(qemu_get_aio_context(), listener->sioc[i]->fd,
> + qio_net_listener_aio_func, NULL, NULL, NULL,
> + listener->sioc[i]);
> + }
I'm not really happy with the listener directly accessing the 'fd'
fields in the QIOSocketChannel, as compared to the GSource approach
where the underlying transport is not exposed to the caller.
If we want to use an AioContext instead of a GSource, then I think
we need to add a method to either QIOChannelSocket, or QIOChannel
base, as an alternative to the GSource watches, and then have the
listener conditionally use the AioContext APIs.
Also in QIOChannel base, we have a io_set_aio_fd_handler() method
that we use elsewhere. Can we perhaps leverage that in some way.
eg, instead of taking the AioContext code path based off
"if (listener->context)", take the code path based on whether
the QIOChannel has had a call qio_channel_set_aio_fd_handler
to register AIO handlers ? Maybe that method doesn't quite fit,
but conceptually I would be more comfortable with an approach
that explicitly associates an AioContext with either the
channel or the listener object, rather than this heuristic
of "if (listener->context)".
> }
> }
>
> @@ -138,12 +166,17 @@ qio_net_listener_unwatch(QIONetListener *listener, const char *caller)
> return;
> }
>
> - trace_qio_net_listener_watch_disabled(listener, caller);
> + trace_qio_net_listener_watch_disabled(listener, listener->context, caller);
> for (i = 0; i < listener->nsioc; i++) {
> - if (listener->io_source[i]) {
> - g_source_destroy(listener->io_source[i]);
> - g_source_unref(listener->io_source[i]);
> - listener->io_source[i] = NULL;
> + if (listener->context) {
> + if (listener->io_source[i]) {
> + g_source_destroy(listener->io_source[i]);
> + g_source_unref(listener->io_source[i]);
> + listener->io_source[i] = NULL;
> + }
> + } else {
> + aio_set_fd_handler(qemu_get_aio_context(), listener->sioc[i]->fd,
> + NULL, NULL, NULL, NULL, NULL);
> }
> }
> object_unref(OBJECT(listener));
> diff --git a/io/trace-events b/io/trace-events
> index 8cc4cae3a5d..1b01b2d51e6 100644
> --- a/io/trace-events
> +++ b/io/trace-events
> @@ -74,6 +74,6 @@ qio_channel_command_abort(void *ioc, int pid) "Command abort ioc=%p pid=%d"
> qio_channel_command_wait(void *ioc, int pid, int ret, int status) "Command abort ioc=%p pid=%d ret=%d status=%d"
>
> # net-listener.c
> -qio_net_listener_watch_enabled(void *listener, void *func, const char *extra) "Net listener=%p watch enabled func=%p by %s"
> -qio_net_listener_watch_disabled(void *listener, const char *extra) "Net listener=%p watch disabled by %s"
> +qio_net_listener_watch_enabled(void *listener, void *func, void *ctx, const char *extra) "Net listener=%p watch enabled func=%p ctx=%p by %s"
> +qio_net_listener_watch_disabled(void *listener, void *ctx, const char *extra) "Net listener=%p watch disabled ctx=%p by %s"
> qio_net_listener_callback(void *listener, void *func) "Net listener=%p callback forwarding to func=%p"
> --
> 2.51.1
>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 8/8] iotests: Add coverage of recent NBD qio deadlock fix
2025-11-03 20:10 ` [PATCH 8/8] iotests: Add coverage of recent NBD qio deadlock fix Eric Blake
@ 2025-11-04 11:38 ` Vladimir Sementsov-Ogievskiy
2025-11-05 22:10 ` Eric Blake
0 siblings, 1 reply; 41+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-11-04 11:38 UTC (permalink / raw)
To: Eric Blake, qemu-devel; +Cc: qemu-block, berrange, kwolf, Hanna Reitz
On 03.11.25 23:10, Eric Blake wrote:
> Test that all images in a qcow2 chain using an NBD backing file can be
> served by the same process. Prior to the recent QIONetListener fixes,
> this test would demonstrate deadlock.
>
> The test borrows heavily from the original formula by "John Doe" in
> the gitlab bug, but uses a Unix socket rather than TCP to avoid port
> contention, and uses a full-blown QEMU rather than qemu-storage-daemon
> since both programs were impacted.
>
> [While preparing this patch by making the new test executable, I
> noticed vvfat.out does not need execute permissions]
>
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/3169
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> tests/qemu-iotests/tests/nbd-in-qcow2-chain | 84 +++++++++++++++++++
> .../qemu-iotests/tests/nbd-in-qcow2-chain.out | 56 +++++++++++++
> tests/qemu-iotests/tests/vvfat.out | 0
> 3 files changed, 140 insertions(+)
> create mode 100755 tests/qemu-iotests/tests/nbd-in-qcow2-chain
> create mode 100644 tests/qemu-iotests/tests/nbd-in-qcow2-chain.out
> mode change 100755 => 100644 tests/qemu-iotests/tests/vvfat.out
>
> diff --git a/tests/qemu-iotests/tests/nbd-in-qcow2-chain b/tests/qemu-iotests/tests/nbd-in-qcow2-chain
> new file mode 100755
> index 00000000000..b89f74d4552
> --- /dev/null
> +++ b/tests/qemu-iotests/tests/nbd-in-qcow2-chain
> @@ -0,0 +1,84 @@
> +#!/usr/bin/env bash
> +# group: rw quick
> +#
> +# Test of opening both server and client NBD in a qcow2 backing chain
> +#
> +# Copyright (C) Red Hat, Inc.
> +#
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +
> +# creator
> +owner=eblake@redhat.com
> +
> +seq=`basename $0`
> +echo "QA output created by $seq"
> +
> +status=1 # failure is the default!
> +
> +_cleanup()
> +{
> + _cleanup_qemu
> + _cleanup_test_img
> + rm -f "$SOCK_DIR/nbd"
> +}
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +# get standard environment, filters and checks
> +cd ..
> +. ./common.rc
> +. ./common.filter
> +. ./common.qemu
> +. ./common.nbd
> +
> +_supported_fmt qcow2 # Hardcoded to qcow2 command line and QMP below
> +_supported_proto file
> +
> +size=100M
> +
> +echo
> +echo "=== Preparing base image ==="
> +
> +TEST_IMG="$TEST_IMG.base" _make_test_img $size
> +
> +echo
> +echo "=== Starting QEMU and exposing base image ==="
> +
> +_launch_qemu -machine q35
> +h1=$QEMU_HANDLE
> +_send_qemu_cmd $QEMU_HANDLE '{"execute": "qmp_capabilities"}' 'return'
> +_send_qemu_cmd $QEMU_HANDLE '{"execute": "blockdev-add",
> + "arguments": {"node-name":"base", "driver":"qcow2",
> + "file":{"driver":"file", "filename":"'"$TEST_IMG.base"'"}}}' 'return'
> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start",
> + "arguments": {"addr":{"type":"unix",
> + "data":{"path":"'"$SOCK_DIR/nbd"'"}}}}' 'return'
> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-add",
> + "arguments": {"device":"base","name":"base"}}' 'return'
> +
> +echo
> +echo "=== Creating wrapper image ==="
> +
> +_make_test_img -F raw -b "nbd+unix:///base?socket=$SOCK_DIR/nbd" $size
> +
> +echo
> +echo "=== Adding wrapper image ==="
> +
> +_send_qemu_cmd $QEMU_HANDLE '{"execute": "blockdev-add",
> + "arguments": {"node-name":"wrap", "driver":"qcow2",
> + "file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' 'return'
Hmm. Why don't you specify "backing": "base" here?
> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-add",
> + "arguments": {"device":"wrap","name":"wrap"}}' 'return'
> +
> +echo
> +echo "=== Checking NBD server ==="
> +
> +$QEMU_NBD --list -k $SOCK_DIR/nbd
> +
> +echo
> +echo "=== Cleaning up ==="
> +
> +_send_qemu_cmd $QEMU_HANDLE '{"execute":"quit"}' ''
> +
> +echo "*** done"
> +rm -f $seq.full
> +status=0
> diff --git a/tests/qemu-iotests/tests/nbd-in-qcow2-chain.out b/tests/qemu-iotests/tests/nbd-in-qcow2-chain.out
> new file mode 100644
> index 00000000000..5f1d31ae2e0
> --- /dev/null
> +++ b/tests/qemu-iotests/tests/nbd-in-qcow2-chain.out
> @@ -0,0 +1,56 @@
> +QA output created by nbd-in-qcow2-chain
> +
> +=== Preparing base image ===
> +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=104857600
> +
> +=== Starting QEMU and exposing base image ===
> +{"execute": "qmp_capabilities"}
> +{"return": {}}
> +{"execute": "blockdev-add",
> + "arguments": {"node-name":"base", "driver":"IMGFMT",
> + "file":{"driver":"file", "filename":"TEST_DIR/t.IMGFMT.base"}}}
> +{"return": {}}
> +{"execute":"nbd-server-start",
> + "arguments": {"addr":{"type":"unix",
> + "data":{"path":"SOCK_DIR/nbd"}}}}
> +{"return": {}}
> +{"execute":"nbd-server-add",
> + "arguments": {"device":"base","name":"base"}}
> +{"return": {}}
> +
> +=== Creating wrapper image ===
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=104857600 backing_file=nbd+unix:///base?socket=SOCK_DIR/nbd backing_fmt=raw
> +
> +=== Adding wrapper image ===
> +{"execute": "blockdev-add",
> + "arguments": {"node-name":"wrap", "driver":"IMGFMT",
> + "file":{"driver":"file", "filename":"TEST_DIR/t.IMGFMT"}}}
> +{"return": {}}
> +{"execute":"nbd-server-add",
> + "arguments": {"device":"wrap","name":"wrap"}}
> +{"return": {}}
> +
> +=== Checking NBD server ===
> +exports available: 2
> + export: 'base'
> + size: 104857600
> + flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
> + min block: 1
> + opt block: 4096
> + max block: 33554432
> + transaction size: 64-bit
> + available meta contexts: 1
> + base:allocation
> + export: 'wrap'
> + size: 104857600
> + flags: 0x158f ( readonly flush fua df multi cache block-status-payload )
> + min block: 1
> + opt block: 4096
> + max block: 33554432
> + transaction size: 64-bit
> + available meta contexts: 1
> + base:allocation
> +
> +=== Cleaning up ===
> +{"execute":"quit"}
> +*** done
> diff --git a/tests/qemu-iotests/tests/vvfat.out b/tests/qemu-iotests/tests/vvfat.out
> old mode 100755
> new mode 100644
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 4/8] qio: Factor out helpers qio_net_listener_[un]watch
2025-11-03 20:10 ` [PATCH 4/8] qio: Factor out helpers qio_net_listener_[un]watch Eric Blake
2025-11-04 11:03 ` Daniel P. Berrangé
@ 2025-11-04 12:37 ` Kevin Wolf
2025-11-04 13:10 ` Daniel P. Berrangé
1 sibling, 1 reply; 41+ messages in thread
From: Kevin Wolf @ 2025-11-04 12:37 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, berrange
Am 03.11.2025 um 21:10 hat Eric Blake geschrieben:
> The code had three similar repetitions of an iteration over one or all
> of nsiocs to set up a GSource, and likewise for teardown. Since an
> upcoming patch wants to tweak whether GSource or AioContext is used,
> its better to consolidate that into one helper function for fewer
> places to edit later.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> io/net-listener.c | 109 +++++++++++++++++++---------------------------
> 1 file changed, 45 insertions(+), 64 deletions(-)
> @@ -145,15 +174,11 @@ void qio_net_listener_set_client_func_full(QIONetListener *listener,
> GDestroyNotify notify,
> GMainContext *context)
> {
> - size_t i;
> -
> if (listener->io_func == func && listener->io_data == data) {
> return;
> }
>
> - if (listener->io_func) {
> - trace_qio_net_listener_watch_disabled(listener, "set_client_func");
> - }
> + qio_net_listener_unwatch(listener, "set_client_func");
> if (listener->io_notify) {
> listener->io_notify(listener->io_data);
> }
This changes the order between the io_notify() call and the unwatch. Is
this intentional? If so, maybe mention it in the commit message and why
it's safe.
Kevin
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 4/8] qio: Factor out helpers qio_net_listener_[un]watch
2025-11-04 12:37 ` Kevin Wolf
@ 2025-11-04 13:10 ` Daniel P. Berrangé
2025-11-05 19:32 ` Eric Blake
0 siblings, 1 reply; 41+ messages in thread
From: Daniel P. Berrangé @ 2025-11-04 13:10 UTC (permalink / raw)
To: Kevin Wolf; +Cc: Eric Blake, qemu-devel, qemu-block
On Tue, Nov 04, 2025 at 01:37:34PM +0100, Kevin Wolf wrote:
> Am 03.11.2025 um 21:10 hat Eric Blake geschrieben:
> > The code had three similar repetitions of an iteration over one or all
> > of nsiocs to set up a GSource, and likewise for teardown. Since an
> > upcoming patch wants to tweak whether GSource or AioContext is used,
> > its better to consolidate that into one helper function for fewer
> > places to edit later.
> >
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> > ---
> > io/net-listener.c | 109 +++++++++++++++++++---------------------------
> > 1 file changed, 45 insertions(+), 64 deletions(-)
>
> > @@ -145,15 +174,11 @@ void qio_net_listener_set_client_func_full(QIONetListener *listener,
> > GDestroyNotify notify,
> > GMainContext *context)
> > {
> > - size_t i;
> > -
> > if (listener->io_func == func && listener->io_data == data) {
> > return;
> > }
> >
> > - if (listener->io_func) {
> > - trace_qio_net_listener_watch_disabled(listener, "set_client_func");
> > - }
> > + qio_net_listener_unwatch(listener, "set_client_func");
> > if (listener->io_notify) {
> > listener->io_notify(listener->io_data);
> > }
>
> This changes the order between the io_notify() call and the unwatch. Is
> this intentional? If so, maybe mention it in the commit message and why
> it's safe.
At least conceptually I think this ordering is better, and I don't think
there should be any functional consequences from the change.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 4/8] qio: Factor out helpers qio_net_listener_[un]watch
2025-11-04 11:03 ` Daniel P. Berrangé
@ 2025-11-04 13:15 ` Kevin Wolf
2025-11-05 19:22 ` Eric Blake
0 siblings, 1 reply; 41+ messages in thread
From: Kevin Wolf @ 2025-11-04 13:15 UTC (permalink / raw)
To: Daniel P. Berrangé; +Cc: Eric Blake, qemu-devel, qemu-block
Am 04.11.2025 um 12:03 hat Daniel P. Berrangé geschrieben:
> On Mon, Nov 03, 2025 at 02:10:55PM -0600, Eric Blake wrote:
> > The code had three similar repetitions of an iteration over one or all
> > of nsiocs to set up a GSource, and likewise for teardown. Since an
> > upcoming patch wants to tweak whether GSource or AioContext is used,
> > its better to consolidate that into one helper function for fewer
> > places to edit later.
> >
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> > ---
> > io/net-listener.c | 109 +++++++++++++++++++---------------------------
> > 1 file changed, 45 insertions(+), 64 deletions(-)
> >
> > diff --git a/io/net-listener.c b/io/net-listener.c
> > index 15df673fb6e..e1378b9a612 100644
> > --- a/io/net-listener.c
> > +++ b/io/net-listener.c
> > @@ -106,6 +106,45 @@ int qio_net_listener_open_sync(QIONetListener *listener,
> > }
> > }
>
> > void qio_net_listener_add(QIONetListener *listener,
> > QIOChannelSocket *sioc)
> > @@ -125,17 +164,7 @@ void qio_net_listener_add(QIONetListener *listener,
> > object_ref(OBJECT(sioc));
> > listener->connected = true;
> >
> > - if (listener->io_func != NULL) {
> > - trace_qio_net_listener_watch_enabled(listener, listener->io_func,
> > - "add");
> > - object_ref(OBJECT(listener));
> > - listener->io_source[listener->nsioc] = qio_channel_add_watch_source(
> > - QIO_CHANNEL(listener->sioc[listener->nsioc]), G_IO_IN,
> > - qio_net_listener_channel_func,
> > - listener, (GDestroyNotify)object_unref, listener->context);
> > - }
> > -
> > - listener->nsioc++;
> > + qio_net_listener_watch(listener, listener->nsioc++, "add");
>
> Nit-picking, I'd have a slight preference to keep the 'nsioc' increment
> on the following line from the qio_net_listener_watch call, as I don't
> like side effects in passing the function arguments.
It actually wouldn't work any more because qio_net_listener_watch()
iterates up to listener->nsioc. It needs the increased value in
listener->nsioc, and the previous one for i, so that we get exactly one
loop iteration.
Kevin
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 7/8] qio: Use AioContext for default-context QIONetListener
2025-11-03 20:10 ` [PATCH 7/8] qio: Use AioContext for default-context QIONetListener Eric Blake
2025-11-04 11:37 ` Daniel P. Berrangé
@ 2025-11-04 15:14 ` Kevin Wolf
1 sibling, 0 replies; 41+ messages in thread
From: Kevin Wolf @ 2025-11-04 15:14 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, berrange
Am 03.11.2025 um 21:10 hat Eric Blake geschrieben:
> The user "John Doe" reported a deadlock when attempting to use
> qemu-storage-daemon to serve both a base file over NBD, and a qcow2
> file with that NBD export as its backing file, from the same process,
> even though it worked just fine when there were two q-s-d processes.
> The bulk of the NBD server code properly uses coroutines to make
> progress in an event-driven manner, but the code for spawning a new
> coroutine at the point when listen(2) detects a new client was
> hard-coded to use the global GMainContext; in other words, the
> callback that triggers nbd_client_new to let the server start the
> negotiation sequence with the client requires the main loop to be
> making progress. However, the code for bdrv_open of a qcow2 image
> with an NBD backing file uses an AIO_WAIT_WHILE nested event loop to
> ensure that the entire qcow2 backing chain is either fully loaded or
> rejected, without any side effects from the main loop causing unwanted
> changes to the disk being loaded (in short, an AioContext represents
> the set of actions that are known to be safe while handling block
> layer I/O, while excluding any other pending actions in the global
> main loop with potentially larger risk of unwanted side effects).
>
> This creates a classic case of deadlock: the server can't progress to
> the point of accept(2)ing the client to write to the NBD socket
> because the main loop is being starved until the AIO_WAIT_WHILE
> completes the bdrv_open, but the AIO_WAIT_WHILE can't progress because
> it is blocked on the client coroutine stuck in a read() of the
> expected magic number from the server side of the socket.
>
> Fortunately, the way that AioContext is set up, any callback that is
> registered to the global AioContext will also be serviced by the main
> loop. So the fix for the deadlock is to alter QIONetListener so that
> if it is not being used in an explicit alternative GMainContext, then
> it should perform its polling via the global AioContext (which
> indirectly still progresses in the default GMainContext) rather than
> directly in the default GMainContext. This has no change in behavior
> to any prior use that did not starve the main loop, but has the
> additional benefit that in the bdrv_open case of a nested AioContext
> loop, the server's listen/accept handler is no longer starved because
> it is now part of the same AioContext loop. From there, since NBD
> already uses coroutines for both server and client code, the nested
> AioContext loop finishes quickly and opening the qcow2 backing chain
> no longer deadlocks.
>
> The next patch will add a unit test (kept separate to make it easier
> to rearrange the series to demonstrate the deadlock without this
> patch).
>
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/3169
> Signed-off-by: Eric Blake <eblake@redhat.com>
This assumes that you'll only ever want to listen on a socket in the
main thread. I understand this is what the NBD server does today even if
you specify an iothread, so despite the limitation the patch is useful.
We just might need to change the interfaces again later.
Kevin
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/8] qio: Add trace points to net_listener
2025-11-04 11:08 ` Kevin Wolf
@ 2025-11-05 17:18 ` Eric Blake
2025-11-06 12:20 ` Kevin Wolf
0 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2025-11-05 17:18 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel, qemu-block, berrange
On Tue, Nov 04, 2025 at 12:08:42PM +0100, Kevin Wolf wrote:
> Am 03.11.2025 um 21:10 hat Eric Blake geschrieben:
> > Upcoming patches will adjust how net_listener watches for new client
> > connections; adding trace points now makes it easier to debug that the
> > changes work as intended. For example, adding
> > --trace='qio_net_listener*' to the qemu-storage-daemon command line
> > before --nbd-server will track when the server first starts listening
> > for clients.
> >
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> > ---
> > io/net-listener.c | 17 +++++++++++++++++
> > io/trace-events | 5 +++++
> > 2 files changed, 22 insertions(+)
> >
> > diff --git a/io/net-listener.c b/io/net-listener.c
> > index 47405965a66..0adbc409cf2 100644
> > --- a/io/net-listener.c
> > +++ b/io/net-listener.c
> > @@ -23,6 +23,7 @@
> > #include "io/dns-resolver.h"
> > #include "qapi/error.h"
> > #include "qemu/module.h"
> > +#include "trace.h"
> >
> > QIONetListener *qio_net_listener_new(void)
> > {
> > @@ -50,6 +51,7 @@ static gboolean qio_net_listener_channel_func(QIOChannel *ioc,
> > return TRUE;
> > }
> >
> > + trace_qio_net_listener_callback(listener, listener->io_func);
> > if (listener->io_func) {
> > listener->io_func(listener, sioc, listener->io_data);
> > }
>
> Not necessarily a problem, but I wonder why you decided to have the
> trace point unconditional here...
>
> > @@ -143,6 +147,9 @@ void qio_net_listener_set_client_func_full(QIONetListener *listener,
> > {
> > size_t i;
> >
> > + if (listener->io_func) {
> > + trace_qio_net_listener_watch_disabled(listener, "set_client_func");
> > + }
> > if (listener->io_notify) {
> > listener->io_notify(listener->io_data);
> > }
>
> ...while everywhere else you only call it with a non-NULL
> listener->io_func.
In the former, the trace is printing out the address of which io_func
(including NULL) is currently registered when the callback is reached.
In the case where a single NetListener registers more than one socket
address, but the user uninstalls their callback after the first client
to connect, it is still possible that other pending connections have
still raced, at which point the qio_net_listener_channel_func can
still fire on those later connections even though there is no longer
an io_func from the user. In all the latter cases, I was merely
tracing when state transitioned between the user installing a handler
or removing one.
Unless you think it would be to noisy, I'm game for changing this in
v2 to have all of the traces be unconditional. It is potentially
noisier, but also would aid in spotting how many times a client
requests no change to the current io_func or lack thereof.
Also worth thinking about: in this patch, I added new traces under the
names enabled/disabled, with the trace points being reacable from
multiple locations. Then in 4/8, when refactoring to consolidate
duplicate code, the trace points are reduced to a single usage in
functions named watch/unwatch. It may be worth rethinking naming to
have the tracepoint and function names share the same terminology.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/8] qio: Minor optimization when callback function is unchanged
2025-11-04 11:13 ` Kevin Wolf
@ 2025-11-05 17:23 ` Eric Blake
0 siblings, 0 replies; 41+ messages in thread
From: Eric Blake @ 2025-11-05 17:23 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel, qemu-block, berrange
On Tue, Nov 04, 2025 at 12:13:52PM +0100, Kevin Wolf wrote:
> Am 03.11.2025 um 21:10 hat Eric Blake geschrieben:
> > In qemu-nbd and other NBD server setups where parallel clients are
> > supported, it is common that the caller will re-register the same
> > callback function as long as it has not reached its limit on
> > simultaneous clients. In that case, there is no need to tear down and
> > reinstall GSource watches in the GMainContext.
> >
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> > ---
> > io/net-listener.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/io/net-listener.c b/io/net-listener.c
> > index 0adbc409cf2..e89286ea63c 100644
> > --- a/io/net-listener.c
> > +++ b/io/net-listener.c
> > @@ -147,6 +147,10 @@ void qio_net_listener_set_client_func_full(QIONetListener *listener,
> > {
> > size_t i;
> >
> > + if (listener->io_func == func && listener->io_data == data) {
> > + return;
> > + }
>
> Wouldn't a change of listener->io_notify also be significant?
On paper, yes. In practice: io_notify is usually either a free-like
function (including obect_unref in this bucket), or NULL. In the
former, you are passing in a malloc'd or reference-counted object,
where the cleanup action at notify time will never vary (the nature of
the object being passed in determines how to clean up when it is no
longer in use). In the latter, passing NULL means no cleanup is
necessary. I struggle to see a scenario where you would register a
callback opaque pointer with one cleanup function, and then change
your mind to re-register the same object but a different io_notify
cleanup function.
But I'm not opposed to adding the additional check for an unchanged
io_notify in v2.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 3/8] qio: Remember context of qio_net_listener_set_client_func_full
2025-11-04 11:25 ` Kevin Wolf
@ 2025-11-05 19:18 ` Eric Blake
0 siblings, 0 replies; 41+ messages in thread
From: Eric Blake @ 2025-11-05 19:18 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel, qemu-block, berrange
On Tue, Nov 04, 2025 at 12:25:38PM +0100, Kevin Wolf wrote:
> Am 03.11.2025 um 21:10 hat Eric Blake geschrieben:
> >
> > That said, it is more robust to remember when a callback function is
> > tied to a non-default context, and have both the sync wait and any
> > late address additions honor that same context. That way, the code
> > will be robust even if a later user performs a sync wait for a
> > specific client in the middle of servicing a longer-lived
> > QIONetListener that has an async callback for all other clients.
> >
> > Signed-off-by: Eric Blake <eblake@redhat.com>
>
> > @@ -160,6 +160,7 @@ void qio_net_listener_set_client_func_full(QIONetListener *listener,
> > listener->io_func = func;
> > listener->io_data = data;
> > listener->io_notify = notify;
> > + listener->context = context;
>
> Now that you show me this, I think patch 2 actually also needs to check
> that context is unchanged. We don't remember the old value before this
> patch, so maybe the order of patch 2 and 3 should be swapped.
Makes sense. v2 will swap the patch order, and ensure the context is
unchanged when optimizing a re-arm of the already-existing callback.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 4/8] qio: Factor out helpers qio_net_listener_[un]watch
2025-11-04 13:15 ` Kevin Wolf
@ 2025-11-05 19:22 ` Eric Blake
0 siblings, 0 replies; 41+ messages in thread
From: Eric Blake @ 2025-11-05 19:22 UTC (permalink / raw)
To: Kevin Wolf; +Cc: Daniel P. Berrangé, qemu-devel, qemu-block
On Tue, Nov 04, 2025 at 02:15:16PM +0100, Kevin Wolf wrote:
> Am 04.11.2025 um 12:03 hat Daniel P. Berrangé geschrieben:
> > On Mon, Nov 03, 2025 at 02:10:55PM -0600, Eric Blake wrote:
> > > The code had three similar repetitions of an iteration over one or all
> > > of nsiocs to set up a GSource, and likewise for teardown. Since an
> > > upcoming patch wants to tweak whether GSource or AioContext is used,
> > > its better to consolidate that into one helper function for fewer
> > > places to edit later.
> > >
> > > - listener->nsioc++;
> > > + qio_net_listener_watch(listener, listener->nsioc++, "add");
> >
> > Nit-picking, I'd have a slight preference to keep the 'nsioc' increment
> > on the following line from the qio_net_listener_watch call, as I don't
> > like side effects in passing the function arguments.
>
> It actually wouldn't work any more because qio_net_listener_watch()
> iterates up to listener->nsioc. It needs the increased value in
> listener->nsioc, and the previous one for i, so that we get exactly one
> loop iteration.
I'm happy to change it to this in v2, to avoid the side-effect in the
function call:
listener->nsioc++;
qio_net_listener_watch(listener, listener->nsioc - 1, "add");
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 4/8] qio: Factor out helpers qio_net_listener_[un]watch
2025-11-04 13:10 ` Daniel P. Berrangé
@ 2025-11-05 19:32 ` Eric Blake
0 siblings, 0 replies; 41+ messages in thread
From: Eric Blake @ 2025-11-05 19:32 UTC (permalink / raw)
To: Daniel P. Berrangé; +Cc: Kevin Wolf, qemu-devel, qemu-block
On Tue, Nov 04, 2025 at 01:10:12PM +0000, Daniel P. Berrangé wrote:
> On Tue, Nov 04, 2025 at 01:37:34PM +0100, Kevin Wolf wrote:
> > Am 03.11.2025 um 21:10 hat Eric Blake geschrieben:
> > > The code had three similar repetitions of an iteration over one or all
> > > of nsiocs to set up a GSource, and likewise for teardown. Since an
> > > upcoming patch wants to tweak whether GSource or AioContext is used,
> > > its better to consolidate that into one helper function for fewer
> > > places to edit later.
> > >
> > > Signed-off-by: Eric Blake <eblake@redhat.com>
> > > ---
> > > io/net-listener.c | 109 +++++++++++++++++++---------------------------
> > > 1 file changed, 45 insertions(+), 64 deletions(-)
> >
> > > @@ -145,15 +174,11 @@ void qio_net_listener_set_client_func_full(QIONetListener *listener,
> > > GDestroyNotify notify,
> > > GMainContext *context)
> > > {
> > > - size_t i;
> > > -
> > > if (listener->io_func == func && listener->io_data == data) {
> > > return;
> > > }
> > >
> > > - if (listener->io_func) {
> > > - trace_qio_net_listener_watch_disabled(listener, "set_client_func");
> > > - }
> > > + qio_net_listener_unwatch(listener, "set_client_func");
> > > if (listener->io_notify) {
> > > listener->io_notify(listener->io_data);
> > > }
> >
> > This changes the order between the io_notify() call and the unwatch. Is
> > this intentional? If so, maybe mention it in the commit message and why
> > it's safe.
>
> At least conceptually I think this ordering is better, and I don't think
> there should be any functional consequences from the change.
>
It may have been inadvertent at the time I wrote the patch, but I
agree it makes sense to call it out as intentional in v2.
Before this patch series, io_notify is typically object_unref on an
object that has other references besides the one associated with the
NetListener. In which case, calling io_notify() reduces the refcount,
for example from 2 to 1, but has no other impact in relation to
whether the GSource is still armed. More drastic would be if the
io_notify() reduced the refcount from 1 to 0 and thereby finalized the
data. If the GSource is still armed and has any chance at all of a
narrow window of time where it can fire before being unwatched, but
the io_data has been freed because the io_notify() dropped the
refcount to 0 rather than merely reducing it, then the callback can
risk a use-after-free of the io_data.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 5/8] qio: Let listening sockets remember their owning QIONetListener
2025-11-03 20:10 ` [PATCH 5/8] qio: Let listening sockets remember their owning QIONetListener Eric Blake
@ 2025-11-05 20:06 ` Eric Blake
2025-11-06 18:35 ` Eric Blake
0 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2025-11-05 20:06 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, berrange, kwolf
On Mon, Nov 03, 2025 at 02:10:56PM -0600, Eric Blake wrote:
> Make it easier to get from the sioc listening to a single address on
> behalf of a NetListener back to its owning object, which will be
> beneficial in an upcoming patch that teaches NetListener how to
> interact with AioContext.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> include/io/channel-socket.h | 1 +
> io/channel-socket.c | 1 +
> io/net-listener.c | 1 +
> 3 files changed, 3 insertions(+)
>
> diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
> index a88cf8b3a9f..eee686f3b4d 100644
> --- a/include/io/channel-socket.h
> +++ b/include/io/channel-socket.h
> @@ -49,6 +49,7 @@ struct QIOChannelSocket {
> socklen_t remoteAddrLen;
> ssize_t zero_copy_queued;
> ssize_t zero_copy_sent;
> + struct QIONetListener *listener;
Commenting on my own patch:
After re-reading docs/devel/style.rst, I can see that this particular
forward declaration of QIONetListener is not consistent with the
guidelines. I have to have a forward reference, since the style guide
also forbids circular inclusion (net-listener.h already includes
channel-socket.h, so channel-socket.h cannot include net-listener.h);
but it may be better for me to move the forward reference into
include/qemu/typedefs.h rather than inlining it how I did here.
(It is a red herring that struct QIOChannelSocket{} already contains
two other uses of 'struct' in its declaration body - both of those are
for 'struct sockaddr_storage' which is the POSIX type always spelled
with struct, with no typical QEMU CamelCase wrapper)
> +++ b/io/channel-socket.c
> @@ -65,6 +65,7 @@ qio_channel_socket_new(void)
> sioc->fd = -1;
> sioc->zero_copy_queued = 0;
> sioc->zero_copy_sent = 0;
> + sioc->listener = NULL;
Also, I added an explicit zero initialization to the new member to
match existing explicit initializers. But checking qom/object.c, I
see that object_new() first uses g_malloc() instead of g_new0(), but
then calls object_initialize_with_type() does a forced memset(,0,) -
so all object constructors that do explicit field initialization to
zero are doing redundant work.
Dropping the sioc->listener = NULL assignment from this patch thus
makes sense from the less work perspective, but now that I've pointed
it out, dropping the sioc->zero_copy_* = 0 lines makes sense too. But
cleanups like that should probably be a separate patch, and maybe
touch as many clients of object_new() as possible (perhaps via
Coccinelle?), rather than shoehorning in a deletion of just those two
lines into a v2 of this particular patch.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 6/8] qio: Hoist ref of listener outside loop
2025-11-04 11:13 ` Daniel P. Berrangé
@ 2025-11-05 21:57 ` Eric Blake
2025-11-11 14:43 ` Daniel P. Berrangé
0 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2025-11-05 21:57 UTC (permalink / raw)
To: Daniel P. Berrangé; +Cc: qemu-devel, qemu-block, kwolf
On Tue, Nov 04, 2025 at 11:13:48AM +0000, Daniel P. Berrangé wrote:
> On Mon, Nov 03, 2025 at 02:10:57PM -0600, Eric Blake wrote:
> > The point of QIONetListener is to allow a server to listen to more
> > than one socket address at a time, and respond to clients in a
> > first-come-first-serve order across any of those addresses. While
> > some servers (like NBD) really do want to serve multiple simultaneous
> > clients, many other servers only care about the first client to
> > connect, and will immediately deregister the callback, possibly by
> > dropping their reference to the QIONetListener. The existing code
> > ensures that all other pending callbacks remain safe once the first
> > callback drops the listener, by adding an extra reference to the
> > listener for each GSource created, where those references pair to the
> > eventual teardown of each GSource after a given callbacks has been
> > serviced or aborted. But it is equally acceptable to hoist the
> > reference to the listener outside the loop - as long as there is a
> > callback function registered, it is sufficient to have a single
> > reference live for the entire array of sioc, rather than one reference
> > per sioc in the array.
> >
> > Hoisting the reference like this will make it easier for an upcoming
> > patch to still ensure the listener cannot be prematurely garbage
> > collected during the user's callback, even when the callback no longer
> > uses a per-sioc GSource.
>
> It isn't quite this simple. Glib reference counts the callback
> func / data, holding a reference when dispatching the callback.
>
> IOW, even if the GSource is unrefed, the callback 'notify'
> function won't be called if the main loop is in the process
> of dispatching.
I'm not sure I follow your argument. Glib holds a reference on the
GSource object, not on the opaque data that is handed to the GSource.
It is possible to use g_source_set_callback_indirect() where GSource
can learn how to use the same reference counting on data as external
code, by the use of function pointers for ref and unref, but QIO uses
merely g_source_set_callback().
https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/gmain.c#L1844
shows that glib then wraps that opaque pointer into an internal
GSourceCallback object which itself is reference counted, so that the
notify function is not called until the GSource is finalized, but that
is reference counting on the container, not on the opaque object
itself (which in this patch is the QIONetListener).
>
> With this change, the reference on 'listener' can now be
> released even if the callback is currently dispatching.
So if I'm understanding your concern, you're worried that the unwatch
code can finish looping through the g_source_destroy and then reach
the point where it unrefs listener, but that a late-breaking client
connection can trigger a callback that can still be executing in
another thread/coroutine after the listener is unref'ed but before the
GSource has been finalized? If so, would squashing this in fix the
problem you are seeing?
diff --git i/io/net-listener.c w/io/net-listener.c
index 9f4e3c0be0c..1fcbbeb7a76 100644
--- i/io/net-listener.c
+++ w/io/net-listener.c
@@ -67,8 +67,10 @@ static void qio_net_listener_aio_func(void *opaque)
{
QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(opaque);
+ object_ref(OBJECT(sioc->listener));
qio_net_listener_channel_func(QIO_CHANNEL(sioc), G_IO_IN,
sioc->listener);
+ object_unref(OBJECT(sioc->listener));
}
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH 7/8] qio: Use AioContext for default-context QIONetListener
2025-11-04 11:37 ` Daniel P. Berrangé
@ 2025-11-05 22:06 ` Eric Blake
0 siblings, 0 replies; 41+ messages in thread
From: Eric Blake @ 2025-11-05 22:06 UTC (permalink / raw)
To: Daniel P. Berrangé; +Cc: qemu-devel, qemu-block, kwolf
On Tue, Nov 04, 2025 at 11:37:21AM +0000, Daniel P. Berrangé wrote:
> On Mon, Nov 03, 2025 at 02:10:58PM -0600, Eric Blake wrote:
> > for ( ; i < listener->nsioc; i++) {
> > - listener->io_source[i] = qio_channel_add_watch_source(
> > - QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
> > - qio_net_listener_channel_func,
> > - listener, NULL, listener->context);
> > + if (listener->context) {
> > + /*
> > + * The user passed a GMainContext with the async callback;
> > + * they plan on running their own g_main_loop.
> > + */
> > + listener->io_source[i] = qio_channel_add_watch_source(
> > + QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
> > + qio_net_listener_channel_func,
> > + listener, NULL, listener->context);
> > + } else {
> > + /*
> > + * The user is fine with the default context. But by doing
> > + * it in the main thread's AioContext rather than
> > + * specifically in a GMainContext, we can remain
> > + * responsive even if another AioContext depends on
> > + * connecting to this server.
> > + */
> > + aio_set_fd_handler(qemu_get_aio_context(), listener->sioc[i]->fd,
> > + qio_net_listener_aio_func, NULL, NULL, NULL,
> > + listener->sioc[i]);
> > + }
>
> I'm not really happy with the listener directly accessing the 'fd'
> fields in the QIOSocketChannel, as compared to the GSource approach
> where the underlying transport is not exposed to the caller.
>
> If we want to use an AioContext instead of a GSource, then I think
> we need to add a method to either QIOChannelSocket, or QIOChannel
> base, as an alternative to the GSource watches, and then have the
> listener conditionally use the AioContext APIs.
>
>
> Also in QIOChannel base, we have a io_set_aio_fd_handler() method
> that we use elsewhere. Can we perhaps leverage that in some way.
I will explore that idea for v2.
>
> eg, instead of taking the AioContext code path based off
> "if (listener->context)", take the code path based on whether
> the QIOChannel has had a call qio_channel_set_aio_fd_handler
> to register AIO handlers ? Maybe that method doesn't quite fit,
> but conceptually I would be more comfortable with an approach
> that explicitly associates an AioContext with either the
> channel or the listener object, rather than this heuristic
> of "if (listener->context)".
I wonder if qio_channel_set_follow_coroutine_ctx() might be the
trigger point you are thinking of. NBD code already calls this, but
only AFTER the client has connected. Would having
ioc->follow_coroutine_ctx is a better witness that the caller
specifically wants the channel behind NetListener to run in an
AioContext, rather than blindly declaring that all NetListeners get
AioContext unless they use qio_net_listener_set_client_func_full(),
and where I change the NBD code to call follow_coroutine_ctx() sooner?
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 8/8] iotests: Add coverage of recent NBD qio deadlock fix
2025-11-04 11:38 ` Vladimir Sementsov-Ogievskiy
@ 2025-11-05 22:10 ` Eric Blake
2025-11-06 8:20 ` Vladimir Sementsov-Ogievskiy
2025-11-06 12:26 ` Kevin Wolf
0 siblings, 2 replies; 41+ messages in thread
From: Eric Blake @ 2025-11-05 22:10 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: qemu-devel, qemu-block, berrange, kwolf, Hanna Reitz
On Tue, Nov 04, 2025 at 02:38:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 03.11.25 23:10, Eric Blake wrote:
> > Test that all images in a qcow2 chain using an NBD backing file can be
> > served by the same process. Prior to the recent QIONetListener fixes,
> > this test would demonstrate deadlock.
> >
> > The test borrows heavily from the original formula by "John Doe" in
> > the gitlab bug, but uses a Unix socket rather than TCP to avoid port
> > contention, and uses a full-blown QEMU rather than qemu-storage-daemon
> > since both programs were impacted.
> >
> > [While preparing this patch by making the new test executable, I
> > noticed vvfat.out does not need execute permissions]
> >
> > Fixes: https://gitlab.com/qemu-project/qemu/-/issues/3169
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> > ---
> > tests/qemu-iotests/tests/nbd-in-qcow2-chain | 84 +++++++++++++++++++
> > .../qemu-iotests/tests/nbd-in-qcow2-chain.out | 56 +++++++++++++
> > tests/qemu-iotests/tests/vvfat.out | 0
> > 3 files changed, 140 insertions(+)
> > create mode 100755 tests/qemu-iotests/tests/nbd-in-qcow2-chain
> > create mode 100644 tests/qemu-iotests/tests/nbd-in-qcow2-chain.out
> > mode change 100755 => 100644 tests/qemu-iotests/tests/vvfat.out
Should I split out that file mode change to a separate cleanup patch?
> >
> > diff --git a/tests/qemu-iotests/tests/nbd-in-qcow2-chain b/tests/qemu-iotests/tests/nbd-in-qcow2-chain
> > new file mode 100755
> > index 00000000000..b89f74d4552
> > --- /dev/null
> > +++ b/tests/qemu-iotests/tests/nbd-in-qcow2-chain
> > +echo
> > +echo "=== Creating wrapper image ==="
> > +
> > +_make_test_img -F raw -b "nbd+unix:///base?socket=$SOCK_DIR/nbd" $size
> > +
> > +echo
> > +echo "=== Adding wrapper image ==="
> > +
> > +_send_qemu_cmd $QEMU_HANDLE '{"execute": "blockdev-add",
> > + "arguments": {"node-name":"wrap", "driver":"qcow2",
> > + "file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' 'return'
>
> Hmm. Why don't you specify "backing": "base" here?
Because the original bug report didn't either. However, I can see the
wisdom in enhancing the test to cover multiple scenarios: both a
backing chain learned only by what is in the qcow2 file, and an
explicit backing chain where the NBD client is spelled out in the QMP
code. I'll see if I can enhance that for v2.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 8/8] iotests: Add coverage of recent NBD qio deadlock fix
2025-11-05 22:10 ` Eric Blake
@ 2025-11-06 8:20 ` Vladimir Sementsov-Ogievskiy
2025-11-06 12:26 ` Kevin Wolf
1 sibling, 0 replies; 41+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-11-06 8:20 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, berrange, kwolf, Hanna Reitz
On 06.11.25 01:10, Eric Blake wrote:
> On Tue, Nov 04, 2025 at 02:38:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 03.11.25 23:10, Eric Blake wrote:
>>> Test that all images in a qcow2 chain using an NBD backing file can be
>>> served by the same process. Prior to the recent QIONetListener fixes,
>>> this test would demonstrate deadlock.
>>>
>>> The test borrows heavily from the original formula by "John Doe" in
>>> the gitlab bug, but uses a Unix socket rather than TCP to avoid port
>>> contention, and uses a full-blown QEMU rather than qemu-storage-daemon
>>> since both programs were impacted.
>>>
>>> [While preparing this patch by making the new test executable, I
>>> noticed vvfat.out does not need execute permissions]
>>>
>>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/3169
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>> ---
>>> tests/qemu-iotests/tests/nbd-in-qcow2-chain | 84 +++++++++++++++++++
>>> .../qemu-iotests/tests/nbd-in-qcow2-chain.out | 56 +++++++++++++
>>> tests/qemu-iotests/tests/vvfat.out | 0
>>> 3 files changed, 140 insertions(+)
>>> create mode 100755 tests/qemu-iotests/tests/nbd-in-qcow2-chain
>>> create mode 100644 tests/qemu-iotests/tests/nbd-in-qcow2-chain.out
>>> mode change 100755 => 100644 tests/qemu-iotests/tests/vvfat.out
>
> Should I split out that file mode change to a separate cleanup patch?
Probably. Personally, I don't care.
>
>>>
>>> diff --git a/tests/qemu-iotests/tests/nbd-in-qcow2-chain b/tests/qemu-iotests/tests/nbd-in-qcow2-chain
>>> new file mode 100755
>>> index 00000000000..b89f74d4552
>>> --- /dev/null
>>> +++ b/tests/qemu-iotests/tests/nbd-in-qcow2-chain
>
>>> +echo
>>> +echo "=== Creating wrapper image ==="
>>> +
>>> +_make_test_img -F raw -b "nbd+unix:///base?socket=$SOCK_DIR/nbd" $size
>>> +
>>> +echo
>>> +echo "=== Adding wrapper image ==="
>>> +
>>> +_send_qemu_cmd $QEMU_HANDLE '{"execute": "blockdev-add",
>>> + "arguments": {"node-name":"wrap", "driver":"qcow2",
>>> + "file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' 'return'
>>
>> Hmm. Why don't you specify "backing": "base" here?
>
> Because the original bug report didn't either.
Ah, I missed the idea: we setup our qcow2 image, so that backing is a
nbd protocol uri. So "backing": "base" would be wrong
> However, I can see the
> wisdom in enhancing the test to cover multiple scenarios: both a
> backing chain learned only by what is in the qcow2 file, and an
> explicit backing chain where the NBD client is spelled out in the QMP
> code. I'll see if I can enhance that for v2.
>
>
Not sure, that it will add real value to the test.. It should be the same
block-graph finally. If you decide keep it as is (or with vvfat fix dropped):
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/8] qio: Add trace points to net_listener
2025-11-05 17:18 ` Eric Blake
@ 2025-11-06 12:20 ` Kevin Wolf
0 siblings, 0 replies; 41+ messages in thread
From: Kevin Wolf @ 2025-11-06 12:20 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, berrange
Am 05.11.2025 um 18:18 hat Eric Blake geschrieben:
> On Tue, Nov 04, 2025 at 12:08:42PM +0100, Kevin Wolf wrote:
> > Am 03.11.2025 um 21:10 hat Eric Blake geschrieben:
> > > Upcoming patches will adjust how net_listener watches for new client
> > > connections; adding trace points now makes it easier to debug that the
> > > changes work as intended. For example, adding
> > > --trace='qio_net_listener*' to the qemu-storage-daemon command line
> > > before --nbd-server will track when the server first starts listening
> > > for clients.
> > >
> > > Signed-off-by: Eric Blake <eblake@redhat.com>
> > > ---
> > > io/net-listener.c | 17 +++++++++++++++++
> > > io/trace-events | 5 +++++
> > > 2 files changed, 22 insertions(+)
> > >
> > > diff --git a/io/net-listener.c b/io/net-listener.c
> > > index 47405965a66..0adbc409cf2 100644
> > > --- a/io/net-listener.c
> > > +++ b/io/net-listener.c
> > > @@ -23,6 +23,7 @@
> > > #include "io/dns-resolver.h"
> > > #include "qapi/error.h"
> > > #include "qemu/module.h"
> > > +#include "trace.h"
> > >
> > > QIONetListener *qio_net_listener_new(void)
> > > {
> > > @@ -50,6 +51,7 @@ static gboolean qio_net_listener_channel_func(QIOChannel *ioc,
> > > return TRUE;
> > > }
> > >
> > > + trace_qio_net_listener_callback(listener, listener->io_func);
> > > if (listener->io_func) {
> > > listener->io_func(listener, sioc, listener->io_data);
> > > }
> >
> > Not necessarily a problem, but I wonder why you decided to have the
> > trace point unconditional here...
> >
> > > @@ -143,6 +147,9 @@ void qio_net_listener_set_client_func_full(QIONetListener *listener,
> > > {
> > > size_t i;
> > >
> > > + if (listener->io_func) {
> > > + trace_qio_net_listener_watch_disabled(listener, "set_client_func");
> > > + }
> > > if (listener->io_notify) {
> > > listener->io_notify(listener->io_data);
> > > }
> >
> > ...while everywhere else you only call it with a non-NULL
> > listener->io_func.
>
> In the former, the trace is printing out the address of which io_func
> (including NULL) is currently registered when the callback is reached.
> In the case where a single NetListener registers more than one socket
> address, but the user uninstalls their callback after the first client
> to connect, it is still possible that other pending connections have
> still raced, at which point the qio_net_listener_channel_func can
> still fire on those later connections even though there is no longer
> an io_func from the user. In all the latter cases, I was merely
> tracing when state transitioned between the user installing a handler
> or removing one.
>
> Unless you think it would be to noisy, I'm game for changing this in
> v2 to have all of the traces be unconditional. It is potentially
> noisier, but also would aid in spotting how many times a client
> requests no change to the current io_func or lack thereof.
I think being noisy in traces is okay. It's easy enough to filter the
output for non-NULL instances if that's what you need. In debugging I
usually want more data, having too much data is rarely a problem.
> Also worth thinking about: in this patch, I added new traces under the
> names enabled/disabled, with the trace points being reacable from
> multiple locations. Then in 4/8, when refactoring to consolidate
> duplicate code, the trace points are reduced to a single usage in
> functions named watch/unwatch. It may be worth rethinking naming to
> have the tracepoint and function names share the same terminology.
Yes, that's a good idea.
Kevin
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 8/8] iotests: Add coverage of recent NBD qio deadlock fix
2025-11-05 22:10 ` Eric Blake
2025-11-06 8:20 ` Vladimir Sementsov-Ogievskiy
@ 2025-11-06 12:26 ` Kevin Wolf
1 sibling, 0 replies; 41+ messages in thread
From: Kevin Wolf @ 2025-11-06 12:26 UTC (permalink / raw)
To: Eric Blake
Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block, berrange,
Hanna Reitz
Am 05.11.2025 um 23:10 hat Eric Blake geschrieben:
> On Tue, Nov 04, 2025 at 02:38:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > On 03.11.25 23:10, Eric Blake wrote:
> > > Test that all images in a qcow2 chain using an NBD backing file can be
> > > served by the same process. Prior to the recent QIONetListener fixes,
> > > this test would demonstrate deadlock.
> > >
> > > The test borrows heavily from the original formula by "John Doe" in
> > > the gitlab bug, but uses a Unix socket rather than TCP to avoid port
> > > contention, and uses a full-blown QEMU rather than qemu-storage-daemon
> > > since both programs were impacted.
> > >
> > > [While preparing this patch by making the new test executable, I
> > > noticed vvfat.out does not need execute permissions]
> > >
> > > Fixes: https://gitlab.com/qemu-project/qemu/-/issues/3169
> > > Signed-off-by: Eric Blake <eblake@redhat.com>
> > > ---
> > > tests/qemu-iotests/tests/nbd-in-qcow2-chain | 84 +++++++++++++++++++
> > > .../qemu-iotests/tests/nbd-in-qcow2-chain.out | 56 +++++++++++++
> > > tests/qemu-iotests/tests/vvfat.out | 0
> > > 3 files changed, 140 insertions(+)
> > > create mode 100755 tests/qemu-iotests/tests/nbd-in-qcow2-chain
> > > create mode 100644 tests/qemu-iotests/tests/nbd-in-qcow2-chain.out
> > > mode change 100755 => 100644 tests/qemu-iotests/tests/vvfat.out
>
> Should I split out that file mode change to a separate cleanup patch?
It's an unrelated change, so while the patch to change only the file
mode may look funny, it's probably better to split it.
Kevin
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 5/8] qio: Let listening sockets remember their owning QIONetListener
2025-11-05 20:06 ` Eric Blake
@ 2025-11-06 18:35 ` Eric Blake
2025-11-07 8:50 ` Daniel P. Berrangé
0 siblings, 1 reply; 41+ messages in thread
From: Eric Blake @ 2025-11-06 18:35 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, berrange, kwolf
On Wed, Nov 05, 2025 at 02:06:06PM -0600, Eric Blake wrote:
> On Mon, Nov 03, 2025 at 02:10:56PM -0600, Eric Blake wrote:
> > Make it easier to get from the sioc listening to a single address on
> > behalf of a NetListener back to its owning object, which will be
> > beneficial in an upcoming patch that teaches NetListener how to
> > interact with AioContext.
> >
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> > ---
> > include/io/channel-socket.h | 1 +
> > io/channel-socket.c | 1 +
> > io/net-listener.c | 1 +
> > 3 files changed, 3 insertions(+)
> >
> > diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
> > index a88cf8b3a9f..eee686f3b4d 100644
> > --- a/include/io/channel-socket.h
> > +++ b/include/io/channel-socket.h
> > @@ -49,6 +49,7 @@ struct QIOChannelSocket {
> > socklen_t remoteAddrLen;
> > ssize_t zero_copy_queued;
> > ssize_t zero_copy_sent;
> > + struct QIONetListener *listener;
>
> Commenting on my own patch:
>
> After re-reading docs/devel/style.rst, I can see that this particular
> forward declaration of QIONetListener is not consistent with the
> guidelines. I have to have a forward reference, since the style guide
> also forbids circular inclusion (net-listener.h already includes
> channel-socket.h, so channel-socket.h cannot include net-listener.h);
> but it may be better for me to move the forward reference into
> include/qemu/typedefs.h rather than inlining it how I did here.
Then again, include/qemu/typedefs.h states "For struct types used in
only a few headers, judicious use of the struct tag instead of the
typedef name is commonly preferable."
So, to keep it simpler, I'll just justify my choice in the commit message.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 5/8] qio: Let listening sockets remember their owning QIONetListener
2025-11-06 18:35 ` Eric Blake
@ 2025-11-07 8:50 ` Daniel P. Berrangé
2025-11-07 13:47 ` Eric Blake
0 siblings, 1 reply; 41+ messages in thread
From: Daniel P. Berrangé @ 2025-11-07 8:50 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, kwolf
On Thu, Nov 06, 2025 at 12:35:05PM -0600, Eric Blake wrote:
> On Wed, Nov 05, 2025 at 02:06:06PM -0600, Eric Blake wrote:
> > On Mon, Nov 03, 2025 at 02:10:56PM -0600, Eric Blake wrote:
> > > Make it easier to get from the sioc listening to a single address on
> > > behalf of a NetListener back to its owning object, which will be
> > > beneficial in an upcoming patch that teaches NetListener how to
> > > interact with AioContext.
> > >
> > > Signed-off-by: Eric Blake <eblake@redhat.com>
> > > ---
> > > include/io/channel-socket.h | 1 +
> > > io/channel-socket.c | 1 +
> > > io/net-listener.c | 1 +
> > > 3 files changed, 3 insertions(+)
> > >
> > > diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
> > > index a88cf8b3a9f..eee686f3b4d 100644
> > > --- a/include/io/channel-socket.h
> > > +++ b/include/io/channel-socket.h
> > > @@ -49,6 +49,7 @@ struct QIOChannelSocket {
> > > socklen_t remoteAddrLen;
> > > ssize_t zero_copy_queued;
> > > ssize_t zero_copy_sent;
> > > + struct QIONetListener *listener;
> >
> > Commenting on my own patch:
> >
> > After re-reading docs/devel/style.rst, I can see that this particular
> > forward declaration of QIONetListener is not consistent with the
> > guidelines. I have to have a forward reference, since the style guide
> > also forbids circular inclusion (net-listener.h already includes
> > channel-socket.h, so channel-socket.h cannot include net-listener.h);
> > but it may be better for me to move the forward reference into
> > include/qemu/typedefs.h rather than inlining it how I did here.
>
> Then again, include/qemu/typedefs.h states "For struct types used in
> only a few headers, judicious use of the struct tag instead of the
> typedef name is commonly preferable."
>
> So, to keep it simpler, I'll just justify my choice in the commit message.
I would really rather we avoided the bi-directional pointer linkage
entirely. AFAICT, this is only required because the later patch is
passing a QIOChannelSocket as the opaque data parametr for a callback.
It would be preferrable if we would instead pass a standalonme data
struct containing the QIOChannelSocket + QIONetListener, and then
avoid this back-linkage.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 5/8] qio: Let listening sockets remember their owning QIONetListener
2025-11-07 8:50 ` Daniel P. Berrangé
@ 2025-11-07 13:47 ` Eric Blake
0 siblings, 0 replies; 41+ messages in thread
From: Eric Blake @ 2025-11-07 13:47 UTC (permalink / raw)
To: Daniel P. Berrangé; +Cc: qemu-devel, qemu-block, kwolf
On Fri, Nov 07, 2025 at 08:50:16AM +0000, Daniel P. Berrangé wrote:
> > > > +++ b/include/io/channel-socket.h
> > > > @@ -49,6 +49,7 @@ struct QIOChannelSocket {
> > > > socklen_t remoteAddrLen;
> > > > ssize_t zero_copy_queued;
> > > > ssize_t zero_copy_sent;
> > > > + struct QIONetListener *listener;
> > >
> > > Commenting on my own patch:
> > >
> > > After re-reading docs/devel/style.rst, I can see that this particular
> > > forward declaration of QIONetListener is not consistent with the
> > > guidelines. I have to have a forward reference, since the style guide
> > > also forbids circular inclusion (net-listener.h already includes
> > > channel-socket.h, so channel-socket.h cannot include net-listener.h);
> > > but it may be better for me to move the forward reference into
> > > include/qemu/typedefs.h rather than inlining it how I did here.
> >
> > Then again, include/qemu/typedefs.h states "For struct types used in
> > only a few headers, judicious use of the struct tag instead of the
> > typedef name is commonly preferable."
> >
> > So, to keep it simpler, I'll just justify my choice in the commit message.
>
> I would really rather we avoided the bi-directional pointer linkage
> entirely. AFAICT, this is only required because the later patch is
> passing a QIOChannelSocket as the opaque data parametr for a callback.
>
> It would be preferrable if we would instead pass a standalonme data
> struct containing the QIOChannelSocket + QIONetListener, and then
> avoid this back-linkage.
Reasonable; a bit more verbose in net-listener.c, but less impact to
channel-socket.h, so I'll do that in v2.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 6/8] qio: Hoist ref of listener outside loop
2025-11-05 21:57 ` Eric Blake
@ 2025-11-11 14:43 ` Daniel P. Berrangé
2025-11-11 15:48 ` Kevin Wolf
0 siblings, 1 reply; 41+ messages in thread
From: Daniel P. Berrangé @ 2025-11-11 14:43 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, kwolf
On Wed, Nov 05, 2025 at 03:57:29PM -0600, Eric Blake wrote:
> On Tue, Nov 04, 2025 at 11:13:48AM +0000, Daniel P. Berrangé wrote:
> > On Mon, Nov 03, 2025 at 02:10:57PM -0600, Eric Blake wrote:
> > > The point of QIONetListener is to allow a server to listen to more
> > > than one socket address at a time, and respond to clients in a
> > > first-come-first-serve order across any of those addresses. While
> > > some servers (like NBD) really do want to serve multiple simultaneous
> > > clients, many other servers only care about the first client to
> > > connect, and will immediately deregister the callback, possibly by
> > > dropping their reference to the QIONetListener. The existing code
> > > ensures that all other pending callbacks remain safe once the first
> > > callback drops the listener, by adding an extra reference to the
> > > listener for each GSource created, where those references pair to the
> > > eventual teardown of each GSource after a given callbacks has been
> > > serviced or aborted. But it is equally acceptable to hoist the
> > > reference to the listener outside the loop - as long as there is a
> > > callback function registered, it is sufficient to have a single
> > > reference live for the entire array of sioc, rather than one reference
> > > per sioc in the array.
> > >
> > > Hoisting the reference like this will make it easier for an upcoming
> > > patch to still ensure the listener cannot be prematurely garbage
> > > collected during the user's callback, even when the callback no longer
> > > uses a per-sioc GSource.
> >
> > It isn't quite this simple. Glib reference counts the callback
> > func / data, holding a reference when dispatching the callback.
> >
> > IOW, even if the GSource is unrefed, the callback 'notify'
> > function won't be called if the main loop is in the process
> > of dispatching.
>
> I'm not sure I follow your argument. Glib holds a reference on the
> GSource object, not on the opaque data that is handed to the GSource.
> It is possible to use g_source_set_callback_indirect() where GSource
> can learn how to use the same reference counting on data as external
> code, by the use of function pointers for ref and unref, but QIO uses
> merely g_source_set_callback().
> https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/gmain.c#L1844
> shows that glib then wraps that opaque pointer into an internal
> GSourceCallback object which itself is reference counted, so that the
> notify function is not called until the GSource is finalized, but that
> is reference counting on the container, not on the opaque object
> itself (which in this patch is the QIONetListener).
>
> >
> > With this change, the reference on 'listener' can now be
> > released even if the callback is currently dispatching.
>
> So if I'm understanding your concern, you're worried that the unwatch
> code can finish looping through the g_source_destroy and then reach
> the point where it unrefs listener, but that a late-breaking client
> connection can trigger a callback that can still be executing in
> another thread/coroutine after the listener is unref'ed but before the
> GSource has been finalized? If so, would squashing this in fix the
> problem you are seeing?
Consider the following scenario, where we have two threads, one
is calling QIONetListener APIs, and one is the event thread.
In the current code, when we unref() the GSource for the socket
watch, the destroy-notify does not get called, because the
event thread is in the middle of a dispatch callback for the
I/O event. When the dispatch callback returns control to the
event loop, the GSourceCallback is unrefed, and this triggers
the destroy-notify call, which unrefs the listener.
The flow looks like this:
Thread 1:
qio_net_listener_set_client_func(lstnr, f, ...);
=> foreach sock: socket
=> object_ref(lstnr)
=> sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, object_unref);
Thread 2:
poll()
=> event POLLIN on socket
=> ref(GSourceCallback)
=> call dispatch(sock)
...do stuff..
Thread 1:
qio_net_listener_set_client_func(lstnr, NULL, ...);
=> foreach sock: socket
=> g_source_unref(sock_src)
unref(lstnr) (the final reference)
=> finalize(lstnr)
Thread 2:
=> return dispatch(sock)
=> unref(GSourceCallback)
=> destroy-notify
=> object_unref
> diff --git i/io/net-listener.c w/io/net-listener.c
> index 9f4e3c0be0c..1fcbbeb7a76 100644
> --- i/io/net-listener.c
> +++ w/io/net-listener.c
> @@ -67,8 +67,10 @@ static void qio_net_listener_aio_func(void *opaque)
> {
> QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(opaque);
>
> + object_ref(OBJECT(sioc->listener));
> qio_net_listener_channel_func(QIO_CHANNEL(sioc), G_IO_IN,
> sioc->listener);
> + object_unref(OBJECT(sioc->listener));
> }
Now consider this patch, plus this extra hunk...
Thread 1:
qio_net_listener_set_client_func(lstnr, f, ...);
=> object_ref(listener)
=> foreach sock: socket
=> sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, NULL);
Thread 2:
poll()
=> event POLLIN on socket
=> call dispatch(sock)
=> ref(lstnr)
...do stuff..
Thread 1:
qio_net_listener_set_client_func(lstnr, NULL, ...);
=> foreach sock: socket
=> g_source_unref(sock_src)
=> object_unref(listener)
unref(lstnr) (still 1 reference left)
Thread 2:
=> unref(lstnr) (the final reference)
=> finalize(lstnr)
=> return dispatch(sock)
=> unref(GSourceCallback)
That appears to work ok, however, there's still a race window that is
not solved. Between the time thread 2 sees POLLIN, and when it calls
the dispatch(sock) function, it is possible that thread 1 will drop
the last reference:
Thread 1:
qio_net_listener_set_client_func(lstnr, f, ...);
=> object_ref(listener)
=> foreach sock: socket
=> sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, NULL);
Thread 2:
poll()
=> event POLLIN on socket
Thread 1:
qio_net_listener_set_client_func(lstnr, NULL, ...);
=> foreach sock: socket
=> g_source_unref(sock_src)
=> object_unref(listener)
unref(lstnr) (still 1 reference left)
Thread 2:
=> call dispatch(sock)
=> ref(lstnr)
...do stuff..
=> unref(lstnr) (the final reference)
=> finalize(lstnr)
=> return dispatch(sock)
=> unref(GSourceCallback)
I don't see a way to solve this without synchronization with the event
loop for releasing the reference on the opaque data for the dispatcher
callback. That's what the current code does, but I'm seeing no way for
the AioContext event loop callbacks to have anything equivalent. This
feels like a gap in the AioContext design.
This is admittedly an incredibly hard to trigger race condition. It would
need a client to be calling a QMP command that tears down the NBD server,
at the exact same time as a new NBD client was incoming. Or the same kind
of scenario for other pieces of QEMU code using QIONetListener. This still
makes me worried though, as rare races have a habit of hitting QEMU
eventually.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 6/8] qio: Hoist ref of listener outside loop
2025-11-11 14:43 ` Daniel P. Berrangé
@ 2025-11-11 15:48 ` Kevin Wolf
2025-11-11 16:07 ` Daniel P. Berrangé
0 siblings, 1 reply; 41+ messages in thread
From: Kevin Wolf @ 2025-11-11 15:48 UTC (permalink / raw)
To: Daniel P. Berrangé; +Cc: Eric Blake, qemu-devel, qemu-block, stefanha
Am 11.11.2025 um 15:43 hat Daniel P. Berrangé geschrieben:
> On Wed, Nov 05, 2025 at 03:57:29PM -0600, Eric Blake wrote:
> > On Tue, Nov 04, 2025 at 11:13:48AM +0000, Daniel P. Berrangé wrote:
> > > On Mon, Nov 03, 2025 at 02:10:57PM -0600, Eric Blake wrote:
> > > > The point of QIONetListener is to allow a server to listen to more
> > > > than one socket address at a time, and respond to clients in a
> > > > first-come-first-serve order across any of those addresses. While
> > > > some servers (like NBD) really do want to serve multiple simultaneous
> > > > clients, many other servers only care about the first client to
> > > > connect, and will immediately deregister the callback, possibly by
> > > > dropping their reference to the QIONetListener. The existing code
> > > > ensures that all other pending callbacks remain safe once the first
> > > > callback drops the listener, by adding an extra reference to the
> > > > listener for each GSource created, where those references pair to the
> > > > eventual teardown of each GSource after a given callbacks has been
> > > > serviced or aborted. But it is equally acceptable to hoist the
> > > > reference to the listener outside the loop - as long as there is a
> > > > callback function registered, it is sufficient to have a single
> > > > reference live for the entire array of sioc, rather than one reference
> > > > per sioc in the array.
> > > >
> > > > Hoisting the reference like this will make it easier for an upcoming
> > > > patch to still ensure the listener cannot be prematurely garbage
> > > > collected during the user's callback, even when the callback no longer
> > > > uses a per-sioc GSource.
> > >
> > > It isn't quite this simple. Glib reference counts the callback
> > > func / data, holding a reference when dispatching the callback.
> > >
> > > IOW, even if the GSource is unrefed, the callback 'notify'
> > > function won't be called if the main loop is in the process
> > > of dispatching.
> >
> > I'm not sure I follow your argument. Glib holds a reference on the
> > GSource object, not on the opaque data that is handed to the GSource.
> > It is possible to use g_source_set_callback_indirect() where GSource
> > can learn how to use the same reference counting on data as external
> > code, by the use of function pointers for ref and unref, but QIO uses
> > merely g_source_set_callback().
> > https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/gmain.c#L1844
> > shows that glib then wraps that opaque pointer into an internal
> > GSourceCallback object which itself is reference counted, so that the
> > notify function is not called until the GSource is finalized, but that
> > is reference counting on the container, not on the opaque object
> > itself (which in this patch is the QIONetListener).
> >
> > >
> > > With this change, the reference on 'listener' can now be
> > > released even if the callback is currently dispatching.
> >
> > So if I'm understanding your concern, you're worried that the unwatch
> > code can finish looping through the g_source_destroy and then reach
> > the point where it unrefs listener, but that a late-breaking client
> > connection can trigger a callback that can still be executing in
> > another thread/coroutine after the listener is unref'ed but before the
> > GSource has been finalized? If so, would squashing this in fix the
> > problem you are seeing?
>
> Consider the following scenario, where we have two threads, one
> is calling QIONetListener APIs, and one is the event thread.
>
> In the current code, when we unref() the GSource for the socket
> watch, the destroy-notify does not get called, because the
> event thread is in the middle of a dispatch callback for the
> I/O event. When the dispatch callback returns control to the
> event loop, the GSourceCallback is unrefed, and this triggers
> the destroy-notify call, which unrefs the listener.
>
> The flow looks like this:
>
> Thread 1:
> qio_net_listener_set_client_func(lstnr, f, ...);
> => foreach sock: socket
> => object_ref(lstnr)
> => sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, object_unref);
>
> Thread 2:
> poll()
> => event POLLIN on socket
> => ref(GSourceCallback)
> => call dispatch(sock)
> ...do stuff..
>
> Thread 1:
> qio_net_listener_set_client_func(lstnr, NULL, ...);
> => foreach sock: socket
> => g_source_unref(sock_src)
> unref(lstnr) (the final reference)
> => finalize(lstnr)
>
> Thread 2:
> => return dispatch(sock)
> => unref(GSourceCallback)
> => destroy-notify
> => object_unref
>
>
>
> > diff --git i/io/net-listener.c w/io/net-listener.c
> > index 9f4e3c0be0c..1fcbbeb7a76 100644
> > --- i/io/net-listener.c
> > +++ w/io/net-listener.c
> > @@ -67,8 +67,10 @@ static void qio_net_listener_aio_func(void *opaque)
> > {
> > QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(opaque);
> >
> > + object_ref(OBJECT(sioc->listener));
> > qio_net_listener_channel_func(QIO_CHANNEL(sioc), G_IO_IN,
> > sioc->listener);
> > + object_unref(OBJECT(sioc->listener));
> > }
>
> Now consider this patch, plus this extra hunk...
>
> Thread 1:
> qio_net_listener_set_client_func(lstnr, f, ...);
> => object_ref(listener)
> => foreach sock: socket
> => sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, NULL);
>
> Thread 2:
> poll()
> => event POLLIN on socket
> => call dispatch(sock)
> => ref(lstnr)
> ...do stuff..
>
> Thread 1:
> qio_net_listener_set_client_func(lstnr, NULL, ...);
> => foreach sock: socket
> => g_source_unref(sock_src)
> => object_unref(listener)
> unref(lstnr) (still 1 reference left)
>
> Thread 2:
> => unref(lstnr) (the final reference)
> => finalize(lstnr)
> => return dispatch(sock)
> => unref(GSourceCallback)
>
>
> That appears to work ok, however, there's still a race window that is
> not solved. Between the time thread 2 sees POLLIN, and when it calls
> the dispatch(sock) function, it is possible that thread 1 will drop
> the last reference:
>
>
>
> Thread 1:
> qio_net_listener_set_client_func(lstnr, f, ...);
> => object_ref(listener)
> => foreach sock: socket
> => sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, NULL);
>
> Thread 2:
> poll()
> => event POLLIN on socket
>
> Thread 1:
> qio_net_listener_set_client_func(lstnr, NULL, ...);
> => foreach sock: socket
> => g_source_unref(sock_src)
> => object_unref(listener)
> unref(lstnr) (still 1 reference left)
Is what you're worried about that there is still a reference left in
the opaque pointer of an fd handler, but it's not in the refcount and
therefore this already frees the listener while thread 2 will still
access it?
>
> Thread 2:
> => call dispatch(sock)
> => ref(lstnr)
> ...do stuff..
> => unref(lstnr) (the final reference)
> => finalize(lstnr)
> => return dispatch(sock)
> => unref(GSourceCallback)
>
>
> I don't see a way to solve this without synchronization with the event
> loop for releasing the reference on the opaque data for the dispatcher
> callback. That's what the current code does, but I'm seeing no way for
> the AioContext event loop callbacks to have anything equivalent. This
> feels like a gap in the AioContext design.
I think the way you would normally do this is schedule a BH in thread 2
to do the critical work. If you delete the fd handler and unref the
listener in thread 2, then there is no race.
But maybe adding a callback for node deletion in AioHandler wouldn't
hurt because the opaque pointer pretty much always references something
and doing an unref when deleting the AioHandler should be a pretty
common pattern.
> This is admittedly an incredibly hard to trigger race condition. It would
> need a client to be calling a QMP command that tears down the NBD server,
> at the exact same time as a new NBD client was incoming. Or the same kind
> of scenario for other pieces of QEMU code using QIONetListener. This still
> makes me worried though, as rare races have a habit of hitting QEMU
> eventually.
Aren't both QMP and incoming NBD connections always handled in the main
thread? I'm not sure if I know a case where we would actually get this
pattern with two different threads today. Of course, that doesn't mean
that we couldn't get it in the future.
Kevin
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 6/8] qio: Hoist ref of listener outside loop
2025-11-11 15:48 ` Kevin Wolf
@ 2025-11-11 16:07 ` Daniel P. Berrangé
0 siblings, 0 replies; 41+ messages in thread
From: Daniel P. Berrangé @ 2025-11-11 16:07 UTC (permalink / raw)
To: Kevin Wolf; +Cc: Eric Blake, qemu-devel, qemu-block, stefanha
On Tue, Nov 11, 2025 at 04:48:04PM +0100, Kevin Wolf wrote:
> Am 11.11.2025 um 15:43 hat Daniel P. Berrangé geschrieben:
> > On Wed, Nov 05, 2025 at 03:57:29PM -0600, Eric Blake wrote:
> > > On Tue, Nov 04, 2025 at 11:13:48AM +0000, Daniel P. Berrangé wrote:
> > > > On Mon, Nov 03, 2025 at 02:10:57PM -0600, Eric Blake wrote:
> > > > > The point of QIONetListener is to allow a server to listen to more
> > > > > than one socket address at a time, and respond to clients in a
> > > > > first-come-first-serve order across any of those addresses. While
> > > > > some servers (like NBD) really do want to serve multiple simultaneous
> > > > > clients, many other servers only care about the first client to
> > > > > connect, and will immediately deregister the callback, possibly by
> > > > > dropping their reference to the QIONetListener. The existing code
> > > > > ensures that all other pending callbacks remain safe once the first
> > > > > callback drops the listener, by adding an extra reference to the
> > > > > listener for each GSource created, where those references pair to the
> > > > > eventual teardown of each GSource after a given callbacks has been
> > > > > serviced or aborted. But it is equally acceptable to hoist the
> > > > > reference to the listener outside the loop - as long as there is a
> > > > > callback function registered, it is sufficient to have a single
> > > > > reference live for the entire array of sioc, rather than one reference
> > > > > per sioc in the array.
> > > > >
> > > > > Hoisting the reference like this will make it easier for an upcoming
> > > > > patch to still ensure the listener cannot be prematurely garbage
> > > > > collected during the user's callback, even when the callback no longer
> > > > > uses a per-sioc GSource.
> > > >
> > > > It isn't quite this simple. Glib reference counts the callback
> > > > func / data, holding a reference when dispatching the callback.
> > > >
> > > > IOW, even if the GSource is unrefed, the callback 'notify'
> > > > function won't be called if the main loop is in the process
> > > > of dispatching.
snip
> > That appears to work ok, however, there's still a race window that is
> > not solved. Between the time thread 2 sees POLLIN, and when it calls
> > the dispatch(sock) function, it is possible that thread 1 will drop
> > the last reference:
> >
> >
> >
> > Thread 1:
> > qio_net_listener_set_client_func(lstnr, f, ...);
> > => object_ref(listener)
> > => foreach sock: socket
> > => sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, NULL);
> >
> > Thread 2:
> > poll()
> > => event POLLIN on socket
> >
> > Thread 1:
> > qio_net_listener_set_client_func(lstnr, NULL, ...);
> > => foreach sock: socket
> > => g_source_unref(sock_src)
> > => object_unref(listener)
> > unref(lstnr) (still 1 reference left)
>
> Is what you're worried about that there is still a reference left in
> the opaque pointer of an fd handler, but it's not in the refcount and
> therefore this already frees the listener while thread 2 will still
> access it?
Yes, exactly.
>
> >
> > Thread 2:
> > => call dispatch(sock)
> > => ref(lstnr)
> > ...do stuff..
> > => unref(lstnr) (the final reference)
> > => finalize(lstnr)
> > => return dispatch(sock)
> > => unref(GSourceCallback)
> >
> >
> > I don't see a way to solve this without synchronization with the event
> > loop for releasing the reference on the opaque data for the dispatcher
> > callback. That's what the current code does, but I'm seeing no way for
> > the AioContext event loop callbacks to have anything equivalent. This
> > feels like a gap in the AioContext design.
>
> I think the way you would normally do this is schedule a BH in thread 2
> to do the critical work. If you delete the fd handler and unref the
> listener in thread 2, then there is no race.
Yes, using a BH would be safe, provided you put the BH in the right
loop, given that we have choice of the main event loop, or a non-default
GMainContext, or the special AioContext that NBD is relying on.
> But maybe adding a callback for node deletion in AioHandler wouldn't
> hurt because the opaque pointer pretty much always references something
> and doing an unref when deleting the AioHandler should be a pretty
> common pattern.
That would likely make the scenario less error-prone, compared to
remembering to use a BH to synchronize.
> > This is admittedly an incredibly hard to trigger race condition. It would
> > need a client to be calling a QMP command that tears down the NBD server,
> > at the exact same time as a new NBD client was incoming. Or the same kind
> > of scenario for other pieces of QEMU code using QIONetListener. This still
> > makes me worried though, as rare races have a habit of hitting QEMU
> > eventually.
>
> Aren't both QMP and incoming NBD connections always handled in the main
> thread? I'm not sure if I know a case where we would actually get this
> pattern with two different threads today. Of course, that doesn't mean
> that we couldn't get it in the future.
Yeah, I believe we're probably safe in todays usage. My concern was
primarily about any surprises in the conceptual design that might
impact us in future.
I guess if NBD is the only thing using AioContext for QIONetListener
today, we could hoist the ref/unref only when using a AioContext,
and keep the GDestroyNotify usage for the GMainContext code path
to significantly limit the exposure. Avoid needing to do anything
extra for AioHandlers right before release.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2025-11-11 16:08 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-03 20:10 [PATCH 0/8] Fix deadlock with bdrv_open of self-served NBD Eric Blake
2025-11-03 20:10 ` [PATCH 1/8] qio: Add trace points to net_listener Eric Blake
2025-11-04 10:43 ` Daniel P. Berrangé
2025-11-04 11:08 ` Kevin Wolf
2025-11-05 17:18 ` Eric Blake
2025-11-06 12:20 ` Kevin Wolf
2025-11-03 20:10 ` [PATCH 2/8] qio: Minor optimization when callback function is unchanged Eric Blake
2025-11-04 10:44 ` Daniel P. Berrangé
2025-11-04 11:13 ` Kevin Wolf
2025-11-05 17:23 ` Eric Blake
2025-11-03 20:10 ` [PATCH 3/8] qio: Remember context of qio_net_listener_set_client_func_full Eric Blake
2025-11-04 10:50 ` Daniel P. Berrangé
2025-11-04 11:25 ` Kevin Wolf
2025-11-05 19:18 ` Eric Blake
2025-11-03 20:10 ` [PATCH 4/8] qio: Factor out helpers qio_net_listener_[un]watch Eric Blake
2025-11-04 11:03 ` Daniel P. Berrangé
2025-11-04 13:15 ` Kevin Wolf
2025-11-05 19:22 ` Eric Blake
2025-11-04 12:37 ` Kevin Wolf
2025-11-04 13:10 ` Daniel P. Berrangé
2025-11-05 19:32 ` Eric Blake
2025-11-03 20:10 ` [PATCH 5/8] qio: Let listening sockets remember their owning QIONetListener Eric Blake
2025-11-05 20:06 ` Eric Blake
2025-11-06 18:35 ` Eric Blake
2025-11-07 8:50 ` Daniel P. Berrangé
2025-11-07 13:47 ` Eric Blake
2025-11-03 20:10 ` [PATCH 6/8] qio: Hoist ref of listener outside loop Eric Blake
2025-11-04 11:13 ` Daniel P. Berrangé
2025-11-05 21:57 ` Eric Blake
2025-11-11 14:43 ` Daniel P. Berrangé
2025-11-11 15:48 ` Kevin Wolf
2025-11-11 16:07 ` Daniel P. Berrangé
2025-11-03 20:10 ` [PATCH 7/8] qio: Use AioContext for default-context QIONetListener Eric Blake
2025-11-04 11:37 ` Daniel P. Berrangé
2025-11-05 22:06 ` Eric Blake
2025-11-04 15:14 ` Kevin Wolf
2025-11-03 20:10 ` [PATCH 8/8] iotests: Add coverage of recent NBD qio deadlock fix Eric Blake
2025-11-04 11:38 ` Vladimir Sementsov-Ogievskiy
2025-11-05 22:10 ` Eric Blake
2025-11-06 8:20 ` Vladimir Sementsov-Ogievskiy
2025-11-06 12:26 ` Kevin Wolf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).