qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race
@ 2025-11-11 15:01 Jie Song
  2025-11-12  8:59 ` Markus Armbruster
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jie Song @ 2025-11-11 15:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, Jie Song

From: Jie Song <songjie_yewu@cmss.chinamobile.com>

When starting a dummy QEMU process with virsh, monitor_init_qmp() enables
IOThread monitoring of the QMP fd by default. However, a race condition
exists during the initialization phase: the IOThread only removes the
main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
which may be delayed under high system load.

This creates a window between monitor_qmp_setup_handlers_bh() and
qio_net_listener_set_client_func_full() where both the main thread and
IOThread are simultaneously monitoring the same fd and processing events.
This race can cause either the main thread or the IOThread to hang and
become unresponsive.

Fix this by proactively cleaning up the listener's IO sources in
monitor_init_qmp() before the IOThread initializes QMP monitoring,
ensuring exclusive fd ownership and eliminating the race condition.

The fix introduces socket_chr_listener_cleanup() to destroy and unref
all existing IO sources on the socket chardev listener, guaranteeing
that no concurrent fd monitoring occurs during the transition to
IOThread handling.

Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com>
---
 chardev/char-socket.c         | 18 ++++++++++++++++++
 include/chardev/char-socket.h |  2 ++
 monitor/qmp.c                 |  6 ++++++
 3 files changed, 26 insertions(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 62852e3caf..073a9da855 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -656,6 +656,24 @@ static void tcp_chr_telnet_destroy(SocketChardev *s)
     }
 }
 
+void socket_chr_listener_cleanup(Chardev *chr)
+{
+    SocketChardev *s = SOCKET_CHARDEV(chr);
+
+    if (s->listener) {
+        QIONetListener *listener = s->listener;
+        size_t i;
+
+        for (i = 0; i < listener->nsioc; i++) {
+            if (listener->io_source[i]) {
+                g_source_destroy(listener->io_source[i]);
+                g_source_unref(listener->io_source[i]);
+                listener->io_source[i] = NULL;
+            }
+        }
+    }
+}
+
 static void tcp_chr_update_read_handler(Chardev *chr)
 {
     SocketChardev *s = SOCKET_CHARDEV(chr);
diff --git a/include/chardev/char-socket.h b/include/chardev/char-socket.h
index d6d13ad37f..682440c6de 100644
--- a/include/chardev/char-socket.h
+++ b/include/chardev/char-socket.h
@@ -84,4 +84,6 @@ typedef struct SocketChardev SocketChardev;
 DECLARE_INSTANCE_CHECKER(SocketChardev, SOCKET_CHARDEV,
                          TYPE_CHARDEV_SOCKET)
 
+void socket_chr_listener_cleanup(Chardev *chr);
+
 #endif /* CHAR_SOCKET_H */
diff --git a/monitor/qmp.c b/monitor/qmp.c
index cb99a12d94..d9d1fafa70 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -25,6 +25,7 @@
 #include "qemu/osdep.h"
 
 #include "chardev/char-io.h"
+#include "chardev/char-socket.h"
 #include "monitor-internal.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-control.h"
@@ -537,6 +538,11 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
          * e.g. the chardev is in client mode, with wait=on.
          */
         remove_fd_in_watch(chr);
+        /*
+         * Clean up listener IO sources early to prevent racy fd
+         * handling between the main thread and the I/O thread.
+         */
+        socket_chr_listener_cleanup(chr);
         /*
          * We can't call qemu_chr_fe_set_handlers() directly here
          * since chardev might be running in the monitor I/O
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race
  2025-11-11 15:01 [PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race Jie Song
@ 2025-11-12  8:59 ` Markus Armbruster
  2025-11-12 15:31   ` Jie Song
  2025-11-12  9:05 ` Daniel P. Berrangé
  2025-11-12 21:48 ` Eric Blake
  2 siblings, 1 reply; 8+ messages in thread
From: Markus Armbruster @ 2025-11-12  8:59 UTC (permalink / raw)
  To: Jie Song; +Cc: qemu-devel, armbru, Jie Song, Daniel P. Berrangé

Daniel, is this in your area of expertise?

Jie Song, can you identify the commit that introduced the bug?

Jie Song <mail@jiesong.me> writes:

> From: Jie Song <songjie_yewu@cmss.chinamobile.com>
>
> When starting a dummy QEMU process with virsh, monitor_init_qmp() enables
> IOThread monitoring of the QMP fd by default. However, a race condition
> exists during the initialization phase: the IOThread only removes the
> main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
> which may be delayed under high system load.
>
> This creates a window between monitor_qmp_setup_handlers_bh() and
> qio_net_listener_set_client_func_full() where both the main thread and
> IOThread are simultaneously monitoring the same fd and processing events.
> This race can cause either the main thread or the IOThread to hang and
> become unresponsive.
>
> Fix this by proactively cleaning up the listener's IO sources in
> monitor_init_qmp() before the IOThread initializes QMP monitoring,
> ensuring exclusive fd ownership and eliminating the race condition.
>
> The fix introduces socket_chr_listener_cleanup() to destroy and unref
> all existing IO sources on the socket chardev listener, guaranteeing
> that no concurrent fd monitoring occurs during the transition to
> IOThread handling.
>
> Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com>
> ---
>  chardev/char-socket.c         | 18 ++++++++++++++++++
>  include/chardev/char-socket.h |  2 ++
>  monitor/qmp.c                 |  6 ++++++
>  3 files changed, 26 insertions(+)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 62852e3caf..073a9da855 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -656,6 +656,24 @@ static void tcp_chr_telnet_destroy(SocketChardev *s)
>      }
>  }
>  
> +void socket_chr_listener_cleanup(Chardev *chr)
> +{
> +    SocketChardev *s = SOCKET_CHARDEV(chr);
> +
> +    if (s->listener) {
> +        QIONetListener *listener = s->listener;
> +        size_t i;
> +
> +        for (i = 0; i < listener->nsioc; i++) {
> +            if (listener->io_source[i]) {
> +                g_source_destroy(listener->io_source[i]);
> +                g_source_unref(listener->io_source[i]);
> +                listener->io_source[i] = NULL;
> +            }
> +        }
> +    }
> +}
> +
>  static void tcp_chr_update_read_handler(Chardev *chr)
>  {
>      SocketChardev *s = SOCKET_CHARDEV(chr);
> diff --git a/include/chardev/char-socket.h b/include/chardev/char-socket.h
> index d6d13ad37f..682440c6de 100644
> --- a/include/chardev/char-socket.h
> +++ b/include/chardev/char-socket.h
> @@ -84,4 +84,6 @@ typedef struct SocketChardev SocketChardev;
>  DECLARE_INSTANCE_CHECKER(SocketChardev, SOCKET_CHARDEV,
>                           TYPE_CHARDEV_SOCKET)
>  
> +void socket_chr_listener_cleanup(Chardev *chr);
> +
>  #endif /* CHAR_SOCKET_H */
> diff --git a/monitor/qmp.c b/monitor/qmp.c
> index cb99a12d94..d9d1fafa70 100644
> --- a/monitor/qmp.c
> +++ b/monitor/qmp.c
> @@ -25,6 +25,7 @@
>  #include "qemu/osdep.h"
>  
>  #include "chardev/char-io.h"
> +#include "chardev/char-socket.h"
>  #include "monitor-internal.h"
>  #include "qapi/error.h"
>  #include "qapi/qapi-commands-control.h"
> @@ -537,6 +538,11 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
>           * e.g. the chardev is in client mode, with wait=on.
>           */
>          remove_fd_in_watch(chr);
> +        /*
> +         * Clean up listener IO sources early to prevent racy fd
> +         * handling between the main thread and the I/O thread.
> +         */
> +        socket_chr_listener_cleanup(chr);
>          /*
>           * We can't call qemu_chr_fe_set_handlers() directly here
>           * since chardev might be running in the monitor I/O



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race
  2025-11-11 15:01 [PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race Jie Song
  2025-11-12  8:59 ` Markus Armbruster
@ 2025-11-12  9:05 ` Daniel P. Berrangé
  2025-11-12 14:57   ` Jie Song
  2025-11-12 21:48 ` Eric Blake
  2 siblings, 1 reply; 8+ messages in thread
From: Daniel P. Berrangé @ 2025-11-12  9:05 UTC (permalink / raw)
  To: Jie Song; +Cc: qemu-devel, armbru, Jie Song

On Tue, Nov 11, 2025 at 11:01:44PM +0800, Jie Song wrote:
> From: Jie Song <songjie_yewu@cmss.chinamobile.com>
> 
> When starting a dummy QEMU process with virsh, monitor_init_qmp() enables
> IOThread monitoring of the QMP fd by default. However, a race condition
> exists during the initialization phase: the IOThread only removes the
> main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
> which may be delayed under high system load.
> 
> This creates a window between monitor_qmp_setup_handlers_bh() and
> qio_net_listener_set_client_func_full() where both the main thread and
> IOThread are simultaneously monitoring the same fd and processing events.
> This race can cause either the main thread or the IOThread to hang and
> become unresponsive.
> 
> Fix this by proactively cleaning up the listener's IO sources in
> monitor_init_qmp() before the IOThread initializes QMP monitoring,
> ensuring exclusive fd ownership and eliminating the race condition.
> 
> The fix introduces socket_chr_listener_cleanup() to destroy and unref
> all existing IO sources on the socket chardev listener, guaranteeing
> that no concurrent fd monitoring occurs during the transition to
> IOThread handling.
> 
> Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com>
> ---
>  chardev/char-socket.c         | 18 ++++++++++++++++++
>  include/chardev/char-socket.h |  2 ++
>  monitor/qmp.c                 |  6 ++++++
>  3 files changed, 26 insertions(+)
> 
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 62852e3caf..073a9da855 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -656,6 +656,24 @@ static void tcp_chr_telnet_destroy(SocketChardev *s)
>      }
>  }
>  
> +void socket_chr_listener_cleanup(Chardev *chr)
> +{
> +    SocketChardev *s = SOCKET_CHARDEV(chr);
> +
> +    if (s->listener) {
> +        QIONetListener *listener = s->listener;
> +        size_t i;
> +
> +        for (i = 0; i < listener->nsioc; i++) {
> +            if (listener->io_source[i]) {
> +                g_source_destroy(listener->io_source[i]);
> +                g_source_unref(listener->io_source[i]);
> +                listener->io_source[i] = NULL;
> +            }
> +        }
> +    }
> +}
> +
>  static void tcp_chr_update_read_handler(Chardev *chr)
>  {
>      SocketChardev *s = SOCKET_CHARDEV(chr);
> diff --git a/include/chardev/char-socket.h b/include/chardev/char-socket.h
> index d6d13ad37f..682440c6de 100644
> --- a/include/chardev/char-socket.h
> +++ b/include/chardev/char-socket.h
> @@ -84,4 +84,6 @@ typedef struct SocketChardev SocketChardev;
>  DECLARE_INSTANCE_CHECKER(SocketChardev, SOCKET_CHARDEV,
>                           TYPE_CHARDEV_SOCKET)
>  
> +void socket_chr_listener_cleanup(Chardev *chr);
> +
>  #endif /* CHAR_SOCKET_H */
> diff --git a/monitor/qmp.c b/monitor/qmp.c
> index cb99a12d94..d9d1fafa70 100644
> --- a/monitor/qmp.c
> +++ b/monitor/qmp.c
> @@ -25,6 +25,7 @@
>  #include "qemu/osdep.h"
>  
>  #include "chardev/char-io.h"
> +#include "chardev/char-socket.h"
>  #include "monitor-internal.h"
>  #include "qapi/error.h"
>  #include "qapi/qapi-commands-control.h"
> @@ -537,6 +538,11 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
>           * e.g. the chardev is in client mode, with wait=on.
>           */
>          remove_fd_in_watch(chr);
> +        /*
> +         * Clean up listener IO sources early to prevent racy fd
> +         * handling between the main thread and the I/O thread.
> +         */
> +        socket_chr_listener_cleanup(chr);

This is unsafe (may crash) because the chardev used by the monitor
may not be a SocketChardev. Having todo back

QMP is already calling 'remove_fd_in_watch' to purge the I/O sources.
So if there is a flaw, I would expect any fix to be entirely in the
chardev code, in a path from remove_fd_in_watch.

>          /*
>           * We can't call qemu_chr_fe_set_handlers() directly here
>           * since chardev might be running in the monitor I/O

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race
  2025-11-12  9:05 ` Daniel P. Berrangé
@ 2025-11-12 14:57   ` Jie Song
  0 siblings, 0 replies; 8+ messages in thread
From: Jie Song @ 2025-11-12 14:57 UTC (permalink / raw)
  To: berrange; +Cc: armbru, mail, qemu-devel, songjie_yewu

Hi Daniel,

Thank you for your review and valuable feedback.

You're absolutely right about the concerns. Let me clarify the scenario 
this patch addresses:
The remove_fd_in_watch() function handles the client-side connection case. 
However, when the chardev is configured in server mode 
(e.g., -qmp unix:/var/lib/libvirt/qemu/qmp-xxx/qmp.monitor,server=on,wait=off), 
there's listener that needs cleanup. The socket_chr_listener_cleanup() 
is specifically intended to handle this server-side listener to prevent the 
race condition between the main thread and IOThread monitoring the same listener fd.

I apologize for the unsafe assumption that the chardev would always be a SocketChardev.
You're correct that this could cause crashes with other chardev types. 
To fix this properly, I’m considering a more general design. 
Would the following approach be acceptable?

  1.Add a chr_listener_cleanup callback to the ChardevClass structure
  2.Implement this callback in SocketChardev
  3.Register it in char_socket_class_init()
  4.In monitor/qmp.c, call it through the class method
    remove_fd_in_watch(chr);
    ChardevClass *cc = CHARDEV_GET_CLASS(chr);
    if (cc->chr_listener_cleanup) {
        cc->chr_listener_cleanup(chr);
    }

This would maintain type safety while keeping the fix properly abstracted
at the chardev layer. Would this fix make sense?

Looking forward to your guidance.

Best regards,
Jie Song 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race
  2025-11-12  8:59 ` Markus Armbruster
@ 2025-11-12 15:31   ` Jie Song
  0 siblings, 0 replies; 8+ messages in thread
From: Jie Song @ 2025-11-12 15:31 UTC (permalink / raw)
  To: armbru; +Cc: berrange, mail, qemu-devel, songjie_yewu

> Daniel, is this in your area of expertise?
> 
> Jie Song, can you identify the commit that introduced the bug?
> 
> Jie Song <mail@jiesong.me> writes:
> 
> > From: Jie Song <songjie_yewu@cmss.chinamobile.com>
> >
> > When starting a dummy QEMU process with virsh, monitor_init_qmp() enables
> > IOThread monitoring of the QMP fd by default. However, a race condition
> > exists during the initialization phase: the IOThread only removes the
> > main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
> > which may be delayed under high system load.
> >
> > This creates a window between monitor_qmp_setup_handlers_bh() and
> > qio_net_listener_set_client_func_full() where both the main thread and
> > IOThread are simultaneously monitoring the same fd and processing events.
> > This race can cause either the main thread or the IOThread to hang and
> > become unresponsive.
> >
> > Fix this by proactively cleaning up the listener's IO sources in
> > monitor_init_qmp() before the IOThread initializes QMP monitoring,
> > ensuring exclusive fd ownership and eliminating the race condition.
> >
> > The fix introduces socket_chr_listener_cleanup() to destroy and unref
> > all existing IO sources on the socket chardev listener, guaranteeing
> > that no concurrent fd monitoring occurs during the transition to
> > IOThread handling.
> >
> > Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com>
> > ---
> >  chardev/char-socket.c         | 18 ++++++++++++++++++
> >  include/chardev/char-socket.h |  2 ++
> >  monitor/qmp.c                 |  6 ++++++
> >  3 files changed, 26 insertions(+)
> >
> > diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> > index 62852e3caf..073a9da855 100644
> > --- a/chardev/char-socket.c
> > +++ b/chardev/char-socket.c
> > @@ -656,6 +656,24 @@ static void tcp_chr_telnet_destroy(SocketChardev *s)
> >      }
> >  }
> >  
> > +void socket_chr_listener_cleanup(Chardev *chr)
> > +{
> > +    SocketChardev *s = SOCKET_CHARDEV(chr);
> > +
> > +    if (s->listener) {
> > +        QIONetListener *listener = s->listener;
> > +        size_t i;
> > +
> > +        for (i = 0; i < listener->nsioc; i++) {
> > +            if (listener->io_source[i]) {
> > +                g_source_destroy(listener->io_source[i]);
> > +                g_source_unref(listener->io_source[i]);
> > +                listener->io_source[i] = NULL;
> > +            }
> > +        }
> > +    }
> > +}
> > +
> >  static void tcp_chr_update_read_handler(Chardev *chr)
> >  {
> >      SocketChardev *s = SOCKET_CHARDEV(chr);
> > diff --git a/include/chardev/char-socket.h b/include/chardev/char-socket.h
> > index d6d13ad37f..682440c6de 100644
> > --- a/include/chardev/char-socket.h
> > +++ b/include/chardev/char-socket.h
> > @@ -84,4 +84,6 @@ typedef struct SocketChardev SocketChardev;
> >  DECLARE_INSTANCE_CHECKER(SocketChardev, SOCKET_CHARDEV,
> >                           TYPE_CHARDEV_SOCKET)
> >  
> > +void socket_chr_listener_cleanup(Chardev *chr);
> > +
> >  #endif /* CHAR_SOCKET_H */
> > diff --git a/monitor/qmp.c b/monitor/qmp.c
> > index cb99a12d94..d9d1fafa70 100644
> > --- a/monitor/qmp.c
> > +++ b/monitor/qmp.c
> > @@ -25,6 +25,7 @@
> >  #include "qemu/osdep.h"
> >  
> >  #include "chardev/char-io.h"
> > +#include "chardev/char-socket.h"
> >  #include "monitor-internal.h"
> >  #include "qapi/error.h"
> >  #include "qapi/qapi-commands-control.h"
> > @@ -537,6 +538,11 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
> >           * e.g. the chardev is in client mode, with wait=on.
> >           */
> >          remove_fd_in_watch(chr);
> > +        /*
> > +         * Clean up listener IO sources early to prevent racy fd
> > +         * handling between the main thread and the I/O thread.
> > +         */
> > +        socket_chr_listener_cleanup(chr);
> >          /*
> >           * We can't call qemu_chr_fe_set_handlers() directly here
> >           * since chardev might be running in the monitor I/O

Hi Markus,

Thank you for the question.

The issue you're referring to is not tied to any specific commit but rather 
arises from the current process flow. Specifically, in scenarios like the one 
with virsh starting a dummy QEMU process, the following command line may
triggers the bug:
`/usr/bin/qemu-system-x86_64 -S -no-user-config -nodefaults -nographic -machine
none,accel=tcg -qmp unix:/var/lib/libvirt/qemu/qmp-xxx/qmp.monitor,server=on,wait=off`

We can reproduce this issue using gdb with the following steps:
  1.Pause the I/O thread: Execute monitor_init_qmp in the main thread, and before 
    aio_bh_schedule_oneshot is called, suspend the I/O thread (scheduler-locking on). 
    This simulates a high load scenario.
  2.Set a breakpoint at qemu_accept: Allow the main thread to continue running. 
    The main thread will reach qemu_accept, and at this point, the main thread will 
    be listening for the corresponding chardev (the QMP socket).
  3.Simulate a client connection: Use nc -U to simulate a client connecting to the 
    Unix socket. The main thread will detect the event and hit the breakpoint at qemu_accept.
  4.Resume the I/O thread: Now, switch to the I/O thread and allow it to run. 
    It will also reach the qemu_accept breakpoint, creating a race condition where 
    both threads are handling the same accept event.

This race causes either the main thread or the IOThread to hang and become unresponsive.

The issue stems from the window between when the main thread sets up the listener watch and
when the IOThread takes over exclusive ownership. Under normal conditions this window is 
very small, but under high load or with specific timing, both threads can end up processing 
events on the same fd simultaneously.

I hope this explanation clarifies the issue. 

Best regards,
Jie Song


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race
  2025-11-11 15:01 [PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race Jie Song
  2025-11-12  8:59 ` Markus Armbruster
  2025-11-12  9:05 ` Daniel P. Berrangé
@ 2025-11-12 21:48 ` Eric Blake
  2025-11-13 15:10   ` mail
  2025-11-13 15:13   ` Eric Blake
  2 siblings, 2 replies; 8+ messages in thread
From: Eric Blake @ 2025-11-12 21:48 UTC (permalink / raw)
  To: Jie Song; +Cc: qemu-devel, armbru, Jie Song

On Tue, Nov 11, 2025 at 11:01:44PM +0800, Jie Song wrote:
> From: Jie Song <songjie_yewu@cmss.chinamobile.com>
> 
> When starting a dummy QEMU process with virsh, monitor_init_qmp() enables
> IOThread monitoring of the QMP fd by default. However, a race condition
> exists during the initialization phase: the IOThread only removes the
> main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
> which may be delayed under high system load.
> 
> This creates a window between monitor_qmp_setup_handlers_bh() and
> qio_net_listener_set_client_func_full() where both the main thread and
> IOThread are simultaneously monitoring the same fd and processing events.
> This race can cause either the main thread or the IOThread to hang and
> become unresponsive.
> 
> Fix this by proactively cleaning up the listener's IO sources in
> monitor_init_qmp() before the IOThread initializes QMP monitoring,
> ensuring exclusive fd ownership and eliminating the race condition.
> 
> The fix introduces socket_chr_listener_cleanup() to destroy and unref
> all existing IO sources on the socket chardev listener, guaranteeing
> that no concurrent fd monitoring occurs during the transition to
> IOThread handling.
> 
> Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com>
> ---
>  chardev/char-socket.c         | 18 ++++++++++++++++++
>  include/chardev/char-socket.h |  2 ++
>  monitor/qmp.c                 |  6 ++++++
>  3 files changed, 26 insertions(+)
> 
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 62852e3caf..073a9da855 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -656,6 +656,24 @@ static void tcp_chr_telnet_destroy(SocketChardev *s)
>      }
>  }
>  
> +void socket_chr_listener_cleanup(Chardev *chr)
> +{
> +    SocketChardev *s = SOCKET_CHARDEV(chr);
> +
> +    if (s->listener) {
> +        QIONetListener *listener = s->listener;
> +        size_t i;
> +
> +        for (i = 0; i < listener->nsioc; i++) {

This directly accesses listener->nsioc outside of net-listener.c.
I've got a pending patch that frowns on this type of usage (here's the
link to v2; v3 is coming soon):

https://lore.kernel.org/qemu-devel/20251108230525.3169174-14-eblake@redhat.com/T/#m69a13da54c24ad55351b6a004ec1c0cba7a7b49c

But it might be possible to do what you want without peeking inside
the listener; have you tested calling
qio_net_listener_set_client_func_full() to change the callback to NULL
prior to doing the handover to iothread, and then reregistering
tcp_chr_accept after that point?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race
  2025-11-12 21:48 ` Eric Blake
@ 2025-11-13 15:10   ` mail
  2025-11-13 15:13   ` Eric Blake
  1 sibling, 0 replies; 8+ messages in thread
From: mail @ 2025-11-13 15:10 UTC (permalink / raw)
  To: eblake; +Cc: armbru, berrange, mail, qemu-devel, songjie_yewu

> On Tue, Nov 11, 2025 at 11:01:44PM +0800, Jie Song wrote:
> > From: Jie Song <songjie_yewu@cmss.chinamobile.com>
> > 
> > When starting a dummy QEMU process with virsh, monitor_init_qmp() enables
> > IOThread monitoring of the QMP fd by default. However, a race condition
> > exists during the initialization phase: the IOThread only removes the
> > main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
> > which may be delayed under high system load.
> > 
> > This creates a window between monitor_qmp_setup_handlers_bh() and
> > qio_net_listener_set_client_func_full() where both the main thread and
> > IOThread are simultaneously monitoring the same fd and processing events.
> > This race can cause either the main thread or the IOThread to hang and
> > become unresponsive.
> > 
> > Fix this by proactively cleaning up the listener's IO sources in
> > monitor_init_qmp() before the IOThread initializes QMP monitoring,
> > ensuring exclusive fd ownership and eliminating the race condition.
> > 
> > The fix introduces socket_chr_listener_cleanup() to destroy and unref
> > all existing IO sources on the socket chardev listener, guaranteeing
> > that no concurrent fd monitoring occurs during the transition to
> > IOThread handling.
> > 
> > Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com>
> > ---
> >  chardev/char-socket.c         | 18 ++++++++++++++++++
> >  include/chardev/char-socket.h |  2 ++
> >  monitor/qmp.c                 |  6 ++++++
> >  3 files changed, 26 insertions(+)
> > 
> > diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> > index 62852e3caf..073a9da855 100644
> > --- a/chardev/char-socket.c
> > +++ b/chardev/char-socket.c
> > @@ -656,6 +656,24 @@ static void tcp_chr_telnet_destroy(SocketChardev *s)
> >      }
> >  }
> >  
> > +void socket_chr_listener_cleanup(Chardev *chr)
> > +{
> > +    SocketChardev *s = SOCKET_CHARDEV(chr);
> > +
> > +    if (s->listener) {
> > +        QIONetListener *listener = s->listener;
> > +        size_t i;
> > +
> > +        for (i = 0; i < listener->nsioc; i++) {
> 
> This directly accesses listener->nsioc outside of net-listener.c.
> I've got a pending patch that frowns on this type of usage (here's the
> link to v2; v3 is coming soon):
> 
> https://lore.kernel.org/qemu-devel/20251108230525.3169174-14-eblake@redhat.com/T/#m69a13da54c24ad55351b6a004ec1c0cba7a7b49c
> 
> But it might be possible to do what you want without peeking inside
> the listener; have you tested calling
> qio_net_listener_set_client_func_full() to change the callback to NULL
> prior to doing the handover to iothread, and then reregistering
> tcp_chr_accept after that point?
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.
> Virtualization:  qemu.org | libguestfs.org

Hi Eric,

Thanks a lot for the detailed feedback! You're absolutely right—the current
patch does have issues with encapsulation by directly accessing internal 
listener structures like nsioc. 

I took your advice and tested it: before the I/O thread initializes 
the QMP listener, call qio_net_listener_set_client_func_full() with the callback
set to NULL. This effectively purges the main thread's fd watch without any 
new helper functions, and then the I/O thread can safely re-register tcp_chr_accept. 

Besides, Daniel also raised a solid concern in his reply: the chardev backend 
isn't guaranteed to be a SocketChardev type, so blindly calling a socket-specific 
cleanup could lead to crashes. 
https://lore.kernel.org/qemu-devel/20251112145743.15075-1-mail@jiesong.me/

The v1 patch needs to be modified. I'll iterate on a v2 incorporating 
both your NULL-callback advice and Daniel's robustness improvements. 

Best regards,
Jie Song 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race
  2025-11-12 21:48 ` Eric Blake
  2025-11-13 15:10   ` mail
@ 2025-11-13 15:13   ` Eric Blake
  1 sibling, 0 replies; 8+ messages in thread
From: Eric Blake @ 2025-11-13 15:13 UTC (permalink / raw)
  To: Jie Song; +Cc: qemu-devel, armbru, berrange, qemu-block, Jie Song

On Wed, Nov 12, 2025 at 03:48:07PM -0600, Eric Blake wrote:
> On Tue, Nov 11, 2025 at 11:01:44PM +0800, Jie Song wrote:
> > From: Jie Song <songjie_yewu@cmss.chinamobile.com>
> > 
> > When starting a dummy QEMU process with virsh, monitor_init_qmp() enables
> > IOThread monitoring of the QMP fd by default. However, a race condition
> > exists during the initialization phase: the IOThread only removes the
> > main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
> > which may be delayed under high system load.
> > 
> > This creates a window between monitor_qmp_setup_handlers_bh() and
> > qio_net_listener_set_client_func_full() where both the main thread and
> > IOThread are simultaneously monitoring the same fd and processing events.
> > This race can cause either the main thread or the IOThread to hang and
> > become unresponsive.
> > 
> > Fix this by proactively cleaning up the listener's IO sources in
> > monitor_init_qmp() before the IOThread initializes QMP monitoring,
> > ensuring exclusive fd ownership and eliminating the race condition.
> > 
> > The fix introduces socket_chr_listener_cleanup() to destroy and unref
> > all existing IO sources on the socket chardev listener, guaranteeing
> > that no concurrent fd monitoring occurs during the transition to
> > IOThread handling.
> > 
> > Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com>
> > ---
> >  chardev/char-socket.c         | 18 ++++++++++++++++++
> >  include/chardev/char-socket.h |  2 ++
> >  monitor/qmp.c                 |  6 ++++++
> >  3 files changed, 26 insertions(+)
> > 
> > diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> > index 62852e3caf..073a9da855 100644
> > --- a/chardev/char-socket.c
> > +++ b/chardev/char-socket.c
> > @@ -656,6 +656,24 @@ static void tcp_chr_telnet_destroy(SocketChardev *s)
> >      }
> >  }
> >  
> > +void socket_chr_listener_cleanup(Chardev *chr)
> > +{
> > +    SocketChardev *s = SOCKET_CHARDEV(chr);
> > +
> > +    if (s->listener) {
> > +        QIONetListener *listener = s->listener;
> > +        size_t i;
> > +
> > +        for (i = 0; i < listener->nsioc; i++) {
> 
> This directly accesses listener->nsioc outside of net-listener.c.
> I've got a pending patch that frowns on this type of usage (here's the
> link to v2; v3 is coming soon):
> 
> https://lore.kernel.org/qemu-devel/20251108230525.3169174-14-eblake@redhat.com/T/#m69a13da54c24ad55351b6a004ec1c0cba7a7b49c
> 
> But it might be possible to do what you want without peeking inside
> the listener; have you tested calling
> qio_net_listener_set_client_func_full() to change the callback to NULL
> prior to doing the handover to iothread, and then reregistering
> tcp_chr_accept after that point?

Thinking further, I think we may still have a problem in
QIONetListener even with my v3 patches:

| static gboolean qio_net_listener_channel_func(QIOChannel *ioc,
|                                               GIOCondition condition,
|                                               gpointer opaque)
| {
|     QIONetListener *listener = QIO_NET_LISTENER(opaque);
|     QIOChannelSocket *sioc;
|     QIONetListenerClientFunc io_func;
|     gpointer io_data;
|     GMainContext *context;
|     AioContext *aio_context;
| 
|     sioc = qio_channel_socket_accept(QIO_CHANNEL_SOCKET(ioc),
|                                      NULL);

This unconditionally tries to accept() the client's socket...

|     if (!sioc) {
|         return TRUE;
|     }
| 
|     WITH_QEMU_LOCK_GUARD(&listener->lock) {
|         io_func = listener->io_func;
|         io_data = listener->io_data;
|         context = listener->context;
|         aio_context = listener->aio_context;
|     }
| 
|     trace_qio_net_listener_callback(listener, io_func, context, aio_context);
|     if (io_func) {
|         io_func(listener, sioc, io_data);

...and if accepted, only then does it trigger the callback to let the
user know the sioc to work with.  But if there is no io_func currently
registered...

|     }
| 
|     object_unref(OBJECT(sioc));

...the client connection is discarded, with the client being unable to
connect if it managed to land in the window when the netlistener had
no async callback registered.

I have to wonder if we should be changing the order in this function
to not attempt the qemu_channel_socket_accept() unless there is a
callback registered, so that a client that is pending service in the
window where the user code does not have an async callback installed
will still be in the queue to accept the moment an async function is
registered.  (Plus thinking of any ripple effects on whether we also
need to ensure that we aren't burning CPU in a busy loop on polling
but not clearing what the poll is looking for)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-11-13 15:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-11 15:01 [PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race Jie Song
2025-11-12  8:59 ` Markus Armbruster
2025-11-12 15:31   ` Jie Song
2025-11-12  9:05 ` Daniel P. Berrangé
2025-11-12 14:57   ` Jie Song
2025-11-12 21:48 ` Eric Blake
2025-11-13 15:10   ` mail
2025-11-13 15:13   ` Eric Blake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).