All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org, "Prasad Pandit" <ppandit@redhat.com>,
	"Ben Chaney" <bchaney@akamai.com>,
	"Juraj Marcin" <jmarcin@redhat.com>,
	"Mark Kanda" <mark.kanda@oracle.com>,
	"Pranav Tyagi" <prtyagi@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH] migration: Fix crash on second migration when cancel early
Date: Thu, 23 Apr 2026 11:25:28 -0400	[thread overview]
Message-ID: <aeo56NITJuRd1FN5@x1.local> (raw)
In-Reply-To: <87qzo6ek7d.fsf@suse.de>

On Wed, Apr 22, 2026 at 06:52:54PM -0300, Fabiano Rosas wrote:
> > We're on the same page at least on that the gsources are not yet managed.
> > But I am not sure they're leaked.
> >
> > exec_connect_incoming() does qio_channel_add_watch_full(), within itself it
> > will release the refcount of the IO watch gsource.  It means it'll be a
> > dangling gsource on the main context.  So after the spawn of the new
> > process, it will still be finalized properly?
> >
> 
> When the migration_with_exec functional test fails [0], the test issues
> qmp-quit to both src and dst. The dst will exit before ever having
> dispatched the gsource and exec_accept_incoming_migration() is not
> executed at all. In that case the gsource and the ioc are never freed
> and the spawned process is left behind.
> 
> This oneliner triggers it for me:
> 
> for i in $(seq 1 1000); do \
> echo "$i ============="; \
> make -j$(nproc) check-func-quick || break; done; ps aux | grep socat
> 
> [0] - due to the startup race as the AI overlords told us. Patch coming
> soon!

Ohhhh, so it's about some failure path, ok.

> 
> >> watches like this:
> >> 
> >>   static inline void migration_watch_data(void) {};
> >>   
> >>   qio_channel_add_watch_full(ioc, G_IO_IN, exec_accept_incoming_migration,
> >>                              migration_watch_data, NULL,
> >>                              g_main_context_get_thread_default());
> >> 
> >> and at migrate_cancel():
> >> 
> >>   while(g_source_remove_by_user_data(migration_watch_data));
> >> 
> >> It feels to me that a wrapper around this, or even a hashtable of "func
> >> ptr->GSource" or "str->GSource id" would allow us to call a (say)
> >> qio_channel_clear_watches() and avoid having to do the management in
> >> each of the clients.
> >
> > In general, this whole idea sounds reasonable.  I just want to check with
> > you on which side of QEMU we're talking about.
> >
> > To me, dest QEMU is less of a concern when it can easily be killed.  But
> > still it's good to be able to manage those.  Src QEMU is more important
> > from that perspective.
> >
> 
> It applies to both src and dst. Whenever we have to add a watch. The
> non-uniformity of having the source removed maybe after it dispatches if
> we return G_SOURCE_REMOVE or maybe it doesn't dispatch and then it needs
> to be explicitly removed is error prone I think.
> 
> Looking at callers of qio_channel_add_watch[_full], all of them just
> store the gsource "tag" and later remove it. Having the callers each
> implement their own way of keeping track of the GSource pointer/id just
> to be able to free them later on seems unnecessary to me.

Yes, and we also need to keep in mind that the ID can be reused by glib
context right after removal of the gsource from the context, afaik.

Example I randomly picked: qio_channel_websock_handshake_io(), it remembers
the ID into hs_io_tag but it also needs to be very careful in its callback
function so that whenever qio_channel_websock_handshake_send() would return
FALSE (which should really be G_SOURCE_REMOVE..) it must reset hs_io_tag.

> 
> The caller could still decide when to destroy the source, but it could
> have semantics more like: "I'm done with all the sources".
> 
> Or just make it part of the iochannel finalize routine. A small list of
> source ids per channel seems reasonable.

Sounds good in general.

But one thing to mention is, IIUC even with this it won't fix the problem
you hit above with leftover spawned process and IOC.. because IOC's
finalize() won't get called..

-- 
Peter Xu



  reply	other threads:[~2026-04-23 15:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21 17:58 [PATCH] migration: Fix crash on second migration when cancel early Peter Xu
2026-04-22 13:31 ` Fabiano Rosas
2026-04-22 14:14   ` Fabiano Rosas
2026-04-22 15:32   ` Peter Xu
2026-04-22 21:52     ` Fabiano Rosas
2026-04-23 15:25       ` Peter Xu [this message]
2026-05-06 20:51 ` Fabiano Rosas
2026-05-07 17:23   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeo56NITJuRd1FN5@x1.local \
    --to=peterx@redhat.com \
    --cc=bchaney@akamai.com \
    --cc=berrange@redhat.com \
    --cc=farosas@suse.de \
    --cc=jmarcin@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mark.kanda@oracle.com \
    --cc=ppandit@redhat.com \
    --cc=prtyagi@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.