From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org,
"Daniel P . Berrangé" <berrange@redhat.com>,
"Juraj Marcin" <jmarcin@redhat.com>
Subject: Re: [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels
Date: Mon, 22 Sep 2025 18:41:43 -0300 [thread overview]
Message-ID: <874isumag8.fsf@suse.de> (raw)
In-Reply-To: <aNGvGDShRyBI80XK@x1.local>
Peter Xu <peterx@redhat.com> writes:
> On Fri, Sep 19, 2025 at 10:50:56AM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>> > On Thu, Sep 18, 2025 at 06:17:37PM -0300, Fabiano Rosas wrote:
>> >> > ============= ABOUT OLD PATCH 2 ===================
>> >> >
>> >> > I dropped it for now to unblock almost patch 1, because patch 1 will fix a
>> >> > real warning that can be triggered for not only qtest but also normal tls
>> >> > postcopy migration.
>> >> >
>> >> > While I was looking at temporary settings for multifd send iochannels to be
>> >> > blocking always, I found I cannot explain how migration_tls_channel_end()
>> >> > currently works, because it writes to the multifd iochannels while the
>> >> > channels should still be owned (and can be written at the same time?) by
>> >> > the sender threads. It sounds like a thread-safety issue, or is it not?
>> >> >
>> >>
>> >> IIUC, the multifd channels will be stuck at p->sem because this is the
>> >> success path so migration will have already finished when we reach
>> >> migration_cleanup(). The ram/device state migration will hold the main
>> >> thread until the multifd channels finish transferring.
>> >
>> > For success cases, indeed. However this is not the success path? After
>> > all, we check migration_has_failed().
>> >
>>
>> My point is that when we reach here, if migration has succeeded, then it
>> should be ok. If not, then thread-safety doesn't matter because things
>> have already went bad, we'll lose the destination anyway.
>
> I'm not sure if it matters or not, maybe it depends on how bad it is when a
> race happened.
>
> If it's a tcp channel, it might be easier; the worst case is we write()
> concurrently in two threads and the output stream, IIUC, can be interleaved
> with the two buffers we write. Not an issue if migration failed anyway.
>
> However this is only needed for TLS, hence I have no idea what happens if
> gnutls writes concurrently. I don't think GnuTLS supports concurrent
> writters. I'm not sure if it means there's still chance src QEMU (when
> having a failed live migration) can crash.
>
> So.. I still think it might be wise we only bye() after knowing it is a
> success, not only because that looks like the only way to make sure it's
> thread-safe, but also because a bye() is only needed if it didn't fail.
> Sending it ignoring error is another way of doing so, but it doesn't avoid
> the possible result of a race (even if I totally agree it is unlikely..).
>
ok
>>
>> > Should I then send a patch to only send bye() when succeeded? Then I can
>> > also add some comment. I wished we could assert. Then the "temporarily
>> > changing nonblock mode" will also rely on this one, because ideally we
>> > shouldn't touch the fd nonblocking mode if some other thread is operating
>> > on it.
>> >
>>
>> I don't know if it changes much. Currently we basically always ignore
>> the error from bye().
>>
>> > The other thing is I also think we shouldn't rely on checking
>> > "p->tls_thread_created && p->thread_created" but only rely on channel type,
>> > which might be more straightforward (I almost did it in v1, but v2 rewrote
>> > things so it was lost).
>>
>> Ok, but we may need to ensure bye() is not called before the session is
>> initiated. So thread_created may still be needed?
>
> In v1, I was using "object_dynamic_cast((Object *)c, TYPE_QIO_CHANNEL_TLS)":
>
> https://lore.kernel.org/all/20250910160144.1762894-4-peterx@redhat.com/
>
> Would that work the same, but without relying on "thread_created"
> vars?
Ok, I'm convinced. migration_cleanup() -> multifd_send_shutdown() ->
bye() cannot happen before thread_create=true because
multifd_send_setup() blocks the migration_thread until the channels have
been fully created. Go ahead then!
prev parent reply other threads:[~2025-09-22 21:43 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-18 20:39 [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels Peter Xu
2025-09-18 20:39 ` [PATCH v3 1/2] io/crypto: Move tls premature termination handling into QIO layer Peter Xu
2025-09-18 20:39 ` [PATCH v3 2/2] migration: Make migration_has_failed() work even for CANCELLING Peter Xu
2025-09-18 21:17 ` [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels Fabiano Rosas
2025-09-18 21:46 ` Peter Xu
2025-09-19 13:50 ` Fabiano Rosas
2025-09-22 20:18 ` Peter Xu
2025-09-22 21:41 ` Fabiano Rosas [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874isumag8.fsf@suse.de \
--to=farosas@suse.de \
--cc=berrange@redhat.com \
--cc=jmarcin@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).