qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org,
	"Daniel P . Berrangé" <berrange@redhat.com>,
	"Juraj Marcin" <jmarcin@redhat.com>
Subject: Re: [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels
Date: Mon, 22 Sep 2025 16:18:32 -0400	[thread overview]
Message-ID: <aNGvGDShRyBI80XK@x1.local> (raw)
In-Reply-To: <871po2v9dr.fsf@suse.de>

On Fri, Sep 19, 2025 at 10:50:56AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Thu, Sep 18, 2025 at 06:17:37PM -0300, Fabiano Rosas wrote:
> >> > ============= ABOUT OLD PATCH 2 ===================
> >> >
> >> > I dropped it for now to unblock almost patch 1, because patch 1 will fix a
> >> > real warning that can be triggered for not only qtest but also normal tls
> >> > postcopy migration.
> >> >
> >> > While I was looking at temporary settings for multifd send iochannels to be
> >> > blocking always, I found I cannot explain how migration_tls_channel_end()
> >> > currently works, because it writes to the multifd iochannels while the
> >> > channels should still be owned (and can be written at the same time?) by
> >> > the sender threads.  It sounds like a thread-safety issue, or is it not?
> >> >
> >> 
> >> IIUC, the multifd channels will be stuck at p->sem because this is the
> >> success path so migration will have already finished when we reach
> >> migration_cleanup(). The ram/device state migration will hold the main
> >> thread until the multifd channels finish transferring.
> >
> > For success cases, indeed.  However this is not the success path?  After
> > all, we check migration_has_failed().
> >
> 
> My point is that when we reach here, if migration has succeeded, then it
> should be ok. If not, then thread-safety doesn't matter because things
> have already went bad, we'll lose the destination anyway.

I'm not sure if it matters or not, maybe it depends on how bad it is when a
race happened.

If it's a tcp channel, it might be easier; the worst case is we write()
concurrently in two threads and the output stream, IIUC, can be interleaved
with the two buffers we write.  Not an issue if migration failed anyway.

However this is only needed for TLS, hence I have no idea what happens if
gnutls writes concurrently.  I don't think GnuTLS supports concurrent
writters.  I'm not sure if it means there's still chance src QEMU (when
having a failed live migration) can crash.

So.. I still think it might be wise we only bye() after knowing it is a
success, not only because that looks like the only way to make sure it's
thread-safe, but also because a bye() is only needed if it didn't fail.
Sending it ignoring error is another way of doing so, but it doesn't avoid
the possible result of a race (even if I totally agree it is unlikely..).

> 
> > Should I then send a patch to only send bye() when succeeded?  Then I can
> > also add some comment.  I wished we could assert.  Then the "temporarily
> > changing nonblock mode" will also rely on this one, because ideally we
> > shouldn't touch the fd nonblocking mode if some other thread is operating
> > on it.
> >
> 
> I don't know if it changes much. Currently we basically always ignore
> the error from bye().
> 
> > The other thing is I also think we shouldn't rely on checking
> > "p->tls_thread_created && p->thread_created" but only rely on channel type,
> > which might be more straightforward (I almost did it in v1, but v2 rewrote
> > things so it was lost).
> 
> Ok, but we may need to ensure bye() is not called before the session is
> initiated. So thread_created may still be needed?

In v1, I was using "object_dynamic_cast((Object *)c, TYPE_QIO_CHANNEL_TLS)":

https://lore.kernel.org/all/20250910160144.1762894-4-peterx@redhat.com/

Would that work the same, but without relying on "thread_created" vars?

-- 
Peter Xu



  reply	other threads:[~2025-09-22 20:19 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-18 20:39 [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels Peter Xu
2025-09-18 20:39 ` [PATCH v3 1/2] io/crypto: Move tls premature termination handling into QIO layer Peter Xu
2025-09-18 20:39 ` [PATCH v3 2/2] migration: Make migration_has_failed() work even for CANCELLING Peter Xu
2025-09-18 21:17 ` [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels Fabiano Rosas
2025-09-18 21:46   ` Peter Xu
2025-09-19 13:50     ` Fabiano Rosas
2025-09-22 20:18       ` Peter Xu [this message]
2025-09-22 21:41         ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aNGvGDShRyBI80XK@x1.local \
    --to=peterx@redhat.com \
    --cc=berrange@redhat.com \
    --cc=farosas@suse.de \
    --cc=jmarcin@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).