* [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels
@ 2025-09-18 20:39 Peter Xu
2025-09-18 20:39 ` [PATCH v3 1/2] io/crypto: Move tls premature termination handling into QIO layer Peter Xu
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Peter Xu @ 2025-09-18 20:39 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, Daniel P . Berrangé, Fabiano Rosas, Juraj Marcin
v3:
- Patch 1
- Update qcrypto_tls_session_read() doc to reflect the new retval [Dan]
- Update commit message to explain the qatomic_read() change [Dan]
- Patch 2 (old)
- Dropped for now, more at the end
This is v3 of the series.
Fabiano fixed graceful shutdowns for multifd channels previously:
https://lore.kernel.org/qemu-devel/20250206175824.22664-1-farosas@suse.de/
However we can still see an warning when running preempt unit test on TLS,
even though migration functionality will not be affected:
QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test --full -r /x86_64/migration/postcopy/preempt/tls/psk
...
qemu-kvm: Cannot read from TLS channel: The TLS connection was non-properly terminated.
...
It turns out this is because the crypto code only passes the ->shutdown
field into the read function, however that value can change concurrently in
another thread by a concurrent qio_channel_shutdown().
Patch 1 should fix this issue.
Patch 2 is something I found when looking at this problem, there's no known
issues I am aware of with them, however I still think they're logically
flawed, so I post them together here.
Please review, thanks.
============= ABOUT OLD PATCH 2 ===================
I dropped it for now to unblock almost patch 1, because patch 1 will fix a
real warning that can be triggered for not only qtest but also normal tls
postcopy migration.
While I was looking at temporary settings for multifd send iochannels to be
blocking always, I found I cannot explain how migration_tls_channel_end()
currently works, because it writes to the multifd iochannels while the
channels should still be owned (and can be written at the same time?) by
the sender threads. It sounds like a thread-safety issue, or is it not?
Peter Xu (2):
io/crypto: Move tls premature termination handling into QIO layer
migration: Make migration_has_failed() work even for CANCELLING
include/crypto/tlssession.h | 10 +++-------
crypto/tlssession.c | 7 ++-----
io/channel-tls.c | 21 +++++++++++++++++++--
migration/migration.c | 3 ++-
4 files changed, 26 insertions(+), 15 deletions(-)
--
2.50.1
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH v3 1/2] io/crypto: Move tls premature termination handling into QIO layer 2025-09-18 20:39 [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels Peter Xu @ 2025-09-18 20:39 ` Peter Xu 2025-09-18 20:39 ` [PATCH v3 2/2] migration: Make migration_has_failed() work even for CANCELLING Peter Xu 2025-09-18 21:17 ` [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels Fabiano Rosas 2 siblings, 0 replies; 8+ messages in thread From: Peter Xu @ 2025-09-18 20:39 UTC (permalink / raw) To: qemu-devel; +Cc: peterx, Daniel P . Berrangé, Fabiano Rosas, Juraj Marcin QCryptoTLSSession allows TLS premature termination in two cases, one of the case is when the channel shutdown() is invoked on READ side. It's possible the shutdown() happened after the read thread blocked at gnutls_record_recv(). In this case, we should allow the premature termination to happen. The problem is by the time qcrypto_tls_session_read() was invoked, tioc->shutdown may not have been set, so this may instead be treated as an error if there is concurrent shutdown() calls. To allow the flag to reflect the latest status of tioc->shutdown, move the check upper into the QIOChannel level, so as to read the flag only after QEMU gets an GNUTLS_E_PREMATURE_TERMINATION. When at it, introduce qio_channel_tls_allow_premature_termination() helper to make the condition checks easier to read. When doing so, change the qatomic_load_acquire() to qatomic_read(): here we don't need any ordering of memory accesses, but reading a flag. qatomic_read() would suffice because it guarantees fetching from memory. Nothing else we should need to order on memory access. This patch will fix a qemu qtest warning when running the preempt tls test, reporting premature termination: QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test --full -r /x86_64/migration/postcopy/preempt/tls/psk ... qemu-kvm: Cannot read from TLS channel: The TLS connection was non-properly terminated. ... In this specific case, the error was set by postcopy_preempt_thread, which normally will be concurrently shutdown()ed by the main thread. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> --- include/crypto/tlssession.h | 10 +++------- crypto/tlssession.c | 7 ++----- io/channel-tls.c | 21 +++++++++++++++++++-- 3 files changed, 24 insertions(+), 14 deletions(-) diff --git a/include/crypto/tlssession.h b/include/crypto/tlssession.h index 2f62ce2d67..2e9fe11cf6 100644 --- a/include/crypto/tlssession.h +++ b/include/crypto/tlssession.h @@ -110,6 +110,7 @@ typedef struct QCryptoTLSSession QCryptoTLSSession; #define QCRYPTO_TLS_SESSION_ERR_BLOCK -2 +#define QCRYPTO_TLS_SESSION_PREMATURE_TERMINATION -3 /** * qcrypto_tls_session_new: @@ -259,7 +260,6 @@ ssize_t qcrypto_tls_session_write(QCryptoTLSSession *sess, * @sess: the TLS session object * @buf: to fill with plain text received * @len: the length of @buf - * @gracefulTermination: treat premature termination as graceful EOF * @errp: pointer to hold returned error object * * Receive up to @len bytes of data from the remote peer @@ -267,22 +267,18 @@ ssize_t qcrypto_tls_session_write(QCryptoTLSSession *sess, * qcrypto_tls_session_set_callbacks(), decrypt it and * store it in @buf. * - * If @gracefulTermination is true, then a premature termination - * of the TLS session will be treated as indicating EOF, as - * opposed to an error. - * * It is an error to call this before * qcrypto_tls_session_handshake() returns * QCRYPTO_TLS_HANDSHAKE_COMPLETE * * Returns: the number of bytes received, * or QCRYPTO_TLS_SESSION_ERR_BLOCK if the receive would block, - * or -1 on error. + * or QCRYPTO_TLS_SESSION_PREMATURE_TERMINATION if a premature termination + * is detected, or -1 on error. */ ssize_t qcrypto_tls_session_read(QCryptoTLSSession *sess, char *buf, size_t len, - bool gracefulTermination, Error **errp); /** diff --git a/crypto/tlssession.c b/crypto/tlssession.c index 86d407a142..ac38c2121d 100644 --- a/crypto/tlssession.c +++ b/crypto/tlssession.c @@ -552,7 +552,6 @@ ssize_t qcrypto_tls_session_read(QCryptoTLSSession *session, char *buf, size_t len, - bool gracefulTermination, Error **errp) { ssize_t ret; @@ -570,9 +569,8 @@ qcrypto_tls_session_read(QCryptoTLSSession *session, if (ret < 0) { if (ret == GNUTLS_E_AGAIN) { return QCRYPTO_TLS_SESSION_ERR_BLOCK; - } else if ((ret == GNUTLS_E_PREMATURE_TERMINATION) && - gracefulTermination){ - return 0; + } else if (ret == GNUTLS_E_PREMATURE_TERMINATION) { + return QCRYPTO_TLS_SESSION_PREMATURE_TERMINATION; } else { if (session->rerr) { error_propagate(errp, session->rerr); @@ -789,7 +787,6 @@ ssize_t qcrypto_tls_session_read(QCryptoTLSSession *sess, char *buf, size_t len, - bool gracefulTermination, Error **errp) { error_setg(errp, "TLS requires GNUTLS support"); diff --git a/io/channel-tls.c b/io/channel-tls.c index a8248a9216..5a2c8188ce 100644 --- a/io/channel-tls.c +++ b/io/channel-tls.c @@ -346,6 +346,19 @@ static void qio_channel_tls_finalize(Object *obj) qcrypto_tls_session_free(ioc->session); } +static bool +qio_channel_tls_allow_premature_termination(QIOChannelTLS *tioc, int flags) +{ + if (flags & QIO_CHANNEL_READ_FLAG_RELAXED_EOF) { + return true; + } + + if (qatomic_read(&tioc->shutdown) & QIO_CHANNEL_SHUTDOWN_READ) { + return true; + } + + return false; +} static ssize_t qio_channel_tls_readv(QIOChannel *ioc, const struct iovec *iov, @@ -364,8 +377,6 @@ static ssize_t qio_channel_tls_readv(QIOChannel *ioc, tioc->session, iov[i].iov_base, iov[i].iov_len, - flags & QIO_CHANNEL_READ_FLAG_RELAXED_EOF || - qatomic_load_acquire(&tioc->shutdown) & QIO_CHANNEL_SHUTDOWN_READ, errp); if (ret == QCRYPTO_TLS_SESSION_ERR_BLOCK) { if (got) { @@ -373,6 +384,12 @@ static ssize_t qio_channel_tls_readv(QIOChannel *ioc, } else { return QIO_CHANNEL_ERR_BLOCK; } + } else if (ret == QCRYPTO_TLS_SESSION_PREMATURE_TERMINATION) { + if (qio_channel_tls_allow_premature_termination(tioc, flags)) { + ret = 0; + } else { + return -1; + } } else if (ret < 0) { return -1; } -- 2.50.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 2/2] migration: Make migration_has_failed() work even for CANCELLING 2025-09-18 20:39 [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels Peter Xu 2025-09-18 20:39 ` [PATCH v3 1/2] io/crypto: Move tls premature termination handling into QIO layer Peter Xu @ 2025-09-18 20:39 ` Peter Xu 2025-09-18 21:17 ` [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels Fabiano Rosas 2 siblings, 0 replies; 8+ messages in thread From: Peter Xu @ 2025-09-18 20:39 UTC (permalink / raw) To: qemu-devel; +Cc: peterx, Daniel P . Berrangé, Fabiano Rosas, Juraj Marcin No issue I hit, the change is only from code observation when I am looking at a TLS premature termination issue. We set CANCELLED very late, it means migration_has_failed() may not work correctly if it's invoked before updating CANCELLING to CANCELLED. Allow that state will make migration_has_failed() working as expected even if it's invoked slightly earlier. One current user is the multifd code for the TLS graceful termination, where it's before updating to CANCELLED. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> --- migration/migration.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/migration/migration.c b/migration/migration.c index 10c216d25d..f6f6a6e202 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1702,7 +1702,8 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type, bool migration_has_failed(MigrationState *s) { - return (s->state == MIGRATION_STATUS_CANCELLED || + return (s->state == MIGRATION_STATUS_CANCELLING || + s->state == MIGRATION_STATUS_CANCELLED || s->state == MIGRATION_STATUS_FAILED); } -- 2.50.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels 2025-09-18 20:39 [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels Peter Xu 2025-09-18 20:39 ` [PATCH v3 1/2] io/crypto: Move tls premature termination handling into QIO layer Peter Xu 2025-09-18 20:39 ` [PATCH v3 2/2] migration: Make migration_has_failed() work even for CANCELLING Peter Xu @ 2025-09-18 21:17 ` Fabiano Rosas 2025-09-18 21:46 ` Peter Xu 2 siblings, 1 reply; 8+ messages in thread From: Fabiano Rosas @ 2025-09-18 21:17 UTC (permalink / raw) To: Peter Xu, qemu-devel; +Cc: peterx, Daniel P . Berrangé, Juraj Marcin Peter Xu <peterx@redhat.com> writes: > v3: > - Patch 1 > - Update qcrypto_tls_session_read() doc to reflect the new retval [Dan] > - Update commit message to explain the qatomic_read() change [Dan] > - Patch 2 (old) > - Dropped for now, more at the end > > This is v3 of the series. > > Fabiano fixed graceful shutdowns for multifd channels previously: > > https://lore.kernel.org/qemu-devel/20250206175824.22664-1-farosas@suse.de/ > > However we can still see an warning when running preempt unit test on TLS, > even though migration functionality will not be affected: > > QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test --full -r /x86_64/migration/postcopy/preempt/tls/psk > ... > qemu-kvm: Cannot read from TLS channel: The TLS connection was non-properly terminated. > ... > > It turns out this is because the crypto code only passes the ->shutdown > field into the read function, however that value can change concurrently in > another thread by a concurrent qio_channel_shutdown(). > > Patch 1 should fix this issue. > > Patch 2 is something I found when looking at this problem, there's no known > issues I am aware of with them, however I still think they're logically > flawed, so I post them together here. > > Please review, thanks. > > ============= ABOUT OLD PATCH 2 =================== > > I dropped it for now to unblock almost patch 1, because patch 1 will fix a > real warning that can be triggered for not only qtest but also normal tls > postcopy migration. > > While I was looking at temporary settings for multifd send iochannels to be > blocking always, I found I cannot explain how migration_tls_channel_end() > currently works, because it writes to the multifd iochannels while the > channels should still be owned (and can be written at the same time?) by > the sender threads. It sounds like a thread-safety issue, or is it not? > IIUC, the multifd channels will be stuck at p->sem because this is the success path so migration will have already finished when we reach migration_cleanup(). The ram/device state migration will hold the main thread until the multifd channels finish transferring. > Peter Xu (2): > io/crypto: Move tls premature termination handling into QIO layer > migration: Make migration_has_failed() work even for CANCELLING > > include/crypto/tlssession.h | 10 +++------- > crypto/tlssession.c | 7 ++----- > io/channel-tls.c | 21 +++++++++++++++++++-- > migration/migration.c | 3 ++- > 4 files changed, 26 insertions(+), 15 deletions(-) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels 2025-09-18 21:17 ` [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels Fabiano Rosas @ 2025-09-18 21:46 ` Peter Xu 2025-09-19 13:50 ` Fabiano Rosas 0 siblings, 1 reply; 8+ messages in thread From: Peter Xu @ 2025-09-18 21:46 UTC (permalink / raw) To: Fabiano Rosas; +Cc: qemu-devel, Daniel P . Berrangé, Juraj Marcin On Thu, Sep 18, 2025 at 06:17:37PM -0300, Fabiano Rosas wrote: > > ============= ABOUT OLD PATCH 2 =================== > > > > I dropped it for now to unblock almost patch 1, because patch 1 will fix a > > real warning that can be triggered for not only qtest but also normal tls > > postcopy migration. > > > > While I was looking at temporary settings for multifd send iochannels to be > > blocking always, I found I cannot explain how migration_tls_channel_end() > > currently works, because it writes to the multifd iochannels while the > > channels should still be owned (and can be written at the same time?) by > > the sender threads. It sounds like a thread-safety issue, or is it not? > > > > IIUC, the multifd channels will be stuck at p->sem because this is the > success path so migration will have already finished when we reach > migration_cleanup(). The ram/device state migration will hold the main > thread until the multifd channels finish transferring. For success cases, indeed. However this is not the success path? After all, we check migration_has_failed(). Should I then send a patch to only send bye() when succeeded? Then I can also add some comment. I wished we could assert. Then the "temporarily changing nonblock mode" will also rely on this one, because ideally we shouldn't touch the fd nonblocking mode if some other thread is operating on it. The other thing is I also think we shouldn't rely on checking "p->tls_thread_created && p->thread_created" but only rely on channel type, which might be more straightforward (I almost did it in v1, but v2 rewrote things so it was lost). -- Peter Xu ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels 2025-09-18 21:46 ` Peter Xu @ 2025-09-19 13:50 ` Fabiano Rosas 2025-09-22 20:18 ` Peter Xu 0 siblings, 1 reply; 8+ messages in thread From: Fabiano Rosas @ 2025-09-19 13:50 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, Daniel P . Berrangé, Juraj Marcin Peter Xu <peterx@redhat.com> writes: > On Thu, Sep 18, 2025 at 06:17:37PM -0300, Fabiano Rosas wrote: >> > ============= ABOUT OLD PATCH 2 =================== >> > >> > I dropped it for now to unblock almost patch 1, because patch 1 will fix a >> > real warning that can be triggered for not only qtest but also normal tls >> > postcopy migration. >> > >> > While I was looking at temporary settings for multifd send iochannels to be >> > blocking always, I found I cannot explain how migration_tls_channel_end() >> > currently works, because it writes to the multifd iochannels while the >> > channels should still be owned (and can be written at the same time?) by >> > the sender threads. It sounds like a thread-safety issue, or is it not? >> > >> >> IIUC, the multifd channels will be stuck at p->sem because this is the >> success path so migration will have already finished when we reach >> migration_cleanup(). The ram/device state migration will hold the main >> thread until the multifd channels finish transferring. > > For success cases, indeed. However this is not the success path? After > all, we check migration_has_failed(). > My point is that when we reach here, if migration has succeeded, then it should be ok. If not, then thread-safety doesn't matter because things have already went bad, we'll lose the destination anyway. > Should I then send a patch to only send bye() when succeeded? Then I can > also add some comment. I wished we could assert. Then the "temporarily > changing nonblock mode" will also rely on this one, because ideally we > shouldn't touch the fd nonblocking mode if some other thread is operating > on it. > I don't know if it changes much. Currently we basically always ignore the error from bye(). > The other thing is I also think we shouldn't rely on checking > "p->tls_thread_created && p->thread_created" but only rely on channel type, > which might be more straightforward (I almost did it in v1, but v2 rewrote > things so it was lost). Ok, but we may need to ensure bye() is not called before the session is initiated. So thread_created may still be needed? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels 2025-09-19 13:50 ` Fabiano Rosas @ 2025-09-22 20:18 ` Peter Xu 2025-09-22 21:41 ` Fabiano Rosas 0 siblings, 1 reply; 8+ messages in thread From: Peter Xu @ 2025-09-22 20:18 UTC (permalink / raw) To: Fabiano Rosas; +Cc: qemu-devel, Daniel P . Berrangé, Juraj Marcin On Fri, Sep 19, 2025 at 10:50:56AM -0300, Fabiano Rosas wrote: > Peter Xu <peterx@redhat.com> writes: > > > On Thu, Sep 18, 2025 at 06:17:37PM -0300, Fabiano Rosas wrote: > >> > ============= ABOUT OLD PATCH 2 =================== > >> > > >> > I dropped it for now to unblock almost patch 1, because patch 1 will fix a > >> > real warning that can be triggered for not only qtest but also normal tls > >> > postcopy migration. > >> > > >> > While I was looking at temporary settings for multifd send iochannels to be > >> > blocking always, I found I cannot explain how migration_tls_channel_end() > >> > currently works, because it writes to the multifd iochannels while the > >> > channels should still be owned (and can be written at the same time?) by > >> > the sender threads. It sounds like a thread-safety issue, or is it not? > >> > > >> > >> IIUC, the multifd channels will be stuck at p->sem because this is the > >> success path so migration will have already finished when we reach > >> migration_cleanup(). The ram/device state migration will hold the main > >> thread until the multifd channels finish transferring. > > > > For success cases, indeed. However this is not the success path? After > > all, we check migration_has_failed(). > > > > My point is that when we reach here, if migration has succeeded, then it > should be ok. If not, then thread-safety doesn't matter because things > have already went bad, we'll lose the destination anyway. I'm not sure if it matters or not, maybe it depends on how bad it is when a race happened. If it's a tcp channel, it might be easier; the worst case is we write() concurrently in two threads and the output stream, IIUC, can be interleaved with the two buffers we write. Not an issue if migration failed anyway. However this is only needed for TLS, hence I have no idea what happens if gnutls writes concurrently. I don't think GnuTLS supports concurrent writters. I'm not sure if it means there's still chance src QEMU (when having a failed live migration) can crash. So.. I still think it might be wise we only bye() after knowing it is a success, not only because that looks like the only way to make sure it's thread-safe, but also because a bye() is only needed if it didn't fail. Sending it ignoring error is another way of doing so, but it doesn't avoid the possible result of a race (even if I totally agree it is unlikely..). > > > Should I then send a patch to only send bye() when succeeded? Then I can > > also add some comment. I wished we could assert. Then the "temporarily > > changing nonblock mode" will also rely on this one, because ideally we > > shouldn't touch the fd nonblocking mode if some other thread is operating > > on it. > > > > I don't know if it changes much. Currently we basically always ignore > the error from bye(). > > > The other thing is I also think we shouldn't rely on checking > > "p->tls_thread_created && p->thread_created" but only rely on channel type, > > which might be more straightforward (I almost did it in v1, but v2 rewrote > > things so it was lost). > > Ok, but we may need to ensure bye() is not called before the session is > initiated. So thread_created may still be needed? In v1, I was using "object_dynamic_cast((Object *)c, TYPE_QIO_CHANNEL_TLS)": https://lore.kernel.org/all/20250910160144.1762894-4-peterx@redhat.com/ Would that work the same, but without relying on "thread_created" vars? -- Peter Xu ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels 2025-09-22 20:18 ` Peter Xu @ 2025-09-22 21:41 ` Fabiano Rosas 0 siblings, 0 replies; 8+ messages in thread From: Fabiano Rosas @ 2025-09-22 21:41 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, Daniel P . Berrangé, Juraj Marcin Peter Xu <peterx@redhat.com> writes: > On Fri, Sep 19, 2025 at 10:50:56AM -0300, Fabiano Rosas wrote: >> Peter Xu <peterx@redhat.com> writes: >> >> > On Thu, Sep 18, 2025 at 06:17:37PM -0300, Fabiano Rosas wrote: >> >> > ============= ABOUT OLD PATCH 2 =================== >> >> > >> >> > I dropped it for now to unblock almost patch 1, because patch 1 will fix a >> >> > real warning that can be triggered for not only qtest but also normal tls >> >> > postcopy migration. >> >> > >> >> > While I was looking at temporary settings for multifd send iochannels to be >> >> > blocking always, I found I cannot explain how migration_tls_channel_end() >> >> > currently works, because it writes to the multifd iochannels while the >> >> > channels should still be owned (and can be written at the same time?) by >> >> > the sender threads. It sounds like a thread-safety issue, or is it not? >> >> > >> >> >> >> IIUC, the multifd channels will be stuck at p->sem because this is the >> >> success path so migration will have already finished when we reach >> >> migration_cleanup(). The ram/device state migration will hold the main >> >> thread until the multifd channels finish transferring. >> > >> > For success cases, indeed. However this is not the success path? After >> > all, we check migration_has_failed(). >> > >> >> My point is that when we reach here, if migration has succeeded, then it >> should be ok. If not, then thread-safety doesn't matter because things >> have already went bad, we'll lose the destination anyway. > > I'm not sure if it matters or not, maybe it depends on how bad it is when a > race happened. > > If it's a tcp channel, it might be easier; the worst case is we write() > concurrently in two threads and the output stream, IIUC, can be interleaved > with the two buffers we write. Not an issue if migration failed anyway. > > However this is only needed for TLS, hence I have no idea what happens if > gnutls writes concurrently. I don't think GnuTLS supports concurrent > writters. I'm not sure if it means there's still chance src QEMU (when > having a failed live migration) can crash. > > So.. I still think it might be wise we only bye() after knowing it is a > success, not only because that looks like the only way to make sure it's > thread-safe, but also because a bye() is only needed if it didn't fail. > Sending it ignoring error is another way of doing so, but it doesn't avoid > the possible result of a race (even if I totally agree it is unlikely..). > ok >> >> > Should I then send a patch to only send bye() when succeeded? Then I can >> > also add some comment. I wished we could assert. Then the "temporarily >> > changing nonblock mode" will also rely on this one, because ideally we >> > shouldn't touch the fd nonblocking mode if some other thread is operating >> > on it. >> > >> >> I don't know if it changes much. Currently we basically always ignore >> the error from bye(). >> >> > The other thing is I also think we shouldn't rely on checking >> > "p->tls_thread_created && p->thread_created" but only rely on channel type, >> > which might be more straightforward (I almost did it in v1, but v2 rewrote >> > things so it was lost). >> >> Ok, but we may need to ensure bye() is not called before the session is >> initiated. So thread_created may still be needed? > > In v1, I was using "object_dynamic_cast((Object *)c, TYPE_QIO_CHANNEL_TLS)": > > https://lore.kernel.org/all/20250910160144.1762894-4-peterx@redhat.com/ > > Would that work the same, but without relying on "thread_created" > vars? Ok, I'm convinced. migration_cleanup() -> multifd_send_shutdown() -> bye() cannot happen before thread_create=true because multifd_send_setup() blocks the migration_thread until the channels have been fully created. Go ahead then! ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-09-22 21:43 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-18 20:39 [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels Peter Xu 2025-09-18 20:39 ` [PATCH v3 1/2] io/crypto: Move tls premature termination handling into QIO layer Peter Xu 2025-09-18 20:39 ` [PATCH v3 2/2] migration: Make migration_has_failed() work even for CANCELLING Peter Xu 2025-09-18 21:17 ` [PATCH v3 0/2] migration/tls: Graceful shutdowns for main and postcopy channels Fabiano Rosas 2025-09-18 21:46 ` Peter Xu 2025-09-19 13:50 ` Fabiano Rosas 2025-09-22 20:18 ` Peter Xu 2025-09-22 21:41 ` Fabiano Rosas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).