From: Fabiano Rosas <farosas@suse.de>
To: Avihai Horon <avihaih@nvidia.com>, qemu-devel@nongnu.org
Cc: "Peter Xu" <peterx@redhat.com>,
"Daniel P . Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH v2 5/6] migration/multifd: Unify multifd and TLS connection paths
Date: Tue, 06 Feb 2024 11:30:44 -0300 [thread overview]
Message-ID: <87fry57jhn.fsf@suse.de> (raw)
In-Reply-To: <e4f9337d-ed7d-4a11-84dd-76661009c3fc@nvidia.com>
Avihai Horon <avihaih@nvidia.com> writes:
> On 05/02/2024 21:49, Fabiano Rosas wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> During multifd channel creation (multifd_send_new_channel_async) when
>> TLS is enabled, the multifd_channel_connect function is called twice,
>> once to create the TLS handshake thread and another time after the
>> asynchrounous TLS handshake has finished.
>>
>> This creates a slightly confusing call stack where
>> multifd_channel_connect() is called more times than the number of
>> channels. It also splits error handling between the two callers of
>> multifd_channel_connect() causing some code duplication. Lastly, it
>> gets in the way of having a single point to determine whether all
>> channel creation tasks have been initiated.
>>
>> Refactor the code to move the reentrancy one level up at the
>> multifd_new_send_channel_async() level, de-duplicating the error
>> handling and allowing for the next patch to introduce a
>> synchronization point common to all the multifd channel creation,
>> regardless of TLS.
>>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>> migration/multifd.c | 73 +++++++++++++++++++--------------------------
>> 1 file changed, 30 insertions(+), 43 deletions(-)
>>
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index cc10be2c3f..89d39fa67c 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -869,30 +869,7 @@ out:
>> return NULL;
>> }
>>
>> -static bool multifd_channel_connect(MultiFDSendParams *p,
>> - QIOChannel *ioc,
>> - Error **errp);
>> -
>> -static void multifd_tls_outgoing_handshake(QIOTask *task,
>> - gpointer opaque)
>> -{
>> - MultiFDSendParams *p = opaque;
>> - QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
>> - Error *err = NULL;
>> -
>> - if (!qio_task_propagate_error(task, &err)) {
>> - trace_multifd_tls_outgoing_handshake_complete(ioc);
>> - if (multifd_channel_connect(p, ioc, &err)) {
>> - return;
>> - }
>> - }
>> -
>> - trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err));
>> -
>> - multifd_send_set_error(err);
>> - multifd_send_kick_main(p);
>> - error_free(err);
>> -}
>> +static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque);
>>
>> static void *multifd_tls_handshake_thread(void *opaque)
>> {
>> @@ -900,7 +877,7 @@ static void *multifd_tls_handshake_thread(void *opaque)
>> QIOChannelTLS *tioc = QIO_CHANNEL_TLS(p->c);
>>
>> qio_channel_tls_handshake(tioc,
>> - multifd_tls_outgoing_handshake,
>> + multifd_new_send_channel_async,
>> p,
>> NULL,
>> NULL);
>> @@ -936,19 +913,6 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
>> QIOChannel *ioc,
>> Error **errp)
>> {
>> - trace_multifd_set_outgoing_channel(
>> - ioc, object_get_typename(OBJECT(ioc)),
>> - migrate_get_current()->hostname);
>> -
>> - if (migrate_channel_requires_tls_upgrade(ioc)) {
>> - /*
>> - * tls_channel_connect will call back to this
>> - * function after the TLS handshake,
>> - * so we mustn't call multifd_send_thread until then
>> - */
>> - return multifd_tls_channel_connect(p, ioc, errp);
>> - }
>> -
>> migration_ioc_register_yank(ioc);
>> p->registered_yank = true;
>> p->c = ioc;
>> @@ -959,20 +923,43 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
>> return true;
>> }
>>
>> +/*
>> + * When TLS is enabled this function is called once to establish the
>> + * TLS connection and a second time after the TLS handshake to create
>> + * the multifd channel. Without TLS it goes straight into the channel
>> + * creation.
>> + */
>> static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>> {
>> MultiFDSendParams *p = opaque;
>> QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
>> Error *local_err = NULL;
>>
>> + bool ret;
>> +
>> trace_multifd_new_send_channel_async(p->id);
>> - if (!qio_task_propagate_error(task, &local_err)) {
>> - qio_channel_set_delay(ioc, false);
>> - if (multifd_channel_connect(p, ioc, &local_err)) {
>> - return;
>> - }
>> +
>> + if (qio_task_propagate_error(task, &local_err)) {
>> + ret = false;
>> + goto out;
>> + }
>
> I think this common error handling for both TLS/non-TLS is a bit
> problematic if there is an error in TLS handshake:
> multifd_tls_channel_connect() sets p->c = QIO_CHANNEL(tioc).
> TLS handshake fails.
> multifd_new_send_channel_async() errors and calls
> object_unref(OBJECT(ioc)) which will result in freeing the IOC.
> Then, multifd_send_terminate_threads() will try to access p->ioc because
> it's not NULL, causing a segfault.
Good catch.
I'm not sure the current reference counting is even correct. AFAICS, the
refcount is 2 at new_send_channel_async due to the qio_task taking a
reference and that will be decremented after we return from the
completion callback, which is multifd_new_send_channel_async itself. The
last reference should be dropped when we cleanup the channel.
So I don't really understand the need for that unref there. But there's
no asserts being reached due to an extra decrement, so there might be
some extra increment hiding somewhere.
Anyway, I'll figure this out and update this patch. Thanks
>> +
>> + qio_channel_set_delay(ioc, false);
>
> Maybe qio_channel_set_delay() should be moved inside
> multifd_channel_connect()? It's called two times when TLS is used.
>
It looks like it could, I'll do that.
>> +
>> + trace_multifd_set_outgoing_channel(ioc, object_get_typename(OBJECT(ioc)),
>> + migrate_get_current()->hostname);
>> +
>> + if (migrate_channel_requires_tls_upgrade(ioc)) {
>> + ret = multifd_tls_channel_connect(p, ioc, &local_err);
>> + } else {
>> + ret = multifd_channel_connect(p, ioc, &local_err);
>> + }
>> +
>> + if (ret) {
>> + return;
>> }
>>
>> +out:
>> trace_multifd_new_send_channel_async_error(p->id, local_err);
>> multifd_send_set_error(local_err);
>> multifd_send_kick_main(p);
>> --
>> 2.35.3
>>
next prev parent reply other threads:[~2024-02-06 14:31 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-05 19:49 [PATCH v2 0/6] migration/multifd: Fix channel creation vs. cleanup races Fabiano Rosas
2024-02-05 19:49 ` [PATCH v2 1/6] migration/multifd: Join the TLS thread Fabiano Rosas
2024-02-06 8:53 ` Daniel P. Berrangé
2024-02-06 9:15 ` Peter Xu
2024-02-06 10:06 ` Daniel P. Berrangé
2024-02-05 19:49 ` [PATCH v2 2/6] migration/multifd: Remove p->running Fabiano Rosas
2024-02-05 19:49 ` [PATCH v2 3/6] migration/multifd: Move multifd_send_setup error handling in to the function Fabiano Rosas
2024-02-05 19:49 ` [PATCH v2 4/6] migration/multifd: Move multifd_send_setup into migration thread Fabiano Rosas
2024-02-05 19:49 ` [PATCH v2 5/6] migration/multifd: Unify multifd and TLS connection paths Fabiano Rosas
2024-02-06 3:33 ` Peter Xu
2024-02-06 12:44 ` Avihai Horon
2024-02-06 14:30 ` Fabiano Rosas [this message]
2024-02-06 14:44 ` Avihai Horon
2024-02-05 19:49 ` [PATCH v2 6/6] migration/multifd: Add a synchronization point for channel creation Fabiano Rosas
2024-02-06 3:37 ` Peter Xu
2024-02-06 3:42 ` [PATCH v2 0/6] migration/multifd: Fix channel creation vs. cleanup races Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87fry57jhn.fsf@suse.de \
--to=farosas@suse.de \
--cc=avihaih@nvidia.com \
--cc=berrange@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).