From: Avihai Horon <avihaih@nvidia.com>
To: Fabiano Rosas <farosas@suse.de>, qemu-devel@nongnu.org
Cc: "Peter Xu" <peterx@redhat.com>,
"Daniel P . Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH v2 5/6] migration/multifd: Unify multifd and TLS connection paths
Date: Tue, 6 Feb 2024 16:44:28 +0200 [thread overview]
Message-ID: <23a2160b-4410-41e5-a760-b1052a0034f9@nvidia.com> (raw)
In-Reply-To: <87fry57jhn.fsf@suse.de>
On 06/02/2024 16:30, Fabiano Rosas wrote:
> External email: Use caution opening links or attachments
>
>
> Avihai Horon <avihaih@nvidia.com> writes:
>
>> On 05/02/2024 21:49, Fabiano Rosas wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> During multifd channel creation (multifd_send_new_channel_async) when
>>> TLS is enabled, the multifd_channel_connect function is called twice,
>>> once to create the TLS handshake thread and another time after the
>>> asynchrounous TLS handshake has finished.
>>>
>>> This creates a slightly confusing call stack where
>>> multifd_channel_connect() is called more times than the number of
>>> channels. It also splits error handling between the two callers of
>>> multifd_channel_connect() causing some code duplication. Lastly, it
>>> gets in the way of having a single point to determine whether all
>>> channel creation tasks have been initiated.
>>>
>>> Refactor the code to move the reentrancy one level up at the
>>> multifd_new_send_channel_async() level, de-duplicating the error
>>> handling and allowing for the next patch to introduce a
>>> synchronization point common to all the multifd channel creation,
>>> regardless of TLS.
>>>
>>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>> ---
>>> migration/multifd.c | 73 +++++++++++++++++++--------------------------
>>> 1 file changed, 30 insertions(+), 43 deletions(-)
>>>
>>> diff --git a/migration/multifd.c b/migration/multifd.c
>>> index cc10be2c3f..89d39fa67c 100644
>>> --- a/migration/multifd.c
>>> +++ b/migration/multifd.c
>>> @@ -869,30 +869,7 @@ out:
>>> return NULL;
>>> }
>>>
>>> -static bool multifd_channel_connect(MultiFDSendParams *p,
>>> - QIOChannel *ioc,
>>> - Error **errp);
>>> -
>>> -static void multifd_tls_outgoing_handshake(QIOTask *task,
>>> - gpointer opaque)
>>> -{
>>> - MultiFDSendParams *p = opaque;
>>> - QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
>>> - Error *err = NULL;
>>> -
>>> - if (!qio_task_propagate_error(task, &err)) {
>>> - trace_multifd_tls_outgoing_handshake_complete(ioc);
>>> - if (multifd_channel_connect(p, ioc, &err)) {
>>> - return;
>>> - }
>>> - }
>>> -
>>> - trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err));
>>> -
>>> - multifd_send_set_error(err);
>>> - multifd_send_kick_main(p);
>>> - error_free(err);
>>> -}
>>> +static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque);
>>>
>>> static void *multifd_tls_handshake_thread(void *opaque)
>>> {
>>> @@ -900,7 +877,7 @@ static void *multifd_tls_handshake_thread(void *opaque)
>>> QIOChannelTLS *tioc = QIO_CHANNEL_TLS(p->c);
>>>
>>> qio_channel_tls_handshake(tioc,
>>> - multifd_tls_outgoing_handshake,
>>> + multifd_new_send_channel_async,
>>> p,
>>> NULL,
>>> NULL);
>>> @@ -936,19 +913,6 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
>>> QIOChannel *ioc,
>>> Error **errp)
>>> {
>>> - trace_multifd_set_outgoing_channel(
>>> - ioc, object_get_typename(OBJECT(ioc)),
>>> - migrate_get_current()->hostname);
>>> -
>>> - if (migrate_channel_requires_tls_upgrade(ioc)) {
>>> - /*
>>> - * tls_channel_connect will call back to this
>>> - * function after the TLS handshake,
>>> - * so we mustn't call multifd_send_thread until then
>>> - */
>>> - return multifd_tls_channel_connect(p, ioc, errp);
>>> - }
>>> -
>>> migration_ioc_register_yank(ioc);
>>> p->registered_yank = true;
>>> p->c = ioc;
>>> @@ -959,20 +923,43 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
>>> return true;
>>> }
>>>
>>> +/*
>>> + * When TLS is enabled this function is called once to establish the
>>> + * TLS connection and a second time after the TLS handshake to create
>>> + * the multifd channel. Without TLS it goes straight into the channel
>>> + * creation.
>>> + */
>>> static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>>> {
>>> MultiFDSendParams *p = opaque;
>>> QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
>>> Error *local_err = NULL;
>>>
>>> + bool ret;
>>> +
>>> trace_multifd_new_send_channel_async(p->id);
>>> - if (!qio_task_propagate_error(task, &local_err)) {
>>> - qio_channel_set_delay(ioc, false);
>>> - if (multifd_channel_connect(p, ioc, &local_err)) {
>>> - return;
>>> - }
>>> +
>>> + if (qio_task_propagate_error(task, &local_err)) {
>>> + ret = false;
>>> + goto out;
>>> + }
>> I think this common error handling for both TLS/non-TLS is a bit
>> problematic if there is an error in TLS handshake:
>> multifd_tls_channel_connect() sets p->c = QIO_CHANNEL(tioc).
>> TLS handshake fails.
>> multifd_new_send_channel_async() errors and calls
>> object_unref(OBJECT(ioc)) which will result in freeing the IOC.
>> Then, multifd_send_terminate_threads() will try to access p->ioc because
>> it's not NULL, causing a segfault.
> Good catch.
>
> I'm not sure the current reference counting is even correct. AFAICS, the
> refcount is 2 at new_send_channel_async due to the qio_task taking a
> reference and that will be decremented after we return from the
> completion callback, which is multifd_new_send_channel_async itself. The
> last reference should be dropped when we cleanup the channel.
>
> So I don't really understand the need for that unref there. But there's
> no asserts being reached due to an extra decrement, so there might be
> some extra increment hiding somewhere.
I think the ref counting is correct, in the non-TLS case we never set
p->c = ioc, so the cleanup will just skip destroying this p->c.
>
> Anyway, I'll figure this out and update this patch. Thanks
>
>>> +
>>> + qio_channel_set_delay(ioc, false);
>> Maybe qio_channel_set_delay() should be moved inside
>> multifd_channel_connect()? It's called two times when TLS is used.
>>
> It looks like it could, I'll do that.
>
>>> +
>>> + trace_multifd_set_outgoing_channel(ioc, object_get_typename(OBJECT(ioc)),
>>> + migrate_get_current()->hostname);
>>> +
>>> + if (migrate_channel_requires_tls_upgrade(ioc)) {
>>> + ret = multifd_tls_channel_connect(p, ioc, &local_err);
>>> + } else {
>>> + ret = multifd_channel_connect(p, ioc, &local_err);
>>> + }
>>> +
>>> + if (ret) {
>>> + return;
>>> }
>>>
>>> +out:
>>> trace_multifd_new_send_channel_async_error(p->id, local_err);
>>> multifd_send_set_error(local_err);
>>> multifd_send_kick_main(p);
>>> --
>>> 2.35.3
>>>
next prev parent reply other threads:[~2024-02-06 14:45 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-05 19:49 [PATCH v2 0/6] migration/multifd: Fix channel creation vs. cleanup races Fabiano Rosas
2024-02-05 19:49 ` [PATCH v2 1/6] migration/multifd: Join the TLS thread Fabiano Rosas
2024-02-06 8:53 ` Daniel P. Berrangé
2024-02-06 9:15 ` Peter Xu
2024-02-06 10:06 ` Daniel P. Berrangé
2024-02-05 19:49 ` [PATCH v2 2/6] migration/multifd: Remove p->running Fabiano Rosas
2024-02-05 19:49 ` [PATCH v2 3/6] migration/multifd: Move multifd_send_setup error handling in to the function Fabiano Rosas
2024-02-05 19:49 ` [PATCH v2 4/6] migration/multifd: Move multifd_send_setup into migration thread Fabiano Rosas
2024-02-05 19:49 ` [PATCH v2 5/6] migration/multifd: Unify multifd and TLS connection paths Fabiano Rosas
2024-02-06 3:33 ` Peter Xu
2024-02-06 12:44 ` Avihai Horon
2024-02-06 14:30 ` Fabiano Rosas
2024-02-06 14:44 ` Avihai Horon [this message]
2024-02-05 19:49 ` [PATCH v2 6/6] migration/multifd: Add a synchronization point for channel creation Fabiano Rosas
2024-02-06 3:37 ` Peter Xu
2024-02-06 3:42 ` [PATCH v2 0/6] migration/multifd: Fix channel creation vs. cleanup races Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=23a2160b-4410-41e5-a760-b1052a0034f9@nvidia.com \
--to=avihaih@nvidia.com \
--cc=berrange@redhat.com \
--cc=farosas@suse.de \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).