qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Avihai Horon <avihaih@nvidia.com>
To: Fabiano Rosas <farosas@suse.de>, qemu-devel@nongnu.org
Cc: "Peter Xu" <peterx@redhat.com>,
	"Daniel P . Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH v2 5/6] migration/multifd: Unify multifd and TLS connection paths
Date: Tue, 6 Feb 2024 16:44:28 +0200	[thread overview]
Message-ID: <23a2160b-4410-41e5-a760-b1052a0034f9@nvidia.com> (raw)
In-Reply-To: <87fry57jhn.fsf@suse.de>


On 06/02/2024 16:30, Fabiano Rosas wrote:
> External email: Use caution opening links or attachments
>
>
> Avihai Horon <avihaih@nvidia.com> writes:
>
>> On 05/02/2024 21:49, Fabiano Rosas wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> During multifd channel creation (multifd_send_new_channel_async) when
>>> TLS is enabled, the multifd_channel_connect function is called twice,
>>> once to create the TLS handshake thread and another time after the
>>> asynchrounous TLS handshake has finished.
>>>
>>> This creates a slightly confusing call stack where
>>> multifd_channel_connect() is called more times than the number of
>>> channels. It also splits error handling between the two callers of
>>> multifd_channel_connect() causing some code duplication. Lastly, it
>>> gets in the way of having a single point to determine whether all
>>> channel creation tasks have been initiated.
>>>
>>> Refactor the code to move the reentrancy one level up at the
>>> multifd_new_send_channel_async() level, de-duplicating the error
>>> handling and allowing for the next patch to introduce a
>>> synchronization point common to all the multifd channel creation,
>>> regardless of TLS.
>>>
>>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>> ---
>>>    migration/multifd.c | 73 +++++++++++++++++++--------------------------
>>>    1 file changed, 30 insertions(+), 43 deletions(-)
>>>
>>> diff --git a/migration/multifd.c b/migration/multifd.c
>>> index cc10be2c3f..89d39fa67c 100644
>>> --- a/migration/multifd.c
>>> +++ b/migration/multifd.c
>>> @@ -869,30 +869,7 @@ out:
>>>        return NULL;
>>>    }
>>>
>>> -static bool multifd_channel_connect(MultiFDSendParams *p,
>>> -                                    QIOChannel *ioc,
>>> -                                    Error **errp);
>>> -
>>> -static void multifd_tls_outgoing_handshake(QIOTask *task,
>>> -                                           gpointer opaque)
>>> -{
>>> -    MultiFDSendParams *p = opaque;
>>> -    QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
>>> -    Error *err = NULL;
>>> -
>>> -    if (!qio_task_propagate_error(task, &err)) {
>>> -        trace_multifd_tls_outgoing_handshake_complete(ioc);
>>> -        if (multifd_channel_connect(p, ioc, &err)) {
>>> -            return;
>>> -        }
>>> -    }
>>> -
>>> -    trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err));
>>> -
>>> -    multifd_send_set_error(err);
>>> -    multifd_send_kick_main(p);
>>> -    error_free(err);
>>> -}
>>> +static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque);
>>>
>>>    static void *multifd_tls_handshake_thread(void *opaque)
>>>    {
>>> @@ -900,7 +877,7 @@ static void *multifd_tls_handshake_thread(void *opaque)
>>>        QIOChannelTLS *tioc = QIO_CHANNEL_TLS(p->c);
>>>
>>>        qio_channel_tls_handshake(tioc,
>>> -                              multifd_tls_outgoing_handshake,
>>> +                              multifd_new_send_channel_async,
>>>                                  p,
>>>                                  NULL,
>>>                                  NULL);
>>> @@ -936,19 +913,6 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
>>>                                        QIOChannel *ioc,
>>>                                        Error **errp)
>>>    {
>>> -    trace_multifd_set_outgoing_channel(
>>> -        ioc, object_get_typename(OBJECT(ioc)),
>>> -        migrate_get_current()->hostname);
>>> -
>>> -    if (migrate_channel_requires_tls_upgrade(ioc)) {
>>> -        /*
>>> -         * tls_channel_connect will call back to this
>>> -         * function after the TLS handshake,
>>> -         * so we mustn't call multifd_send_thread until then
>>> -         */
>>> -        return multifd_tls_channel_connect(p, ioc, errp);
>>> -    }
>>> -
>>>        migration_ioc_register_yank(ioc);
>>>        p->registered_yank = true;
>>>        p->c = ioc;
>>> @@ -959,20 +923,43 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
>>>        return true;
>>>    }
>>>
>>> +/*
>>> + * When TLS is enabled this function is called once to establish the
>>> + * TLS connection and a second time after the TLS handshake to create
>>> + * the multifd channel. Without TLS it goes straight into the channel
>>> + * creation.
>>> + */
>>>    static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>>>    {
>>>        MultiFDSendParams *p = opaque;
>>>        QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
>>>        Error *local_err = NULL;
>>>
>>> +    bool ret;
>>> +
>>>        trace_multifd_new_send_channel_async(p->id);
>>> -    if (!qio_task_propagate_error(task, &local_err)) {
>>> -        qio_channel_set_delay(ioc, false);
>>> -        if (multifd_channel_connect(p, ioc, &local_err)) {
>>> -            return;
>>> -        }
>>> +
>>> +    if (qio_task_propagate_error(task, &local_err)) {
>>> +        ret = false;
>>> +        goto out;
>>> +    }
>> I think this common error handling for both TLS/non-TLS is a bit
>> problematic if there is an error in TLS handshake:
>> multifd_tls_channel_connect() sets p->c = QIO_CHANNEL(tioc).
>> TLS handshake fails.
>> multifd_new_send_channel_async() errors and calls
>> object_unref(OBJECT(ioc)) which will result in freeing the IOC.
>> Then, multifd_send_terminate_threads() will try to access p->ioc because
>> it's not NULL, causing a segfault.
> Good catch.
>
> I'm not sure the current reference counting is even correct. AFAICS, the
> refcount is 2 at new_send_channel_async due to the qio_task taking a
> reference and that will be decremented after we return from the
> completion callback, which is multifd_new_send_channel_async itself. The
> last reference should be dropped when we cleanup the channel.
>
> So I don't really understand the need for that unref there. But there's
> no asserts being reached due to an extra decrement, so there might be
> some extra increment hiding somewhere.

I think the ref counting is correct, in the non-TLS case we never set 
p->c = ioc, so the cleanup will just skip destroying this p->c.

>
> Anyway, I'll figure this out and update this patch. Thanks
>
>>> +
>>> +    qio_channel_set_delay(ioc, false);
>> Maybe qio_channel_set_delay() should be moved inside
>> multifd_channel_connect()? It's called two times when TLS is used.
>>
> It looks like it could, I'll do that.
>
>>> +
>>> +    trace_multifd_set_outgoing_channel(ioc, object_get_typename(OBJECT(ioc)),
>>> +                                       migrate_get_current()->hostname);
>>> +
>>> +    if (migrate_channel_requires_tls_upgrade(ioc)) {
>>> +        ret = multifd_tls_channel_connect(p, ioc, &local_err);
>>> +    } else {
>>> +        ret = multifd_channel_connect(p, ioc, &local_err);
>>> +    }
>>> +
>>> +    if (ret) {
>>> +        return;
>>>        }
>>>
>>> +out:
>>>        trace_multifd_new_send_channel_async_error(p->id, local_err);
>>>        multifd_send_set_error(local_err);
>>>        multifd_send_kick_main(p);
>>> --
>>> 2.35.3
>>>


  reply	other threads:[~2024-02-06 14:45 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-05 19:49 [PATCH v2 0/6] migration/multifd: Fix channel creation vs. cleanup races Fabiano Rosas
2024-02-05 19:49 ` [PATCH v2 1/6] migration/multifd: Join the TLS thread Fabiano Rosas
2024-02-06  8:53   ` Daniel P. Berrangé
2024-02-06  9:15     ` Peter Xu
2024-02-06 10:06       ` Daniel P. Berrangé
2024-02-05 19:49 ` [PATCH v2 2/6] migration/multifd: Remove p->running Fabiano Rosas
2024-02-05 19:49 ` [PATCH v2 3/6] migration/multifd: Move multifd_send_setup error handling in to the function Fabiano Rosas
2024-02-05 19:49 ` [PATCH v2 4/6] migration/multifd: Move multifd_send_setup into migration thread Fabiano Rosas
2024-02-05 19:49 ` [PATCH v2 5/6] migration/multifd: Unify multifd and TLS connection paths Fabiano Rosas
2024-02-06  3:33   ` Peter Xu
2024-02-06 12:44   ` Avihai Horon
2024-02-06 14:30     ` Fabiano Rosas
2024-02-06 14:44       ` Avihai Horon [this message]
2024-02-05 19:49 ` [PATCH v2 6/6] migration/multifd: Add a synchronization point for channel creation Fabiano Rosas
2024-02-06  3:37   ` Peter Xu
2024-02-06  3:42 ` [PATCH v2 0/6] migration/multifd: Fix channel creation vs. cleanup races Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=23a2160b-4410-41e5-a760-b1052a0034f9@nvidia.com \
    --to=avihaih@nvidia.com \
    --cc=berrange@redhat.com \
    --cc=farosas@suse.de \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).