From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, "Juan Quintela" <quintela@redhat.com>,
"Leonardo Bras" <leobras@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>
Subject: Re: [RFC PATCH 2/2] migration/multifd: Move semaphore release into main thread
Date: Mon, 13 Nov 2023 22:50:39 -0300 [thread overview]
Message-ID: <877cml5ci8.fsf@suse.de> (raw)
In-Reply-To: <ZVJSx6FOg8WfSbrz@x1n>
Peter Xu <peterx@redhat.com> writes:
> On Fri, Nov 10, 2023 at 09:05:41AM -0300, Fabiano Rosas wrote:
>
> [...]
>
>> > Then assuming we have a clear model with all these threads issue fixed (no
>> > matter whether we'd shrink 2N threads into N threads), then what we need to
>> > do, IMHO, is making sure to join() all of them before destroying anything
>> > (say, per-channel MultiFDSendParams). Then when we destroy everything
>> > safely, either mutex/sem/etc.. Because no one will race us anymore.
>>
>> This doesn't address the race. There's a data dependency between the
>> multifd channels and the migration thread around the channels_ready
>> semaphore. So we cannot join the migration thread because it could be
>> stuck waiting for the semaphore, which means we cannot join+cleanup the
>> channel thread because the semaphore is still being used.
>
> I think this is the major part of confusion, on why this can happen.
>
> The problem is afaik multifd_save_cleanup() is only called by
> migrate_fd_cleanup(), which is further only called in:
>
> 1) migrate_fd_cleanup_bh()
> 2) migrate_fd_connect()
>
> For 1): it's only run when migration comletes/fails/etc. (in all cases,
> right before it quits..) and then kicks off migrate_fd_cleanup_schedule().
> So migration thread shouldn't be stuck, afaiu, or it won't be able to kick
> that BH.
>
> For 2): it's called by the main thread, where migration thread should have
> not yet been created.
>
> With that, I don't see how migrate_fd_cleanup() would need to worry about
> migration thread
>
> Did I miss something?
There are two points:
1) multifd_new_send_channel_async() doesn't set an Error. Even if
multifd_channel_connect() fails, we'll still continue with
migrate_fd_connect(). I don't see any code that looks at the migration
error (s->error).
2) the TLS handshake thread of one of the channels could simply not get
any chance to run until something else fails and we reach
multifd_save_cleanup() from the BH path.
This second point in particular is why I don't think simply joining the
TLS thread will avoid the race. There's nothing linking the multifd
channels together, we could have 7 of them operational and a 8th one
still going through the TLS handshake.
That said, I'm not sure about the exact path we take to reach the bug
situation. It's very hard to reproduce so I'm relying entirely on code
inspection.
next prev parent reply other threads:[~2023-11-14 1:51 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-09 16:58 [RFC PATCH 0/2] migration: Fix multifd qemu_mutex_destroy race Fabiano Rosas
2023-11-09 16:58 ` [RFC PATCH 1/2] migration: Report error in incoming migration Fabiano Rosas
2023-11-09 18:57 ` Peter Xu
2023-11-10 10:58 ` Fabiano Rosas
2023-11-13 16:51 ` Peter Xu
2023-11-14 1:54 ` Fabiano Rosas
2023-11-09 16:58 ` [RFC PATCH 2/2] migration/multifd: Move semaphore release into main thread Fabiano Rosas
2023-11-09 18:56 ` Peter Xu
2023-11-10 12:05 ` Fabiano Rosas
2023-11-10 12:37 ` Fabiano Rosas
2023-11-16 15:51 ` Juan Quintela
2023-11-13 16:45 ` Peter Xu
2023-11-14 1:50 ` Fabiano Rosas [this message]
2023-11-14 17:28 ` Peter Xu
2023-11-16 15:44 ` Juan Quintela
2023-11-16 14:56 ` Juan Quintela
2023-11-16 18:13 ` Fabiano Rosas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877cml5ci8.fsf@suse.de \
--to=farosas@suse.de \
--cc=leobras@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).