From: Peter Xu <peterx@redhat.com>
To: Avihai Horon <avihaih@nvidia.com>
Cc: Fabiano Rosas <farosas@suse.de>, qemu-devel@nongnu.org
Subject: Re: [PATCH 04/17] migration/multifd: Set p->running = true in the right place
Date: Mon, 29 Jan 2024 12:17:26 +0800 [thread overview]
Message-ID: <Zbcm1nXzQ_r0eGG0@x1n> (raw)
In-Reply-To: <78fa90f7-d062-4f23-8035-841a0ffef8af@nvidia.com>
On Sun, Jan 28, 2024 at 05:43:52PM +0200, Avihai Horon wrote:
>
> On 25/01/2024 22:57, Fabiano Rosas wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > Avihai Horon <avihaih@nvidia.com> writes:
> >
> > > The commit in the fixes line moved multifd thread creation to a
> > > different location, but forgot to move the p->running = true assignment
> > > as well. Thus, p->running is set to true before multifd thread is
> > > actually created.
> > >
> > > p->running is used in multifd_save_cleanup() to decide whether to join
> > > the multifd thread or not.
> > >
> > > With TLS, an error in multifd_tls_channel_connect() can lead to a
> > > segmentation fault because p->running is true but p->thread is never
> > > initialized, so multifd_save_cleanup() tries to join an uninitialized
> > > thread.
> > >
> > > Fix it by moving p->running = true assignment right after multifd thread
> > > creation. Also move qio_channel_set_delay() to there, as this is where
> > > it used to be originally.
> > >
> > > Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake")
> > > Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> > Just for context, I haven't looked at this patch yet, but we were
> > planning to remove p->running altogether:
> >
> > https://lore.kernel.org/r/20231110200241.20679-1-farosas@suse.de
>
> Thanks for putting me in the picture.
> I see that there has been a discussion about the multifd creation/treadown
> flow.
> In light of this discussion, I can already see a few problems in my series
> that I didn't notice before (such as the TLS handshake thread leak).
> The thread you mentioned here and some of my patches point out some problems
> in multifd creation/treardown. I guess we can discuss it and see what's the
> best way to solve them.
>
> Regarding this patch, your solution indeed solves the bug that this patch
> addresses, so maybe this could be dropped (or only noted in your patch).
>
> Maybe I should also put you (and Peter) in context for this whole series --
> I am writing it as preparation for adding a separate migration channel for
> VFIO device migration, so VFIO devices could be migrated in parallel.
> So this series tries to lay down some foundations to facilitate it.
Avihai, is the throughput the only reason that VFIO would like to have a
separate channel?
I'm wondering if we can also use multifd threads to send vfio data at some
point. Now multifd indeed is closely bound to ram pages but maybe it'll
change in the near future to take any load?
Multifd is for solving the throughput issue already. If vfio has the same
goal, IMHO it'll be good to keep them using the same thread model, instead
of managing different threads in different places. With that, any user
setting (for example, multifd-n-threads) will naturally apply to all
components, rather than relying on yet-another vfio-migration-threads-num
parameter.
--
Peter Xu
next prev parent reply other threads:[~2024-01-29 4:18 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
2024-01-25 16:25 ` [PATCH 01/17] migration: Fix logic of channels and transport compatibility check Avihai Horon
2024-01-26 3:09 ` Peter Xu
2024-01-25 16:25 ` [PATCH 02/17] migration: Move local_err check in migration_ioc_process_incoming() Avihai Horon
2024-01-26 3:10 ` Peter Xu
2024-01-25 16:25 ` [PATCH 03/17] migration: Rename default_channel to main_channel Avihai Horon
2024-01-26 3:11 ` Peter Xu
2024-01-25 16:25 ` [PATCH 04/17] migration/multifd: Set p->running = true in the right place Avihai Horon
2024-01-25 20:57 ` Fabiano Rosas
2024-01-28 15:43 ` Avihai Horon
2024-01-29 4:17 ` Peter Xu [this message]
2024-01-29 12:20 ` Avihai Horon
2024-01-30 5:57 ` Peter Xu
2024-01-30 18:44 ` Avihai Horon
2024-02-06 10:25 ` Peter Xu
2024-02-08 15:31 ` Avihai Horon
2024-01-29 12:23 ` Fabiano Rosas
2024-01-25 16:25 ` [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding Avihai Horon
2024-01-29 14:34 ` Fabiano Rosas
2024-01-30 18:32 ` Avihai Horon
2024-01-30 21:32 ` Fabiano Rosas
2024-01-31 4:49 ` Peter Xu
2024-01-31 10:39 ` Avihai Horon
2024-01-25 16:25 ` [PATCH 06/17] migration/tls: Rename main migration channel TLS functions Avihai Horon
2024-01-25 16:25 ` [PATCH 07/17] migration/tls: Add new migration channel TLS upgrade API Avihai Horon
2024-01-25 16:25 ` [PATCH 08/17] migration: Use the new TLS upgrade API for main channel Avihai Horon
2024-01-25 16:25 ` [PATCH 09/17] migration/multifd: Use the new TLS upgrade API for multifd channels Avihai Horon
2024-01-25 16:25 ` [PATCH 10/17] migration/postcopy: Use the new TLS upgrade API for preempt channel Avihai Horon
2024-01-25 16:25 ` [PATCH 11/17] migration/tls: Make migration_tls_client_create() static Avihai Horon
2024-01-25 16:25 ` [PATCH 12/17] migration/multifd: Consolidate TLS/non-TLS multifd channel error flow Avihai Horon
2024-01-25 16:25 ` [PATCH 13/17] migration: Store MigrationAddress in MigrationState Avihai Horon
2024-01-25 16:25 ` [PATCH 14/17] migration: Rename migration_channel_connect() Avihai Horon
2024-01-25 16:25 ` [PATCH 15/17] migration: Add new migration channel connect API Avihai Horon
2024-01-25 16:25 ` [PATCH 16/17] migration/multifd: Use the new migration channel connect API for multifd Avihai Horon
2024-01-25 16:25 ` [PATCH 17/17] migration/postcopy: Use the new migration channel connect API for postcopy preempt Avihai Horon
2024-02-06 10:04 ` [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Peter Xu
2024-02-06 13:10 ` Avihai Horon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zbcm1nXzQ_r0eGG0@x1n \
--to=peterx@redhat.com \
--cc=avihaih@nvidia.com \
--cc=farosas@suse.de \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).