From: Fabiano Rosas <farosas@suse.de>
To: Lukas Straub <lukasstraub2@web.de>, Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, "Juan Quintela" <quintela@redhat.com>,
"Leonardo Bras" <leobras@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH v6 09/10] migration/yank: Keep track of registered yank instances
Date: Mon, 25 Sep 2023 09:20:58 -0300 [thread overview]
Message-ID: <87y1gu4f6d.fsf@suse.de> (raw)
In-Reply-To: <20230925093607.7f3ab989@penguin>
CC: Daniel for the QIOChannel discussion
Lukas Straub <lukasstraub2@web.de> writes:
> On Thu, 14 Sep 2023 10:57:47 -0400
> Peter Xu <peterx@redhat.com> wrote:
>
>> On Thu, Sep 14, 2023 at 10:23:38AM -0300, Fabiano Rosas wrote:
>> > Peter Xu <peterx@redhat.com> writes:
>> >
>> > > On Wed, Sep 13, 2023 at 06:53:20PM -0300, Fabiano Rosas wrote:
>> > >> Peter Xu <peterx@redhat.com> writes:
>> > >>
>> > >> > On Mon, Sep 11, 2023 at 02:13:19PM -0300, Fabiano Rosas wrote:
>> > >> >> The core yank code is strict about balanced registering and
>> > >> >> unregistering of yank functions.
>> > >> >>
>> > >> >> This creates a difficulty because the migration code registers one
>> > >> >> yank function per QIOChannel, but each QIOChannel can be referenced by
>> > >> >> more than one QEMUFile. The yank function should not be removed until
>> > >> >> all QEMUFiles have been closed.
>> > >> >>
>> > >> >> Keep a reference count of how many QEMUFiles are using a QIOChannel
>> > >> >> that has a yank function. Only unregister the yank function when all
>> > >> >> QEMUFiles have been closed.
>> > >> >>
>> > >> >> This improves the current code by removing the need for the programmer
>> > >> >> to know which QEMUFile is the last one to be cleaned up and fixes the
>> > >> >> theoretical issue of removing the yank function while another QEMUFile
>> > >> >> could still be using the ioc and require a yank.
>> > >> >>
>> > >> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> > >> >> ---
>> > >> >> migration/yank_functions.c | 81 ++++++++++++++++++++++++++++++++++----
>> > >> >> migration/yank_functions.h | 8 ++++
>> > >> >> 2 files changed, 81 insertions(+), 8 deletions(-)
>> > >> >
>> > >> > I worry this over-complicate things.
>> > >>
>> > >> It does. We ran out of simple options.
>> > >>
>> > >> > If you prefer the cleaness that we operate always on qemufile level, can we
>> > >> > just register each yank function per-qemufile?
>> > >>
>> > >> "just" hehe
>> > >>
>> > >> we could, but:
>> > >>
>> > >> i) the yank is a per-channel operation, so this is even more unintuitive;
>> > >
>> > > I mean we can provide something like:
>> > >
>> > > void migration_yank_qemufile(void *opaque)
>> > > {
>> > > QEMUFile *file = opaque;
>> > > QIOChannel *ioc = file->ioc;
>> > >
>> > > qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
>> > > }
>> > >
>> > > void migration_qemufile_register_yank(QEMUFile *file)
>> > > {
>> > > if (migration_ioc_yank_supported(file->ioc)) {
>> > > yank_register_function(MIGRATION_YANK_INSTANCE,
>> > > migration_yank_qemufile,
>> > > file);
>> > > }
>> > > }
>> >
>> > Sure, this is what I was thinking as well. IMO it will be yet another
>> > operation that happens on the channel, but it performed via the
>> > file. Just like qio_channel_close() at qemu_fclose(). Not the end of the
>> > world, of course, I just find it error-prone.
>> >
>> > >>
>> > >> ii) multifd doesn't have a QEMUFile, so it will have to continue using
>> > >> the ioc;
>> > >
>> > > We can keep using migration_ioc_[un]register_yank() for them if there's no
>> > > qemufile attached. As long as the function will all be registered under
>> > > MIGRATION_YANK_INSTANCE we should be fine having different yank func.
>> > >
>> >
>> > ok
>> >
>> > >>
>> > >> iii) we'll have to add a yank to every new QEMUFile created during the
>> > >> incoming migration (colo, rdma, etc), otherwise the incoming side
>> > >> will be left using iocs while the src uses the QEMUFile;
>> > >
>> > > For RDMA, IIUC it'll simply be a noop as migration_ioc_yank_supported()
>> > > will be a noop for it for either reg/unreg.
>> > >
>> > > Currently it seems we will also unreg the ioc even for RDMA (even though we
>> > > don't reg for it). But since unreg will be a noop it seems all fine even
>> > > if not paired.. maybe we should still try to pair it, e.g. register also in
>> > > rdma_start_outgoing_migration() for the rdma ioc so at least they're paired.
>> > >
>> > > I don't see why COLO is special here, though. Maybe I missed something.
>> >
>> > For colo I was thinking we'd have to register the yank just to be sure
>> > that all paths unregistering it have something to unregister.
>> >
>> > Maybe I should move the register into qemu_file_new_impl() with a
>> > matching unregister at qemu_fclose().
>>
>> Sounds good. Or...
>>
>> >
>> > >>
>> > >> iv) this is a functional change of the yank feature for which we have no
>> > >> tests.
>> > >
>> > > Having yank tested should be preferrable. Lukas is in the loop, let's see
>> > > whether he has something. We can still smoke test it before a selftest
>> > > being there.
>> > >
>
> Hi All,
> Sorry for the late reply.
>
> Yes, testing missing. I'll work on it.
>
>> > > Taking one step back.. I doubt whether anyone is using yank for migration?
>> > > Knowing that migration already have migrate-cancel (for precopy) and
>> > > migrate-pause (for postcopy).
>> >
>> > Right, both already call qio_channel_shutdown().
>> >
>> > > I never used it myself, and I don't think
>> > > it's supported for RHEL. How's that in suse's case?
>> >
>> > Never heard mention of it and I don't see it in our virtualization
>> > documentation.
>> >
>> > >
>> > > If no one is using it, maybe we can even avoid registering migration to
>> > > yank?
>> > >
>> >
>> > Seems reasonable to me.
>>
>> ... let's wait for a few days from Lukas to see whether he as any more
>> input, or I'd vote for dropping yank for migration as a whole. It caused
>> mostly more crashes that I knew than benefits, so far..
>>
>> I also checked libvirt is not using yank.
>>
>
> The main user for yank is COLO. It can't be replaced by 'migrate_pause'
> or 'migrate_cancel', because:
>
> 1) It needs to work while the main lock is taken by the migration
> thread, so it needs to be an OOB qmp command. There are places
> where the migration thread can hang on a socket while the main lock
> is taken. 'migrate_pause' is OOB, but not usable in the COLO case (it
> doesn't support postcopy).
>
> 2) In COLO, it needs to work both on outgoing and on incoming side, since
> both sides have a completely healthy and ready to takeover guest state.
>
> I agree that the migration yank code was not well thought out :(.
I'd say the QIOChannel being referenced via multiple QEMUFiles throws a
curve ball to the yank design.
> I had the idea back then to create child class of the IOCs, e.g.
> YankableQIOChannelSocket and YankableQIOChannelTLS. It's not
> perfect, but then the lifetime of the yank functions is directly
> coupled with the iochannel. Then the IOCs can be used just as usual in
> the rest of the migration code.
The yank really wants to be tied to the channel. We should do that.
I'm just thinking whether a feature bit + setter would be simpler to
implement. It wouldn't require changing any of the object creation code,
just add a qio_channel_enable_yank() at the start of migration and let
the channel take care of the rest.
> Another problem area was to be that there was no clear point in
> migration code where all channels are closed to unregister the yank
> instance itself. That seems to be solved now?
I'm inclined to add reference counting all over instead of trying to
squint at the code and figure out where these cleanups should
go. Specially since we have these pause/recovery scenarios.
That said, I haven't looked closely at the instance unregister, but I
don't think this series changes anything that would help in that regard.
next prev parent reply other threads:[~2023-09-25 12:21 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-11 17:13 [PATCH v6 00/10] Fix segfault on migration return path Fabiano Rosas
2023-09-11 17:13 ` [PATCH v6 01/10] migration: Fix possible race when setting rp_state.error Fabiano Rosas
2023-09-11 17:13 ` [PATCH v6 02/10] migration: Fix possible races when shutting down the return path Fabiano Rosas
2023-09-11 17:13 ` [PATCH v6 03/10] migration: Fix possible race when shutting down to_dst_file Fabiano Rosas
2023-09-11 17:13 ` [PATCH v6 04/10] migration: Remove redundant cleanup of postcopy_qemufile_src Fabiano Rosas
2023-09-11 17:13 ` [PATCH v6 05/10] migration: Consolidate return path closing code Fabiano Rosas
2023-09-11 17:13 ` [PATCH v6 06/10] migration: Replace the return path retry logic Fabiano Rosas
2023-09-11 17:13 ` [PATCH v6 07/10] migration: Move return path cleanup to main migration thread Fabiano Rosas
2023-09-11 17:13 ` [PATCH v6 08/10] migration/yank: Use channel features Fabiano Rosas
2023-09-11 20:46 ` Philippe Mathieu-Daudé
2023-09-13 20:05 ` Peter Xu
2024-01-22 20:08 ` Fabiano Rosas
2024-01-23 1:27 ` Peter Xu
2023-09-11 17:13 ` [PATCH v6 09/10] migration/yank: Keep track of registered yank instances Fabiano Rosas
2023-09-13 20:13 ` Peter Xu
2023-09-13 21:53 ` Fabiano Rosas
2023-09-13 23:48 ` Peter Xu
2023-09-14 13:23 ` Fabiano Rosas
2023-09-14 14:57 ` Peter Xu
2023-09-25 7:38 ` Lukas Straub
2023-09-25 12:20 ` Fabiano Rosas [this message]
2023-09-25 15:32 ` Lukas Straub
2023-09-11 17:13 ` [PATCH v6 10/10] migration: Add a wrapper to cleanup migration files Fabiano Rosas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y1gu4f6d.fsf@suse.de \
--to=farosas@suse.de \
--cc=berrange@redhat.com \
--cc=leobras@redhat.com \
--cc=lukasstraub2@web.de \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).