From: Peter Xu <peterx@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: "Cédric Le Goater" <clg@redhat.com>,
qemu-devel@nongnu.org, "Fabiano Rosas" <farosas@suse.de>
Subject: Re: [PATCH 0/2] migration: Fix return-path thread exit
Date: Tue, 6 Feb 2024 10:42:07 +0800 [thread overview]
Message-ID: <ZcGcf0sdBShK9q8A@x1n> (raw)
In-Reply-To: <ZcC5QTO3tmt9gaCf@redhat.com>
On Mon, Feb 05, 2024 at 10:32:33AM +0000, Daniel P. Berrangé wrote:
> On Fri, Feb 02, 2024 at 05:53:39PM +0800, Peter Xu wrote:
> > On Thu, Feb 01, 2024 at 07:48:51PM +0100, Cédric Le Goater wrote:
> > > Hello,
> >
> > Hi, Cédric,
> >
> > Thanks for the patches.
> >
> > >
> > > Today, close_return_path_on_source() can perform a shutdown to exit
> > > the return-path thread if an error occured. However, migrate_fd_cleanup()
> > > does cleanups too early and the shutdown in close_return_path_on_source()
> > > fails, leaving the source and destination waiting for an event to occur.
> > >
> > > This little series tries to fix that. Comments welcome !
> >
> > One thing I do agree is that relying on qemu_file_get_error(to_dst_file) in
> > close_return_path_on_source() is weird: IMHO we have better way to detect
> > "whether the migration has error" now, which is migrate_has_error().
> >
> > For this specific issue, I think one long standing issue that might be
> > relevant is we have two QEMUFile (from_dst_file, to_dst_file) that share
> > the same QIOChannel now. Logically the two QEMUFile should be able to be
> > managed separately, say, close() of to_dst_file shouldn't affect the other.
> >
> > However I don't think it's the case now, as qemu_fclose(to_dst_file) will
> > do qio_channel_close() already, which means there will be a side effect to
> > the other QEMUFile that its backing IOC is already closed.
> >
> > Is this the issue we're facing? IOW, the close() of to_dst_file will not
> > properly kick the other thread who is blocked at reading from_dst_file,
> > while the shutdown() will kick it out?
> >
> > If so, not sure whether we can somehow relay the real qio_channel_close()
> > to until the last user releases it? IOW, conditionally close() the channel
> > in qio_channel_finalize(), if the channel is still open? Would that make
> > sense?
>
> IMHO the problem described above is a result of the design mistake of
> having 2 separate QEMUFile instances for what is ultimately the same
> channel. This was a convenient approach to take originally, but it has
> likely outlived its purpose.
>
> In the ideal world IMHO, QEMUFile would not exist at all, and we would
> have a QIOChannelCached that adds the read/write buffering above the
> base QIOChannel.
We have that in the TODO wiki page for a long time, I'll update it
slightly.
https://wiki.qemu.org/ToDo/LiveMigration#Rewrite_QEMUFile_for_migration
But yeah that might be too big a hammer to solve this specific issue.
AFAIU Fabiano is looking into that direction, but I assume it should still
be a long term thing.
>
> That's doable, but bigger than a quick fix. A natural stepping stone
> to get there though is to move from 2 QEMUFile objs down to 1 QEMUFile,
> which might be more practical as a quick fix.
Agree. However would this still be quite some change?
We still have a lot of references on the four qemufiles (to/from_dst_file,
to/from_src_file), at least that'll need a replacement; I didn't yet
further check whether all places can be done with a direct replacement of
such change, some tweaks may be needed here and there, but shouldn't be
major.
Meanwhile IIUC it'll also need a major rework on QEMUFile, allowing it to
be bi-directional? We may need to duplicate the cache layer, IIUC, one for
each direction IOs.
--
Peter Xu
prev parent reply other threads:[~2024-02-06 2:43 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-01 18:48 [PATCH 0/2] migration: Fix return-path thread exit Cédric Le Goater
2024-02-01 18:48 ` [PATCH 1/2] migration: Add a file_error argument to close_return_path_on_source() Cédric Le Goater
2024-02-02 14:30 ` Fabiano Rosas
2024-02-02 14:45 ` Cédric Le Goater
2024-02-01 18:48 ` [PATCH 2/2] migration: Fix return-path thread exit Cédric Le Goater
2024-02-02 14:42 ` Fabiano Rosas
2024-02-02 14:51 ` Cédric Le Goater
2024-02-02 15:11 ` Fabiano Rosas
2024-02-05 3:37 ` Peter Xu
2024-02-05 10:17 ` Cédric Le Goater
2024-02-02 9:53 ` [PATCH 0/2] " Peter Xu
2024-02-02 13:04 ` Cédric Le Goater
2024-02-05 10:32 ` Daniel P. Berrangé
2024-02-06 2:42 ` Peter Xu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZcGcf0sdBShK9q8A@x1n \
--to=peterx@redhat.com \
--cc=berrange@redhat.com \
--cc=clg@redhat.com \
--cc=farosas@suse.de \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).