All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: "Cédric Le Goater" <clg@redhat.com>,
	qemu-devel@nongnu.org, "Fabiano Rosas" <farosas@suse.de>
Subject: Re: [PATCH 0/2] migration: Fix return-path thread exit
Date: Tue, 6 Feb 2024 10:42:07 +0800	[thread overview]
Message-ID: <ZcGcf0sdBShK9q8A@x1n> (raw)
In-Reply-To: <ZcC5QTO3tmt9gaCf@redhat.com>

On Mon, Feb 05, 2024 at 10:32:33AM +0000, Daniel P. Berrangé wrote:
> On Fri, Feb 02, 2024 at 05:53:39PM +0800, Peter Xu wrote:
> > On Thu, Feb 01, 2024 at 07:48:51PM +0100, Cédric Le Goater wrote:
> > > Hello,
> > 
> > Hi, Cédric,
> > 
> > Thanks for the patches.
> > 
> > > 
> > > Today, close_return_path_on_source() can perform a shutdown to exit
> > > the return-path thread if an error occured. However, migrate_fd_cleanup()
> > > does cleanups too early and the shutdown in close_return_path_on_source()
> > > fails, leaving the source and destination waiting for an event to occur.
> > > 
> > > This little series tries to fix that. Comments welcome !  
> > 
> > One thing I do agree is that relying on qemu_file_get_error(to_dst_file) in
> > close_return_path_on_source() is weird: IMHO we have better way to detect
> > "whether the migration has error" now, which is migrate_has_error().
> > 
> > For this specific issue, I think one long standing issue that might be
> > relevant is we have two QEMUFile (from_dst_file, to_dst_file) that share
> > the same QIOChannel now.  Logically the two QEMUFile should be able to be
> > managed separately, say, close() of to_dst_file shouldn't affect the other.
> > 
> > However I don't think it's the case now, as qemu_fclose(to_dst_file) will
> > do qio_channel_close() already, which means there will be a side effect to
> > the other QEMUFile that its backing IOC is already closed.
> > 
> > Is this the issue we're facing?  IOW, the close() of to_dst_file will not
> > properly kick the other thread who is blocked at reading from_dst_file,
> > while the shutdown() will kick it out?
> > 
> > If so, not sure whether we can somehow relay the real qio_channel_close()
> > to until the last user releases it? IOW, conditionally close() the channel
> > in qio_channel_finalize(), if the channel is still open?  Would that make
> > sense?
> 
> IMHO the problem described above is a result of the design mistake of
> having 2 separate QEMUFile instances for what is ultimately the same
> channel. This was a convenient approach to take originally, but it has
> likely outlived its purpose.
> 
> In the ideal world IMHO, QEMUFile would not exist at all, and we would
> have a QIOChannelCached that adds the read/write buffering above the
> base QIOChannel.

We have that in the TODO wiki page for a long time, I'll update it
slightly.

https://wiki.qemu.org/ToDo/LiveMigration#Rewrite_QEMUFile_for_migration

But yeah that might be too big a hammer to solve this specific issue.
AFAIU Fabiano is looking into that direction, but I assume it should still
be a long term thing.

> 
> That's doable, but bigger than a quick fix. A natural stepping stone
> to get there though is to move from 2 QEMUFile objs down to 1 QEMUFile,
> which might be more practical as a quick fix.

Agree. However would this still be quite some change?

We still have a lot of references on the four qemufiles (to/from_dst_file,
to/from_src_file), at least that'll need a replacement; I didn't yet
further check whether all places can be done with a direct replacement of
such change, some tweaks may be needed here and there, but shouldn't be
major.

Meanwhile IIUC it'll also need a major rework on QEMUFile, allowing it to
be bi-directional?  We may need to duplicate the cache layer, IIUC, one for
each direction IOs.

-- 
Peter Xu



      reply	other threads:[~2024-02-06  2:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-01 18:48 [PATCH 0/2] migration: Fix return-path thread exit Cédric Le Goater
2024-02-01 18:48 ` [PATCH 1/2] migration: Add a file_error argument to close_return_path_on_source() Cédric Le Goater
2024-02-02 14:30   ` Fabiano Rosas
2024-02-02 14:45     ` Cédric Le Goater
2024-02-01 18:48 ` [PATCH 2/2] migration: Fix return-path thread exit Cédric Le Goater
2024-02-02 14:42   ` Fabiano Rosas
2024-02-02 14:51     ` Cédric Le Goater
2024-02-02 15:11       ` Fabiano Rosas
2024-02-05  3:37         ` Peter Xu
2024-02-05 10:17           ` Cédric Le Goater
2024-02-02  9:53 ` [PATCH 0/2] " Peter Xu
2024-02-02 13:04   ` Cédric Le Goater
2024-02-05 10:32   ` Daniel P. Berrangé
2024-02-06  2:42     ` Peter Xu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZcGcf0sdBShK9q8A@x1n \
    --to=peterx@redhat.com \
    --cc=berrange@redhat.com \
    --cc=clg@redhat.com \
    --cc=farosas@suse.de \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.