From: Peter Xu <peterx@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Fabiano Rosas <farosas@suse.de>,
qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>,
Wei Wang <wei.w.wang@intel.com>,
Leonardo Bras <leobras@redhat.com>
Subject: Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source
Date: Thu, 3 Aug 2023 11:39:15 -0400 [thread overview]
Message-ID: <ZMvKIyur0DbBGOuy@x1n> (raw)
In-Reply-To: <ZMvGoD61XG/Lr3jI@redhat.com>
On Thu, Aug 03, 2023 at 04:24:16PM +0100, Daniel P. Berrangé wrote:
> On Thu, Aug 03, 2023 at 11:15:41AM -0400, Peter Xu wrote:
> > On Thu, Aug 03, 2023 at 11:45:38AM -0300, Fabiano Rosas wrote:
> > > Peter Xu <peterx@redhat.com> writes:
> > >
> > > > On Wed, Aug 02, 2023 at 04:58:38PM -0300, Fabiano Rosas wrote:
> > > >> Peter Xu <peterx@redhat.com> writes:
> > > >>
> > > >> > On Wed, Aug 02, 2023 at 11:36:43AM -0300, Fabiano Rosas wrote:
> > > >> >> This function currently has a straight-forward part which is waiting
> > > >> >> for the thread to join and a complicated part which is doing a
> > > >> >> qemu_file_shutdown() on the return path file.
> > > >> >>
> > > >> >> The shutdown is tricky because all calls to qemu_file_shutdown() set
> > > >> >> f->last_error to -EIO, which means we can never know if an error is an
> > > >> >> actual error or if we cleanly shutdown the file previously.
> > > >> >>
> > > >> >> This is particularly bothersome for postcopy because it would send the
> > > >> >> return path thread into the retry routine which would wait on the
> > > >> >> postcopy_pause_rp_sem and consequently block the main thread. We
> > > >> >> haven't had reports of this so I must presume we never reach here with
> > > >> >> postcopy.
> > > >> >>
> > > >> >> The shutdown call is also racy because since it doesn't take the
> > > >> >> qemu_file_lock, it could NULL-dereference if the return path thread
> > > >> >> happens to be in the middle of the critical region at
> > > >> >> migration_release_dst_files().
> > > >> >
> > > >> > After you rework the thread model on resume, shall we move
> > > >> > migration_release_dst_files() into the migration thread to be after the
> > > >> > pthread_join()? I assume then we don't even need a mutex to protect it?
> > > >> >
> > > >>
> > > >> I just need to figure out if it's ok to move the postcopy_qemufile_src
> > > >> cleanup along. No idea why it is there in the first place. I see you
> > > >> moved it from postcopy_pause and we're about to move it back to the
> > > >> exact same place =D
> > > >
> > > > It was there because the old postcopy-preempt was sending data via
> > > > postcopy_qemufile_src from the migration thread, while postcopy_pause is
> > > > also the migration thread context.
> > > >
> > > > Then we had 9358982744 ("migration: Send requested page directly in
> > > > rp-return thread") where we moved that "send page" operation into the
> > > > return path thread to reduce latencies. After moving there it also means
> > > > the file handle can be accessed in >1 threads, so I just moved it over to
> > > > operate that always in the return path thread, then no race should happen.
> > > >
> > >
> > > Thanks for the context.
> > >
> > > > With your change, return path will vanish before migration thread accesses
> > > > it later (so as mentioned above, it must be after pthread_join()
> > > > succeeded), then I assume it'll be fine too to have it back in migration
> > > > thread.
> > > >
> > > > Or perhaps just take the file lock?
> > > >
> > >
> > > There's also migrate_fd_cleanup and migrate_fd_cancel that can touch
> > > these files. We might need to lock anyway, let's see.
> >
> > The cancel path shouldn't clear the QEMUFile*, then I assume it's fine.
> > That's based on the assumption that qemu_file_shutdown() is actually thread
> > safe (say, shutdown() syscall is thread-safe for sockets).
>
> The shutdown() syscall and qio_channel_shutdown() method are intended
> to be safe to call from any thread *PROVIDED* you can ensure no other
> thread is concurrently going to call close() on the FD (or unref the
> QIOChannel object).
>
> There is no locking in qemu_file_shutdown() to guarantee this, but
> maybe something else in migration code is guaranteeing that the
> QIOChannel object is not going to be closed (or unref'd), while a
> thread is invoking qemu_file_shutdown().
It should currently be guaranteed by the qemu_file_lock I think.
>
> IOW, in theory qemu_file_shutdown() could be safe to use but
> I'm not seeing a clearly expressed guarantee of safety in the
> code. If it is safe, the reasons are very subtle and rationale
> ought to be documented in the comment for qemu_file_shutdown
I agree.
For now for this specific use case of the migration qemufile, we can simply
always take the mutex.
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2023-08-03 15:40 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-02 14:36 [PATCH v2 0/2] Fix segfault on migration return path Fabiano Rosas
2023-08-02 14:36 ` [PATCH v2 1/2] migration: Split await_return_path_close_on_source Fabiano Rosas
2023-08-02 16:19 ` Peter Xu
2023-08-02 19:58 ` Fabiano Rosas
2023-08-02 20:40 ` Peter Xu
2023-08-03 14:45 ` Fabiano Rosas
2023-08-03 15:15 ` Peter Xu
2023-08-03 15:24 ` Daniel P. Berrangé
2023-08-03 15:39 ` Peter Xu [this message]
2023-08-02 14:36 ` [PATCH v2 2/2] migration: Replace the return path retry logic Fabiano Rosas
2023-08-02 16:02 ` Peter Xu
2023-08-02 20:04 ` Fabiano Rosas
2023-08-02 20:44 ` Peter Xu
2023-08-03 15:00 ` Fabiano Rosas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZMvKIyur0DbBGOuy@x1n \
--to=peterx@redhat.com \
--cc=berrange@redhat.com \
--cc=farosas@suse.de \
--cc=leobras@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=wei.w.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).