All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Fabiano Rosas <farosas@suse.de>,
	qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>,
	Wei Wang <wei.w.wang@intel.com>,
	Leonardo Bras <leobras@redhat.com>
Subject: Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source
Date: Thu, 3 Aug 2023 16:24:16 +0100	[thread overview]
Message-ID: <ZMvGoD61XG/Lr3jI@redhat.com> (raw)
In-Reply-To: <ZMvEnbQzPEqIROlD@x1n>

On Thu, Aug 03, 2023 at 11:15:41AM -0400, Peter Xu wrote:
> On Thu, Aug 03, 2023 at 11:45:38AM -0300, Fabiano Rosas wrote:
> > Peter Xu <peterx@redhat.com> writes:
> > 
> > > On Wed, Aug 02, 2023 at 04:58:38PM -0300, Fabiano Rosas wrote:
> > >> Peter Xu <peterx@redhat.com> writes:
> > >> 
> > >> > On Wed, Aug 02, 2023 at 11:36:43AM -0300, Fabiano Rosas wrote:
> > >> >> This function currently has a straight-forward part which is waiting
> > >> >> for the thread to join and a complicated part which is doing a
> > >> >> qemu_file_shutdown() on the return path file.
> > >> >> 
> > >> >> The shutdown is tricky because all calls to qemu_file_shutdown() set
> > >> >> f->last_error to -EIO, which means we can never know if an error is an
> > >> >> actual error or if we cleanly shutdown the file previously.
> > >> >> 
> > >> >> This is particularly bothersome for postcopy because it would send the
> > >> >> return path thread into the retry routine which would wait on the
> > >> >> postcopy_pause_rp_sem and consequently block the main thread. We
> > >> >> haven't had reports of this so I must presume we never reach here with
> > >> >> postcopy.
> > >> >> 
> > >> >> The shutdown call is also racy because since it doesn't take the
> > >> >> qemu_file_lock, it could NULL-dereference if the return path thread
> > >> >> happens to be in the middle of the critical region at
> > >> >> migration_release_dst_files().
> > >> >
> > >> > After you rework the thread model on resume, shall we move
> > >> > migration_release_dst_files() into the migration thread to be after the
> > >> > pthread_join()?  I assume then we don't even need a mutex to protect it?
> > >> >
> > >> 
> > >> I just need to figure out if it's ok to move the postcopy_qemufile_src
> > >> cleanup along. No idea why it is there in the first place. I see you
> > >> moved it from postcopy_pause and we're about to move it back to the
> > >> exact same place =D
> > >
> > > It was there because the old postcopy-preempt was sending data via
> > > postcopy_qemufile_src from the migration thread, while postcopy_pause is
> > > also the migration thread context.
> > >
> > > Then we had 9358982744 ("migration: Send requested page directly in
> > > rp-return thread") where we moved that "send page" operation into the
> > > return path thread to reduce latencies.  After moving there it also means
> > > the file handle can be accessed in >1 threads, so I just moved it over to
> > > operate that always in the return path thread, then no race should happen.
> > >
> > 
> > Thanks for the context.
> > 
> > > With your change, return path will vanish before migration thread accesses
> > > it later (so as mentioned above, it must be after pthread_join()
> > > succeeded), then I assume it'll be fine too to have it back in migration
> > > thread.
> > >
> > > Or perhaps just take the file lock?
> > >
> > 
> > There's also migrate_fd_cleanup and migrate_fd_cancel that can touch
> > these files. We might need to lock anyway, let's see.
> 
> The cancel path shouldn't clear the QEMUFile*, then I assume it's fine.
> That's based on the assumption that qemu_file_shutdown() is actually thread
> safe (say, shutdown() syscall is thread-safe for sockets).

The shutdown() syscall and qio_channel_shutdown() method are intended
to be safe to call from any thread *PROVIDED* you can ensure no other
thread is concurrently going to call close() on the FD (or unref the
QIOChannel object).

There is no locking in qemu_file_shutdown() to guarantee this, but
maybe something else in migration code is guaranteeing that the
QIOChannel object is not going to be closed (or unref'd), while a
thread is invoking qemu_file_shutdown().

IOW, in theory qemu_file_shutdown() could be safe to use but
I'm not seeing a clearly expressed guarantee of safety in the
code. If it is safe, the reasons are very subtle and rationale
ought to be documented in the comment for qemu_file_shutdown


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



  reply	other threads:[~2023-08-03 15:25 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-02 14:36 [PATCH v2 0/2] Fix segfault on migration return path Fabiano Rosas
2023-08-02 14:36 ` [PATCH v2 1/2] migration: Split await_return_path_close_on_source Fabiano Rosas
2023-08-02 16:19   ` Peter Xu
2023-08-02 19:58     ` Fabiano Rosas
2023-08-02 20:40       ` Peter Xu
2023-08-03 14:45         ` Fabiano Rosas
2023-08-03 15:15           ` Peter Xu
2023-08-03 15:24             ` Daniel P. Berrangé [this message]
2023-08-03 15:39               ` Peter Xu
2023-08-02 14:36 ` [PATCH v2 2/2] migration: Replace the return path retry logic Fabiano Rosas
2023-08-02 16:02   ` Peter Xu
2023-08-02 20:04     ` Fabiano Rosas
2023-08-02 20:44       ` Peter Xu
2023-08-03 15:00         ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZMvGoD61XG/Lr3jI@redhat.com \
    --to=berrange@redhat.com \
    --cc=farosas@suse.de \
    --cc=leobras@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=wei.w.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.