All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: "Cédric Le Goater" <clg@redhat.com>,
	qemu-devel@nongnu.org,
	"Alex Williamson" <alex.williamson@redhat.com>
Subject: Re: [RFC PATCH 14/14] migration: Fix return-path thread exit
Date: Fri, 23 Feb 2024 11:05:34 -0300	[thread overview]
Message-ID: <87cysn9sy9.fsf@suse.de> (raw)
In-Reply-To: <ZdgfnWJNYeiNYeGN@x1n>

Peter Xu <peterx@redhat.com> writes:

> On Fri, Feb 16, 2024 at 02:35:26PM -0300, Fabiano Rosas wrote:
>> Cédric Le Goater <clg@redhat.com> writes:
>> 
>> > Hello Fabiano
>> >
>> > On 2/14/24 21:35, Fabiano Rosas wrote:
>> >> Cédric Le Goater <clg@redhat.com> writes:
>> >> 
>> >>> Hello Fabiano
>> >>>
>> >>> On 2/8/24 14:29, Fabiano Rosas wrote:
>> >>>> Cédric Le Goater <clg@redhat.com> writes:
>> >>>>
>> >>>>> In case of error, close_return_path_on_source() can perform a shutdown
>> >>>>> to exit the return-path thread.  However, in migrate_fd_cleanup(),
>> >>>>> 'to_dst_file' is closed before calling close_return_path_on_source()
>> >>>>> and the shutdown fails, leaving the source and destination waiting for
>> >>>>> an event to occur.
>> >>>>
>> >>>> Hi, Cédric
>> >>>>
>> >>>> Are you sure this is not caused by patch 13?
>> >>>
>> >>> It happens with upstream QEMU without any patch.
>> >> 
>> >> I might have taken that "shutdown fails" in the commit message too
>> >> literaly. Anyway, I have a proposed solution:
>> >> 
>> >> -->8--
>> >>  From 729aa7b5b7f130f756d41649fdd0862bd2e90430 Mon Sep 17 00:00:00 2001
>> >> From: Fabiano Rosas <farosas@suse.de>
>> >> Date: Wed, 14 Feb 2024 16:45:43 -0300
>> >> Subject: [PATCH] migration: Join the return path thread before releasing
>> >>   to_dst_file
>> >> MIME-Version: 1.0
>> >> Content-Type: text/plain; charset=UTF-8
>> >> Content-Transfer-Encoding: 8bit
>> >> 
>> >> The return path thread might hang at a blocking system call. Before
>> >> joining the thread we might need to issue a shutdown() on the socket
>> >> file descriptor to release it. To determine whether the shutdown() is
>> >> necessary we look at the QEMUFile error.
>> >> 
>> >> Make sure we only clean up the QEMUFile after the return path has been
>> >> waited for.
>> >
>> > Yes. That's the important part.
>> >
>> >> This fixes a hang when qemu_savevm_state_setup() produced an error
>> >> that was detected by migration_detect_error(). That skips
>> >> migration_completion() so close_return_path_on_source() would get
>> >> stuck waiting for the RP thread to terminate.
>> >> 
>> >> At migrate_fd_cleanup() I'm keeping the relative order of joining the
>> >> migration thread and the return path just in case.
>> >
>> > That doesn't look necessary.
>> 
>> Indeed. But I don't trust the migration code, it's full of undocumented
>> dependencies like that.
>> 
>> > What was the reason to join the migration thread only when
>> > s->to_dst_file is valid ?
>> 
>> I didn't find any explicit reason looking through the history. It seems
>> we used to rely on to_dst_file before migration_thread_running was
>> introduced.
>> 
>> I wouldn't mind keeping that 'if' there.
>> 
>> Let's see what Peter thinks about it.
>
> Frankly I don't have a strong opinion on current patch 14 or the new
> proposal, but it seems we reached a consensus.
>
> Fabiano, would you repost with a formal patch, with the proper tags?

Yes, I'll post it soon.

>
> One thing I am still not sure is whether we should still have patch 13
> altogether? Please see my other reply on whether it's possible to have
> migrate_get_error() == true but qemu_file_get_error() == false.

I'll include it then.

> In
> postcopy_pause(), currently we constantly shutdown() so the join() should
> always work:
>
>         qemu_file_shutdown(file);
>         qemu_fclose(file);
>
>         /*
>          * We're already pausing, so ignore any errors on the return
>          * path and just wait for the thread to finish. It will be
>          * re-created when we resume.
>          */
>         close_return_path_on_source(s);
>
> If move close_return_path_on_source() upper, qemu_file_shutdown() may not
> be needed? And I think we need to make sure close_return_path_on_source()
> will always properly kick the other thread.



  reply	other threads:[~2024-02-23 14:09 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-07 13:33 [PATCH 00/14] migration: Improve error reporting Cédric Le Goater
2024-02-07 13:33 ` [PATCH 01/14] migration: Add Error** argument to .save_setup() handler Cédric Le Goater
2024-02-07 20:11   ` Philippe Mathieu-Daudé
2024-02-08 13:27     ` Cédric Le Goater
2024-02-08  4:26   ` Peter Xu
2024-02-12  8:36   ` Avihai Horon
2024-02-12 14:49     ` Cédric Le Goater
2024-02-12 15:57       ` Avihai Horon
2024-02-07 13:33 ` [PATCH 02/14] migration: Add Error** argument to .load_setup() handler Cédric Le Goater
2024-02-07 20:12   ` Philippe Mathieu-Daudé
2024-02-08  4:30   ` Peter Xu
2024-02-09  9:35     ` Cédric Le Goater
2024-02-07 13:33 ` [PATCH 03/14] memory: Add Error** argument to .log_global*() handlers Cédric Le Goater
2024-02-08  5:48   ` Peter Xu
2024-02-09 10:14     ` Cédric Le Goater
2024-02-12  8:43       ` Avihai Horon
2024-02-12 16:36         ` Cédric Le Goater
2024-02-08 15:59   ` Philippe Mathieu-Daudé
2024-02-07 13:33 ` [PATCH 04/14] migration: Modify ram_init_bitmaps() to report dirty tracking errors Cédric Le Goater
2024-02-07 20:15   ` Philippe Mathieu-Daudé
2024-02-12  8:51   ` Avihai Horon
2024-02-12 16:37     ` Cédric Le Goater
2024-02-07 13:33 ` [PATCH 05/14] vfio: Add Error** argument to .set_dirty_page_tracking() handler Cédric Le Goater
2024-02-07 20:16   ` Philippe Mathieu-Daudé
2024-02-07 13:33 ` [PATCH 06/14] vfio: Add Error** argument to vfio_devices_dma_logging_start() Cédric Le Goater
2024-02-07 20:17   ` Philippe Mathieu-Daudé
2024-02-07 13:33 ` [PATCH 07/14] vfio: Add Error** argument to vfio_devices_dma_logging_stop() Cédric Le Goater
2024-02-07 20:18   ` Philippe Mathieu-Daudé
2024-02-07 13:33 ` [PATCH 08/14] vfio: Use new Error** argument in vfio_save_setup() Cédric Le Goater
2024-02-07 20:21   ` Philippe Mathieu-Daudé
2024-02-12  9:17   ` Avihai Horon
2024-02-12 17:54     ` Cédric Le Goater
2024-02-13 13:57       ` Avihai Horon
2024-02-07 13:33 ` [PATCH 09/14] vfio: Add Error** argument to .vfio_save_config() handler Cédric Le Goater
2024-02-07 20:22   ` Philippe Mathieu-Daudé
2024-02-12  9:21   ` Avihai Horon
2024-02-07 13:33 ` [PATCH 10/14] vfio: Also trace event failures in vfio_save_complete_precopy() Cédric Le Goater
2024-02-07 13:33 ` [PATCH 11/14] vfio: Extend vfio_set_migration_error() with Error* argument Cédric Le Goater
2024-02-07 20:25   ` Philippe Mathieu-Daudé
2024-02-12  9:35   ` Avihai Horon
2024-02-16 13:12     ` Cédric Le Goater
2024-02-07 13:33 ` [PATCH 12/14] migration: Report error when shutdown fails Cédric Le Goater
2024-02-07 20:26   ` Philippe Mathieu-Daudé
2024-02-08  5:52   ` Peter Xu
2024-02-07 13:33 ` [PATCH 13/14] migration: Use migrate_has_error() in close_return_path_on_source() Cédric Le Goater
2024-02-08  5:52   ` Peter Xu
2024-02-08 13:07   ` Fabiano Rosas
2024-02-08 13:45     ` Cédric Le Goater
2024-02-08 13:57       ` Fabiano Rosas
2024-02-12 13:03         ` Cédric Le Goater
2024-02-14 16:00           ` Fabiano Rosas
2024-02-16 15:17             ` Cédric Le Goater
2024-02-23  4:14     ` Peter Xu
2024-02-07 13:33 ` [RFC PATCH 14/14] migration: Fix return-path thread exit Cédric Le Goater
2024-02-08  5:57   ` Peter Xu
2024-02-12 16:04     ` Cédric Le Goater
2024-02-23  4:25       ` Peter Xu
2024-02-08 13:29   ` Fabiano Rosas
2024-02-12 15:44     ` Cédric Le Goater
2024-02-14 20:35       ` Fabiano Rosas
2024-02-16 15:08         ` Cédric Le Goater
2024-02-16 17:35           ` Fabiano Rosas
2024-02-23  4:31             ` Peter Xu
2024-02-23 14:05               ` Fabiano Rosas [this message]
2024-02-26  8:44                 ` Cédric Le Goater

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cysn9sy9.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=alex.williamson@redhat.com \
    --cc=clg@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.