From: "Cédric Le Goater" <clg@redhat.com>
To: Avihai Horon <avihaih@nvidia.com>, qemu-devel@nongnu.org
Cc: Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>
Subject: Re: [PATCH for-9.0 1/2] migration: Set migration error in migration_completion()
Date: Thu, 28 Mar 2024 17:29:15 +0100 [thread overview]
Message-ID: <ca670136-b362-42b0-9a93-3b28ff4b3397@redhat.com> (raw)
In-Reply-To: <9d3a476d-9e99-4ed8-8320-8bf86269dbf3@nvidia.com>
On 3/28/24 16:50, Avihai Horon wrote:
>
> On 28/03/2024 17:21, Cédric Le Goater wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hello Avihai,
>>
>> On 3/28/24 15:02, Avihai Horon wrote:
>>> After commit 9425ef3f990a ("migration: Use migrate_has_error() in
>>> close_return_path_on_source()"), close_return_path_on_source() assumes
>>> that migration error is set if an error occurs during migration.
>>>
>>> This may not be true if migration errors in migration_completion(). For
>>> example, if qemu_savevm_state_complete_precopy() errors, migration error
>>> will not be set
>>
>> Out of curiosity, could you describe a bit more the context ? Did
>> vfio_save_complete_precopy() fail ? why ?
>
> Yep, vfio_save_complete_precopy() failed (but it failed while I was experimenting with an unofficial debug FW).
>
>>
>> We should propagate errors of .save_live_complete_precopy() handlers as
>> it was done .save_setup handlers(). For 9.1.
>
> Agreed.
>
>>
>>> This in turn, will cause a migration hang bug, similar to the bug that
>>> was fixed by commit 22b04245f0d5 ("migration: Join the return path
>>> thread before releasing to_dst_file"), as shutdown() will not be issued
>>> for the return-path channel.
>>
>> yes, but this test :
>>
>> if (ret < 0) {
>> goto fail;
>> }
>>
>> will skip the close_return_path_on_source() call. Won't it ? So I don't
>> understand how it can be an issue. Am I missing something ?
>
> It will skip the close_return_path_on_source() call in migration_completion(), but there is another close_return_path_on_source() call in migrate_fd_cleanup().
OK. Found it. This is a code path I hadn't explored yet.
Acked-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
>
>>
>>> Fix it by ensuring migration error is set in case of error in
>>> migration_completion().
>>
>> Why didn't you add a reference to commit 9425ef3f990a ?
>
> I thought this commit didn't introduce this bug, but looking again in the mailing list [1], it kinda did:
> The hang bug was fully fixed by commit 22b04245f0d ("migration: Join the return path thread before releasing to_dst_file") and then 9425ef3f990a re-introduced the bug, but only for migration_completion() case.
> So, you are right, a fixes line with 9425ef3f990a should be added.
>
> Thanks.
>
> [1] https://lore.kernel.org/all/20240226203122.22894-1-farosas@suse.de/
>
>>
>>
>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>> ---
>>> migration/migration.c | 10 ++++++++++
>>> 1 file changed, 10 insertions(+)
>>>
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 9fe8fd2afd7..b73ae3a72c4 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -2799,6 +2799,7 @@ static void migration_completion(MigrationState *s)
>>> {
>>> int ret = 0;
>>> int current_active_state = s->state;
>>> + Error *local_err = NULL;
>>>
>>> if (s->state == MIGRATION_STATUS_ACTIVE) {
>>> ret = migration_completion_precopy(s, ¤t_active_state);
>>> @@ -2832,6 +2833,15 @@ static void migration_completion(MigrationState *s)
>>> return;
>>>
>>> fail:
>>> + if (qemu_file_get_error_obj(s->to_dst_file, &local_err)) {
>>> + migrate_set_error(s, local_err);
>>> + error_free(local_err);
>>> + } else if (ret) {
>>> + error_setg_errno(&local_err, -ret, "Error in migration completion");
>>
>> The 'ret = -1' case could be improved with error_setg(). As a followup.
>>
>> Thanks,
>>
>> C.
>>
>>
>>
>>
>>> + migrate_set_error(s, local_err);
>>> + error_free(local_err);
>>> + }
>>> +
>>> migration_completion_failed(s, current_active_state);
>>> }
>>>
>>
>
next prev parent reply other threads:[~2024-03-28 16:30 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-28 14:02 [PATCH for-9.0 0/2] migration: Two migration bug fixes Avihai Horon
2024-03-28 14:02 ` [PATCH for-9.0 1/2] migration: Set migration error in migration_completion() Avihai Horon
2024-03-28 15:09 ` Peter Xu
2024-03-28 15:54 ` Avihai Horon
2024-03-28 15:21 ` Cédric Le Goater
2024-03-28 15:50 ` Avihai Horon
2024-03-28 16:29 ` Cédric Le Goater [this message]
2024-03-28 14:02 ` [PATCH for-9.0 2/2] migration/postcopy: Ensure postcopy_start() sets errp if it fails Avihai Horon
2024-03-28 15:04 ` Cédric Le Goater
2024-03-28 15:10 ` Peter Xu
2024-03-28 17:04 ` [PATCH for-9.0 0/2] migration: Two migration bug fixes Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ca670136-b362-42b0-9a93-3b28ff4b3397@redhat.com \
--to=clg@redhat.com \
--cc=avihaih@nvidia.com \
--cc=farosas@suse.de \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).