From: Peter Xu <peterx@redhat.com>
To: "Cédric Le Goater" <clg@redhat.com>
Cc: "Laurent Vivier" <lvivier@redhat.com>,
qemu-devel@nongnu.org, "Fabiano Rosas" <farosas@suse.de>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Avihai Horon" <avihaih@nvidia.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Markus Armbruster" <armbru@redhat.com>,
"Prasad Pandit" <pjp@fedoraproject.org>
Subject: Re: [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup()
Date: Fri, 8 Mar 2024 22:17:50 +0800 [thread overview]
Message-ID: <ZeseDv1o6ihlA2Ct@x1n> (raw)
In-Reply-To: <9772e612-6ecc-4a8f-aae8-86884397f39d@redhat.com>
On Fri, Mar 08, 2024 at 02:55:30PM +0100, Cédric Le Goater wrote:
> On 3/8/24 14:39, Cédric Le Goater wrote:
> > On 3/8/24 14:14, Cédric Le Goater wrote:
> > > On 3/8/24 13:56, Peter Xu wrote:
> > > > On Wed, Mar 06, 2024 at 02:34:25PM +0100, Cédric Le Goater wrote:
> > > > > This prepares ground for the changes coming next which add an Error**
> > > > > argument to the .save_setup() handler. Callers of qemu_savevm_state_setup()
> > > > > now handle the error and fail earlier setting the migration state from
> > > > > MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
> > > > >
> > > > > In qemu_savevm_state(), move the cleanup to preserve the error
> > > > > reported by .save_setup() handlers.
> > > > >
> > > > > Since the previous behavior was to ignore errors at this step of
> > > > > migration, this change should be examined closely to check that
> > > > > cleanups are still correctly done.
> > > > >
> > > > > Signed-off-by: Cédric Le Goater <clg@redhat.com>
> > > > > ---
> > > > >
> > > > > Changes in v4:
> > > > > - Merged cleanup change in qemu_savevm_state()
> > > > > Changes in v3:
> > > > > - Set migration state to MIGRATION_STATUS_FAILED
> > > > > - Fixed error handling to be done under lock in bg_migration_thread()
> > > > > - Made sure an error is always set in case of failure in
> > > > > qemu_savevm_state_setup()
> > > > > migration/savevm.h | 2 +-
> > > > > migration/migration.c | 27 ++++++++++++++++++++++++---
> > > > > migration/savevm.c | 26 +++++++++++++++-----------
> > > > > 3 files changed, 40 insertions(+), 15 deletions(-)
> > > > >
> > > > > diff --git a/migration/savevm.h b/migration/savevm.h
> > > > > index 74669733dd63a080b765866c703234a5c4939223..9ec96a995c93a42aad621595f0ed58596c532328 100644
> > > > > --- a/migration/savevm.h
> > > > > +++ b/migration/savevm.h
> > > > > @@ -32,7 +32,7 @@
> > > > > bool qemu_savevm_state_blocked(Error **errp);
> > > > > void qemu_savevm_non_migratable_list(strList **reasons);
> > > > > int qemu_savevm_state_prepare(Error **errp);
> > > > > -void qemu_savevm_state_setup(QEMUFile *f);
> > > > > +int qemu_savevm_state_setup(QEMUFile *f, Error **errp);
> > > > > bool qemu_savevm_state_guest_unplug_pending(void);
> > > > > int qemu_savevm_state_resume_prepare(MigrationState *s);
> > > > > void qemu_savevm_state_header(QEMUFile *f);
> > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > index a49fcd53ee19df1ce0182bc99d7e064968f0317b..6d1544224e96f5edfe56939a9c8395d88ef29581 100644
> > > > > --- a/migration/migration.c
> > > > > +++ b/migration/migration.c
> > > > > @@ -3408,6 +3408,8 @@ static void *migration_thread(void *opaque)
> > > > > int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > > > > MigThrError thr_error;
> > > > > bool urgent = false;
> > > > > + Error *local_err = NULL;
> > > > > + int ret;
> > > > > thread = migration_threads_add("live_migration", qemu_get_thread_id());
> > > > > @@ -3451,9 +3453,17 @@ static void *migration_thread(void *opaque)
> > > > > }
> > > > > bql_lock();
> > > > > - qemu_savevm_state_setup(s->to_dst_file);
> > > > > + ret = qemu_savevm_state_setup(s->to_dst_file, &local_err);
> > > > > bql_unlock();
> > > > > + if (ret) {
> > > > > + migrate_set_error(s, local_err);
> > > > > + error_free(local_err);
> > > > > + migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> > > > > + MIGRATION_STATUS_FAILED);
> > > > > + goto out;
> > > > > + }
> > > >
> > > > There's a small indent issue, I can fix it.
> > >
> > > checkpatch did report anything.
> > >
> > > >
> > > > The bigger problem is I _think_ this will trigger a ci failure in the
> > > > virtio-net-failover test:
> > > >
> > > > ▶ 121/464 ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling") ERROR
> > > > 121/464 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover ERROR 4.77s killed by signal 6 SIGABRT
> > > > > > > PYTHON=/builds/peterx/qemu/build/pyvenv/bin/python3.8 G_TEST_DBUS_DAEMON=/builds/peterx/qemu/tests/dbus-vmstate-daemon.sh MALLOC_PERTURB_=161 QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_BINARY=./qemu-system-x86_64 /builds/peterx/qemu/build/tests/qtest/virtio-net-failover --tap -k
> > > > ――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
> > > > stderr:
> > > > qemu-system-x86_64: ram_save_setup failed: Input/output error
> > > > **
> > > > ERROR:../tests/qtest/virtio-net-failover.c:1203:test_migrate_abort_wait_unplug: assertion failed (status == "cancelling"): ("cancelled" == "cancelling")
> > > > (test program exited with status code -6)
> > > > ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> > > >
> > > > I am not familiar enough with the failover code, and may not have time
> > > > today to follow this up, copy Laurent. Cedric, if you have time, please
> > > > have a look.
> > >
> > >
> > > Sure. Weird because I usually run make check on x86_64, s390x, ppc64 and
> > > aarch64. Let me check again.
> >
> > I see one timeout error on s390x but not always. See below. It occurs with
> > or without this patchset. the other x86_64, ppc64 arches run fine (a part
> > from one io test failing from time to time)
>
> Ah ! I got this once on aarch64 :
>
> 161/486 ERROR:../tests/qtest/virtio-net-failover.c:1222:test_migrate_abort_wait_unplug: 'device' should not be NULL ERROR
> 161/486 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover ERROR 5.98s killed by signal 6 SIGABRT
> > > > G_TEST_DBUS_DAEMON=/home/legoater/work/qemu/qemu.git/tests/dbus-vmstate-daemon.sh MALLOC_PERTURB_=119 QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_QEMU_IMG=./qemu-img PYTHON=/home/legoater/work/qemu/qemu.git/build/pyvenv/bin/python3 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon /home/legoater/work/qemu/qemu.git/build/tests/qtest/virtio-net-failover --tap -k
> ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> stderr:
> qemu-system-x86_64: ram_save_setup failed: Input/output error
> **
> ERROR:../tests/qtest/virtio-net-failover.c:1222:test_migrate_abort_wait_unplug: 'device' should not be NULL
>
> (test program exited with status code -6)
> ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
Hmm, this one seems different..
>
> I couldn't reproduce yet :/
I never reproduced it locally on x86, and my failure is always at checking
"cancelling" v.s. "cancelled" rather than the NULL check. It's much easier
to trigger on CI in check-system-centos (I don't know why centos..):
https://gitlab.com/peterx/qemu/-/jobs/6351020546
I think at least for the error I hit, the problem is the failover test will
cancel the migration, but if it cancels too fast and during setup now it
can already fail it (while it won't fail before when we ignore
qemu_savevm_state_setup() errors), and I think it'll skip:
qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP,
MIGRATION_STATUS_ACTIVE);
It seems the test wants the "cancelling" to hold until later:
/* while the card is not ejected, we must be in "cancelling" state */
ret = migrate_status(qts);
status = qdict_get_str(ret, "status");
g_assert_cmpstr(status, ==, "cancelling");
qobject_unref(ret);
/* OS unplugs the cards, QEMU can move from wait-unplug state */
qtest_outl(qts, ACPI_PCIHP_ADDR_ICH9 + PCI_EJ_BASE, 1);
Again, since I'll need to read the failover code, not much I can tell.
Laurent might have a clue.
/me disappears..
--
Peter Xu
next prev parent reply other threads:[~2024-03-08 14:18 UTC|newest]
Thread overview: 111+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-06 13:34 [PATCH v4 00/25] migration: Improve error reporting Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 01/25] migration: Report error when shutdown fails Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 02/25] migration: Remove SaveStateHandler and LoadStateHandler typedefs Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 03/25] migration: Add documentation for SaveVMHandlers Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 04/25] migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 05/25] s390/stattrib: Add Error** argument to set_migrationmode() handler Cédric Le Goater
2024-03-07 12:18 ` Fabiano Rosas
2024-03-08 8:11 ` Peter Xu
2024-03-08 8:45 ` Thomas Huth
2024-03-06 13:34 ` [PATCH v4 06/25] vfio: Always report an error in vfio_save_setup() Cédric Le Goater
2024-03-07 9:36 ` Eric Auger
2024-03-06 13:34 ` [PATCH v4 07/25] migration: Always report an error in block_save_setup() Cédric Le Goater
2024-03-07 12:28 ` Fabiano Rosas
2024-03-08 6:59 ` Peter Xu
2024-03-11 15:22 ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 08/25] migration: Always report an error in ram_save_setup() Cédric Le Goater
2024-03-07 12:28 ` Fabiano Rosas
2024-03-06 13:34 ` [PATCH v4 09/25] migration: Add Error** argument to vmstate_save() Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 10/25] migration: Add Error** argument to qemu_savevm_state_setup() Cédric Le Goater
2024-03-07 12:45 ` Fabiano Rosas
2024-03-08 12:56 ` Peter Xu
2024-03-08 13:14 ` Cédric Le Goater
2024-03-08 13:39 ` Cédric Le Goater
2024-03-08 13:55 ` Cédric Le Goater
2024-03-08 14:17 ` Peter Xu [this message]
2024-03-11 18:12 ` Cédric Le Goater
2024-03-11 20:15 ` Peter Xu
2024-03-08 14:11 ` Fabiano Rosas
2024-03-08 14:36 ` Fabiano Rosas
2024-03-11 18:15 ` Cédric Le Goater
2024-03-11 19:03 ` Fabiano Rosas
2024-03-11 20:10 ` Peter Xu
2024-03-12 13:01 ` Cédric Le Goater
2024-03-12 12:32 ` Cédric Le Goater
2024-03-12 13:34 ` Cédric Le Goater
2024-03-12 14:01 ` Cédric Le Goater
2024-03-12 14:24 ` Fabiano Rosas
2024-03-12 15:18 ` Peter Xu
2024-03-12 18:06 ` Cédric Le Goater
2024-03-12 18:28 ` Fabiano Rosas
2024-03-15 10:17 ` Cédric Le Goater
2024-03-15 11:01 ` Peter Xu
2024-03-15 12:20 ` Cédric Le Goater
2024-03-15 13:09 ` Peter Xu
2024-03-15 14:30 ` Cédric Le Goater
2024-03-15 13:11 ` Peter Xu
2024-03-15 14:31 ` Cédric Le Goater
2024-03-15 14:57 ` Peter Xu
2024-03-15 14:21 ` Cédric Le Goater
2024-03-15 14:52 ` Peter Xu
2024-03-19 10:46 ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 11/25] migration: Add Error** argument to .save_setup() handler Cédric Le Goater
2024-03-07 9:53 ` Vladimir Sementsov-Ogievskiy
2024-03-07 10:31 ` Cédric Le Goater
2024-03-07 11:39 ` Vladimir Sementsov-Ogievskiy
2024-03-08 7:11 ` Peter Xu
2024-03-08 8:08 ` Peter Xu
2024-03-06 13:34 ` [PATCH v4 12/25] migration: Add Error** argument to .load_setup() handler Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 13/25] memory: Add Error** argument to .log_global_start() handler Cédric Le Goater
2024-03-15 11:18 ` Peter Xu
2024-03-18 14:33 ` Cédric Le Goater
2024-03-18 14:54 ` Cédric Le Goater
2024-03-18 16:27 ` Peter Xu
2024-03-06 13:34 ` [PATCH v4 14/25] memory: Add Error** argument to the global_dirty_log routines Cédric Le Goater
2024-03-15 11:34 ` Peter Xu
2024-03-18 10:43 ` Cédric Le Goater
2024-03-18 16:03 ` Cédric Le Goater
2024-03-18 16:08 ` Cédric Le Goater
2024-03-18 16:31 ` Peter Xu
2024-03-16 2:41 ` Yong Huang
2024-03-18 16:19 ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 15/25] migration: Modify ram_init_bitmaps() to report dirty tracking errors Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 16/25] vfio: Add Error** argument to .set_dirty_page_tracking() handler Cédric Le Goater
2024-03-07 8:09 ` Eric Auger
2024-03-07 12:06 ` Cédric Le Goater
2024-03-08 7:39 ` Eric Auger
2024-03-08 13:00 ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 17/25] vfio: Add Error** argument to vfio_devices_dma_logging_start() Cédric Le Goater
2024-03-07 8:15 ` Eric Auger
2024-03-07 13:15 ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 18/25] vfio: Add Error** argument to vfio_devices_dma_logging_stop() Cédric Le Goater
2024-03-07 8:53 ` Eric Auger
2024-03-07 14:05 ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 19/25] vfio: Use new Error** argument in vfio_save_setup() Cédric Le Goater
2024-03-07 9:04 ` Eric Auger
2024-03-07 13:35 ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 20/25] vfio: Add Error** argument to .vfio_save_config() handler Cédric Le Goater
2024-03-07 9:13 ` Eric Auger
2024-03-07 13:55 ` Cédric Le Goater
2024-03-08 7:41 ` Eric Auger
2024-03-06 13:34 ` [PATCH v4 21/25] vfio: Reverse test on vfio_get_dirty_bitmap() Cédric Le Goater
2024-03-06 20:51 ` Philippe Mathieu-Daudé
2024-03-07 7:13 ` Cédric Le Goater
2024-03-06 13:34 ` [PATCH v4 22/25] memory: Add Error** argument to memory_get_xlat_addr() Cédric Le Goater
2024-03-15 15:06 ` Peter Xu
2024-03-06 13:34 ` [PATCH v4 23/25] vfio: Add Error** argument to .get_dirty_bitmap() handler Cédric Le Goater
2024-03-07 9:23 ` Eric Auger
2024-03-06 13:34 ` [PATCH v4 24/25] vfio: Also trace event failures in vfio_save_complete_precopy() Cédric Le Goater
2024-03-07 9:28 ` Eric Auger
2024-03-07 13:36 ` Cédric Le Goater
2024-03-08 7:42 ` Eric Auger
2024-03-06 13:34 ` [PATCH v4 25/25] vfio: Extend vfio_set_migration_error() with Error* argument Cédric Le Goater
2024-03-07 9:30 ` Eric Auger
2024-03-08 8:15 ` [PATCH v4 00/25] migration: Improve error reporting Peter Xu
2024-03-08 13:03 ` Cédric Le Goater
2024-03-11 20:24 ` Peter Xu
2024-03-12 7:16 ` Cédric Le Goater
2024-03-12 9:58 ` Cédric Le Goater
2024-03-12 11:50 ` Peter Xu
2024-03-12 12:09 ` Cédric Le Goater
2024-03-12 12:25 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZeseDv1o6ihlA2Ct@x1n \
--to=peterx@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=armbru@redhat.com \
--cc=avihaih@nvidia.com \
--cc=clg@redhat.com \
--cc=farosas@suse.de \
--cc=lvivier@redhat.com \
--cc=philmd@linaro.org \
--cc=pjp@fedoraproject.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).