qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] migration: Fix the BDRV_O_INACTIVE assertion
@ 2024-11-25 14:46 Fabiano Rosas
  2024-11-25 14:46 ` [PATCH 1/5] tests/qtest/migration: Move more code under only_target Fabiano Rosas
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Fabiano Rosas @ 2024-11-25 14:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Peter Xu, Daniel P . Berrangé, andrey.drobyshev,
	den, Vladimir Sementsov-Ogievskiy

At this point everyone is probably familiar with the assertion:

  bdrv_inactivate_recurse: Assertion `!(bs->open_flags &
  BDRV_O_INACTIVE)' failed.

The issue is that at the end of migration, the block layer is
deactivated to release the locks on the disks so that in a shared
storage scenario the migration destination can take over. Trying to
deactivate twice leads to the above assertion.

The reason deactivation is happening twice is due to the migration
capability "late-block-activate", which delays activation until the
destination VM receives a qmp_cont command. The reasoning for that cap
is:

    Activating the block devices causes the locks to be taken on
    the backing file.  If we're running with -S and the destination libvirt
    hasn't started the destination with 'cont', it's expecting the locks are
    still untaken.

    Don't activate the block devices if we're not going to autostart the VM;
    'cont' already will do that anyway.   This change is tied to the new
    migration capability 'late-block-activate' that defaults to off, keeping
    the old behaviour by default.

    bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854

However, the expected qmp_cont command never happens, because the user
may never decide to continue the VM and instead attempt another
migration while the VM is paused.

Fix this by assuming that the qmp_migrate command having been issued
is enough indication that it is now ok to take the locks again
(similarly to qmp_cont) and call bdrv_activate_all() at that
moment. This is the least invasive fix that won't require us to add
new qmp commands and go change libvirt, etc.

I'm also copying the block list (for obvious reasons, but also)
because Vladimir pointed out that asserts of this kind might not only
happen during migration. I know of at least
bdrv_co_write_req_prepare() which will assert if reached when a guest
is paused after a migration that used late-block-activate.

Patches 1-3 are just shuffling code;

Patch 4 is the fix;

Patch 5 is the test. To reproduce the issue revert patch 4 and run:

QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test -p \
/x86_64/migration/precopy/ping-pong/late-block-activate

References:
https://gitlab.com/qemu-project/qemu/-/issues/686
https://gitlab.com/qemu-project/qemu/-/issues/2395
https://lore.kernel.org/r/20240924125611.664315-1-andrey.drobyshev@virtuozzo.com

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1556902330

Fabiano Rosas (5):
  tests/qtest/migration: Move more code under only_target
  tests/qtest/migration: Don't use hardcoded strings for -serial
  tests/qtest/migration: Support cleaning up only one side of migration
  migration: Activate block devices if VM is paused when migrating
  tests/qtest/migration: Test successive migrations

 migration/migration.c           |  19 +++
 tests/qtest/migration-helpers.c |   8 +
 tests/qtest/migration-helpers.h |   2 +
 tests/qtest/migration-test.c    | 252 +++++++++++++++++++++++++-------
 4 files changed, 226 insertions(+), 55 deletions(-)


base-commit: 134b443512825bed401b6e141447b8cdc22d2efe
-- 
2.35.3



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-12-04  0:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-25 14:46 [PATCH 0/5] migration: Fix the BDRV_O_INACTIVE assertion Fabiano Rosas
2024-11-25 14:46 ` [PATCH 1/5] tests/qtest/migration: Move more code under only_target Fabiano Rosas
2024-11-25 14:46 ` [PATCH 2/5] tests/qtest/migration: Don't use hardcoded strings for -serial Fabiano Rosas
2024-11-25 14:46 ` [PATCH 3/5] tests/qtest/migration: Support cleaning up only one side of migration Fabiano Rosas
2024-11-25 14:46 ` [PATCH 4/5] migration: Activate block devices if VM is paused when migrating Fabiano Rosas
2024-11-25 17:07   ` Peter Xu
2024-11-27 17:40     ` Andrey Drobyshev
2024-12-04  0:57       ` Peter Xu
2024-11-27 17:40   ` Andrey Drobyshev
2024-11-25 14:46 ` [PATCH 5/5] tests/qtest/migration: Test successive migrations Fabiano Rosas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).