qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>, qemu-devel@nongnu.org
Cc: peterx@redhat.com, qemu-block@nongnu.org,
	"Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Andrey Drobyshev" <andrey.drobyshev@virtuozzo.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Dr . David Alan Gilbert" <dave@treblig.org>,
	"Kevin Wolf" <kwolf@redhat.com>,
	"Daniel P . Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH v2 0/6] migration/block: disk activation rewrite
Date: Tue, 17 Dec 2024 12:26:44 -0300	[thread overview]
Message-ID: <87h672wcsr.fsf@suse.de> (raw)
In-Reply-To: <20241206230838.1111496-1-peterx@redhat.com>

Peter Xu <peterx@redhat.com> writes:

> CI: https://gitlab.com/peterx/qemu/-/pipelines/1577280033
>  (note: it's a pipeline of two patchsets, to save CI credits and time)
>
> v1: https://lore.kernel.org/r/20241204005138.702289-1-peterx@redhat.com
>
> This is v2 of the series, removing RFC tag, because my goal is to have them
> (or some newer version) merged.
>
> The major change is I merged last three patches, and did quite some changes
> here and there, to make sure the global disk activation status is always
> consistent.  The whole idea is still the same.  I say changelog won't help.
>
> I also temporarily dropped Fabiano's ping-pong test cases to avoid
> different versions floating on the list (as I know a new version is coming
> at some point. Fabiano: you're taking over the 10.0 pulls, so I assume
> you're aware so there's no concern on order of merges).  I'll review the
> test cases separately when they're ready, but this series is still tested
> with that pingpong test and it keeps working.
>
> I started looking at this problem as a whole when reviewing Fabiano's
> series, especially the patch (for a QEMU crash [1]):
>
> https://lore.kernel.org/r/20241125144612.16194-5-farosas@suse.de
>
> The proposed patch could work, but it's unwanted to add such side effect to
> migration.  So I start to think about whether we can provide a cleaner
> approach, because migration doesn't need the disks to be active to work at
> all.  Hence we should try to avoid adding a migration ABI (which doesn't
> matter now, but may matter some day) into prepare phase on disk activation
> status.  Migration should happen with disks inactivated.
>
> It's also a pure wish that, if bdrv_inactivate_all() could be benign to be
> called even if all disks are already inactive.  Then the bug is also gone.
> After all, similar call on bdrv_activate_all() upon all-active disks is all
> fine.  I hope that wish could still be fair.  But I don't know well on
> block layer to say anything meaningful.
>
> And when I was looking at that, I found more things spread all over the
> place on disk activation.  I decided to clean all of them up, while
> hopefully fixing the QEMU crash [1] too.
>
> For this v2, I did some more tests, I want to make sure all the past paths
> keep working at least on failure or cancel races, also in postcopy failure
> cases.  So I did below and they all run pass (when I said "emulated" below,
> I meant I hacked something to trigger those race / rare failures, because
> they aren't easy to trigger with vanilla binary):
>
> - Tested generic migrate_cancel during precopy, disk activation won't be
>   affected.  Disk status reports correct values in tracepoints.
>
> - Test Fabiano's ping-pong migration tests on PAUSED state VM.
>
> - Emulated precopy failure before sending non-iterable, disk inactivation
>   won't happen, and also activation won't trigger after migration cleanups
>   (even if activation on top of activate disk is benign, I checked traces
>   to make sure it'll provide consistent disk status, skipping activation).
>
> - Emulated precopy failure right after sending non-iterable. Disks will be
>   inactivated, but then can be reactivated properly before VM starts.
>
> - Emulated postcopy failure when sending the packed data (which is after
>   disk invalidated), and making sure src VM will get back to live properly,
>   re-activate the disks before starting.
>
> - Emulated concurrent migrate_cancel at the end of migration_completion()
>   of precopy, after disk inactivated.  Disks can be reactivated properly.
>
>   NOTE: here if dest QEMU didn't quit before migrate_cancel,
>   bdrv_activate_all() can crash src QEMU.  This behavior should be the same
>   before/after this patch.
>
> Comments welcomed, thanks.
>
> [1] https://gitlab.com/qemu-project/qemu/-/issues/2395
>
> Peter Xu (6):
>   migration: Add helper to get target runstate
>   qmp/cont: Only activate disks if migration completed
>   migration/block: Make late-block-active the default
>   migration/block: Apply late-block-active behavior to postcopy
>   migration/block: Fix possible race with block_inactive
>   migration/block: Rewrite disk activation
>
>  include/migration/misc.h |   4 ++
>  migration/migration.h    |   6 +-
>  migration/block-active.c |  94 +++++++++++++++++++++++++++
>  migration/colo.c         |   2 +-
>  migration/migration.c    | 136 +++++++++++++++------------------------
>  migration/savevm.c       |  46 ++++++-------
>  monitor/qmp-cmds.c       |  22 +++----
>  migration/meson.build    |   1 +
>  migration/trace-events   |   3 +
>  9 files changed, 188 insertions(+), 126 deletions(-)
>  create mode 100644 migration/block-active.c

Queued, thanks!


      parent reply	other threads:[~2024-12-17 15:27 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-06 23:08 [PATCH v2 0/6] migration/block: disk activation rewrite Peter Xu
2024-12-06 23:08 ` [PATCH v2 1/6] migration: Add helper to get target runstate Peter Xu
2024-12-06 23:08 ` [PATCH v2 2/6] qmp/cont: Only activate disks if migration completed Peter Xu
2024-12-16 15:10   ` Fabiano Rosas
2024-12-06 23:08 ` [PATCH v2 3/6] migration/block: Make late-block-active the default Peter Xu
2024-12-16 15:16   ` Fabiano Rosas
2024-12-06 23:08 ` [PATCH v2 4/6] migration/block: Apply late-block-active behavior to postcopy Peter Xu
2024-12-16 15:17   ` Fabiano Rosas
2024-12-06 23:08 ` [PATCH v2 5/6] migration/block: Fix possible race with block_inactive Peter Xu
2024-12-16 15:18   ` Fabiano Rosas
2024-12-06 23:08 ` [PATCH v2 6/6] migration/block: Rewrite disk activation Peter Xu
2024-12-16 15:58   ` Fabiano Rosas
2024-12-16 18:33     ` Peter Xu
2024-12-17 15:26 ` Fabiano Rosas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h672wcsr.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=andrey.drobyshev@virtuozzo.com \
    --cc=berrange@redhat.com \
    --cc=dave@treblig.org \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).