All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>, qemu-devel@nongnu.org
Cc: peterx@redhat.com, qemu-block@nongnu.org,
	"Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Andrey Drobyshev" <andrey.drobyshev@virtuozzo.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Dr . David Alan Gilbert" <dave@treblig.org>,
	"Kevin Wolf" <kwolf@redhat.com>,
	"Daniel P . Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH v2 0/6] migration/block: disk activation rewrite
Date: Tue, 17 Dec 2024 12:26:44 -0300	[thread overview]
Message-ID: <87h672wcsr.fsf@suse.de> (raw)
In-Reply-To: <20241206230838.1111496-1-peterx@redhat.com>

Peter Xu <peterx@redhat.com> writes:

> CI: https://gitlab.com/peterx/qemu/-/pipelines/1577280033
>  (note: it's a pipeline of two patchsets, to save CI credits and time)
>
> v1: https://lore.kernel.org/r/20241204005138.702289-1-peterx@redhat.com
>
> This is v2 of the series, removing RFC tag, because my goal is to have them
> (or some newer version) merged.
>
> The major change is I merged last three patches, and did quite some changes
> here and there, to make sure the global disk activation status is always
> consistent.  The whole idea is still the same.  I say changelog won't help.
>
> I also temporarily dropped Fabiano's ping-pong test cases to avoid
> different versions floating on the list (as I know a new version is coming
> at some point. Fabiano: you're taking over the 10.0 pulls, so I assume
> you're aware so there's no concern on order of merges).  I'll review the
> test cases separately when they're ready, but this series is still tested
> with that pingpong test and it keeps working.
>
> I started looking at this problem as a whole when reviewing Fabiano's
> series, especially the patch (for a QEMU crash [1]):
>
> https://lore.kernel.org/r/20241125144612.16194-5-farosas@suse.de
>
> The proposed patch could work, but it's unwanted to add such side effect to
> migration.  So I start to think about whether we can provide a cleaner
> approach, because migration doesn't need the disks to be active to work at
> all.  Hence we should try to avoid adding a migration ABI (which doesn't
> matter now, but may matter some day) into prepare phase on disk activation
> status.  Migration should happen with disks inactivated.
>
> It's also a pure wish that, if bdrv_inactivate_all() could be benign to be
> called even if all disks are already inactive.  Then the bug is also gone.
> After all, similar call on bdrv_activate_all() upon all-active disks is all
> fine.  I hope that wish could still be fair.  But I don't know well on
> block layer to say anything meaningful.
>
> And when I was looking at that, I found more things spread all over the
> place on disk activation.  I decided to clean all of them up, while
> hopefully fixing the QEMU crash [1] too.
>
> For this v2, I did some more tests, I want to make sure all the past paths
> keep working at least on failure or cancel races, also in postcopy failure
> cases.  So I did below and they all run pass (when I said "emulated" below,
> I meant I hacked something to trigger those race / rare failures, because
> they aren't easy to trigger with vanilla binary):
>
> - Tested generic migrate_cancel during precopy, disk activation won't be
>   affected.  Disk status reports correct values in tracepoints.
>
> - Test Fabiano's ping-pong migration tests on PAUSED state VM.
>
> - Emulated precopy failure before sending non-iterable, disk inactivation
>   won't happen, and also activation won't trigger after migration cleanups
>   (even if activation on top of activate disk is benign, I checked traces
>   to make sure it'll provide consistent disk status, skipping activation).
>
> - Emulated precopy failure right after sending non-iterable. Disks will be
>   inactivated, but then can be reactivated properly before VM starts.
>
> - Emulated postcopy failure when sending the packed data (which is after
>   disk invalidated), and making sure src VM will get back to live properly,
>   re-activate the disks before starting.
>
> - Emulated concurrent migrate_cancel at the end of migration_completion()
>   of precopy, after disk inactivated.  Disks can be reactivated properly.
>
>   NOTE: here if dest QEMU didn't quit before migrate_cancel,
>   bdrv_activate_all() can crash src QEMU.  This behavior should be the same
>   before/after this patch.
>
> Comments welcomed, thanks.
>
> [1] https://gitlab.com/qemu-project/qemu/-/issues/2395
>
> Peter Xu (6):
>   migration: Add helper to get target runstate
>   qmp/cont: Only activate disks if migration completed
>   migration/block: Make late-block-active the default
>   migration/block: Apply late-block-active behavior to postcopy
>   migration/block: Fix possible race with block_inactive
>   migration/block: Rewrite disk activation
>
>  include/migration/misc.h |   4 ++
>  migration/migration.h    |   6 +-
>  migration/block-active.c |  94 +++++++++++++++++++++++++++
>  migration/colo.c         |   2 +-
>  migration/migration.c    | 136 +++++++++++++++------------------------
>  migration/savevm.c       |  46 ++++++-------
>  monitor/qmp-cmds.c       |  22 +++----
>  migration/meson.build    |   1 +
>  migration/trace-events   |   3 +
>  9 files changed, 188 insertions(+), 126 deletions(-)
>  create mode 100644 migration/block-active.c

Queued, thanks!


      parent reply	other threads:[~2024-12-17 15:27 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-06 23:08 [PATCH v2 0/6] migration/block: disk activation rewrite Peter Xu
2024-12-06 23:08 ` [PATCH v2 1/6] migration: Add helper to get target runstate Peter Xu
2024-12-06 23:08 ` [PATCH v2 2/6] qmp/cont: Only activate disks if migration completed Peter Xu
2024-12-16 15:10   ` Fabiano Rosas
2024-12-06 23:08 ` [PATCH v2 3/6] migration/block: Make late-block-active the default Peter Xu
2024-12-16 15:16   ` Fabiano Rosas
2024-12-06 23:08 ` [PATCH v2 4/6] migration/block: Apply late-block-active behavior to postcopy Peter Xu
2024-12-16 15:17   ` Fabiano Rosas
2024-12-06 23:08 ` [PATCH v2 5/6] migration/block: Fix possible race with block_inactive Peter Xu
2024-12-16 15:18   ` Fabiano Rosas
2024-12-06 23:08 ` [PATCH v2 6/6] migration/block: Rewrite disk activation Peter Xu
2024-12-16 15:58   ` Fabiano Rosas
2024-12-16 18:33     ` Peter Xu
2024-12-17 15:26 ` Fabiano Rosas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h672wcsr.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=andrey.drobyshev@virtuozzo.com \
    --cc=berrange@redhat.com \
    --cc=dave@treblig.org \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.