From: Juan Quintela <quintela@redhat.com>
To: Fiona Ebner <f.ebner@proxmox.com>
Cc: qemu-devel@nongnu.org, stefanha@redhat.com, fam@euphon.net,
eblake@redhat.com, vsementsov@yandex-team.ru, jsnow@redhat.com,
peterx@redhat.com, leobras@redhat.com, qemu-block@nongnu.org
Subject: Re: [RFC] migration/block-dirty-bitmap: make loading bitmap for device with iothread future-proof
Date: Mon, 31 Jul 2023 09:35:42 +0200 [thread overview]
Message-ID: <871qgocz3l.fsf@secure.mitica> (raw)
In-Reply-To: <20230728133928.256898-1-f.ebner@proxmox.com> (Fiona Ebner's message of "Fri, 28 Jul 2023 15:39:28 +0200")
Fiona Ebner <f.ebner@proxmox.com> wrote:
> The bdrv_create_dirty_bitmap() function (which is also called by
> bdrv_dirty_bitmap_create_successor()) uses bdrv_getlength(bs). This is
> a wrapper around a coroutine, and when not called in coroutine context
> would use bdrv_poll_co(). Such a call would trigger an assert() if the
> correct AioContext hasn't been acquired before, because polling would
> try to release the AioContext.
The ingenous in me thinks:
If the problem is that bdrv_poll_co() release an AioContext that it
don't have acquired, perhaps we should fix bdrv_poll_co().
Ha!!!
$ find . -type f -exec grep --color=auto -nH --null -e bdrv_poll_co \{\} +
./scripts/block-coroutine-wrapper.py\0173:
bdrv_poll_co(&s.poll_state);
./scripts/block-coroutine-wrapper.py\0198:
bdrv_poll_co(&s.poll_state);
./block/block-gen.h\038:static inline void bdrv_poll_co(BdrvPollCo *s)
$
/me retreats
> The issue does not happen for migration, because the call happens
> from process_incoming_migration_co(), i.e. in coroutine context. So
> the bdrv_getlength() wrapper will just call bdrv_co_getlength()
> directly without polling.
The ingenous in me wonders why bdrv_getlength() needs to use coroutines
at all, but as I have been burned on the previous paragraph, I learn not
to even try.
Ok, I never learn, so I do a grep and I see two appearces of
bdrv_getlength in include files, but grep only shows uses of the
function, not a real definition.
I even see
co_wrapper_mixed_bdrv_rdlock
And decied to stop here.
> The issue would happen for snapshots, but won't in practice, because
> saving a snapshot with a block dirty bitmap is currently not possible.
> The reason is that dirty_bitmap_save_iterate() returns whether it has
> completed the bulk phase, which only happens in postcopy, so
> qemu_savevm_state_iterate() will always return 0, meaning the call
> to iterate will be repeated over and over again without ever reaching
> the completion phase.
>
> Still, this would make the code more robust for the future.
What I wonder is if we should annotate this calls somehow that they need
to be called from a coroutine context, because I would have never found
it.
And just thinking loud:
I still wonder if we should make incoming migration its own thread.
Because we got more and more problems because it is a coroutine, that in
non-multifd case can consume a whole CPU alone, so it makes no sense to
have a coroutine.
On the other hand, with multifd, it almost don't use any resources, so ....
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>
> We ran into this issue downstream, because we have a custom snapshot
> mechanism which does support dirty bitmaps and does not run in
> coroutine context during load.
>
> migration/block-dirty-bitmap.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index 032fc5f405..e1ae3b7316 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -805,8 +805,11 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
> "destination", bdrv_dirty_bitmap_name(s->bitmap));
> return -EINVAL;
> } else {
> + AioContext *ctx = bdrv_get_aio_context(s->bs);
> + aio_context_acquire(ctx);
> s->bitmap = bdrv_create_dirty_bitmap(s->bs, granularity,
> s->bitmap_name, &local_err);
> + aio_context_release(ctx);
> if (!s->bitmap) {
> error_report_err(local_err);
> return -EINVAL;
I was going to suggest to put that code inside brdv_create_dirty_bitmap,
because migration come is already complex enough without knowing about
AioContext, but it appears that this pattern is already used in several
places.
> @@ -833,7 +836,10 @@ static int dirty_bitmap_load_start(QEMUFile *f, DBMLoadState *s)
>
> bdrv_disable_dirty_bitmap(s->bitmap);
> if (flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED) {
> + AioContext *ctx = bdrv_get_aio_context(s->bs);
> + aio_context_acquire(ctx);
> bdrv_dirty_bitmap_create_successor(s->bitmap, &local_err);
> + aio_context_release(ctx);
> if (local_err) {
> error_report_err(local_err);
> return -EINVAL;
for(i=1; i < 1000; i++)
printf(""I will not grep what a successor of a dirty bitmap
is\n");
Later, Juan.
next prev parent reply other threads:[~2023-07-31 7:36 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-28 13:39 [RFC] migration/block-dirty-bitmap: make loading bitmap for device with iothread future-proof Fiona Ebner
2023-07-31 7:35 ` Juan Quintela [this message]
2023-08-01 7:57 ` Fiona Ebner
2023-10-04 16:04 ` Vladimir Sementsov-Ogievskiy
2023-10-04 16:13 ` Vladimir Sementsov-Ogievskiy
2023-10-05 7:19 ` Fiona Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871qgocz3l.fsf@secure.mitica \
--to=quintela@redhat.com \
--cc=eblake@redhat.com \
--cc=f.ebner@proxmox.com \
--cc=fam@euphon.net \
--cc=jsnow@redhat.com \
--cc=leobras@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@yandex-team.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.