From: Max Reitz <mreitz@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
Juan Quintela <quintela@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: Properly quitting qemu immediately after failing migration
Date: Mon, 29 Jun 2020 17:00:57 +0200 [thread overview]
Message-ID: <9eecca93-e7d9-d1da-7fcd-ee60978ec460@redhat.com> (raw)
In-Reply-To: <92ce741d-ef67-fbf9-a889-27d9ae218681@virtuozzo.com>
[-- Attachment #1.1: Type: text/plain, Size: 3218 bytes --]
On 29.06.20 16:18, Vladimir Sementsov-Ogievskiy wrote:
> 29.06.2020 16:48, Max Reitz wrote:
>> Hi,
>>
>> In an iotest, I’m trying to quit qemu immediately after a migration has
>> failed. Unfortunately, that doesn’t seem to be possible in a clean way:
>> migrate_fd_cleanup() runs only at some point after the migration state
>> is already “failed”, so if I just wait for that “failed” state and
>> immediately quit, some cleanup functions may not have been run yet.
>>
>> This is a problem with dirty bitmap migration at least, because it
>> increases the refcount on all block devices that are to be migrated, so
>> if we don’t call the cleanup function before quitting, the refcount will
>> stay elevated and bdrv_close_all() will hit an assertion because those
>> block devices are still around after blk_remove_all_bs() and
>> blockdev_close_all_bdrv_states().
>>
>> In practice this particular issue might not be that big of a problem,
>> because it just means qemu aborts when the user intended to let it quit
>> anyway. But on one hand I could imagine that there are other clean-up
>> paths that should definitely run before qemu quits (although I don’t
>> know), and on the other, it’s a problem for my test.
>>
>> I tried working around the problem for my test by waiting on “Unable to
>> write” appearing on stderr, because that indicates that
>> migrate_fd_cleanup()’s error_report_err() has been reached. But on one
>> hand, that isn’t really nice, and on the other, it doesn’t even work
>> when the failure is on the source side (because then there is no
>> s->error for migrate_fd_cleanup() to report).
(I’ve now managed to work around it by invoking blockdev-del on a node
affected by bitmap migration until it succeeds, because blockdev-del can
only succeed once the bitmap migration code has dropped its reference to
it.)
>> In all, I’m asking:
>> (1) Is there a nice solution for me now to delay quitting qemu until the
>> failed migration has been fully resolved, including the clean-up?
>>
>> (2) Isn’t it a problem if qemu crashes when you issue “quit” via QMP at
>> the wrong time? Like, maybe lingering subprocesses when using “exec”?
>>
>>
>
> I'll look more closely tomorrow, but as a short answer: try my series
> "[PATCH v2 00/22] Fix error handling during bitmap postcopy" it
> handles different problems around migration failures & qemu shutdown,
> probably it will help.
Not, it doesn’t seem to.
I’m not sure what exactly that series addresses, but FWIW I’m hitting
the problem in non-postcopy migration. What my simplest reproducer does is:
On the source VM:
blockdev-add node-name='foo' driver='null-co'
block-dirty-bitmap-add node='foo' name='bmap0'
(Launch destination VM with some -incoming, e.g.
-incoming 'exec: cat /tmp/mig_file')
Both on source and destination:
migrate-set-capabilities capabilities=[
{capability='events', state=true},
{capability='dirty-bitmaps', state=true}
]
On source:
migrate uri='exec: cat > /tmp/mig_file'
Then wait for a MIGRATION event with data/status == 'failed', and then
issue 'quit'.
Max
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2020-06-29 15:02 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-29 13:48 Properly quitting qemu immediately after failing migration Max Reitz
2020-06-29 14:18 ` Vladimir Sementsov-Ogievskiy
2020-06-29 15:00 ` Max Reitz [this message]
2020-07-01 16:16 ` Vladimir Sementsov-Ogievskiy
2020-07-02 7:23 ` Max Reitz
2020-07-02 11:44 ` Vladimir Sementsov-Ogievskiy
2020-07-02 12:57 ` Vladimir Sementsov-Ogievskiy
2020-06-29 15:41 ` Dr. David Alan Gilbert
2020-06-29 16:08 ` Max Reitz
2020-06-29 16:46 ` Dr. David Alan Gilbert
2020-06-29 15:45 ` Daniel P. Berrangé
2020-06-29 16:00 ` Max Reitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9eecca93-e7d9-d1da-7fcd-ee60978ec460@redhat.com \
--to=mreitz@redhat.com \
--cc=dgilbert@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.