From: Paolo Bonzini <pbonzini@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
Alexey Kardashevskiy <aik@ozlabs.ru>,
Michal Privoznik <mprivozn@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Max Reitz <mreitz@redhat.com>
Subject: Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed
Date: Wed, 17 Sep 2014 17:53:49 +0200 [thread overview]
Message-ID: <5419AE8D.3070007@redhat.com> (raw)
In-Reply-To: <CAJSP0QXq_OY7+UOPO4c1_uu4RsOoWzPVncZ6JDcGF1OYcUNijg@mail.gmail.com>
Il 17/09/2014 17:04, Stefan Hajnoczi ha scritto:
> On Wed, Sep 17, 2014 at 10:25 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Il 17/09/2014 11:06, Stefan Hajnoczi ha scritto:
>>> I think the fundamental problem here is that the mirror block job
>>> on the source host does not synchronize with live migration.
>>>
>>> Remember the mirror block job iterates on the dirty bitmap
>>> whenever it feels like.
>>>
>>> There is no guarantee that the mirror block job has quiesced before
>>> migration handover takes place, right?
>>
>> Libvirt does that. Migration is started only once storage mirroring
>> is out of the bulk phase, and the handover looks like:
>>
>> 1) migration completes
>>
>> 2) because the source VM is stopped, the disk has quiesced on the source
>
> But the mirror block job might still be writing out dirty blocks.
Right, but it quiesces after (3).
>> 3) libvirt sends block-job-complete
>
> No, it sends block-job-cancel after the source QEMU's migration has
> completed. See the qemuMigrationCancelDriveMirror() call in
> src/qemu/qemu_migration.c:qemuMigrationRun().
No problem, block-job-cancel and block-job-complete are the same except
for pivoting to the destination.
>> 4) libvirt receives BLOCK_JOB_COMPLETED. The disk has now quiesced on
>> the destination as well.
>
> I don't see where this happens in the libvirt source code. Libvirt
> doesn't care about block job events for drive-mirror during migration.
>
> And that's why there could still be I/O going on (since
> block-job-cancel is asynchronous).
Oops, this would be a bug! block-job-complete and block-job-cancel are
asynchronous. CCing Michal Privoznik who wrote the libvirt code.
Paolo
>> 5) the VM is started on the destination
>>
>> 6) the NBD server is stopped on the destination and the source VM is quit.
>>
>> It is actually a feature that storage migration is completed
>> asynchronously with respect to RAM migration. The problem is that
>> qcow2_invalidate_cache happens between (3) and (5), and it doesn't
>> like the concurrent I/O received by the NBD server.
>
> I agree that qcow2_invalidate_cache() (and any other invalidate cache
> implementations) need to allow concurrent I/O requests.
>
> Either I'm misreading the libvirt code or libvirt is not actually
> ensuring that the block job on the source has cancelled/completed
> before the guest is resumed on the destination. So I think there is
> still a bug, maybe Eric can verify this?
>
> Stefan
>
prev parent reply other threads:[~2014-09-17 15:54 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-15 10:50 [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed Alexey Kardashevskiy
2014-09-16 12:02 ` Alexey Kardashevskiy
2014-09-16 12:10 ` Paolo Bonzini
2014-09-16 12:34 ` Kevin Wolf
2014-09-16 12:35 ` Paolo Bonzini
2014-09-16 12:52 ` Kevin Wolf
2014-09-16 12:59 ` Paolo Bonzini
2014-09-19 8:47 ` Kevin Wolf
2014-09-23 8:47 ` [Qemu-devel] [RFC PATCH] qcow2: Fix race in cache invalidation Alexey Kardashevskiy
2014-09-24 7:30 ` Alexey Kardashevskiy
2014-09-24 9:48 ` Kevin Wolf
2014-09-25 8:41 ` Alexey Kardashevskiy
2014-09-25 8:57 ` Kevin Wolf
2014-09-25 9:55 ` Alexey Kardashevskiy
2014-09-25 10:20 ` Kevin Wolf
2014-09-25 12:29 ` Alexey Kardashevskiy
2014-09-25 12:39 ` Kevin Wolf
2014-09-25 14:05 ` Alexey Kardashevskiy
2014-09-28 11:14 ` Alexey Kardashevskiy
2014-09-17 6:46 ` [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed Alexey Kardashevskiy
2014-09-16 14:52 ` Alexey Kardashevskiy
2014-09-17 9:06 ` Stefan Hajnoczi
2014-09-17 9:25 ` Paolo Bonzini
2014-09-17 13:44 ` Alexey Kardashevskiy
2014-09-17 15:07 ` Stefan Hajnoczi
2014-09-18 3:26 ` Alexey Kardashevskiy
2014-09-18 9:56 ` Paolo Bonzini
2014-09-19 8:23 ` Alexey Kardashevskiy
2014-09-17 15:04 ` Stefan Hajnoczi
2014-09-17 15:17 ` Eric Blake
2014-09-17 15:53 ` Paolo Bonzini [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5419AE8D.3070007@redhat.com \
--to=pbonzini@redhat.com \
--cc=aik@ozlabs.ru \
--cc=dgilbert@redhat.com \
--cc=kwolf@redhat.com \
--cc=mprivozn@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.