From: Laine Stump <laine@redhat.com>
To: qemu-devel@nongnu.org
Subject: [Qemu-devel] race condition when exec'ing "qemu -incoming" followed by monitor "cont"
Date: Fri, 09 Apr 2010 12:03:54 -0400 [thread overview]
Message-ID: <4BBF4FEA.1070306@redhat.com> (raw)
(Please forgive (and correct!) any inaccuracies in my description of
qemu's workings - I've only recently started looking at it directly,
rather than through the lens of libvirt)
libvirt implements a "domain restore" operation by:
0) start with a previously saved domain image in a file
1) open the domain image, and connect it to a pipe
2) fork, connect the pipe to stdin, and exec qemu with "-incoming exec:cat"
3) execute "cont" in that qemu's monitor.
(for those familiar with the code, you can look at the
src/qemu/eqmu_driver.c:qemudDomainRestore() in the libvirt source).
Although this works successfully for most people, I'm consistently
seeing a problem on my particular hardware (Intel Core 2 Duo 2.2Ghz)
that causes this domain restore to fail. It seems that the "cont"
command takes effect before the restore is completed (possibly/probably
before it even starts?) resulting in a failed restore - the domain is
left in some random state, sometimes rebooting spontaneously, sometimes
just hung.
If I insert a usleep(250 * 1000) between starting up qemu with
"-incoming exec:cat" and issuing "cont" to start the CPUs, the restore
is successful 100% of the time.
I've been told that once the incoming migration starts, the monitor will
be non-responsive until it is complete. This should mean that as long as
the "cont" isn't issued until after the migration starts, it will be
blocked until the migration is complete, thus protecting us from the
race; for this reason (along with the fact that a 250msec sleep is
enough to cure the problem) I'm thinking it's likely the "cont" happens
before the migration starts.
There is, of course, an "info migrate" command in the monitor that could
be used to assure the migration had completed before issuing "cont", but
that command only works for outgoing migrations, not incoming
(presumably if it was available, checking the info prior to the
migration starting would return "not started" (or something similar),
and once it had started, the entire monitor interface would block until
the migrate was completed).
Can someone provide any insight on why it is possible to start the CPUs
in the domain before the incoming migration is complete, and what we can
do (other than blindly sleeping) to prevent this?
next reply other threads:[~2010-04-09 16:04 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-09 16:03 Laine Stump [this message]
2010-04-09 16:29 ` [Qemu-devel] Re: race condition when exec'ing "qemu -incoming" followed by monitor "cont" Paolo Bonzini
2010-04-09 16:45 ` [Qemu-devel] " Daniel P. Berrange
2010-04-09 16:47 ` Laine Stump
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BBF4FEA.1070306@redhat.com \
--to=laine@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).