qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Daniel P. Berrange" <berrange@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <famz@redhat.com>,
	qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] Help on possible hang in drive-mirror / query-block-jobs
Date: Thu, 10 Jul 2014 16:53:15 +0100	[thread overview]
Message-ID: <20140710155315.GO23577@redhat.com> (raw)
In-Reply-To: <53BEAF80.1060308@redhat.com>

On Thu, Jul 10, 2014 at 05:21:36PM +0200, Paolo Bonzini wrote:
> Il 10/07/2014 17:11, Daniel P. Berrange ha scritto:
> >I've spent the last week debugging an issue that is hitting OpenStack
> >with drive-mirror/block job usage.
> >
> >Specifically we are seeing that a monitor command for 'query-block-jobs'
> >never replies to libvirt. After 3 minutes of waiting the test harness
> >times out and kills the VM. When working normally the entire test will
> >complete in just a couple of seconds, so we don't think the 3 minute
> >timeout is hitting a false positive.
> >
> >[...] The rate failure of this problem is only
> >around 1 in 400 uses of drive-mirror. [...] No one has ever
> >managed to reproduce the problem outside of our automated test
> >system environment, even when running the same tests locally and
> >we can't log into the test systems to get GDB traces or install
> >custom QEMU builds.
> >
> >The best I can do is to collect debug logs from libvirtd, and get
> >stdio from QEMU. The QEMU stderr/stdout shows nothing at all. The
> >libvirtd log shows the following sequence of monitor interactions
> >with QEMU:
> >
> >[...]
> >
> >5. Libvirt waits for cleanup to complete:
> >
> >msg={"execute":"query-block-jobs","id":"libvirt-15"}
> >reply={"return": [{"io-status": "ok", "device": "drive-virtio-disk0", "busy": true, "len": 25165824, "offset": 25165824, "paused": false, "speed": 0, "type": "mirror"}], "id": "libvirt-15"}
> >
> >msg={"execute":"query-block-jobs","id":"libvirt-16"}
> >reply={"return": [{"io-status": "ok", "device": "drive-virtio-disk0", "busy": true, "len": 25165824, "offset": 25165824, "paused": false, "speed": 0, "type": "mirror"}], "id": "libvirt-16"}
> >
> >msg={"execute":"query-block-jobs","id":"libvirt-17"}
> ><...hang...>
> >
> >So we can see this last 'query-block-jobs' command hangs. I've looked at
> >the code for handling this monitor command and struggling to come up with
> >any ideas of why this might hang.  My best idea was the bdrv_iterate()
> >call it does might be happening at the same time as another thread modifies
> >the list, but debugging on a local QEMU shows no changes to the list at
> >all due to drive-mirror/block jobs, so that doesn't seem to be the cause.
> 
> No, all these modifications are anyway done with the big QEMU lock.  All
> multitasking happens in coroutines, and the preemption points only happen
> when you have AIO.
> 
> Can you install a custom QEMU?  How many megabytes of stdout can your test
> rig tolerate?  Any chance you can collect other files (traces)?

I can possibly come up with some gross hack to wget a qemu binary from
an external host at the start of the test. Can generate 100MB of test
logs without it being an issue.

What sort of info would be helpful to collect. I can't easily get
arbitrary files out of the test hosts, but if there's a way to get the
debug info into QEMU's  stdout/stderr logs then we're collecting those
already.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

  reply	other threads:[~2014-07-10 15:53 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-10 15:11 [Qemu-devel] Help on possible hang in drive-mirror / query-block-jobs Daniel P. Berrange
2014-07-10 15:21 ` Paolo Bonzini
2014-07-10 15:53   ` Daniel P. Berrange [this message]
2014-07-10 16:02     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140710155315.GO23577@redhat.com \
    --to=berrange@redhat.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).