From: Paolo Bonzini <pbonzini@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: qemu-devel <qemu-devel@nongnu.org>,
"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
"Fernando Casas Schössow" <casasfernando@outlook.com>
Subject: Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error
Date: Wed, 20 Feb 2019 18:53:54 +0100 [thread overview]
Message-ID: <f168e72e-89fe-cf9b-89ff-3eb820f8a5cf@redhat.com> (raw)
In-Reply-To: <20190220165839.GJ30403@stefanha-x1.localdomain>
[-- Attachment #1: Type: text/plain, Size: 5473 bytes --]
On 20/02/19 17:58, Stefan Hajnoczi wrote:
> On Mon, Feb 18, 2019 at 07:21:25AM +0000, Fernando Casas Schössow wrote:
>> It took a few days but last night the problem was reproduced.
>> This is the information from the log:
>>
>> vdev 0x55f261d940f0 ("virtio-blk")
>> vq 0x55f261d9ee40 (idx 0)
>> inuse 128 vring.num 128
>> old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874
>> avail 0x3d87a800 avail_idx (cache bypassed) 58625
>
> Hi Paolo,
> Are you aware of any recent MemoryRegionCache issues? The avail_idx
> value 58874 was read via the cache while a non-cached read produces
> 58625!
>
> I suspect that 58625 is correct since the vring is already full and the
> driver wouldn't bump avail_idx any further until requests complete.
>
> Fernando also hits this issue with virtio-scsi so it's not a
> virtio_blk.ko driver bug or a virtio-blk device emulation issue.
No, I am not aware of any issues.
How can I get the core dump (and the corresponding executable to get the
symbols)? Alternatively, it should be enough to print the
vq->vring.caches->avail.mrs from the debugger.
Also, one possibility is to add in vring_avail_idx an assertion like
assert(vq->shadow_availa_idx == virtio_lduw_phys(vdev,
vq->vring.avail + offsetof(VRingAvail, idx)));
and try to catch the error earlier.
Paolo
> A QEMU core dump is available for debugging.
>
> Here is the patch that produced this debug output:
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index a1ff647a66..28d89fcbcb 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
> return NULL;
> }
> rcu_read_lock();
> + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx;
> if (virtio_queue_empty_rcu(vq)) {
> goto done;
> }
> @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
> max = vq->vring.num;
>
> if (vq->inuse >= vq->vring.num) {
> + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name);
> + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq));
> + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num);
> + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx);
> + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx)));
> + fprintf(stderr, "used_idx %u\n", vq->used_idx);
> + abort(); /* <--- core dump! */
> virtio_error(vdev, "Virtqueue size exceeded");
> goto done;
> }
>
> Stefan
>
>> used_idx 58497
>> 2019-02-18 03:20:08.605+0000: shutting down, reason=crashed
>>
>> The dump file, including guest memory, was generated successfully (after gzip the file is around 492MB).
>> I switched the guest now to virtio-scsi to get the information and dump with this setup as well.
>>
>> How should we proceed?
>>
>> Thanks.
>>
>> On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi failed suggests it's not a virtqueue element leak in the virtio-blk or virtio-scsi device emulation code. The hung task error messages from inside the guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error. QEMU refuses to process further requests after the error, causing tasks inside the guest to get stuck on I/O. I don't have a good theory regarding the root cause. Two ideas: 1. The guest is corrupting the vring or submitting more requests than will fit into the ring. Somewhat unlikely because it happens with both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the memory region cache which is used for fast guest RAM accesses. Here is an expanded version of the debug patch which might help identify which of these scenarios is likely. Sorry, it requires running the guest again! This time let's make QEMU dump core so both QEMU state and guest RAM are captured for further debugging. That way it will be possible to extract more information using gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if (virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; }
>>
>>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2019-02-20 17:54 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-14 21:56 [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error Fernando Casas Schössow
2017-06-16 6:58 ` Ladi Prosek
2017-06-16 10:11 ` Fernando Casas Schössow
2017-06-16 10:25 ` Ladi Prosek
2017-06-19 22:10 ` Fernando Casas Schössow
2017-06-20 5:59 ` Ladi Prosek
2017-06-20 6:30 ` Fernando Casas Schössow
2017-06-20 7:52 ` Ladi Prosek
2017-06-21 12:19 ` Fernando Casas Schössow
2017-06-22 7:43 ` Ladi Prosek
2017-06-23 6:29 ` Fernando Casas Schössow
[not found] ` <1498199343.2815.0@smtp-mail.outlook.com>
2017-06-24 8:34 ` Fernando Casas Schössow
2019-01-31 11:32 ` Fernando Casas Schössow
2019-02-01 5:48 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2019-02-01 8:17 ` Fernando Casas Schössow
2019-02-04 6:06 ` Stefan Hajnoczi
2019-02-04 7:24 ` Fernando Casas Schössow
[not found] ` <AM5PR0602MB32368CB5ADDEC05F42D8BC8FA46D0@AM5PR0602MB3236.eurprd06.prod.outlo ok.com>
2019-02-06 7:15 ` Fernando Casas Schössow
[not found] ` <AM5PR0602MB32368CB5ADDEC05F42D8BC8FA46D0@AM5PR0602MB3236.eurprd06.prod.outlo>
[not found] ` <VI1PR0602MB3245032D51A5DF45AF6E1952A46F0@VI1PR0602MB3245.eurprd06.prod.outlo ok.com>
2019-02-06 16:47 ` Fernando Casas Schössow
2019-02-11 3:17 ` Stefan Hajnoczi
2019-02-11 9:48 ` Fernando Casas Schössow
2019-02-18 7:21 ` Fernando Casas Schössow
[not found] ` <VI1PR0602MB3245424120D151F29884A7E2A4630@VI1PR0602MB3245.eurprd06.prod.outlo ok.com>
2019-02-19 7:26 ` Fernando Casas Schössow
2019-02-20 16:58 ` Stefan Hajnoczi
2019-02-20 17:53 ` Paolo Bonzini [this message]
2019-02-20 18:56 ` Fernando Casas Schössow
2019-02-21 11:11 ` Stefan Hajnoczi
2019-02-21 11:33 ` Fernando Casas Schössow
[not found] ` <VI1PR0602MB3245593855B029B427ED544FA47E0@VI1PR0602MB3245.eurprd06.prod.outlook.com>
[not found] ` <CAJSP0QUs9Yz2-k1KyVMwpgx6RwY9cK7qdQRCQ74xmgXJPJR-qw@mail.gmail.com>
[not found] ` <VI1PR0602MB32453A8B5CBC0308C7D18F1DA47E0@VI1PR0602MB3245.eurprd06.prod.outlook.com>
[not found] ` <CAJSP0QVxaW3tezjBN9owJHsxzE9h8_qcaeRr5zHHKxKJOeFnkQ@mail.gmail.com>
[not found] ` <CAJSP0QVXoZJ9MJ0qp4RM_m2fGJ8iFSyJMAU_X7mdiQvpOK59KA@mail.gmail.com>
[not found] ` <VI1PR0602MB324516419266A934FE7759C6A47E0@VI1PR0602MB3245.eurprd06.prod.outlook.com>
[not found] ` <VI1PR0602MB324516419266A934FE7759C6A47E0@VI1PR0602MB3245.eurprd06.prod.outlo>
[not found] ` <VI1PR0602MB32454C17192EFA863E29CC49A47E0@VI1PR0602MB3245.eurprd06.prod.outlo>
[not found] ` <VI1PR0602MB324547F72DA9EDEB1613C888A47E0@VI1PR0602MB3245.eurprd06.prod.outlook.com>
[not found] ` <CAJSP0QUg=cq3tCSLidQ9BR2hxAo3K6gA6LKtpx5Rjb=_6XgJ6Q@mail.gmail.com>
[not found] ` <28e6b4ed-9afd-3a79-6267-86c7385c23ce@redhat.com>
[not found] ` <VI1PR0602MB324578F91F1AF9390D03022FA47F0@VI1PR0602MB3245.eurprd06.prod.outlook.com>
2019-02-22 14:04 ` Stefan Hajnoczi
2019-02-22 14:38 ` Paolo Bonzini
2019-02-22 14:43 ` Fernando Casas Schössow
2019-02-22 14:55 ` Paolo Bonzini
2019-02-22 15:48 ` Fernando Casas Schössow
2019-02-22 16:37 ` Dr. David Alan Gilbert
2019-02-22 16:39 ` Paolo Bonzini
2019-02-22 16:47 ` Dr. David Alan Gilbert
2019-02-23 11:49 ` Natanael Copa
2019-02-26 13:30 ` Paolo Bonzini
2019-02-28 7:35 ` Fernando Casas Schössow
2019-02-23 15:55 ` Natanael Copa
2019-02-23 16:18 ` Peter Maydell
2019-02-25 10:24 ` Natanael Copa
2019-02-25 10:34 ` Peter Maydell
2019-02-25 12:15 ` Fernando Casas Schössow
2019-02-25 12:21 ` Natanael Copa
2019-02-25 13:06 ` Peter Maydell
2019-02-25 13:25 ` Natanael Copa
2019-02-25 13:32 ` Fernando Casas Schössow
[not found] ` <VI1PR0602MB3245A6B693B23DA2E0E8E500A47A0@VI1PR0602MB3245.eurprd06.prod.outlo ok.com>
2019-02-25 15:41 ` Fernando Casas Schössow
2019-02-28 9:58 ` Peter Maydell
2019-03-07 7:14 ` Fernando Casas Schössow
2019-02-23 16:21 ` Fernando Casas Schössow
2019-02-25 10:30 ` Stefan Hajnoczi
2019-02-25 10:33 ` Stefan Hajnoczi
2019-02-23 16:57 ` Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f168e72e-89fe-cf9b-89ff-3eb820f8a5cf@redhat.com \
--to=pbonzini@redhat.com \
--cc=casasfernando@outlook.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).