From: Stefan Hajnoczi <stefanha@gmail.com>
To: "Fernando Casas Schössow" <casasfernando@outlook.com>
Cc: qemu-devel <qemu-devel@nongnu.org>,
"qemu-block@nongnu.org" <qemu-block@nongnu.org>
Subject: Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error
Date: Mon, 11 Feb 2019 11:17:25 +0800 [thread overview]
Message-ID: <20190211031725.GB18083@stefanha-x1.localdomain> (raw)
In-Reply-To: <VI1PR0602MB3245FD0A939EC3E28CA0F740A46F0@VI1PR0602MB3245.eurprd06.prod.outlook.com>
[-- Attachment #1: Type: text/plain, Size: 6120 bytes --]
On Wed, Feb 06, 2019 at 04:47:19PM +0000, Fernando Casas Schössow wrote:
> I could also repro the same with virtio-scsi on the same guest a couple of hours later:
>
> 2019-02-06 07:10:37.672+0000: starting up libvirt version: 4.10.0, qemu version: 3.1.0, kernel: 4.19.18-0-vanilla, hostname: vmsvr01.homenet.local
> LC_ALL=C PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin HOME=/root USER=root QEMU_AUDIO_DRV=spice /home/fernando/qemu-system-x86_64 -name guest=DOCKER01,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-32-DOCKER01/master-key.aes -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,umip=on,xsaveopt=on -drive file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=46,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on -netdev tap,fd=48,id=hostnet0,vhost=on,vhostfd=50 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=51,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -spice port=5904,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
> 2019-02-06 07:10:37.672+0000: Domain id=32 is tainted: high-privileges
> char device redirected to /dev/pts/5 (label charserial0)
> vdev 0x5585456ef6b0 ("virtio-scsi")
> vq 0x5585456f90a0 (idx 2)
> inuse 128 vring.num 128
> 2019-02-06T13:00:46.942424Z qemu-system-x86_64: Virtqueue size exceeded
>
>
> I'm open to any tests or suggestions that can move the investigation forward and find the cause of this issue.
Thanks for collecting the data!
The fact that both virtio-blk and virtio-scsi failed suggests it's not a
virtqueue element leak in the virtio-blk or virtio-scsi device emulation
code.
The hung task error messages from inside the guest are a consequence of
QEMU hitting the "Virtqueue size exceeded" error. QEMU refuses to
process further requests after the error, causing tasks inside the guest
to get stuck on I/O.
I don't have a good theory regarding the root cause. Two ideas:
1. The guest is corrupting the vring or submitting more requests than
will fit into the ring. Somewhat unlikely because it happens with
both Windows and Linux guests.
2. QEMU's virtqueue code is buggy, maybe the memory region cache which
is used for fast guest RAM accesses.
Here is an expanded version of the debug patch which might help identify
which of these scenarios is likely. Sorry, it requires running the
guest again!
This time let's make QEMU dump core so both QEMU state and guest RAM are
captured for further debugging. That way it will be possible to extract
more information using gdb without rerunning.
Stefan
---
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index a1ff647a66..28d89fcbcb 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -866,6 +866,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
return NULL;
}
rcu_read_lock();
+ uint16_t old_shadow_avail_idx = vq->shadow_avail_idx;
if (virtio_queue_empty_rcu(vq)) {
goto done;
}
@@ -879,6 +880,12 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
max = vq->vring.num;
if (vq->inuse >= vq->vring.num) {
+ fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name);
+ fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq));
+ fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num);
+ fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx);
+ fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, idx)));
+ fprintf(stderr, "used_idx %u\n", vq->used_idx);
+ abort(); /* <--- core dump! */
virtio_error(vdev, "Virtqueue size exceeded");
goto done;
}
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
next prev parent reply other threads:[~2019-02-11 3:20 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-14 21:56 [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error Fernando Casas Schössow
2017-06-16 6:58 ` Ladi Prosek
2017-06-16 10:11 ` Fernando Casas Schössow
2017-06-16 10:25 ` Ladi Prosek
2017-06-19 22:10 ` Fernando Casas Schössow
2017-06-20 5:59 ` Ladi Prosek
2017-06-20 6:30 ` Fernando Casas Schössow
2017-06-20 7:52 ` Ladi Prosek
2017-06-21 12:19 ` Fernando Casas Schössow
2017-06-22 7:43 ` Ladi Prosek
2017-06-23 6:29 ` Fernando Casas Schössow
[not found] ` <1498199343.2815.0@smtp-mail.outlook.com>
2017-06-24 8:34 ` Fernando Casas Schössow
2019-01-31 11:32 ` Fernando Casas Schössow
2019-02-01 5:48 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2019-02-01 8:17 ` Fernando Casas Schössow
2019-02-04 6:06 ` Stefan Hajnoczi
2019-02-04 7:24 ` Fernando Casas Schössow
[not found] ` <AM5PR0602MB32368CB5ADDEC05F42D8BC8FA46D0@AM5PR0602MB3236.eurprd06.prod.outlo ok.com>
2019-02-06 7:15 ` Fernando Casas Schössow
[not found] ` <AM5PR0602MB32368CB5ADDEC05F42D8BC8FA46D0@AM5PR0602MB3236.eurprd06.prod.outlo>
[not found] ` <VI1PR0602MB3245032D51A5DF45AF6E1952A46F0@VI1PR0602MB3245.eurprd06.prod.outlo ok.com>
2019-02-06 16:47 ` Fernando Casas Schössow
2019-02-11 3:17 ` Stefan Hajnoczi [this message]
2019-02-11 9:48 ` Fernando Casas Schössow
2019-02-18 7:21 ` Fernando Casas Schössow
[not found] ` <VI1PR0602MB3245424120D151F29884A7E2A4630@VI1PR0602MB3245.eurprd06.prod.outlo ok.com>
2019-02-19 7:26 ` Fernando Casas Schössow
2019-02-20 16:58 ` Stefan Hajnoczi
2019-02-20 17:53 ` Paolo Bonzini
2019-02-20 18:56 ` Fernando Casas Schössow
2019-02-21 11:11 ` Stefan Hajnoczi
2019-02-21 11:33 ` Fernando Casas Schössow
[not found] ` <VI1PR0602MB3245593855B029B427ED544FA47E0@VI1PR0602MB3245.eurprd06.prod.outlook.com>
[not found] ` <CAJSP0QUs9Yz2-k1KyVMwpgx6RwY9cK7qdQRCQ74xmgXJPJR-qw@mail.gmail.com>
[not found] ` <VI1PR0602MB32453A8B5CBC0308C7D18F1DA47E0@VI1PR0602MB3245.eurprd06.prod.outlook.com>
[not found] ` <CAJSP0QVxaW3tezjBN9owJHsxzE9h8_qcaeRr5zHHKxKJOeFnkQ@mail.gmail.com>
[not found] ` <CAJSP0QVXoZJ9MJ0qp4RM_m2fGJ8iFSyJMAU_X7mdiQvpOK59KA@mail.gmail.com>
[not found] ` <VI1PR0602MB324516419266A934FE7759C6A47E0@VI1PR0602MB3245.eurprd06.prod.outlook.com>
[not found] ` <VI1PR0602MB324516419266A934FE7759C6A47E0@VI1PR0602MB3245.eurprd06.prod.outlo>
[not found] ` <VI1PR0602MB32454C17192EFA863E29CC49A47E0@VI1PR0602MB3245.eurprd06.prod.outlo>
[not found] ` <VI1PR0602MB324547F72DA9EDEB1613C888A47E0@VI1PR0602MB3245.eurprd06.prod.outlook.com>
[not found] ` <CAJSP0QUg=cq3tCSLidQ9BR2hxAo3K6gA6LKtpx5Rjb=_6XgJ6Q@mail.gmail.com>
[not found] ` <28e6b4ed-9afd-3a79-6267-86c7385c23ce@redhat.com>
[not found] ` <VI1PR0602MB324578F91F1AF9390D03022FA47F0@VI1PR0602MB3245.eurprd06.prod.outlook.com>
2019-02-22 14:04 ` Stefan Hajnoczi
2019-02-22 14:38 ` Paolo Bonzini
2019-02-22 14:43 ` Fernando Casas Schössow
2019-02-22 14:55 ` Paolo Bonzini
2019-02-22 15:48 ` Fernando Casas Schössow
2019-02-22 16:37 ` Dr. David Alan Gilbert
2019-02-22 16:39 ` Paolo Bonzini
2019-02-22 16:47 ` Dr. David Alan Gilbert
2019-02-23 11:49 ` Natanael Copa
2019-02-26 13:30 ` Paolo Bonzini
2019-02-28 7:35 ` Fernando Casas Schössow
2019-02-23 15:55 ` Natanael Copa
2019-02-23 16:18 ` Peter Maydell
2019-02-25 10:24 ` Natanael Copa
2019-02-25 10:34 ` Peter Maydell
2019-02-25 12:15 ` Fernando Casas Schössow
2019-02-25 12:21 ` Natanael Copa
2019-02-25 13:06 ` Peter Maydell
2019-02-25 13:25 ` Natanael Copa
2019-02-25 13:32 ` Fernando Casas Schössow
[not found] ` <VI1PR0602MB3245A6B693B23DA2E0E8E500A47A0@VI1PR0602MB3245.eurprd06.prod.outlo ok.com>
2019-02-25 15:41 ` Fernando Casas Schössow
2019-02-28 9:58 ` Peter Maydell
2019-03-07 7:14 ` Fernando Casas Schössow
2019-02-23 16:21 ` Fernando Casas Schössow
2019-02-25 10:30 ` Stefan Hajnoczi
2019-02-25 10:33 ` Stefan Hajnoczi
2019-02-23 16:57 ` Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190211031725.GB18083@stefanha-x1.localdomain \
--to=stefanha@gmail.com \
--cc=casasfernando@outlook.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).