From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: kvm-devel <kvm@vger.kernel.org>,
Stefan Hajnoczi <stefanha@gmail.com>,
qemu-devel <qemu-devel@nongnu.org>,
target-devel <target-devel@vger.kernel.org>,
Hannes Reinecke <hare@suse.de>,
Anthony Liguori <anthony@codemonkey.ws>,
Jan Kiszka <jan.kiszka@siemens.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Christoph Hellwig <hch@lst.de>, Avi Kivity <avi@redhat.com>
Subject: Re: [Qemu-devel] vhost-scsi port to v1.1.0 + MSI-X performance regression
Date: Mon, 13 Aug 2012 21:15:38 +0300 [thread overview]
Message-ID: <20120813181538.GA19460@redhat.com> (raw)
In-Reply-To: <1343161256.1813.87.camel@haakon2.linux-iscsi.org>
On Tue, Jul 24, 2012 at 01:20:56PM -0700, Nicholas A. Bellinger wrote:
> On Tue, 2012-07-24 at 09:57 +0200, Jan Kiszka wrote:
> > On 2012-07-24 09:42, Nicholas A. Bellinger wrote:
> > > Hi Anthony, Stefan & QEMU folks,
> > >
>
> <SNIP>
>
> > > However, thus far I've not been able to get virtio-scsi <-> tcm_vhost
> > > I/O to actually work against the latest qemu.git/master..
> > >
> > > So while doing a (manual) bisection w/ this series to track down the
> > > issue with qemu/master, I managed to run across something else.. With
> > > the vhost-scsi series applied, everything is working as expected up
> > > until the following commit:
> > >
> > > commit 1523ed9e1d46b0b54540049d491475ccac7e6421
> > > Author: Jan Kiszka <jan.kiszka@siemens.com>
> > > Date: Thu May 17 10:32:39 2012 -0300
> > >
> > > virtio/vhost: Add support for KVM in-kernel MSI injection
> > >
> > >
> > > This commit ends up triggering the following assert immediately after
> > > starting qemu with virtio-scsi <-> tcm_vhost:
> > >
> > > qemu-system-x86_64: /usr/src/qemu.git/hw/msix.c:515:
> > > msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier &&
> > > dev->msix_vector_release_notifier' failed.
> > >
> > > OK, so adding the following hack allows me to boot:
> > >
> > > diff --git a/hw/msix.c b/hw/msix.c
> > > index 59c7a83..6036909 100644
> > > --- a/hw/msix.c
> > > +++ b/hw/msix.c
> > > @@ -511,6 +511,11 @@ void msix_unset_vector_notifiers(PCIDevice *dev)
> > > {
> > > int vector;
> > >
> > > + if (!dev->msix_vector_use_notifier && !dev->msix_vector_release_notifier) {
> > > + printf("Hit NULL msix_unset_vector_notifiers for: %s\n", dev->name);
> > > + return;
> > > + }
> > > +
> > > assert(dev->msix_vector_use_notifier &&
> > > dev->msix_vector_release_notifier);
> > >
> > > --
> >
> > Can you post a backtrace from gdb?
> >
>
> Sure, w/o the above patch the backtrace with commit 1523ed9e1d looks
> like the following:
>
> (gdb) run
> Starting program: /usr/src/qemu.git/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -smp 2 -m 2048 -serial file:/tmp/vhost-serial.txt -hda /usr/src/qemu-vhost.git/debian_squeeze_amd64_standard-old.qcow2 -vhost-scsi id=vhost-scsi0,wwpn=naa.600140579ad21088,tpgt=1 -device virtio-scsi-pci,vhost-scsi=vhost-scsi0,event_idx=off
> [Thread debugging using libthread_db enabled]
> wwpn = "vhost-scsi0" tpgt = "1"
> [New Thread 0x7ffff45f8700 (LWP 26508)]
> [New Thread 0x7ffff3bf6700 (LWP 26509)]
> [New Thread 0x7ffff33f5700 (LWP 26510)]
> vhost_scsi_stop
> Failed to clear endpoint
> qemu-system-x86_64: /usr/src/qemu.git/hw/msix.c:515: msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' failed.
>
> Program received signal SIGABRT, Aborted.
> 0x00007ffff5e8b165 in raise () from /lib/libc.so.6
> (gdb) bt
> #0 0x00007ffff5e8b165 in raise () from /lib/libc.so.6
> #1 0x00007ffff5e8df70 in abort () from /lib/libc.so.6
> #2 0x00007ffff5e842b1 in __assert_fail () from /lib/libc.so.6
> #3 0x00000000004a84a1 in msix_unset_vector_notifiers (dev=0x1463a70) at /usr/src/qemu.git/hw/msix.c:514
> #4 0x00000000004d2865 in virtio_pci_set_guest_notifiers (opaque=0x6788, assign=136)
> at /usr/src/qemu.git/hw/virtio-pci.c:703
> #5 0x000000000062955f in vhost_dev_stop (hdev=0x126c8a8, vdev=0x1465220) at /usr/src/qemu.git/hw/vhost.c:954
> #6 0x0000000000628989 in vhost_scsi_stop (vs=0x126c890, vdev=0x1465220) at /usr/src/qemu.git/hw/vhost-scsi.c:115
> #7 0x000000000062f5c9 in virtio_scsi_set_status (vdev=0x1465220, val=<value optimized out>)
> at /usr/src/qemu.git/hw/virtio-scsi.c:631
> #8 0x0000000000632082 in virtio_set_status (vdev=0x1465220, val=136 '\210') at /usr/src/qemu.git/hw/virtio.c:507
> #9 0x0000000000633410 in virtio_reset (opaque=0x6788) at /usr/src/qemu.git/hw/virtio.c:517
This is strange, code shows virtio_reset calls virtio_set_status
with value 0. Why is it 136 here?
Stack corruption?
We actually did have a memory corruptor with virtio scsi:
You sent patch
virtio-scsi: Fix incorrect VirtIOSCSI->cmd_vqs[0] definition
maybe the bug goes away if you apply that?
> #10 0x00000000004d30a9 in virtio_pci_reset (d=0x1463a70) at /usr/src/qemu.git/hw/virtio-pci.c:280
> #11 0x00000000004fc909 in qdev_reset_one (dev=0x6788, opaque=0x6788) at /usr/src/qemu.git/hw/qdev.c:207
> #12 0x00000000004fc670 in qdev_walk_children (dev=0x1463a70, devfn=0x4fc8f0 <qdev_reset_one>,
> busfn=0x4fc510 <qbus_reset_one>, opaque=0x0) at /usr/src/qemu.git/hw/qdev.c:372
> #13 0x00000000004ae43d in pci_device_reset (dev=0x6788) at /usr/src/qemu.git/hw/pci.c:163
> #14 0x00000000004ae64f in pci_bus_reset (bus=0x1415bd0) at /usr/src/qemu.git/hw/pci.c:206
> #15 0x00000000004ae699 in pcibus_reset (qbus=0x6788) at /usr/src/qemu.git/hw/pci.c:213
> #16 0x00000000004fc710 in qbus_walk_children (bus=0x1415bd0, devfn=0x4fc8f0 <qdev_reset_one>, busfn=0x6, opaque=0x0)
> at /usr/src/qemu.git/hw/qdev.c:349
> #17 0x00000000004fc6a3 in qdev_walk_children (dev=<value optimized out>, devfn=0x4fc8f0 <qdev_reset_one>,
> busfn=0x4fc510 <qbus_reset_one>, opaque=0x0) at /usr/src/qemu.git/hw/qdev.c:379
> #18 0x00000000004fc745 in qbus_walk_children (bus=<value optimized out>, devfn=0x4fc8f0 <qdev_reset_one>,
> busfn=0x4fc510 <qbus_reset_one>, opaque=0x0) at /usr/src/qemu.git/hw/qdev.c:356
> #19 0x00000000004d5822 in qemu_system_reset (report=false) at /usr/src/qemu.git/vl.c:1412
> #20 0x00000000004d70bb in main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>)
> at /usr/src/qemu.git/vl.c:3647
> (gdb)
>
>
> > Also, is there a git tree and a way to reproduce this without special
> > hardware needs?
> >
>
> I'll push this series + branches to demonstrate the issue into an public
> tree this afternoon.
>
> Also, the particular backend is a Fusion-IO raw block flash device, but
> I'm pretty sure that using a TCM RAMDISK into tcm_vhost would exhibit
> the same type of behavior. (Will double check on that shortly..)
>
> > >
> > > and virtio-scsi is then able to load + detect tcm_vhost LUNs as
> > > expected.
> > >
> > > However the random I/O performance with commit 1523ed9e1d46b is off by a
> > > couple of orders of magnitude, ~6K IOPs compared to ~60K IOPs on raw
> > > block flash using just the previous commit bdd00bdc64ba in Jan's series.
> > >
> > > So AFAICT there appears to be a serious performance regression that is
> > > easily reproducible with that patch, which is about as far along as I've
> > > been able to diagnose yet.
> > >
> > > Interestingly enough, virtio-scsi-raw performance does not seem to be
> > > effected AFAICT by this regression, and is still able to go ~20K IOPs
> > > with the same workload using commit 1523ed9e1d46b. (Roughly the same as
> > > before)
> > >
> > > Does anyone have any idea why commit 1523ed9e1d46b would be killing
> > > vhost / tcm_vhost performance so terribly, or is there something else
> > > that vhost / vhost-scsi should be doing with new code..?
> >
> > No good idea yet, will have to look closer.
> >
>
> <nod>, thanks for your help Jan. ;)
>
> > Maybe you are somehow deassigning (via set_guest_notifiers) before
> > assigning. But that would not yet explain performance regressions. Your
> > target is exposing MSI-X, isn't it?
> >
>
> I believe that is correct.
>
> --nab
prev parent reply other threads:[~2012-08-13 18:14 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-24 7:42 [Qemu-devel] vhost-scsi port to v1.1.0 + MSI-X performance regression Nicholas A. Bellinger
2012-07-24 7:57 ` Jan Kiszka
2012-07-24 12:05 ` Stefan Hajnoczi
2012-07-24 12:10 ` Jan Kiszka
2012-07-24 20:20 ` Nicholas A. Bellinger
2012-07-24 21:43 ` Nicholas A. Bellinger
2012-07-25 9:26 ` Jan Kiszka
2012-08-13 18:15 ` Michael S. Tsirkin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120813181538.GA19460@redhat.com \
--to=mst@redhat.com \
--cc=anthony@codemonkey.ws \
--cc=avi@redhat.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jan.kiszka@siemens.com \
--cc=kvm@vger.kernel.org \
--cc=nab@linux-iscsi.org \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
--cc=target-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).