From: Stefan Hajnoczi <stefanha@gmail.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
Anthony Liguori <aliguori@us.ibm.com>,
qemu-devel@nongnu.org, Blue Swirl <blauwirbel@gmail.com>,
khoa@us.ibm.com, Stefan Hajnoczi <stefanha@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>, Asias He <asias@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v4 04/11] dataplane: add virtqueue vring code
Date: Wed, 5 Dec 2012 13:57:35 +0100 [thread overview]
Message-ID: <20121205125735.GC5892@stefanha-thinkpad.hitronhub.home> (raw)
In-Reply-To: <20121129125001.GB9372@redhat.com>
On Thu, Nov 29, 2012 at 02:50:01PM +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 22, 2012 at 04:16:45PM +0100, Stefan Hajnoczi wrote:
> > The virtio-blk-data-plane cannot access memory using the usual QEMU
> > functions since it executes outside the global mutex and the memory APIs
> > are this time are not thread-safe.
> >
> > This patch introduces a virtqueue module based on the kernel's vhost
> > vring code. The trick is that we map guest memory ahead of time and
> > access it cheaply outside the global mutex.
> >
> > Once the hardware emulation code can execute outside the global mutex it
> > will be possible to drop this code.
> >
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>
> Is there no way to factor out ommon code and share it with virtio.c?
I think we have touched on this in other sub-threads but for reference:
this code implements vring access outside the global mutex, which means
no QEMU memory API functions. Therefore it's hard to share the virtio.c
code which uses QEMU memory API functions.
The current work that Ping Fan Liu is doing will lead to thread-safe
memory accesses from device emulation code. At that point we can ditch
this and unify with virtio.c.
> > +/* This is stolen from linux-2.6/drivers/vhost/vhost.c. */
>
> Probably should document the version you based this on.
> Surely not really 2.6?
linux-2.6.git is still mirrored from linux.git :). I'll try to dig up
the specific Linux version that this code is based on.
> > +static int get_indirect(Vring *vring,
> > + struct iovec iov[], struct iovec *iov_end,
> > + unsigned int *out_num, unsigned int *in_num,
> > + struct vring_desc *indirect)
> > +{
> > + struct vring_desc desc;
> > + unsigned int i = 0, count, found = 0;
> > +
> > + /* Sanity check */
> > + if (unlikely(indirect->len % sizeof desc)) {
> > + error_report("Invalid length in indirect descriptor: "
> > + "len %#x not multiple of %#zx",
> > + indirect->len, sizeof desc);
> > + vring->broken = true;
> > + return -EFAULT;
> > + }
> > +
> > + count = indirect->len / sizeof desc;
> > + /* Buffers are chained via a 16 bit next field, so
> > + * we can have at most 2^16 of these. */
> > + if (unlikely(count > USHRT_MAX + 1)) {
> > + error_report("Indirect buffer length too big: %d", indirect->len);
> > + vring->broken = true;
> > + return -EFAULT;
> > + }
> > +
> > + /* Point to translate indirect desc chain */
> > + indirect = hostmem_lookup(&vring->hostmem, indirect->addr, indirect->len,
> > + false);
>
> This assumes an indirect buffer is contigious in qemu memory
> which seems wrong since unlike vring itself
> there are no alignment requirements.
Let's break this up into one hostmem_lookup() per descriptor. In other
words, don't try to lookup the entire indirect buffer but copy-in one
descriptor at a time.
> Overriding indirect here also seems unnecessarily tricky.
You are right, let's use a separate local variable to make the code
clearer.
> > +int vring_pop(VirtIODevice *vdev, Vring *vring,
> > + struct iovec iov[], struct iovec *iov_end,
> > + unsigned int *out_num, unsigned int *in_num)
> > +{
> > + struct vring_desc desc;
> > + unsigned int i, head, found = 0, num = vring->vr.num;
> > + uint16_t avail_idx, last_avail_idx;
> > +
> > + /* If there was a fatal error then refuse operation */
> > + if (vring->broken) {
> > + return -EFAULT;
> > + }
> > +
> > + /* Check it isn't doing very strange things with descriptor numbers. */
> > + last_avail_idx = vring->last_avail_idx;
> > + avail_idx = vring->vr.avail->idx;
>
> I think something needs to be done here to force
> a read otherwise two accesses to avail_idx
> below can cause two reads from the ring and
> could return inconsistent results.
There is no function call or anything in between that forces the
compiler to load the value of avail_idx and reuse it.
So I think you're right. I'm not 100% sure a read barrier forces the
compiler to load here since the following code just manipulates
last_avail_idx and avail_idx.
Any ideas?
> > + if (vdev->guest_features & (1 << VIRTIO_RING_F_EVENT_IDX)) {
> > + vring_avail_event(&vring->vr) = vring->vr.avail->idx;
>
> No barrier here?
> I also don't see similar code in vhost - why is it a good idea?
This is from hw/virtio.c:virtqueue_pop(). We know there is at least one
request available, we're hinting to the guest to not to bother
notifying any requests up to the latest.
However, setting avail_event to the current vring avail_idx only helps
if we get lucky and process the vring *before* the guest decides to
notify a whole bunch of requests it has just enqueued.
So this doesn't seem incorrect but the performance benefit is
questionable.
Do you remember why you wrote this code? The commit is:
commit bcbabae8ff7f7ec114da9fe2aa7f25f420f35306
Author: Michael S. Tsirkin <mst@redhat.com>
Date: Sun Jun 12 16:21:57 2011 +0300
virtio: event index support
Add support for event_idx feature, and utilize it to
reduce the number of interrupts and exits for the guest.
Stefan
next prev parent reply other threads:[~2012-12-05 12:57 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-22 15:16 [Qemu-devel] [PATCH v4 00/11] virtio: virtio-blk data plane Stefan Hajnoczi
2012-11-22 15:16 ` [Qemu-devel] [PATCH v4 01/11] raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane Stefan Hajnoczi
2012-11-22 15:16 ` [Qemu-devel] [PATCH v4 02/11] configure: add CONFIG_VIRTIO_BLK_DATA_PLANE Stefan Hajnoczi
2012-11-22 15:16 ` [Qemu-devel] [PATCH v4 03/11] dataplane: add host memory mapping code Stefan Hajnoczi
2012-11-29 12:33 ` Michael S. Tsirkin
2012-11-29 12:45 ` Stefan Hajnoczi
2012-11-29 12:54 ` Michael S. Tsirkin
2012-11-29 12:57 ` Michael S. Tsirkin
2012-12-05 8:13 ` Stefan Hajnoczi
2012-11-29 13:54 ` Michael S. Tsirkin
2012-11-29 14:26 ` Stefan Hajnoczi
2012-11-29 14:36 ` Michael S. Tsirkin
2012-11-29 15:26 ` Paolo Bonzini
2012-12-05 8:31 ` Stefan Hajnoczi
2012-12-05 11:22 ` Michael S. Tsirkin
2012-11-22 15:16 ` [Qemu-devel] [PATCH v4 04/11] dataplane: add virtqueue vring code Stefan Hajnoczi
2012-11-29 12:50 ` Michael S. Tsirkin
2012-11-29 15:17 ` Paolo Bonzini
2012-12-05 12:57 ` Stefan Hajnoczi [this message]
2012-11-29 13:48 ` Michael S. Tsirkin
2012-11-22 15:16 ` [Qemu-devel] [PATCH v4 05/11] dataplane: add event loop Stefan Hajnoczi
2012-11-22 15:16 ` [Qemu-devel] [PATCH v4 06/11] dataplane: add Linux AIO request queue Stefan Hajnoczi
2012-11-22 15:16 ` [Qemu-devel] [PATCH v4 07/11] iov: add iov_discard() to remove data Stefan Hajnoczi
2012-11-22 15:16 ` [Qemu-devel] [PATCH v4 08/11] test-iov: add iov_discard() testcase Stefan Hajnoczi
2012-11-22 15:16 ` [Qemu-devel] [PATCH v4 09/11] iov: add qemu_iovec_concat_iov() Stefan Hajnoczi
2012-11-22 15:16 ` [Qemu-devel] [PATCH v4 10/11] dataplane: add virtio-blk data plane code Stefan Hajnoczi
2012-11-29 13:41 ` Michael S. Tsirkin
2012-11-29 14:02 ` Michael S. Tsirkin
2012-11-29 15:21 ` Paolo Bonzini
2012-11-29 15:27 ` Michael S. Tsirkin
2012-11-29 15:47 ` Paolo Bonzini
2012-11-30 13:57 ` Stefan Hajnoczi
2012-11-22 15:16 ` [Qemu-devel] [PATCH v4 11/11] virtio-blk: add x-data-plane=on|off performance feature Stefan Hajnoczi
2012-11-29 13:12 ` Michael S. Tsirkin
2012-11-29 14:45 ` Stefan Hajnoczi
2012-11-29 14:55 ` Michael S. Tsirkin
2012-12-04 11:20 ` Michael S. Tsirkin
2012-12-04 14:19 ` Stefan Hajnoczi
2012-11-29 9:18 ` [Qemu-devel] [PATCH v4 00/11] virtio: virtio-blk data plane Stefan Hajnoczi
2012-11-29 12:03 ` Paolo Bonzini
2012-11-29 14:09 ` Michael S. Tsirkin
2012-11-29 14:48 ` Stefan Hajnoczi
2012-11-29 15:19 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121205125735.GC5892@stefanha-thinkpad.hitronhub.home \
--to=stefanha@gmail.com \
--cc=aliguori@us.ibm.com \
--cc=asias@redhat.com \
--cc=blauwirbel@gmail.com \
--cc=khoa@us.ibm.com \
--cc=kwolf@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).