From: Avi Kivity <avi@redhat.com>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: markmc@redhat.com, kvm@vger.kernel.org,
"Michael S. Tsirkin" <mst@redhat.com>,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
Sasha Levin <levinsasha928@gmail.com>
Subject: Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
Date: Wed, 07 Dec 2011 15:37:48 +0200 [thread overview]
Message-ID: <4EDF6C2C.3050604@redhat.com> (raw)
In-Reply-To: <87pqg1kiuu.fsf@rustcorp.com.au>
On 12/06/2011 02:03 PM, Rusty Russell wrote:
> On Tue, 06 Dec 2011 11:58:21 +0200, Avi Kivity <avi@redhat.com> wrote:
> > On 12/06/2011 07:07 AM, Rusty Russell wrote:
> > > Yes, but the hypervisor/trusted party would simply have to do the copy;
> > > the rings themselves would be shared A would say "copy this to/from B's
> > > ring entry N" and you know that A can't have changed B's entry.
> >
> > Sorry, I don't follow. How can the rings be shared? If A puts a gpa in
> > A's address space into the ring, there's no way B can do anything with
> > it, it's an opaque number. Xen solves this with an extra layer of
> > indirection (grant table handles) that cost extra hypercalls to map or
> > copy.
>
> It's not symmetric. B can see the desc and avail pages R/O, and the
> used page R/W. It needs to ask the something to copy in/out of
> descriptors, though, because they're an opaque number, and it doesn't
> have access. ie. the existence of the descriptor in the ring *implies*
> a grant.
>
> Perhaps this could be generalized further into a "connect these two
> rings", but I'm not sure. Descriptors with both read and write parts
> are tricky.
Okay, I was using a wrong mental model of how this works. B must be
aware of the translation from A's address space into B. Both qemu and
the kernel can do this on their own, but if B is another guest, then it
cannot do this except by calling H.
vhost-copy cannot work fully transparently, because you need some memory
to copy into. Maybe we can have a pci device with a large BAR that
contains buffers for copying, and also a translation from A addresses
into B addresses. It would work something like this:
A prepares a request with both out and in buffers
vhost-copy allocates memory in B's virtio-copy BAR, copies (using a
DMA engine) the out buffers into it, and rewrites the out descriptors to
contain B addresses
B services the request, and updates the in addresses in the
descriptors to point at B memory
vhost-copy copies (using a DMA engine) the in buffers into A memory
> > > I'm just not sure how the host would even know to hint.
> >
> > For JBOD storage, a good rule of thumb is (number of spindles) x 3.
> > With less, you might leave an idle spindle; with more, you're just
> > adding latency. This assumes you're using indirects so ring entry ==
> > request. The picture is muddier with massive battery-backed RAID
> > controllers or flash.
> >
> > For networking, you want (ring size) * min(expected packet size, page
> > size) / (link bandwidth) to be something that doesn't get the
> > bufferbloat people after your blood.
>
> OK, so while neither side knows, the host knows slightly more.
>
> Now I think about it, from a spec POV, saying it's a "hint" is useless,
> as it doesn't tell the driver what to do with it. I'll say it's a
> maximum, which keeps it simple.
>
Those rules of thumb always have exceptions, I'd say it's the default
that the guest can override.
--
error compiling committee.c: too many arguments to function
WARNING: multiple messages have this Message-ID (diff)
From: Avi Kivity <avi@redhat.com>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
Sasha Levin <levinsasha928@gmail.com>,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
markmc@redhat.com
Subject: Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
Date: Wed, 07 Dec 2011 15:37:48 +0200 [thread overview]
Message-ID: <4EDF6C2C.3050604@redhat.com> (raw)
In-Reply-To: <87pqg1kiuu.fsf@rustcorp.com.au>
On 12/06/2011 02:03 PM, Rusty Russell wrote:
> On Tue, 06 Dec 2011 11:58:21 +0200, Avi Kivity <avi@redhat.com> wrote:
> > On 12/06/2011 07:07 AM, Rusty Russell wrote:
> > > Yes, but the hypervisor/trusted party would simply have to do the copy;
> > > the rings themselves would be shared A would say "copy this to/from B's
> > > ring entry N" and you know that A can't have changed B's entry.
> >
> > Sorry, I don't follow. How can the rings be shared? If A puts a gpa in
> > A's address space into the ring, there's no way B can do anything with
> > it, it's an opaque number. Xen solves this with an extra layer of
> > indirection (grant table handles) that cost extra hypercalls to map or
> > copy.
>
> It's not symmetric. B can see the desc and avail pages R/O, and the
> used page R/W. It needs to ask the something to copy in/out of
> descriptors, though, because they're an opaque number, and it doesn't
> have access. ie. the existence of the descriptor in the ring *implies*
> a grant.
>
> Perhaps this could be generalized further into a "connect these two
> rings", but I'm not sure. Descriptors with both read and write parts
> are tricky.
Okay, I was using a wrong mental model of how this works. B must be
aware of the translation from A's address space into B. Both qemu and
the kernel can do this on their own, but if B is another guest, then it
cannot do this except by calling H.
vhost-copy cannot work fully transparently, because you need some memory
to copy into. Maybe we can have a pci device with a large BAR that
contains buffers for copying, and also a translation from A addresses
into B addresses. It would work something like this:
A prepares a request with both out and in buffers
vhost-copy allocates memory in B's virtio-copy BAR, copies (using a
DMA engine) the out buffers into it, and rewrites the out descriptors to
contain B addresses
B services the request, and updates the in addresses in the
descriptors to point at B memory
vhost-copy copies (using a DMA engine) the in buffers into A memory
> > > I'm just not sure how the host would even know to hint.
> >
> > For JBOD storage, a good rule of thumb is (number of spindles) x 3.
> > With less, you might leave an idle spindle; with more, you're just
> > adding latency. This assumes you're using indirects so ring entry ==
> > request. The picture is muddier with massive battery-backed RAID
> > controllers or flash.
> >
> > For networking, you want (ring size) * min(expected packet size, page
> > size) / (link bandwidth) to be something that doesn't get the
> > bufferbloat people after your blood.
>
> OK, so while neither side knows, the host knows slightly more.
>
> Now I think about it, from a spec POV, saying it's a "hint" is useless,
> as it doesn't tell the driver what to do with it. I'll say it's a
> maximum, which keeps it simple.
>
Those rules of thumb always have exceptions, I'd say it's the default
that the guest can override.
--
error compiling committee.c: too many arguments to function
next prev parent reply other threads:[~2011-12-07 13:37 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-29 9:33 [PATCH] virtio-ring: Use threshold for switching to indirect descriptors Sasha Levin
2011-11-29 9:33 ` Sasha Levin
2011-11-29 12:56 ` Michael S. Tsirkin
2011-11-29 12:56 ` Michael S. Tsirkin
2011-11-29 13:34 ` Sasha Levin
2011-11-29 13:34 ` Sasha Levin
2011-11-29 13:54 ` Michael S. Tsirkin
2011-11-29 13:54 ` Michael S. Tsirkin
2011-11-29 14:21 ` Sasha Levin
2011-11-29 14:21 ` Sasha Levin
2011-11-29 14:54 ` Michael S. Tsirkin
2011-11-29 14:54 ` Michael S. Tsirkin
2011-11-29 14:58 ` Avi Kivity
2011-11-29 14:58 ` Avi Kivity
2011-11-30 16:11 ` Sasha Levin
2011-11-30 16:11 ` Sasha Levin
2011-11-30 16:17 ` Sasha Levin
2011-11-30 16:17 ` Sasha Levin
2011-12-01 2:42 ` Rusty Russell
2011-12-01 2:42 ` Rusty Russell
2011-12-01 7:58 ` Michael S. Tsirkin
2011-12-01 7:58 ` Michael S. Tsirkin
2011-12-01 8:09 ` Sasha Levin
2011-12-01 8:09 ` Sasha Levin
2011-12-01 10:26 ` Michael S. Tsirkin
2011-12-01 10:26 ` Michael S. Tsirkin
2011-12-02 0:46 ` Rusty Russell
2011-12-02 0:46 ` Rusty Russell
2011-12-03 11:50 ` Sasha Levin
2011-12-03 11:50 ` Sasha Levin
2011-12-04 11:06 ` Michael S. Tsirkin
2011-12-04 11:06 ` Michael S. Tsirkin
2011-12-04 15:15 ` Michael S. Tsirkin
2011-12-04 15:15 ` Michael S. Tsirkin
2011-12-04 11:52 ` Avi Kivity
2011-12-04 11:52 ` Avi Kivity
2011-12-04 12:01 ` Michael S. Tsirkin
2011-12-04 12:01 ` Michael S. Tsirkin
2011-12-04 12:06 ` Avi Kivity
2011-12-04 12:06 ` Avi Kivity
2011-12-04 15:11 ` Michael S. Tsirkin
2011-12-04 15:11 ` Michael S. Tsirkin
2011-12-04 15:16 ` Avi Kivity
2011-12-04 15:16 ` Avi Kivity
2011-12-04 16:00 ` Michael S. Tsirkin
2011-12-04 16:00 ` Michael S. Tsirkin
2011-12-04 16:33 ` Avi Kivity
2011-12-04 16:33 ` Avi Kivity
2011-12-05 0:10 ` Rusty Russell
2011-12-05 0:10 ` Rusty Russell
2011-12-05 9:52 ` Avi Kivity
2011-12-05 9:52 ` Avi Kivity
2011-12-06 5:07 ` Rusty Russell
2011-12-06 5:07 ` Rusty Russell
2011-12-06 9:58 ` Avi Kivity
2011-12-06 9:58 ` Avi Kivity
2011-12-06 12:03 ` Rusty Russell
2011-12-06 12:03 ` Rusty Russell
2011-12-07 13:37 ` Avi Kivity [this message]
2011-12-07 13:37 ` Avi Kivity
2011-12-04 12:13 ` Sasha Levin
2011-12-04 12:13 ` Sasha Levin
2011-12-04 16:22 ` Michael S. Tsirkin
2011-12-04 16:22 ` Michael S. Tsirkin
2011-12-04 17:34 ` Sasha Levin
2011-12-04 17:34 ` Sasha Levin
2011-12-04 17:37 ` Avi Kivity
2011-12-04 17:37 ` Avi Kivity
2011-12-04 17:39 ` Sasha Levin
2011-12-04 17:39 ` Sasha Levin
2011-12-04 18:23 ` Sasha Levin
2011-12-04 18:23 ` Sasha Levin
2011-12-07 14:02 ` Sasha Levin
2011-12-07 14:02 ` Sasha Levin
2011-12-07 15:48 ` Michael S. Tsirkin
2011-12-07 15:48 ` Michael S. Tsirkin
2011-12-08 9:44 ` Rusty Russell
2011-12-08 9:44 ` Rusty Russell
2011-12-08 10:37 ` Sasha Levin
2011-12-08 10:37 ` Sasha Levin
2011-12-09 5:33 ` Rusty Russell
2011-12-09 5:33 ` Rusty Russell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EDF6C2C.3050604@redhat.com \
--to=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=levinsasha928@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=markmc@redhat.com \
--cc=mst@redhat.com \
--cc=rusty@rustcorp.com.au \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.