From: Avi Kivity <avi@redhat.com>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [Qemu-devel] [PATCH RFC] virtio: put last seen used index into ring itself
Date: Thu, 20 May 2010 10:00:56 +0300 [thread overview]
Message-ID: <4BF4DE28.2080103@redhat.com> (raw)
In-Reply-To: <201005201431.51142.rusty@rustcorp.com.au>
On 05/20/2010 08:01 AM, Rusty Russell wrote:
>
>> A device with out of order
>> completion (like virtio-blk) will quickly randomize the unused
>> descriptor indexes, so every descriptor fetch will require a bounce.
>>
>> In contrast, if the rings hold the descriptors themselves instead of
>> pointers, we bounce (sizeof(descriptor)/cache_line_size) cache lines for
>> every descriptor, amortized.
>>
> We already have indirect, this would be a logical next step. So let's
> think about it. The avail ring would contain 64 bit values, the used ring
> would contain indexes into the avail ring.
>
Have just one ring, no indexes. The producer places descriptors into
the ring and updates the head, The consumer copies out descriptors to
be processed and copies back in completed descriptors. Chaining is
always linear. The descriptors contain a tag that allow the producer to
identify the completion.
Indirect only pays when there are enough descriptors in it to fill a
couple of cache lines. Otherwise it's an extra bounce.
We will always bounce here, that what happens when transferring data.
The question is whether how many cache lines per descriptor. A pointer
adds 1 bounce, linear descriptors cost 1/4 bounce, chained descriptors
cost a bounce. So best is one ring of linearly chained descriptors.
Indirect works when you have large requests (like block).
> So client writes descriptor page and adds to avail ring, then writes to
> index.
> Server reads index, avail ring, descriptor page (3). Writes used
> entry (1). Updates last_used (1). Client reads used (1), derefs avail (1),
> updates last_used (1), cleans descriptor page (1).
>
> That's 9 cacheline transfers, worst case. Best case of a half-full ring
> in steady state, assuming 128-byte cache lines, the avail ring costs are
> 1/16, the used entry is 1/64. This drops it to 6 and 9/64 transfers.
>
Cache lines are 64 bytes these days.
With a single ring, client writes descriptors (ceil(N/4)), updates head
(1). Server reads head (1) copies out descriptors (ceil(N/4)), issues
requests, copies back completions ((ceil(N/4)), updates tail (1).
Client reads back tail and descriptors (1 + ceil(N/4))
Worst case: 4 + 4 * ceil(N/4). Best case I think this drops by half.
> (Note, the current scheme adds 2 more cacheline transfers, for the descriptor
> table, worst case.
2 bounces per descriptor due to random access.
> Assuming indirect, we get 2/8 xfer best case. Either way,
> it's not the main source of cacheline xfers).
>
Indirect adds a double bounce to get to the descriptor table, but any
descriptors there are accessed linearly. It's only good when you have
large chains.
> Can we do better? The obvious idea is to try to get rid of last_used and
> used, and use the ring itself. We would use an invalid entry to mark the
> head of the ring.
>
Interesting! So a peer will read until it hits a wall. But how to
update the wall atomically?
Maybe we can have a flag in the descriptor indicate headness or
tailness. Update looks ugly though: write descriptor with head flag,
write next descriptor with head flag, remove flag from previous descriptor.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
next prev parent reply other threads:[~2010-05-20 7:02 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-05 20:58 [Qemu-devel] [PATCH RFC] virtio: put last seen used index into ring itself Michael S. Tsirkin
2010-05-05 21:18 ` [Qemu-devel] " Dor Laor
2010-05-06 2:31 ` Rusty Russell
2010-05-06 6:19 ` Michael S. Tsirkin
2010-05-07 3:33 ` Rusty Russell
2010-05-09 21:06 ` Michael S. Tsirkin
2010-05-06 10:00 ` [Qemu-devel] " Avi Kivity
2010-05-07 3:23 ` Rusty Russell
2010-05-11 19:27 ` Avi Kivity
2010-05-11 19:52 ` Michael S. Tsirkin
2010-05-19 7:39 ` Rusty Russell
2010-05-19 8:06 ` Avi Kivity
2010-05-19 22:33 ` Michael S. Tsirkin
2010-05-20 6:04 ` Avi Kivity
2010-05-20 5:01 ` Rusty Russell
2010-05-20 5:08 ` Rusty Russell
2010-05-23 15:31 ` Michael S. Tsirkin
2010-05-23 15:41 ` Avi Kivity
2010-05-23 15:51 ` Michael S. Tsirkin
2010-05-23 16:03 ` Avi Kivity
2010-05-23 16:30 ` Michael S. Tsirkin
2010-05-24 6:37 ` Avi Kivity
2010-05-24 8:05 ` Michael S. Tsirkin
2010-05-24 11:00 ` Avi Kivity
2010-05-23 17:28 ` Michael S. Tsirkin
2010-05-23 15:56 ` Michael S. Tsirkin
2010-05-20 7:00 ` Avi Kivity [this message]
2010-05-20 14:34 ` Rusty Russell
2010-05-20 15:46 ` Avi Kivity
2010-05-20 10:04 ` Michael S. Tsirkin
2010-05-11 18:46 ` [Qemu-devel] " Ryan Harper
2010-05-11 19:48 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BF4DE28.2080103@redhat.com \
--to=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rusty@rustcorp.com.au \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).