* [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
@ 2011-11-29 9:33 Sasha Levin
2011-11-29 12:56 ` Michael S. Tsirkin
0 siblings, 1 reply; 41+ messages in thread
From: Sasha Levin @ 2011-11-29 9:33 UTC (permalink / raw)
To: linux-kernel; +Cc: virtualization, Sasha Levin, kvm, Michael S. Tsirkin
Currently if VIRTIO_RING_F_INDIRECT_DESC is enabled we will use indirect
descriptors even if we have plenty of space in the ring. This means that
we take a performance hit at all times due to the overhead of creating
indirect descriptors.
With this patch, we will use indirect descriptors only if we have less than
either 16, or 12% of the total amount of descriptors available.
I did basic performance benchmark on virtio-net with vhost enabled.
Before:
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 4563.92
After:
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 5353.28
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
---
drivers/virtio/virtio_ring.c | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c7a2c20..5e0ce15 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -82,6 +82,7 @@ struct vring_virtqueue
/* Host supports indirect buffers */
bool indirect;
+ unsigned int indirect_threshold;
/* Host publishes avail event idx */
bool event;
@@ -176,8 +177,9 @@ int virtqueue_add_buf_gfp(struct virtqueue *_vq,
BUG_ON(data == NULL);
/* If the host supports indirect descriptor tables, and we have multiple
- * buffers, then go indirect. FIXME: tune this threshold */
- if (vq->indirect && (out + in) > 1 && vq->num_free) {
+ * buffers, then go indirect. */
+ if (vq->indirect && (out + in) > 1 &&
+ (vq->num_free < vq->indirect_threshold)) {
head = vring_add_indirect(vq, sg, out, in, gfp);
if (likely(head >= 0))
goto add_head;
@@ -485,6 +487,12 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
#endif
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
+ /*
+ * Use indirect descriptors only when we have less than either 12%
+ * or 16 of the descriptors in the ring available.
+ */
+ if (vq->indirect)
+ vq->indirect_threshold = max(16U, num >> 3);
vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
/* No callback? Tell other side not to bother us. */
--
1.7.8.rc3
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-11-29 9:33 [PATCH] virtio-ring: Use threshold for switching to indirect descriptors Sasha Levin
@ 2011-11-29 12:56 ` Michael S. Tsirkin
2011-11-29 13:34 ` Sasha Levin
0 siblings, 1 reply; 41+ messages in thread
From: Michael S. Tsirkin @ 2011-11-29 12:56 UTC (permalink / raw)
To: Sasha Levin; +Cc: linux-kernel, kvm, virtualization
On Tue, Nov 29, 2011 at 11:33:16AM +0200, Sasha Levin wrote:
> Currently if VIRTIO_RING_F_INDIRECT_DESC is enabled we will use indirect
> descriptors even if we have plenty of space in the ring. This means that
> we take a performance hit at all times due to the overhead of creating
> indirect descriptors.
Is it the overhead of creating them or just allocating the pages?
The logic you propose is basically add direct as long as
the ring is mostly empty. So if the problem is in allocations,
one simple optimization for this one workload is add a small
cache of memory to use for indirect bufs. Of course building
a good API for this is where we got blocked in the past...
> With this patch, we will use indirect descriptors only if we have less than
> either 16, or 12% of the total amount of descriptors available.
One notes that this to some level conflicts with patches that change
virtio net not to drain the vq before add buf, in that we are
required here to drain the vq to avoid indirect.
Not necessarily a serious problem, but something to keep in mind:
a memory pool would not have this issue.
>
> I did basic performance benchmark on virtio-net with vhost enabled.
>
> Before:
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. 10^6bits/sec
>
> 87380 16384 16384 10.00 4563.92
>
> After:
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. 10^6bits/sec
>
> 87380 16384 16384 10.00 5353.28
Is this with the kvm tool? what kind of benchmark is this?
Need to verify the effect on block too, and do some more
benchmarks. In particular we are making the ring
in effect smaller, how will this affect small packet perf
with multiple streams?
A very simple test is to disable indirect buffers altogether.
qemu-kvm has a flag for this.
Is this an equivalent test?
If yes I'll try that.
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: virtualization@lists.linux-foundation.org
> Cc: kvm@vger.kernel.org
> Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
> ---
> drivers/virtio/virtio_ring.c | 12 ++++++++++--
> 1 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index c7a2c20..5e0ce15 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -82,6 +82,7 @@ struct vring_virtqueue
>
> /* Host supports indirect buffers */
> bool indirect;
We can get rid of bool indirect now, just set indirect_threshold to 0,
right?
> + unsigned int indirect_threshold;
Please add a comment. It should be something like
'Min. number of free space in the ring to trigger direct descriptor use'
>
> /* Host publishes avail event idx */
> bool event;
> @@ -176,8 +177,9 @@ int virtqueue_add_buf_gfp(struct virtqueue *_vq,
> BUG_ON(data == NULL);
>
> /* If the host supports indirect descriptor tables, and we have multiple
> - * buffers, then go indirect. FIXME: tune this threshold */
> - if (vq->indirect && (out + in) > 1 && vq->num_free) {
> + * buffers, then go indirect. */
> + if (vq->indirect && (out + in) > 1 &&
> + (vq->num_free < vq->indirect_threshold)) {
If num_free is 0, this will allocate the buffer which is
not a good idea.
I think there's a regression here: with a small vq, we could
previously put in an indirect descriptor, with your patch
add_buf will fail. I think this is a real problem for block
which was the original reason indirect bufs were introduced.
> head = vring_add_indirect(vq, sg, out, in, gfp);
> if (likely(head >= 0))
> goto add_head;
> @@ -485,6 +487,12 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
> #endif
>
> vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
> + /*
> + * Use indirect descriptors only when we have less than either 12%
> + * or 16 of the descriptors in the ring available.
> + */
> + if (vq->indirect)
> + vq->indirect_threshold = max(16U, num >> 3);
Let's add some defines at top of the file please, maybe even
a module parameter.
> vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
>
> /* No callback? Tell other side not to bother us. */
> --
> 1.7.8.rc3
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-11-29 12:56 ` Michael S. Tsirkin
@ 2011-11-29 13:34 ` Sasha Levin
2011-11-29 13:54 ` Michael S. Tsirkin
[not found] ` <20111129135406.GB30966@redhat.com>
0 siblings, 2 replies; 41+ messages in thread
From: Sasha Levin @ 2011-11-29 13:34 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: linux-kernel, kvm, virtualization
On Tue, 2011-11-29 at 14:56 +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 29, 2011 at 11:33:16AM +0200, Sasha Levin wrote:
> > Currently if VIRTIO_RING_F_INDIRECT_DESC is enabled we will use indirect
> > descriptors even if we have plenty of space in the ring. This means that
> > we take a performance hit at all times due to the overhead of creating
> > indirect descriptors.
>
> Is it the overhead of creating them or just allocating the pages?
My guess here is that it's the allocation since creating them is very
similar to creating regular descriptors.
> The logic you propose is basically add direct as long as
> the ring is mostly empty. So if the problem is in allocations,
> one simple optimization for this one workload is add a small
> cache of memory to use for indirect bufs. Of course building
> a good API for this is where we got blocked in the past...
I thought the issue of using a single pool was that the sizes of
indirect descriptors are dynamic, so you can't use a single kmemcache
for all of them unless you're ok with having a bunch of wasted bytes.
>
> > With this patch, we will use indirect descriptors only if we have less than
> > either 16, or 12% of the total amount of descriptors available.
>
> One notes that this to some level conflicts with patches that change
> virtio net not to drain the vq before add buf, in that we are
> required here to drain the vq to avoid indirect.
You don't have to avoid indirects by all means, if the vq is so full it
has to resort to indirect buffers we better let him do that.
>
> Not necessarily a serious problem, but something to keep in mind:
> a memory pool would not have this issue.
>
> >
> > I did basic performance benchmark on virtio-net with vhost enabled.
> >
> > Before:
> > Recv Send Send
> > Socket Socket Message Elapsed
> > Size Size Size Time Throughput
> > bytes bytes bytes secs. 10^6bits/sec
> >
> > 87380 16384 16384 10.00 4563.92
> >
> > After:
> > Recv Send Send
> > Socket Socket Message Elapsed
> > Size Size Size Time Throughput
> > bytes bytes bytes secs. 10^6bits/sec
> >
> > 87380 16384 16384 10.00 5353.28
>
> Is this with the kvm tool? what kind of benchmark is this?
It's using the kvm tool and netperf. It's a simple TCP_STREAM test with
vhost enabled and using a regular TAP device to connect between guest
and host.
>
> Need to verify the effect on block too, and do some more
> benchmarks. In particular we are making the ring
> in effect smaller, how will this affect small packet perf
> with multiple streams?
I couldn't get good block benchmarks on my hardware. They were all over
the place even when I was trying to get the baseline. I'm guessing my
disk is about to kick the bucket.
This threshold should be dynamic and be based on the amount of avail
descriptors over time, so for example, if the vring is 90% full over
time the threshold will go up allowing for more indirect buffers.
Currently it's static, but it's a first step to making it dynamic :)
I'll do a benchmark with small packets.
> A very simple test is to disable indirect buffers altogether.
> qemu-kvm has a flag for this.
> Is this an equivalent test?
> If yes I'll try that.
Yes, it should be equivalent to qemu without that flag.
>
>
> > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > Cc: virtualization@lists.linux-foundation.org
> > Cc: kvm@vger.kernel.org
> > Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
> > ---
> > drivers/virtio/virtio_ring.c | 12 ++++++++++--
> > 1 files changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index c7a2c20..5e0ce15 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -82,6 +82,7 @@ struct vring_virtqueue
> >
> > /* Host supports indirect buffers */
> > bool indirect;
>
> We can get rid of bool indirect now, just set indirect_threshold to 0,
> right?
Yup.
>
> > + unsigned int indirect_threshold;
>
> Please add a comment. It should be something like
> 'Min. number of free space in the ring to trigger direct descriptor use'
Will do.
>
> >
> > /* Host publishes avail event idx */
> > bool event;
> > @@ -176,8 +177,9 @@ int virtqueue_add_buf_gfp(struct virtqueue *_vq,
> > BUG_ON(data == NULL);
> >
> > /* If the host supports indirect descriptor tables, and we have multiple
> > - * buffers, then go indirect. FIXME: tune this threshold */
> > - if (vq->indirect && (out + in) > 1 && vq->num_free) {
> > + * buffers, then go indirect. */
> > + if (vq->indirect && (out + in) > 1 &&
> > + (vq->num_free < vq->indirect_threshold)) {
>
> If num_free is 0, this will allocate the buffer which is
> not a good idea.
>
> I think there's a regression here: with a small vq, we could
> previously put in an indirect descriptor, with your patch
> add_buf will fail. I think this is a real problem for block
> which was the original reason indirect bufs were introduced.
I defined the threshold so at least 16 descriptors will be used as
indirect buffers, so if you have a small vq theres still a solid minimum
of indirect descriptors it could use.
>
>
> > head = vring_add_indirect(vq, sg, out, in, gfp);
> > if (likely(head >= 0))
> > goto add_head;
> > @@ -485,6 +487,12 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
> > #endif
> >
> > vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
> > + /*
> > + * Use indirect descriptors only when we have less than either 12%
> > + * or 16 of the descriptors in the ring available.
> > + */
> > + if (vq->indirect)
> > + vq->indirect_threshold = max(16U, num >> 3);
>
> Let's add some defines at top of the file please, maybe even
> a module parameter.
>
> > vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
> >
> > /* No callback? Tell other side not to bother us. */
> > --
> > 1.7.8.rc3
--
Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-11-29 13:34 ` Sasha Levin
@ 2011-11-29 13:54 ` Michael S. Tsirkin
[not found] ` <20111129135406.GB30966@redhat.com>
1 sibling, 0 replies; 41+ messages in thread
From: Michael S. Tsirkin @ 2011-11-29 13:54 UTC (permalink / raw)
To: Sasha Levin; +Cc: linux-kernel, kvm, virtualization
On Tue, Nov 29, 2011 at 03:34:48PM +0200, Sasha Levin wrote:
> On Tue, 2011-11-29 at 14:56 +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 29, 2011 at 11:33:16AM +0200, Sasha Levin wrote:
> > > Currently if VIRTIO_RING_F_INDIRECT_DESC is enabled we will use indirect
> > > descriptors even if we have plenty of space in the ring. This means that
> > > we take a performance hit at all times due to the overhead of creating
> > > indirect descriptors.
> >
> > Is it the overhead of creating them or just allocating the pages?
>
> My guess here is that it's the allocation since creating them is very
> similar to creating regular descriptors.
Well, there is some formatting overhead ...
> > The logic you propose is basically add direct as long as
> > the ring is mostly empty. So if the problem is in allocations,
> > one simple optimization for this one workload is add a small
> > cache of memory to use for indirect bufs. Of course building
> > a good API for this is where we got blocked in the past...
>
> I thought the issue of using a single pool was that the sizes of
> indirect descriptors are dynamic, so you can't use a single kmemcache
> for all of them unless you're ok with having a bunch of wasted bytes.
If the pool size is limited, the waste is limited too, so maybe
we are OK with that...
> >
> > > With this patch, we will use indirect descriptors only if we have less than
> > > either 16, or 12% of the total amount of descriptors available.
> >
> > One notes that this to some level conflicts with patches that change
> > virtio net not to drain the vq before add buf, in that we are
> > required here to drain the vq to avoid indirect.
>
> You don't have to avoid indirects by all means, if the vq is so full it
> has to resort to indirect buffers we better let him do that.
With the limited polling patches, the vq stays full all of
the time, we only poll enough to create space for the new
descriptor.
It's not a must to make them work as they are not upstream,
but worth considering.
> >
> > Not necessarily a serious problem, but something to keep in mind:
> > a memory pool would not have this issue.
> >
> > >
> > > I did basic performance benchmark on virtio-net with vhost enabled.
> > >
> > > Before:
> > > Recv Send Send
> > > Socket Socket Message Elapsed
> > > Size Size Size Time Throughput
> > > bytes bytes bytes secs. 10^6bits/sec
> > >
> > > 87380 16384 16384 10.00 4563.92
> > >
> > > After:
> > > Recv Send Send
> > > Socket Socket Message Elapsed
> > > Size Size Size Time Throughput
> > > bytes bytes bytes secs. 10^6bits/sec
> > >
> > > 87380 16384 16384 10.00 5353.28
> >
> > Is this with the kvm tool? what kind of benchmark is this?
>
> It's using the kvm tool and netperf. It's a simple TCP_STREAM test with
> vhost enabled and using a regular TAP device to connect between guest
> and host.
guest to host?
> >
> > Need to verify the effect on block too, and do some more
> > benchmarks. In particular we are making the ring
> > in effect smaller, how will this affect small packet perf
> > with multiple streams?
>
> I couldn't get good block benchmarks on my hardware. They were all over
> the place even when I was trying to get the baseline. I'm guessing my
> disk is about to kick the bucket.
Try using memory as a backing store.
> This threshold should be dynamic and be based on the amount of avail
> descriptors over time, so for example, if the vring is 90% full over
> time the threshold will go up allowing for more indirect buffers.
> Currently it's static, but it's a first step to making it dynamic :)
>
> I'll do a benchmark with small packets.
>
> > A very simple test is to disable indirect buffers altogether.
> > qemu-kvm has a flag for this.
> > Is this an equivalent test?
> > If yes I'll try that.
>
> Yes, it should be equivalent to qemu without that flag.
>
> >
> >
> > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > Cc: virtualization@lists.linux-foundation.org
> > > Cc: kvm@vger.kernel.org
> > > Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
> > > ---
> > > drivers/virtio/virtio_ring.c | 12 ++++++++++--
> > > 1 files changed, 10 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index c7a2c20..5e0ce15 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -82,6 +82,7 @@ struct vring_virtqueue
> > >
> > > /* Host supports indirect buffers */
> > > bool indirect;
> >
> > We can get rid of bool indirect now, just set indirect_threshold to 0,
> > right?
>
> Yup.
>
> >
> > > + unsigned int indirect_threshold;
> >
> > Please add a comment. It should be something like
> > 'Min. number of free space in the ring to trigger direct descriptor use'
>
> Will do.
>
> >
> > >
> > > /* Host publishes avail event idx */
> > > bool event;
> > > @@ -176,8 +177,9 @@ int virtqueue_add_buf_gfp(struct virtqueue *_vq,
> > > BUG_ON(data == NULL);
> > >
> > > /* If the host supports indirect descriptor tables, and we have multiple
> > > - * buffers, then go indirect. FIXME: tune this threshold */
> > > - if (vq->indirect && (out + in) > 1 && vq->num_free) {
> > > + * buffers, then go indirect. */
> > > + if (vq->indirect && (out + in) > 1 &&
> > > + (vq->num_free < vq->indirect_threshold)) {
> >
> > If num_free is 0, this will allocate the buffer which is
> > not a good idea.
> >
> > I think there's a regression here: with a small vq, we could
> > previously put in an indirect descriptor, with your patch
> > add_buf will fail. I think this is a real problem for block
> > which was the original reason indirect bufs were introduced.
>
> I defined the threshold so at least 16 descriptors will be used as
> indirect buffers, so if you have a small vq theres still a solid minimum
> of indirect descriptors it could use.
Yes but request size might be > 16.
> >
> >
> > > head = vring_add_indirect(vq, sg, out, in, gfp);
> > > if (likely(head >= 0))
> > > goto add_head;
> > > @@ -485,6 +487,12 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
> > > #endif
> > >
> > > vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
> > > + /*
> > > + * Use indirect descriptors only when we have less than either 12%
> > > + * or 16 of the descriptors in the ring available.
> > > + */
> > > + if (vq->indirect)
> > > + vq->indirect_threshold = max(16U, num >> 3);
> >
> > Let's add some defines at top of the file please, maybe even
> > a module parameter.
> >
> > > vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
> > >
> > > /* No callback? Tell other side not to bother us. */
> > > --
> > > 1.7.8.rc3
>
> --
>
> Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
[not found] ` <20111129135406.GB30966@redhat.com>
@ 2011-11-29 14:21 ` Sasha Levin
2011-11-29 14:54 ` Michael S. Tsirkin
0 siblings, 1 reply; 41+ messages in thread
From: Sasha Levin @ 2011-11-29 14:21 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: linux-kernel, kvm, virtualization
On Tue, 2011-11-29 at 15:54 +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 29, 2011 at 03:34:48PM +0200, Sasha Levin wrote:
> > On Tue, 2011-11-29 at 14:56 +0200, Michael S. Tsirkin wrote:
> > > On Tue, Nov 29, 2011 at 11:33:16AM +0200, Sasha Levin wrote:
> > > > Currently if VIRTIO_RING_F_INDIRECT_DESC is enabled we will use indirect
> > > > descriptors even if we have plenty of space in the ring. This means that
> > > > we take a performance hit at all times due to the overhead of creating
> > > > indirect descriptors.
> > >
> > > Is it the overhead of creating them or just allocating the pages?
> >
> > My guess here is that it's the allocation since creating them is very
> > similar to creating regular descriptors.
>
> Well, there is some formatting overhead ...
Very little. The formatting code is very similar to regular buffers.
>
> > > The logic you propose is basically add direct as long as
> > > the ring is mostly empty. So if the problem is in allocations,
> > > one simple optimization for this one workload is add a small
> > > cache of memory to use for indirect bufs. Of course building
> > > a good API for this is where we got blocked in the past...
> >
> > I thought the issue of using a single pool was that the sizes of
> > indirect descriptors are dynamic, so you can't use a single kmemcache
> > for all of them unless you're ok with having a bunch of wasted bytes.
>
> If the pool size is limited, the waste is limited too, so maybe
> we are OK with that...
What would you say are the best numbers for indirect descriptor sizes
and the amount of those in a kmemcache?
> > >
> > > > With this patch, we will use indirect descriptors only if we have less than
> > > > either 16, or 12% of the total amount of descriptors available.
> > >
> > > One notes that this to some level conflicts with patches that change
> > > virtio net not to drain the vq before add buf, in that we are
> > > required here to drain the vq to avoid indirect.
> >
> > You don't have to avoid indirects by all means, if the vq is so full it
> > has to resort to indirect buffers we better let him do that.
>
> With the limited polling patches, the vq stays full all of
> the time, we only poll enough to create space for the new
> descriptor.
> It's not a must to make them work as they are not upstream,
> but worth considering.
>
> > >
> > > Not necessarily a serious problem, but something to keep in mind:
> > > a memory pool would not have this issue.
> > >
> > > >
> > > > I did basic performance benchmark on virtio-net with vhost enabled.
> > > >
> > > > Before:
> > > > Recv Send Send
> > > > Socket Socket Message Elapsed
> > > > Size Size Size Time Throughput
> > > > bytes bytes bytes secs. 10^6bits/sec
> > > >
> > > > 87380 16384 16384 10.00 4563.92
> > > >
> > > > After:
> > > > Recv Send Send
> > > > Socket Socket Message Elapsed
> > > > Size Size Size Time Throughput
> > > > bytes bytes bytes secs. 10^6bits/sec
> > > >
> > > > 87380 16384 16384 10.00 5353.28
> > >
> > > Is this with the kvm tool? what kind of benchmark is this?
> >
> > It's using the kvm tool and netperf. It's a simple TCP_STREAM test with
> > vhost enabled and using a regular TAP device to connect between guest
> > and host.
>
> guest to host?
guest is running as server.
>
> > >
> > > Need to verify the effect on block too, and do some more
> > > benchmarks. In particular we are making the ring
> > > in effect smaller, how will this affect small packet perf
> > > with multiple streams?
> >
> > I couldn't get good block benchmarks on my hardware. They were all over
> > the place even when I was trying to get the baseline. I'm guessing my
> > disk is about to kick the bucket.
>
> Try using memory as a backing store.
Here are the results from fio doing random reads:
With indirect buffers:
Run status group 0 (all jobs):
READ: io=2419.7MB, aggrb=126001KB/s, minb=12887KB/s, maxb=13684KB/s, mint=18461msec, maxt=19664msec
Disk stats (read/write):
vda: ios=612107/0, merge=0/0, ticks=37559/0, in_queue=32723, util=76.70%
Indirect buffers disabled in the host:
Run status group 0 (all jobs):
READ: io=2419.7MB, aggrb=127106KB/s, minb=12811KB/s, maxb=14557KB/s, mint=17486msec, maxt=19493msec
Disk stats (read/write):
vda: ios=617315/0, merge=1/0, ticks=166751/0, in_queue=162807, util=88.19%
Which is actually strange, weren't indirect buffers introduced to make
the performance *better*? From what I see it's pretty much the
same/worse for virtio-blk.
Here's my fio test file:
[random-read]
rw=randread
size=256m
filename=/dev/vda
ioengine=libaio
iodepth=8
direct=1
invalidate=1
numjobs=10
>
> > This threshold should be dynamic and be based on the amount of avail
> > descriptors over time, so for example, if the vring is 90% full over
> > time the threshold will go up allowing for more indirect buffers.
> > Currently it's static, but it's a first step to making it dynamic :)
> >
> > I'll do a benchmark with small packets.
> >
> > > A very simple test is to disable indirect buffers altogether.
> > > qemu-kvm has a flag for this.
> > > Is this an equivalent test?
> > > If yes I'll try that.
> >
> > Yes, it should be equivalent to qemu without that flag.
> >
> > >
> > >
> > > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > > Cc: virtualization@lists.linux-foundation.org
> > > > Cc: kvm@vger.kernel.org
> > > > Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
> > > > ---
> > > > drivers/virtio/virtio_ring.c | 12 ++++++++++--
> > > > 1 files changed, 10 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index c7a2c20..5e0ce15 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -82,6 +82,7 @@ struct vring_virtqueue
> > > >
> > > > /* Host supports indirect buffers */
> > > > bool indirect;
> > >
> > > We can get rid of bool indirect now, just set indirect_threshold to 0,
> > > right?
> >
> > Yup.
> >
> > >
> > > > + unsigned int indirect_threshold;
> > >
> > > Please add a comment. It should be something like
> > > 'Min. number of free space in the ring to trigger direct descriptor use'
> >
> > Will do.
> >
> > >
> > > >
> > > > /* Host publishes avail event idx */
> > > > bool event;
> > > > @@ -176,8 +177,9 @@ int virtqueue_add_buf_gfp(struct virtqueue *_vq,
> > > > BUG_ON(data == NULL);
> > > >
> > > > /* If the host supports indirect descriptor tables, and we have multiple
> > > > - * buffers, then go indirect. FIXME: tune this threshold */
> > > > - if (vq->indirect && (out + in) > 1 && vq->num_free) {
> > > > + * buffers, then go indirect. */
> > > > + if (vq->indirect && (out + in) > 1 &&
> > > > + (vq->num_free < vq->indirect_threshold)) {
> > >
> > > If num_free is 0, this will allocate the buffer which is
> > > not a good idea.
> > >
> > > I think there's a regression here: with a small vq, we could
> > > previously put in an indirect descriptor, with your patch
> > > add_buf will fail. I think this is a real problem for block
> > > which was the original reason indirect bufs were introduced.
> >
> > I defined the threshold so at least 16 descriptors will be used as
> > indirect buffers, so if you have a small vq theres still a solid minimum
> > of indirect descriptors it could use.
>
> Yes but request size might be > 16.
>
> > >
> > >
> > > > head = vring_add_indirect(vq, sg, out, in, gfp);
> > > > if (likely(head >= 0))
> > > > goto add_head;
> > > > @@ -485,6 +487,12 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
> > > > #endif
> > > >
> > > > vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
> > > > + /*
> > > > + * Use indirect descriptors only when we have less than either 12%
> > > > + * or 16 of the descriptors in the ring available.
> > > > + */
> > > > + if (vq->indirect)
> > > > + vq->indirect_threshold = max(16U, num >> 3);
> > >
> > > Let's add some defines at top of the file please, maybe even
> > > a module parameter.
> > >
> > > > vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
> > > >
> > > > /* No callback? Tell other side not to bother us. */
> > > > --
> > > > 1.7.8.rc3
> >
> > --
> >
> > Sasha.
--
Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-11-29 14:21 ` Sasha Levin
@ 2011-11-29 14:54 ` Michael S. Tsirkin
2011-11-29 14:58 ` Avi Kivity
0 siblings, 1 reply; 41+ messages in thread
From: Michael S. Tsirkin @ 2011-11-29 14:54 UTC (permalink / raw)
To: Sasha Levin; +Cc: markmc, linux-kernel, kvm, virtualization
On Tue, Nov 29, 2011 at 04:21:04PM +0200, Sasha Levin wrote:
> > > > Need to verify the effect on block too, and do some more
> > > > benchmarks. In particular we are making the ring
> > > > in effect smaller, how will this affect small packet perf
> > > > with multiple streams?
> > >
> > > I couldn't get good block benchmarks on my hardware. They were all over
> > > the place even when I was trying to get the baseline. I'm guessing my
> > > disk is about to kick the bucket.
> >
> > Try using memory as a backing store.
>
> Here are the results from fio doing random reads:
>
> With indirect buffers:
> Run status group 0 (all jobs):
> READ: io=2419.7MB, aggrb=126001KB/s, minb=12887KB/s, maxb=13684KB/s, mint=18461msec, maxt=19664msec
>
> Disk stats (read/write):
> vda: ios=612107/0, merge=0/0, ticks=37559/0, in_queue=32723, util=76.70%
>
> Indirect buffers disabled in the host:
> Run status group 0 (all jobs):
> READ: io=2419.7MB, aggrb=127106KB/s, minb=12811KB/s, maxb=14557KB/s, mint=17486msec, maxt=19493msec
>
> Disk stats (read/write):
> vda: ios=617315/0, merge=1/0, ticks=166751/0, in_queue=162807, util=88.19%
I don't know much about this, only difference I see is that
in_queue is way higher.
>
> Which is actually strange, weren't indirect buffers introduced to make
> the performance *better*? From what I see it's pretty much the
> same/worse for virtio-blk.
I know they were introduced to allow adding very large bufs.
See 9fa29b9df32ba4db055f3977933cd0c1b8fe67cd
Mark, you wrote the patch, could you tell us which workloads
benefit the most from indirect bufs?
> Here's my fio test file:
> [random-read]
> rw=randread
> size=256m
> filename=/dev/vda
> ioengine=libaio
> iodepth=8
> direct=1
> invalidate=1
> numjobs=10
> >
> > > This threshold should be dynamic and be based on the amount of avail
> > > descriptors over time, so for example, if the vring is 90% full over
> > > time the threshold will go up allowing for more indirect buffers.
> > > Currently it's static, but it's a first step to making it dynamic :)
> > >
> > > I'll do a benchmark with small packets.
> > >
> > > > A very simple test is to disable indirect buffers altogether.
> > > > qemu-kvm has a flag for this.
> > > > Is this an equivalent test?
> > > > If yes I'll try that.
> > >
> > > Yes, it should be equivalent to qemu without that flag.
> > >
> > > >
> > > >
> > > > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > > > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > > > Cc: virtualization@lists.linux-foundation.org
> > > > > Cc: kvm@vger.kernel.org
> > > > > Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
> > > > > ---
> > > > > drivers/virtio/virtio_ring.c | 12 ++++++++++--
> > > > > 1 files changed, 10 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index c7a2c20..5e0ce15 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -82,6 +82,7 @@ struct vring_virtqueue
> > > > >
> > > > > /* Host supports indirect buffers */
> > > > > bool indirect;
> > > >
> > > > We can get rid of bool indirect now, just set indirect_threshold to 0,
> > > > right?
> > >
> > > Yup.
> > >
> > > >
> > > > > + unsigned int indirect_threshold;
> > > >
> > > > Please add a comment. It should be something like
> > > > 'Min. number of free space in the ring to trigger direct descriptor use'
> > >
> > > Will do.
> > >
> > > >
> > > > >
> > > > > /* Host publishes avail event idx */
> > > > > bool event;
> > > > > @@ -176,8 +177,9 @@ int virtqueue_add_buf_gfp(struct virtqueue *_vq,
> > > > > BUG_ON(data == NULL);
> > > > >
> > > > > /* If the host supports indirect descriptor tables, and we have multiple
> > > > > - * buffers, then go indirect. FIXME: tune this threshold */
> > > > > - if (vq->indirect && (out + in) > 1 && vq->num_free) {
> > > > > + * buffers, then go indirect. */
> > > > > + if (vq->indirect && (out + in) > 1 &&
> > > > > + (vq->num_free < vq->indirect_threshold)) {
> > > >
> > > > If num_free is 0, this will allocate the buffer which is
> > > > not a good idea.
> > > >
> > > > I think there's a regression here: with a small vq, we could
> > > > previously put in an indirect descriptor, with your patch
> > > > add_buf will fail. I think this is a real problem for block
> > > > which was the original reason indirect bufs were introduced.
> > >
> > > I defined the threshold so at least 16 descriptors will be used as
> > > indirect buffers, so if you have a small vq theres still a solid minimum
> > > of indirect descriptors it could use.
> >
> > Yes but request size might be > 16.
> >
> > > >
> > > >
> > > > > head = vring_add_indirect(vq, sg, out, in, gfp);
> > > > > if (likely(head >= 0))
> > > > > goto add_head;
> > > > > @@ -485,6 +487,12 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
> > > > > #endif
> > > > >
> > > > > vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
> > > > > + /*
> > > > > + * Use indirect descriptors only when we have less than either 12%
> > > > > + * or 16 of the descriptors in the ring available.
> > > > > + */
> > > > > + if (vq->indirect)
> > > > > + vq->indirect_threshold = max(16U, num >> 3);
> > > >
> > > > Let's add some defines at top of the file please, maybe even
> > > > a module parameter.
> > > >
> > > > > vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
> > > > >
> > > > > /* No callback? Tell other side not to bother us. */
> > > > > --
> > > > > 1.7.8.rc3
> > >
> > > --
> > >
> > > Sasha.
>
> --
>
> Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-11-29 14:54 ` Michael S. Tsirkin
@ 2011-11-29 14:58 ` Avi Kivity
2011-11-30 16:11 ` Sasha Levin
0 siblings, 1 reply; 41+ messages in thread
From: Avi Kivity @ 2011-11-29 14:58 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: markmc, kvm, linux-kernel, virtualization, Sasha Levin
On 11/29/2011 04:54 PM, Michael S. Tsirkin wrote:
> >
> > Which is actually strange, weren't indirect buffers introduced to make
> > the performance *better*? From what I see it's pretty much the
> > same/worse for virtio-blk.
>
> I know they were introduced to allow adding very large bufs.
> See 9fa29b9df32ba4db055f3977933cd0c1b8fe67cd
> Mark, you wrote the patch, could you tell us which workloads
> benefit the most from indirect bufs?
>
Indirects are really for block devices with many spindles, since there
the limiting factor is the number of requests in flight. Network
interfaces are limited by bandwidth, it's better to increase the ring
size and use direct buffers there (so the ring size more or less
corresponds to the buffer size).
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-11-29 14:58 ` Avi Kivity
@ 2011-11-30 16:11 ` Sasha Levin
2011-11-30 16:17 ` Sasha Levin
2011-12-01 2:42 ` Rusty Russell
0 siblings, 2 replies; 41+ messages in thread
From: Sasha Levin @ 2011-11-30 16:11 UTC (permalink / raw)
To: Avi Kivity; +Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization
On Tue, 2011-11-29 at 16:58 +0200, Avi Kivity wrote:
> On 11/29/2011 04:54 PM, Michael S. Tsirkin wrote:
> > >
> > > Which is actually strange, weren't indirect buffers introduced to make
> > > the performance *better*? From what I see it's pretty much the
> > > same/worse for virtio-blk.
> >
> > I know they were introduced to allow adding very large bufs.
> > See 9fa29b9df32ba4db055f3977933cd0c1b8fe67cd
> > Mark, you wrote the patch, could you tell us which workloads
> > benefit the most from indirect bufs?
> >
>
> Indirects are really for block devices with many spindles, since there
> the limiting factor is the number of requests in flight. Network
> interfaces are limited by bandwidth, it's better to increase the ring
> size and use direct buffers there (so the ring size more or less
> corresponds to the buffer size).
>
I did some testing of indirect descriptors under different workloads.
All tests were on a 2 vcpu guest with vhost on. Simple TCP_STREAM using
netperf.
Indirect desc off:
guest -> host, 1 stream: ~4600mb/s
host -> guest, 1 stream: ~5900mb/s
guest -> host, 8 streams: ~620mb/s (on average)
host -> guest, 8 stream: ~600mb/s (on average)
Indirect desc on:
guest -> host, 1 stream: ~4900mb/s
host -> guest, 1 stream: ~5400mb/s
guest -> host, 8 streams: ~620mb/s (on average)
host -> guest, 8 stream: ~600mb/s (on average)
Which means that for one stream, guest to host gets faster while host to
guest gets slower when indirect descriptors are on.
--
Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-11-30 16:11 ` Sasha Levin
@ 2011-11-30 16:17 ` Sasha Levin
2011-12-01 2:42 ` Rusty Russell
1 sibling, 0 replies; 41+ messages in thread
From: Sasha Levin @ 2011-11-30 16:17 UTC (permalink / raw)
To: Avi Kivity; +Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization
Sorry, I forgot to copy-paste one of the results :)
On Wed, 2011-11-30 at 18:11 +0200, Sasha Levin wrote:
> I did some testing of indirect descriptors under different workloads.
>
> All tests were on a 2 vcpu guest with vhost on. Simple TCP_STREAM using
> netperf.
>
> Indirect desc off:
> guest -> host, 1 stream: ~4600mb/s
> host -> guest, 1 stream: ~5900mb/s
> guest -> host, 8 streams: ~620mb/s (on average)
> host -> guest, 8 stream: ~600mb/s (on average)
>
> Indirect desc on:
> guest -> host, 1 stream: ~4900mb/s
> host -> guest, 1 stream: ~5400mb/s
> guest -> host, 8 streams: ~620mb/s (on average)
> host -> guest, 8 stream: ~600mb/s (on average)
Should be:
host -> guest, 8 stream: ~515mb/s (on average)
>
> Which means that for one stream, guest to host gets faster while host to
> guest gets slower when indirect descriptors are on.
>
--
Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-11-30 16:11 ` Sasha Levin
2011-11-30 16:17 ` Sasha Levin
@ 2011-12-01 2:42 ` Rusty Russell
2011-12-01 7:58 ` Michael S. Tsirkin
1 sibling, 1 reply; 41+ messages in thread
From: Rusty Russell @ 2011-12-01 2:42 UTC (permalink / raw)
To: Sasha Levin, Avi Kivity
Cc: markmc, virtualization, linux-kernel, kvm, Michael S. Tsirkin
On Wed, 30 Nov 2011 18:11:51 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> On Tue, 2011-11-29 at 16:58 +0200, Avi Kivity wrote:
> > On 11/29/2011 04:54 PM, Michael S. Tsirkin wrote:
> > > >
> > > > Which is actually strange, weren't indirect buffers introduced to make
> > > > the performance *better*? From what I see it's pretty much the
> > > > same/worse for virtio-blk.
> > >
> > > I know they were introduced to allow adding very large bufs.
> > > See 9fa29b9df32ba4db055f3977933cd0c1b8fe67cd
> > > Mark, you wrote the patch, could you tell us which workloads
> > > benefit the most from indirect bufs?
> > >
> >
> > Indirects are really for block devices with many spindles, since there
> > the limiting factor is the number of requests in flight. Network
> > interfaces are limited by bandwidth, it's better to increase the ring
> > size and use direct buffers there (so the ring size more or less
> > corresponds to the buffer size).
> >
>
> I did some testing of indirect descriptors under different workloads.
MST and I discussed getting clever with dynamic limits ages ago, but it
was down low on the TODO list. Thanks for diving into this...
AFAICT, if the ring never fills, direct is optimal. When the ring
fills, indirect is optimal (we're better to queue now than later).
Why not something simple, like a threshold which drops every time we
fill the ring?
struct vring_virtqueue
{
...
int indirect_thresh;
...
}
virtqueue_add_buf_gfp()
{
...
if (vq->indirect &&
(vq->vring.num - vq->num_free) + out + in > vq->indirect_thresh)
return indirect()
...
if (vq->num_free < out + in) {
if (vq->indirect && vq->indirect_thresh > 0)
vq->indirect_thresh--;
...
}
Too dumb?
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-01 2:42 ` Rusty Russell
@ 2011-12-01 7:58 ` Michael S. Tsirkin
2011-12-01 8:09 ` Sasha Levin
0 siblings, 1 reply; 41+ messages in thread
From: Michael S. Tsirkin @ 2011-12-01 7:58 UTC (permalink / raw)
To: Rusty Russell
Cc: markmc, kvm, linux-kernel, virtualization, Sasha Levin,
Avi Kivity
On Thu, Dec 01, 2011 at 01:12:25PM +1030, Rusty Russell wrote:
> On Wed, 30 Nov 2011 18:11:51 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> > On Tue, 2011-11-29 at 16:58 +0200, Avi Kivity wrote:
> > > On 11/29/2011 04:54 PM, Michael S. Tsirkin wrote:
> > > > >
> > > > > Which is actually strange, weren't indirect buffers introduced to make
> > > > > the performance *better*? From what I see it's pretty much the
> > > > > same/worse for virtio-blk.
> > > >
> > > > I know they were introduced to allow adding very large bufs.
> > > > See 9fa29b9df32ba4db055f3977933cd0c1b8fe67cd
> > > > Mark, you wrote the patch, could you tell us which workloads
> > > > benefit the most from indirect bufs?
> > > >
> > >
> > > Indirects are really for block devices with many spindles, since there
> > > the limiting factor is the number of requests in flight. Network
> > > interfaces are limited by bandwidth, it's better to increase the ring
> > > size and use direct buffers there (so the ring size more or less
> > > corresponds to the buffer size).
> > >
> >
> > I did some testing of indirect descriptors under different workloads.
>
> MST and I discussed getting clever with dynamic limits ages ago, but it
> was down low on the TODO list. Thanks for diving into this...
>
> AFAICT, if the ring never fills, direct is optimal. When the ring
> fills, indirect is optimal (we're better to queue now than later).
>
> Why not something simple, like a threshold which drops every time we
> fill the ring?
>
> struct vring_virtqueue
> {
> ...
> int indirect_thresh;
> ...
> }
>
> virtqueue_add_buf_gfp()
> {
> ...
>
> if (vq->indirect &&
> (vq->vring.num - vq->num_free) + out + in > vq->indirect_thresh)
> return indirect()
> ...
>
> if (vq->num_free < out + in) {
> if (vq->indirect && vq->indirect_thresh > 0)
> vq->indirect_thresh--;
>
> ...
> }
>
> Too dumb?
>
> Cheers,
> Rusty.
We'll presumably need some logic to increment is back,
to account for random workload changes.
Something like slow start?
--
MST
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-01 7:58 ` Michael S. Tsirkin
@ 2011-12-01 8:09 ` Sasha Levin
2011-12-01 10:26 ` Michael S. Tsirkin
0 siblings, 1 reply; 41+ messages in thread
From: Sasha Levin @ 2011-12-01 8:09 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: markmc, kvm, linux-kernel, virtualization, Avi Kivity
On Thu, 2011-12-01 at 09:58 +0200, Michael S. Tsirkin wrote:
> On Thu, Dec 01, 2011 at 01:12:25PM +1030, Rusty Russell wrote:
> > On Wed, 30 Nov 2011 18:11:51 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> > > On Tue, 2011-11-29 at 16:58 +0200, Avi Kivity wrote:
> > > > On 11/29/2011 04:54 PM, Michael S. Tsirkin wrote:
> > > > > >
> > > > > > Which is actually strange, weren't indirect buffers introduced to make
> > > > > > the performance *better*? From what I see it's pretty much the
> > > > > > same/worse for virtio-blk.
> > > > >
> > > > > I know they were introduced to allow adding very large bufs.
> > > > > See 9fa29b9df32ba4db055f3977933cd0c1b8fe67cd
> > > > > Mark, you wrote the patch, could you tell us which workloads
> > > > > benefit the most from indirect bufs?
> > > > >
> > > >
> > > > Indirects are really for block devices with many spindles, since there
> > > > the limiting factor is the number of requests in flight. Network
> > > > interfaces are limited by bandwidth, it's better to increase the ring
> > > > size and use direct buffers there (so the ring size more or less
> > > > corresponds to the buffer size).
> > > >
> > >
> > > I did some testing of indirect descriptors under different workloads.
> >
> > MST and I discussed getting clever with dynamic limits ages ago, but it
> > was down low on the TODO list. Thanks for diving into this...
> >
> > AFAICT, if the ring never fills, direct is optimal. When the ring
> > fills, indirect is optimal (we're better to queue now than later).
> >
> > Why not something simple, like a threshold which drops every time we
> > fill the ring?
> >
> > struct vring_virtqueue
> > {
> > ...
> > int indirect_thresh;
> > ...
> > }
> >
> > virtqueue_add_buf_gfp()
> > {
> > ...
> >
> > if (vq->indirect &&
> > (vq->vring.num - vq->num_free) + out + in > vq->indirect_thresh)
> > return indirect()
> > ...
> >
> > if (vq->num_free < out + in) {
> > if (vq->indirect && vq->indirect_thresh > 0)
> > vq->indirect_thresh--;
> >
> > ...
> > }
> >
> > Too dumb?
> >
> > Cheers,
> > Rusty.
>
> We'll presumably need some logic to increment is back,
> to account for random workload changes.
> Something like slow start?
We can increment it each time the queue was less than 10% full, it
should act like slow start, no?
--
Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-01 8:09 ` Sasha Levin
@ 2011-12-01 10:26 ` Michael S. Tsirkin
2011-12-02 0:46 ` Rusty Russell
0 siblings, 1 reply; 41+ messages in thread
From: Michael S. Tsirkin @ 2011-12-01 10:26 UTC (permalink / raw)
To: Sasha Levin; +Cc: markmc, kvm, linux-kernel, virtualization, Avi Kivity
On Thu, Dec 01, 2011 at 10:09:37AM +0200, Sasha Levin wrote:
> On Thu, 2011-12-01 at 09:58 +0200, Michael S. Tsirkin wrote:
> > On Thu, Dec 01, 2011 at 01:12:25PM +1030, Rusty Russell wrote:
> > > On Wed, 30 Nov 2011 18:11:51 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> > > > On Tue, 2011-11-29 at 16:58 +0200, Avi Kivity wrote:
> > > > > On 11/29/2011 04:54 PM, Michael S. Tsirkin wrote:
> > > > > > >
> > > > > > > Which is actually strange, weren't indirect buffers introduced to make
> > > > > > > the performance *better*? From what I see it's pretty much the
> > > > > > > same/worse for virtio-blk.
> > > > > >
> > > > > > I know they were introduced to allow adding very large bufs.
> > > > > > See 9fa29b9df32ba4db055f3977933cd0c1b8fe67cd
> > > > > > Mark, you wrote the patch, could you tell us which workloads
> > > > > > benefit the most from indirect bufs?
> > > > > >
> > > > >
> > > > > Indirects are really for block devices with many spindles, since there
> > > > > the limiting factor is the number of requests in flight. Network
> > > > > interfaces are limited by bandwidth, it's better to increase the ring
> > > > > size and use direct buffers there (so the ring size more or less
> > > > > corresponds to the buffer size).
> > > > >
> > > >
> > > > I did some testing of indirect descriptors under different workloads.
> > >
> > > MST and I discussed getting clever with dynamic limits ages ago, but it
> > > was down low on the TODO list. Thanks for diving into this...
> > >
> > > AFAICT, if the ring never fills, direct is optimal. When the ring
> > > fills, indirect is optimal (we're better to queue now than later).
> > >
> > > Why not something simple, like a threshold which drops every time we
> > > fill the ring?
> > >
> > > struct vring_virtqueue
> > > {
> > > ...
> > > int indirect_thresh;
> > > ...
> > > }
> > >
> > > virtqueue_add_buf_gfp()
> > > {
> > > ...
> > >
> > > if (vq->indirect &&
> > > (vq->vring.num - vq->num_free) + out + in > vq->indirect_thresh)
> > > return indirect()
> > > ...
> > >
> > > if (vq->num_free < out + in) {
> > > if (vq->indirect && vq->indirect_thresh > 0)
> > > vq->indirect_thresh--;
> > >
> > > ...
> > > }
> > >
> > > Too dumb?
> > >
> > > Cheers,
> > > Rusty.
> >
> > We'll presumably need some logic to increment is back,
> > to account for random workload changes.
> > Something like slow start?
>
> We can increment it each time the queue was less than 10% full, it
> should act like slow start, no?
No, we really shouldn't get an empty ring as long as things behave
well. What I meant is something like:
#define VIRTIO_DECREMENT 2
#define VIRTIO_INCREMENT 1
if (vq->num_free < out + in) {
if (vq->indirect && vq->indirect_thresh > VIRTIO_DECREMENT)
vq->indirect_thresh /= VIRTIO_DECREMENT;
} else {
if (vq->indirect_thresh < vq->num)
vq->indirect_thresh += VIRTIO_INCREMENT;
}
So we try to avoid indirect but the moment there's no space, we decrease
the threshold drastically. If you make the increment/decrement module
parameters it's easy to try different values.
> --
>
> Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-01 10:26 ` Michael S. Tsirkin
@ 2011-12-02 0:46 ` Rusty Russell
2011-12-03 11:50 ` Sasha Levin
0 siblings, 1 reply; 41+ messages in thread
From: Rusty Russell @ 2011-12-02 0:46 UTC (permalink / raw)
To: Michael S. Tsirkin, Sasha Levin
Cc: markmc, virtualization, Avi Kivity, kvm, linux-kernel
On Thu, 1 Dec 2011 12:26:42 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Dec 01, 2011 at 10:09:37AM +0200, Sasha Levin wrote:
> > On Thu, 2011-12-01 at 09:58 +0200, Michael S. Tsirkin wrote:
> > > We'll presumably need some logic to increment is back,
> > > to account for random workload changes.
> > > Something like slow start?
> >
> > We can increment it each time the queue was less than 10% full, it
> > should act like slow start, no?
>
> No, we really shouldn't get an empty ring as long as things behave
> well. What I meant is something like:
I was thinking of the network output case, but you're right. We need to
distinguish between usually full (eg. virtio-net input) and usually
empty (eg. virtio-net output).
The signal for "we to pack more into the ring" is different. We could
use some hacky heuristic like "out == 0" but I'd rather make it explicit
when we set up the virtqueue.
Our other alternative, moving the logic to the driver, is worse.
As to fading the effect over time, that's harder. We have to deplete
the ring quite a few times before it turns into always-indirect. We
could back off every time the ring is totally idle, but that may hurt
bursty traffic. Let's try simple first?
Thanks,
Rusty.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-02 0:46 ` Rusty Russell
@ 2011-12-03 11:50 ` Sasha Levin
2011-12-04 11:06 ` Michael S. Tsirkin
2011-12-04 11:52 ` Avi Kivity
0 siblings, 2 replies; 41+ messages in thread
From: Sasha Levin @ 2011-12-03 11:50 UTC (permalink / raw)
To: Rusty Russell
Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Avi Kivity
On Fri, 2011-12-02 at 11:16 +1030, Rusty Russell wrote:
> On Thu, 1 Dec 2011 12:26:42 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, Dec 01, 2011 at 10:09:37AM +0200, Sasha Levin wrote:
> > > On Thu, 2011-12-01 at 09:58 +0200, Michael S. Tsirkin wrote:
> > > > We'll presumably need some logic to increment is back,
> > > > to account for random workload changes.
> > > > Something like slow start?
> > >
> > > We can increment it each time the queue was less than 10% full, it
> > > should act like slow start, no?
> >
> > No, we really shouldn't get an empty ring as long as things behave
> > well. What I meant is something like:
>
> I was thinking of the network output case, but you're right. We need to
> distinguish between usually full (eg. virtio-net input) and usually
> empty (eg. virtio-net output).
>
> The signal for "we to pack more into the ring" is different. We could
> use some hacky heuristic like "out == 0" but I'd rather make it explicit
> when we set up the virtqueue.
>
> Our other alternative, moving the logic to the driver, is worse.
>
> As to fading the effect over time, that's harder. We have to deplete
> the ring quite a few times before it turns into always-indirect. We
> could back off every time the ring is totally idle, but that may hurt
> bursty traffic. Let's try simple first?
I tried to take a different approach, and tried putting the indirect
descriptors in a kmem_cache as Michael suggested. The benchmarks showed
that this way virtio-net actually worked faster with indirect on even in
a single stream.
Maybe we can do that instead of playing with threshold for now.
The question here, how much wasted space we can afford? since indirect
descriptors would have to be the same size we'd have a bunch of them
wasted in the cache. Ofcourse we can make that configurable, but how
much is ok by default?
--
Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-03 11:50 ` Sasha Levin
@ 2011-12-04 11:06 ` Michael S. Tsirkin
2011-12-04 15:15 ` Michael S. Tsirkin
2011-12-04 11:52 ` Avi Kivity
1 sibling, 1 reply; 41+ messages in thread
From: Michael S. Tsirkin @ 2011-12-04 11:06 UTC (permalink / raw)
To: Sasha Levin; +Cc: markmc, kvm, linux-kernel, virtualization, Avi Kivity
On Sat, Dec 03, 2011 at 01:50:28PM +0200, Sasha Levin wrote:
> On Fri, 2011-12-02 at 11:16 +1030, Rusty Russell wrote:
> > On Thu, 1 Dec 2011 12:26:42 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Thu, Dec 01, 2011 at 10:09:37AM +0200, Sasha Levin wrote:
> > > > On Thu, 2011-12-01 at 09:58 +0200, Michael S. Tsirkin wrote:
> > > > > We'll presumably need some logic to increment is back,
> > > > > to account for random workload changes.
> > > > > Something like slow start?
> > > >
> > > > We can increment it each time the queue was less than 10% full, it
> > > > should act like slow start, no?
> > >
> > > No, we really shouldn't get an empty ring as long as things behave
> > > well. What I meant is something like:
> >
> > I was thinking of the network output case, but you're right. We need to
> > distinguish between usually full (eg. virtio-net input) and usually
> > empty (eg. virtio-net output).
> >
> > The signal for "we to pack more into the ring" is different. We could
> > use some hacky heuristic like "out == 0" but I'd rather make it explicit
> > when we set up the virtqueue.
> >
> > Our other alternative, moving the logic to the driver, is worse.
> >
> > As to fading the effect over time, that's harder. We have to deplete
> > the ring quite a few times before it turns into always-indirect. We
> > could back off every time the ring is totally idle, but that may hurt
> > bursty traffic. Let's try simple first?
>
> I tried to take a different approach, and tried putting the indirect
> descriptors in a kmem_cache as Michael suggested. The benchmarks showed
> that this way virtio-net actually worked faster with indirect on even in
> a single stream.
>
> Maybe we can do that instead of playing with threshold for now.
>
> The question here, how much wasted space we can afford? since indirect
> descriptors would have to be the same size we'd have a bunch of them
> wasted in the cache. Ofcourse we can make that configurable, but how
> much is ok by default?
I think it's a good idea to make that per-device.
For network at least, each skb already has overhead of
around 1/2 K, so using up to 1/2K more seems acceptable.
But even if we went up to MAX_SKB_FRAGS+2, it would be
only 1K per ring entry, so for a ring of 256 entries, we end up with
256K max waste. That's not that terrible.
But I'd say let's do some benchmarking to figure out
the point where the gains are becoming very small.
> --
>
> Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-03 11:50 ` Sasha Levin
2011-12-04 11:06 ` Michael S. Tsirkin
@ 2011-12-04 11:52 ` Avi Kivity
2011-12-04 12:01 ` Michael S. Tsirkin
2011-12-04 12:13 ` Sasha Levin
1 sibling, 2 replies; 41+ messages in thread
From: Avi Kivity @ 2011-12-04 11:52 UTC (permalink / raw)
To: Sasha Levin; +Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization
On 12/03/2011 01:50 PM, Sasha Levin wrote:
> On Fri, 2011-12-02 at 11:16 +1030, Rusty Russell wrote:
> > On Thu, 1 Dec 2011 12:26:42 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Thu, Dec 01, 2011 at 10:09:37AM +0200, Sasha Levin wrote:
> > > > On Thu, 2011-12-01 at 09:58 +0200, Michael S. Tsirkin wrote:
> > > > > We'll presumably need some logic to increment is back,
> > > > > to account for random workload changes.
> > > > > Something like slow start?
> > > >
> > > > We can increment it each time the queue was less than 10% full, it
> > > > should act like slow start, no?
> > >
> > > No, we really shouldn't get an empty ring as long as things behave
> > > well. What I meant is something like:
> >
> > I was thinking of the network output case, but you're right. We need to
> > distinguish between usually full (eg. virtio-net input) and usually
> > empty (eg. virtio-net output).
> >
> > The signal for "we to pack more into the ring" is different. We could
> > use some hacky heuristic like "out == 0" but I'd rather make it explicit
> > when we set up the virtqueue.
> >
> > Our other alternative, moving the logic to the driver, is worse.
> >
> > As to fading the effect over time, that's harder. We have to deplete
> > the ring quite a few times before it turns into always-indirect. We
> > could back off every time the ring is totally idle, but that may hurt
> > bursty traffic. Let's try simple first?
>
> I tried to take a different approach, and tried putting the indirect
> descriptors in a kmem_cache as Michael suggested. The benchmarks showed
> that this way virtio-net actually worked faster with indirect on even in
> a single stream.
How much better?
I think that if indirects benefit networking, then we're doing something
wrong. What's going on? Does the ring get filled too early? If so we
should expand it.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-04 11:52 ` Avi Kivity
@ 2011-12-04 12:01 ` Michael S. Tsirkin
2011-12-04 12:06 ` Avi Kivity
2011-12-04 12:13 ` Sasha Levin
1 sibling, 1 reply; 41+ messages in thread
From: Michael S. Tsirkin @ 2011-12-04 12:01 UTC (permalink / raw)
To: Avi Kivity; +Cc: markmc, kvm, linux-kernel, virtualization, Sasha Levin
On Sun, Dec 04, 2011 at 01:52:16PM +0200, Avi Kivity wrote:
> On 12/03/2011 01:50 PM, Sasha Levin wrote:
> > On Fri, 2011-12-02 at 11:16 +1030, Rusty Russell wrote:
> > > On Thu, 1 Dec 2011 12:26:42 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Thu, Dec 01, 2011 at 10:09:37AM +0200, Sasha Levin wrote:
> > > > > On Thu, 2011-12-01 at 09:58 +0200, Michael S. Tsirkin wrote:
> > > > > > We'll presumably need some logic to increment is back,
> > > > > > to account for random workload changes.
> > > > > > Something like slow start?
> > > > >
> > > > > We can increment it each time the queue was less than 10% full, it
> > > > > should act like slow start, no?
> > > >
> > > > No, we really shouldn't get an empty ring as long as things behave
> > > > well. What I meant is something like:
> > >
> > > I was thinking of the network output case, but you're right. We need to
> > > distinguish between usually full (eg. virtio-net input) and usually
> > > empty (eg. virtio-net output).
> > >
> > > The signal for "we to pack more into the ring" is different. We could
> > > use some hacky heuristic like "out == 0" but I'd rather make it explicit
> > > when we set up the virtqueue.
> > >
> > > Our other alternative, moving the logic to the driver, is worse.
> > >
> > > As to fading the effect over time, that's harder. We have to deplete
> > > the ring quite a few times before it turns into always-indirect. We
> > > could back off every time the ring is totally idle, but that may hurt
> > > bursty traffic. Let's try simple first?
> >
> > I tried to take a different approach, and tried putting the indirect
> > descriptors in a kmem_cache as Michael suggested. The benchmarks showed
> > that this way virtio-net actually worked faster with indirect on even in
> > a single stream.
>
> How much better?
>
> I think that if indirects benefit networking, then we're doing something
> wrong. What's going on? Does the ring get filled too early? If so we
> should expand it.
The ring is physically contigious.
With 256 entries and 64 bytes each, that's already 16K.
> --
> error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-04 12:01 ` Michael S. Tsirkin
@ 2011-12-04 12:06 ` Avi Kivity
2011-12-04 15:11 ` Michael S. Tsirkin
0 siblings, 1 reply; 41+ messages in thread
From: Avi Kivity @ 2011-12-04 12:06 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: markmc, kvm, linux-kernel, virtualization, Sasha Levin
On 12/04/2011 02:01 PM, Michael S. Tsirkin wrote:
> >
> > How much better?
> >
> > I think that if indirects benefit networking, then we're doing something
> > wrong. What's going on? Does the ring get filled too early? If so we
> > should expand it.
>
> The ring is physically contigious.
> With 256 entries and 64 bytes each, that's already 16K.
A descriptor is just 16 bytes. There's also the used ring, but that's a
mistake if you have out of order completion. We should have used copying.
16kB worth of descriptors is 1024 entries. With 4kB buffers, that's 4MB
worth of data, or 4 ms at 10GbE line speed. With 1500 byte buffers it's
just 1.5 ms. In any case I think it's sufficient.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-04 11:52 ` Avi Kivity
2011-12-04 12:01 ` Michael S. Tsirkin
@ 2011-12-04 12:13 ` Sasha Levin
2011-12-04 16:22 ` Michael S. Tsirkin
1 sibling, 1 reply; 41+ messages in thread
From: Sasha Levin @ 2011-12-04 12:13 UTC (permalink / raw)
To: Avi Kivity; +Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization
On Sun, 2011-12-04 at 13:52 +0200, Avi Kivity wrote:
> On 12/03/2011 01:50 PM, Sasha Levin wrote:
> > On Fri, 2011-12-02 at 11:16 +1030, Rusty Russell wrote:
> > > On Thu, 1 Dec 2011 12:26:42 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Thu, Dec 01, 2011 at 10:09:37AM +0200, Sasha Levin wrote:
> > > > > On Thu, 2011-12-01 at 09:58 +0200, Michael S. Tsirkin wrote:
> > > > > > We'll presumably need some logic to increment is back,
> > > > > > to account for random workload changes.
> > > > > > Something like slow start?
> > > > >
> > > > > We can increment it each time the queue was less than 10% full, it
> > > > > should act like slow start, no?
> > > >
> > > > No, we really shouldn't get an empty ring as long as things behave
> > > > well. What I meant is something like:
> > >
> > > I was thinking of the network output case, but you're right. We need to
> > > distinguish between usually full (eg. virtio-net input) and usually
> > > empty (eg. virtio-net output).
> > >
> > > The signal for "we to pack more into the ring" is different. We could
> > > use some hacky heuristic like "out == 0" but I'd rather make it explicit
> > > when we set up the virtqueue.
> > >
> > > Our other alternative, moving the logic to the driver, is worse.
> > >
> > > As to fading the effect over time, that's harder. We have to deplete
> > > the ring quite a few times before it turns into always-indirect. We
> > > could back off every time the ring is totally idle, but that may hurt
> > > bursty traffic. Let's try simple first?
> >
> > I tried to take a different approach, and tried putting the indirect
> > descriptors in a kmem_cache as Michael suggested. The benchmarks showed
> > that this way virtio-net actually worked faster with indirect on even in
> > a single stream.
>
> How much better?
host->guest was same with both indirect on and off, and guest->host went
up by 5% with indirect on.
This was just a simple 1 TCP stream test.
>
> I think that if indirects benefit networking, then we're doing something
> wrong. What's going on? Does the ring get filled too early? If so we
> should expand it.
>
--
Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-04 12:06 ` Avi Kivity
@ 2011-12-04 15:11 ` Michael S. Tsirkin
2011-12-04 15:16 ` Avi Kivity
0 siblings, 1 reply; 41+ messages in thread
From: Michael S. Tsirkin @ 2011-12-04 15:11 UTC (permalink / raw)
To: Avi Kivity; +Cc: markmc, kvm, linux-kernel, virtualization, Sasha Levin
On Sun, Dec 04, 2011 at 02:06:34PM +0200, Avi Kivity wrote:
> On 12/04/2011 02:01 PM, Michael S. Tsirkin wrote:
> > >
> > > How much better?
> > >
> > > I think that if indirects benefit networking, then we're doing something
> > > wrong. What's going on? Does the ring get filled too early? If so we
> > > should expand it.
> >
> > The ring is physically contigious.
> > With 256 entries and 64 bytes each, that's already 16K.
>
> A descriptor is just 16 bytes.
Right. Not sure where did I get 64.
> There's also the used ring, but that's a
> mistake if you have out of order completion. We should have used copying.
Seems unrelated... unless you want used to be written into
descriptor ring itself?
But, I don't really know why does virtio ring insist on
making the 3 buffers (avail/used/descriptor)
physically contigious. Rusty?
> 16kB worth of descriptors is 1024 entries. With 4kB buffers, that's 4MB
> worth of data, or 4 ms at 10GbE line speed. With 1500 byte buffers it's
> just 1.5 ms. In any case I think it's sufficient.
Right. So I think that without indirect, we waste about 3 entries
per packet for virtio header and transport etc headers.
> --
> error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-04 11:06 ` Michael S. Tsirkin
@ 2011-12-04 15:15 ` Michael S. Tsirkin
0 siblings, 0 replies; 41+ messages in thread
From: Michael S. Tsirkin @ 2011-12-04 15:15 UTC (permalink / raw)
To: Sasha Levin; +Cc: markmc, kvm, linux-kernel, virtualization, Avi Kivity
On Sun, Dec 04, 2011 at 01:06:37PM +0200, Michael S. Tsirkin wrote:
> On Sat, Dec 03, 2011 at 01:50:28PM +0200, Sasha Levin wrote:
> > On Fri, 2011-12-02 at 11:16 +1030, Rusty Russell wrote:
> > > On Thu, 1 Dec 2011 12:26:42 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Thu, Dec 01, 2011 at 10:09:37AM +0200, Sasha Levin wrote:
> > > > > On Thu, 2011-12-01 at 09:58 +0200, Michael S. Tsirkin wrote:
> > > > > > We'll presumably need some logic to increment is back,
> > > > > > to account for random workload changes.
> > > > > > Something like slow start?
> > > > >
> > > > > We can increment it each time the queue was less than 10% full, it
> > > > > should act like slow start, no?
> > > >
> > > > No, we really shouldn't get an empty ring as long as things behave
> > > > well. What I meant is something like:
> > >
> > > I was thinking of the network output case, but you're right. We need to
> > > distinguish between usually full (eg. virtio-net input) and usually
> > > empty (eg. virtio-net output).
> > >
> > > The signal for "we to pack more into the ring" is different. We could
> > > use some hacky heuristic like "out == 0" but I'd rather make it explicit
> > > when we set up the virtqueue.
> > >
> > > Our other alternative, moving the logic to the driver, is worse.
> > >
> > > As to fading the effect over time, that's harder. We have to deplete
> > > the ring quite a few times before it turns into always-indirect. We
> > > could back off every time the ring is totally idle, but that may hurt
> > > bursty traffic. Let's try simple first?
> >
> > I tried to take a different approach, and tried putting the indirect
> > descriptors in a kmem_cache as Michael suggested. The benchmarks showed
> > that this way virtio-net actually worked faster with indirect on even in
> > a single stream.
> >
> > Maybe we can do that instead of playing with threshold for now.
> >
> > The question here, how much wasted space we can afford? since indirect
> > descriptors would have to be the same size we'd have a bunch of them
> > wasted in the cache. Ofcourse we can make that configurable, but how
> > much is ok by default?
>
> I think it's a good idea to make that per-device.
> For network at least, each skb already has overhead of
> around 1/2 K, so using up to 1/2K more seems acceptable.
> But even if we went up to MAX_SKB_FRAGS+2, it would be
> only 1K per ring entry,
I got this wrong - descriptor is 16 bytes, so MAX_SKB_FRAGS+2
descriptors would be less than 300 bytes overhead per packet.
That's not a lot.
> so for a ring of 256 entries, we end up with
> 256K max waste. That's not that terrible.
>
> But I'd say let's do some benchmarking to figure out
> the point where the gains are becoming very small.
> > --
> >
> > Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-04 15:11 ` Michael S. Tsirkin
@ 2011-12-04 15:16 ` Avi Kivity
2011-12-04 16:00 ` Michael S. Tsirkin
2011-12-05 0:10 ` Rusty Russell
0 siblings, 2 replies; 41+ messages in thread
From: Avi Kivity @ 2011-12-04 15:16 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: markmc, kvm, linux-kernel, virtualization, Sasha Levin
On 12/04/2011 05:11 PM, Michael S. Tsirkin wrote:
> > There's also the used ring, but that's a
> > mistake if you have out of order completion. We should have used copying.
>
> Seems unrelated... unless you want used to be written into
> descriptor ring itself?
The avail/used rings are in addition to the regular ring, no? If you
copy descriptors, then it goes away.
> But, I don't really know why does virtio ring insist on
> making the 3 buffers (avail/used/descriptor)
> physically contigious. Rusty?
Let's drop them instead.
>
> > 16kB worth of descriptors is 1024 entries. With 4kB buffers, that's 4MB
> > worth of data, or 4 ms at 10GbE line speed. With 1500 byte buffers it's
> > just 1.5 ms. In any case I think it's sufficient.
>
> Right. So I think that without indirect, we waste about 3 entries
> per packet for virtio header and transport etc headers.
That does suck. Are there issues in increasing the ring size? Or
making it discontiguous?
Can you take a peek at how Xen manages its rings? They have the same
problems we do.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-04 15:16 ` Avi Kivity
@ 2011-12-04 16:00 ` Michael S. Tsirkin
2011-12-04 16:33 ` Avi Kivity
2011-12-05 0:10 ` Rusty Russell
1 sibling, 1 reply; 41+ messages in thread
From: Michael S. Tsirkin @ 2011-12-04 16:00 UTC (permalink / raw)
To: Avi Kivity; +Cc: markmc, kvm, linux-kernel, virtualization, Sasha Levin
On Sun, Dec 04, 2011 at 05:16:59PM +0200, Avi Kivity wrote:
> On 12/04/2011 05:11 PM, Michael S. Tsirkin wrote:
> > > There's also the used ring, but that's a
> > > mistake if you have out of order completion. We should have used copying.
> >
> > Seems unrelated... unless you want used to be written into
> > descriptor ring itself?
>
> The avail/used rings are in addition to the regular ring, no?
Yes. A couple of extra pages, if we reduce alignment we could pack this
in a single extra page.
> If you
> copy descriptors, then it goes away.
The avail ring could go away. used could if we make descriptors
writeable. IIUC it was made RO in the hope that will make it
easier for xen to adopt. Still relevant?
> > But, I don't really know why does virtio ring insist on
> > making the 3 buffers (avail/used/descriptor)
> > physically contigious. Rusty?
>
> Let's drop them instead.
>
> >
> > > 16kB worth of descriptors is 1024 entries. With 4kB buffers, that's 4MB
> > > worth of data, or 4 ms at 10GbE line speed. With 1500 byte buffers it's
> > > just 1.5 ms. In any case I think it's sufficient.
> >
> > Right. So I think that without indirect, we waste about 3 entries
> > per packet for virtio header and transport etc headers.
>
> That does suck. Are there issues in increasing the ring size? Or
> making it discontiguous?
discontiguous ring is what indirect is, basically.
> Can you take a peek at how Xen manages its rings? They have the same
> problems we do.
>
> --
> error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-04 12:13 ` Sasha Levin
@ 2011-12-04 16:22 ` Michael S. Tsirkin
2011-12-04 17:34 ` Sasha Levin
0 siblings, 1 reply; 41+ messages in thread
From: Michael S. Tsirkin @ 2011-12-04 16:22 UTC (permalink / raw)
To: Sasha Levin; +Cc: markmc, kvm, linux-kernel, virtualization, Avi Kivity
On Sun, Dec 04, 2011 at 02:13:51PM +0200, Sasha Levin wrote:
> On Sun, 2011-12-04 at 13:52 +0200, Avi Kivity wrote:
> > On 12/03/2011 01:50 PM, Sasha Levin wrote:
> > > On Fri, 2011-12-02 at 11:16 +1030, Rusty Russell wrote:
> > > > On Thu, 1 Dec 2011 12:26:42 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Thu, Dec 01, 2011 at 10:09:37AM +0200, Sasha Levin wrote:
> > > > > > On Thu, 2011-12-01 at 09:58 +0200, Michael S. Tsirkin wrote:
> > > > > > > We'll presumably need some logic to increment is back,
> > > > > > > to account for random workload changes.
> > > > > > > Something like slow start?
> > > > > >
> > > > > > We can increment it each time the queue was less than 10% full, it
> > > > > > should act like slow start, no?
> > > > >
> > > > > No, we really shouldn't get an empty ring as long as things behave
> > > > > well. What I meant is something like:
> > > >
> > > > I was thinking of the network output case, but you're right. We need to
> > > > distinguish between usually full (eg. virtio-net input) and usually
> > > > empty (eg. virtio-net output).
> > > >
> > > > The signal for "we to pack more into the ring" is different. We could
> > > > use some hacky heuristic like "out == 0" but I'd rather make it explicit
> > > > when we set up the virtqueue.
> > > >
> > > > Our other alternative, moving the logic to the driver, is worse.
> > > >
> > > > As to fading the effect over time, that's harder. We have to deplete
> > > > the ring quite a few times before it turns into always-indirect. We
> > > > could back off every time the ring is totally idle, but that may hurt
> > > > bursty traffic. Let's try simple first?
> > >
> > > I tried to take a different approach, and tried putting the indirect
> > > descriptors in a kmem_cache as Michael suggested. The benchmarks showed
> > > that this way virtio-net actually worked faster with indirect on even in
> > > a single stream.
> >
> > How much better?
>
> host->guest was same with both indirect on and off, and guest->host went
> up by 5% with indirect on.
>
> This was just a simple 1 TCP stream test.
I'm confused. didn't you see a bigger benefit for guest->host by
switching indirect off?
> >
> > I think that if indirects benefit networking, then we're doing something
> > wrong. What's going on? Does the ring get filled too early? If so we
> > should expand it.
> >
>
> --
>
> Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-04 16:00 ` Michael S. Tsirkin
@ 2011-12-04 16:33 ` Avi Kivity
0 siblings, 0 replies; 41+ messages in thread
From: Avi Kivity @ 2011-12-04 16:33 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: markmc, kvm, linux-kernel, virtualization, Sasha Levin
On 12/04/2011 06:00 PM, Michael S. Tsirkin wrote:
> > If you
> > copy descriptors, then it goes away.
>
> The avail ring could go away. used could if we make descriptors
> writeable. IIUC it was made RO in the hope that will make it
> easier for xen to adopt. Still relevant?
You mean RO from the consumer side? Why can't Xen do that?
> > That does suck. Are there issues in increasing the ring size? Or
> > making it discontiguous?
>
> discontiguous ring is what indirect is, basically.
No, discontiguous is more cache and prefetch friendly. With vmap(), the
code doesn't even change.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-04 16:22 ` Michael S. Tsirkin
@ 2011-12-04 17:34 ` Sasha Levin
2011-12-04 17:37 ` Avi Kivity
0 siblings, 1 reply; 41+ messages in thread
From: Sasha Levin @ 2011-12-04 17:34 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: markmc, kvm, linux-kernel, virtualization, Avi Kivity
On Sun, 2011-12-04 at 18:22 +0200, Michael S. Tsirkin wrote:
> On Sun, Dec 04, 2011 at 02:13:51PM +0200, Sasha Levin wrote:
> > On Sun, 2011-12-04 at 13:52 +0200, Avi Kivity wrote:
> > > On 12/03/2011 01:50 PM, Sasha Levin wrote:
> > > > On Fri, 2011-12-02 at 11:16 +1030, Rusty Russell wrote:
> > > > > On Thu, 1 Dec 2011 12:26:42 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > On Thu, Dec 01, 2011 at 10:09:37AM +0200, Sasha Levin wrote:
> > > > > > > On Thu, 2011-12-01 at 09:58 +0200, Michael S. Tsirkin wrote:
> > > > > > > > We'll presumably need some logic to increment is back,
> > > > > > > > to account for random workload changes.
> > > > > > > > Something like slow start?
> > > > > > >
> > > > > > > We can increment it each time the queue was less than 10% full, it
> > > > > > > should act like slow start, no?
> > > > > >
> > > > > > No, we really shouldn't get an empty ring as long as things behave
> > > > > > well. What I meant is something like:
> > > > >
> > > > > I was thinking of the network output case, but you're right. We need to
> > > > > distinguish between usually full (eg. virtio-net input) and usually
> > > > > empty (eg. virtio-net output).
> > > > >
> > > > > The signal for "we to pack more into the ring" is different. We could
> > > > > use some hacky heuristic like "out == 0" but I'd rather make it explicit
> > > > > when we set up the virtqueue.
> > > > >
> > > > > Our other alternative, moving the logic to the driver, is worse.
> > > > >
> > > > > As to fading the effect over time, that's harder. We have to deplete
> > > > > the ring quite a few times before it turns into always-indirect. We
> > > > > could back off every time the ring is totally idle, but that may hurt
> > > > > bursty traffic. Let's try simple first?
> > > >
> > > > I tried to take a different approach, and tried putting the indirect
> > > > descriptors in a kmem_cache as Michael suggested. The benchmarks showed
> > > > that this way virtio-net actually worked faster with indirect on even in
> > > > a single stream.
> > >
> > > How much better?
> >
> > host->guest was same with both indirect on and off, and guest->host went
> > up by 5% with indirect on.
> >
> > This was just a simple 1 TCP stream test.
>
> I'm confused. didn't you see a bigger benefit for guest->host by
> switching indirect off?
The 5% improvement is over the 'regular' indirect on, not over indirect
off. Sorry for the confusion there.
I suggested this change regardless of the outcome of indirect descriptor
threshold discussion, since it would help anyways.
--
Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-04 17:34 ` Sasha Levin
@ 2011-12-04 17:37 ` Avi Kivity
2011-12-04 17:39 ` Sasha Levin
0 siblings, 1 reply; 41+ messages in thread
From: Avi Kivity @ 2011-12-04 17:37 UTC (permalink / raw)
To: Sasha Levin; +Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization
On 12/04/2011 07:34 PM, Sasha Levin wrote:
> >
> > I'm confused. didn't you see a bigger benefit for guest->host by
> > switching indirect off?
>
> The 5% improvement is over the 'regular' indirect on, not over indirect
> off. Sorry for the confusion there.
>
> I suggested this change regardless of the outcome of indirect descriptor
> threshold discussion, since it would help anyways.
For net, this makes sense. For block, it reduces the effective queue
depth, so it's not a trivial change. It probably makes sense there too,
though.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-04 17:37 ` Avi Kivity
@ 2011-12-04 17:39 ` Sasha Levin
2011-12-04 18:23 ` Sasha Levin
0 siblings, 1 reply; 41+ messages in thread
From: Sasha Levin @ 2011-12-04 17:39 UTC (permalink / raw)
To: Avi Kivity; +Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization
On Sun, 2011-12-04 at 19:37 +0200, Avi Kivity wrote:
> On 12/04/2011 07:34 PM, Sasha Levin wrote:
> > >
> > > I'm confused. didn't you see a bigger benefit for guest->host by
> > > switching indirect off?
> >
> > The 5% improvement is over the 'regular' indirect on, not over indirect
> > off. Sorry for the confusion there.
> >
> > I suggested this change regardless of the outcome of indirect descriptor
> > threshold discussion, since it would help anyways.
>
> For net, this makes sense. For block, it reduces the effective queue
> depth, so it's not a trivial change. It probably makes sense there too,
> though.
It doesn't have to be limited at that number, anything above that can go
through the regular kmalloc() path.
--
Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-04 17:39 ` Sasha Levin
@ 2011-12-04 18:23 ` Sasha Levin
2011-12-07 14:02 ` Sasha Levin
0 siblings, 1 reply; 41+ messages in thread
From: Sasha Levin @ 2011-12-04 18:23 UTC (permalink / raw)
To: Avi Kivity; +Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization
On Sun, 2011-12-04 at 19:39 +0200, Sasha Levin wrote:
> On Sun, 2011-12-04 at 19:37 +0200, Avi Kivity wrote:
> > On 12/04/2011 07:34 PM, Sasha Levin wrote:
> > > >
> > > > I'm confused. didn't you see a bigger benefit for guest->host by
> > > > switching indirect off?
> > >
> > > The 5% improvement is over the 'regular' indirect on, not over indirect
> > > off. Sorry for the confusion there.
> > >
> > > I suggested this change regardless of the outcome of indirect descriptor
> > > threshold discussion, since it would help anyways.
> >
> > For net, this makes sense. For block, it reduces the effective queue
> > depth, so it's not a trivial change. It probably makes sense there too,
> > though.
>
> It doesn't have to be limited at that number, anything above that can go
> through the regular kmalloc() path.
Something like the following patch:
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c7a2c20..3166ca0 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -82,6 +82,7 @@ struct vring_virtqueue
/* Host supports indirect buffers */
bool indirect;
+ struct kmem_cache *indirect_cache;
/* Host publishes avail event idx */
bool event;
@@ -110,6 +111,9 @@ struct vring_virtqueue
#define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
+static unsigned int ind_alloc_thresh = 0;
+module_param(ind_alloc_thresh, uint, S_IRUGO);
+
/* Set up an indirect table of descriptors and add it to the queue. */
static int vring_add_indirect(struct vring_virtqueue *vq,
struct scatterlist sg[],
@@ -121,7 +125,10 @@ static int vring_add_indirect(struct vring_virtqueue *vq,
unsigned head;
int i;
- desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
+ if ((out + in) <= ind_alloc_thresh)
+ desc = kmem_cache_alloc(vq->indirect_cache, gfp);
+ else
+ desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
if (!desc)
return -ENOMEM;
@@ -479,6 +486,9 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
vq->broken = false;
vq->last_used_idx = 0;
vq->num_added = 0;
+ if (ind_alloc_thresh)
+ vq->indirect_cache = KMEM_CACHE(vring_desc[ind_alloc_thresh], 0);
list_add_tail(&vq->vq.list, &vdev->vqs);
#ifdef DEBUG
vq->in_use = false;
--
Sasha.
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-04 15:16 ` Avi Kivity
2011-12-04 16:00 ` Michael S. Tsirkin
@ 2011-12-05 0:10 ` Rusty Russell
2011-12-05 9:52 ` Avi Kivity
1 sibling, 1 reply; 41+ messages in thread
From: Rusty Russell @ 2011-12-05 0:10 UTC (permalink / raw)
To: Avi Kivity, Michael S. Tsirkin
Cc: markmc, virtualization, Sasha Levin, kvm, linux-kernel
On Sun, 04 Dec 2011 17:16:59 +0200, Avi Kivity <avi@redhat.com> wrote:
> On 12/04/2011 05:11 PM, Michael S. Tsirkin wrote:
> > > There's also the used ring, but that's a
> > > mistake if you have out of order completion. We should have used copying.
> >
> > Seems unrelated... unless you want used to be written into
> > descriptor ring itself?
>
> The avail/used rings are in addition to the regular ring, no? If you
> copy descriptors, then it goes away.
There were two ideas which drove the current design:
1) The Van-Jacobson style "no two writers to same cacheline makes rings
fast" idea. Empirically, this doesn't show any winnage.
2) Allowing a generic inter-guest copy mechanism, so we could have
genuinely untrusted driver domains. Yet noone ever did this so it's
hardly a killer feature :(
So if we're going to revisit and drop those requirements, I'd say:
1) Shared device/driver rings like Xen. Xen uses device-specific ring
contents, I'd be tempted to stick to our pre-headers, and a 'u64
addr; u64 len_and_flags; u64 cookie;' generic style. Then use
the same ring for responses. That's a slight space-win, since
we're 24 bytes vs 26 bytes now.
2) Stick with physically-contiguous rings, but use them of size (2^n)-1.
Makes the indexing harder, but that -1 lets us stash the indices in
the first entry and makes the ring a nice 2^n size.
> > > 16kB worth of descriptors is 1024 entries. With 4kB buffers, that's 4MB
> > > worth of data, or 4 ms at 10GbE line speed. With 1500 byte buffers it's
> > > just 1.5 ms. In any case I think it's sufficient.
> >
> > Right. So I think that without indirect, we waste about 3 entries
> > per packet for virtio header and transport etc headers.
>
> That does suck. Are there issues in increasing the ring size? Or
> making it discontiguous?
Because the qemu implementation is broken. We can often put the virtio
header at the head of the packet. In practice, the qemu implementation
insists the header be a single descriptor.
(At least, it used to, perhaps it has now been fixed. We need a
VIRTIO_NET_F_I_NOW_CONFORM_TO_THE_DAMN_SPEC_SORRY_I_SUCK bit).
We currently use small rings: the guest can't negotiate so qemu has to
offer a lowest-common-denominator value. The new virtio-pci layout
fixes this, and lets the guest set the ring size.
> Can you take a peek at how Xen manages its rings? They have the same
> problems we do.
Yes, I made some mistakes, but I did steal from them in the first
place...
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-05 0:10 ` Rusty Russell
@ 2011-12-05 9:52 ` Avi Kivity
2011-12-06 5:07 ` Rusty Russell
0 siblings, 1 reply; 41+ messages in thread
From: Avi Kivity @ 2011-12-05 9:52 UTC (permalink / raw)
To: Rusty Russell
Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Sasha Levin
On 12/05/2011 02:10 AM, Rusty Russell wrote:
> On Sun, 04 Dec 2011 17:16:59 +0200, Avi Kivity <avi@redhat.com> wrote:
> > On 12/04/2011 05:11 PM, Michael S. Tsirkin wrote:
> > > > There's also the used ring, but that's a
> > > > mistake if you have out of order completion. We should have used copying.
> > >
> > > Seems unrelated... unless you want used to be written into
> > > descriptor ring itself?
> >
> > The avail/used rings are in addition to the regular ring, no? If you
> > copy descriptors, then it goes away.
>
> There were two ideas which drove the current design:
>
> 1) The Van-Jacobson style "no two writers to same cacheline makes rings
> fast" idea. Empirically, this doesn't show any winnage.
Write/write is the same as write/read or read/write. Both cases have to
send a probe and wait for the result. What we really need is to
minimize cache line ping ponging, and the descriptor pool fails that
with ooo completion. I doubt it's measurable though except with the
very fastest storage providers.
> 2) Allowing a generic inter-guest copy mechanism, so we could have
> genuinely untrusted driver domains. Yet noone ever did this so it's
> hardly a killer feature :(
It's still a goal, though not an important one. But we have to
translate rings anyway, don't, since buffers are in guest physical
addresses, and we're moving into an address space that doesn't map those.
I thought of having a vhost-copy driver that could do ring translation,
using a dma engine for the copy.
> So if we're going to revisit and drop those requirements, I'd say:
>
> 1) Shared device/driver rings like Xen. Xen uses device-specific ring
> contents, I'd be tempted to stick to our pre-headers, and a 'u64
> addr; u64 len_and_flags; u64 cookie;' generic style. Then use
> the same ring for responses. That's a slight space-win, since
> we're 24 bytes vs 26 bytes now.
Let's cheat and have inline contents. Take three bits from
len_and_flags to specify additional descriptors as inline data. Also,
stuff the cookie into len_and_flags as well.
> 2) Stick with physically-contiguous rings, but use them of size (2^n)-1.
> Makes the indexing harder, but that -1 lets us stash the indices in
> the first entry and makes the ring a nice 2^n size.
Allocate at lease a cache line for those. The 2^n size is not really
material, a division is never necessary.
> > > > 16kB worth of descriptors is 1024 entries. With 4kB buffers, that's 4MB
> > > > worth of data, or 4 ms at 10GbE line speed. With 1500 byte buffers it's
> > > > just 1.5 ms. In any case I think it's sufficient.
> > >
> > > Right. So I think that without indirect, we waste about 3 entries
> > > per packet for virtio header and transport etc headers.
> >
> > That does suck. Are there issues in increasing the ring size? Or
> > making it discontiguous?
>
> Because the qemu implementation is broken.
I was talking about something else, but this is more important. Every
time we make a simplifying assumption, it turns around and bites us, and
the code becomes twice as complicated as it would have been in the first
place, and the test matrix explodes.
> We can often put the virtio
> header at the head of the packet. In practice, the qemu implementation
> insists the header be a single descriptor.
>
> (At least, it used to, perhaps it has now been fixed. We need a
> VIRTIO_NET_F_I_NOW_CONFORM_TO_THE_DAMN_SPEC_SORRY_I_SUCK bit).
We'll run out of bits in no time.
> We currently use small rings: the guest can't negotiate so qemu has to
> offer a lowest-common-denominator value. The new virtio-pci layout
> fixes this, and lets the guest set the ring size.
Ok good. Note the figuring out the best ring size needs some info from
the host, but that can be had from other channels.
> > Can you take a peek at how Xen manages its rings? They have the same
> > problems we do.
>
> Yes, I made some mistakes, but I did steal from them in the first
> place...
There was a bit of second system syndrome there. And I don't understand
how the ring/pool issue didn't surface during review, it seems so
obvious now but completely eluded me then.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-05 9:52 ` Avi Kivity
@ 2011-12-06 5:07 ` Rusty Russell
2011-12-06 9:58 ` Avi Kivity
0 siblings, 1 reply; 41+ messages in thread
From: Rusty Russell @ 2011-12-06 5:07 UTC (permalink / raw)
To: Avi Kivity
Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Sasha Levin
On Mon, 05 Dec 2011 11:52:54 +0200, Avi Kivity <avi@redhat.com> wrote:
> On 12/05/2011 02:10 AM, Rusty Russell wrote:
> > On Sun, 04 Dec 2011 17:16:59 +0200, Avi Kivity <avi@redhat.com> wrote:
> > > On 12/04/2011 05:11 PM, Michael S. Tsirkin wrote:
> > > > > There's also the used ring, but that's a
> > > > > mistake if you have out of order completion. We should have used copying.
> > > >
> > > > Seems unrelated... unless you want used to be written into
> > > > descriptor ring itself?
> > >
> > > The avail/used rings are in addition to the regular ring, no? If you
> > > copy descriptors, then it goes away.
> >
> > There were two ideas which drove the current design:
> >
> > 1) The Van-Jacobson style "no two writers to same cacheline makes rings
> > fast" idea. Empirically, this doesn't show any winnage.
>
> Write/write is the same as write/read or read/write. Both cases have to
> send a probe and wait for the result. What we really need is to
> minimize cache line ping ponging, and the descriptor pool fails that
> with ooo completion. I doubt it's measurable though except with the
> very fastest storage providers.
The claim was that going exclusive->shared->exclusive was cheaper than
exclusive->invalid->exclusive. When VJ said it, it seemed convincing :)
> > 2) Allowing a generic inter-guest copy mechanism, so we could have
> > genuinely untrusted driver domains. Yet noone ever did this so it's
> > hardly a killer feature :(
>
> It's still a goal, though not an important one. But we have to
> translate rings anyway, don't, since buffers are in guest physical
> addresses, and we're moving into an address space that doesn't map those.
Yes, but the hypervisor/trusted party would simply have to do the copy;
the rings themselves would be shared A would say "copy this to/from B's
ring entry N" and you know that A can't have changed B's entry.
> I thought of having a vhost-copy driver that could do ring translation,
> using a dma engine for the copy.
As long as we get the length of data written from the vhost-copy driver
(ie. not just the network header). Otherwise a malicious other guest
can send short packets, and a local process can read uninitialized
memory. And pre-zeroing the buffers for this corner case sucks.
> > So if we're going to revisit and drop those requirements, I'd say:
> >
> > 1) Shared device/driver rings like Xen. Xen uses device-specific ring
> > contents, I'd be tempted to stick to our pre-headers, and a 'u64
> > addr; u64 len_and_flags; u64 cookie;' generic style. Then use
> > the same ring for responses. That's a slight space-win, since
> > we're 24 bytes vs 26 bytes now.
>
> Let's cheat and have inline contents. Take three bits from
> len_and_flags to specify additional descriptors as inline data.
Nice, I like this optimization.
> Also, stuff the cookie into len_and_flags as well.
Every driver really wants to put a pointer in there. We have an array
to map desc. numbers to cookies inside the virtio core.
We really want 64 bits.
> > 2) Stick with physically-contiguous rings, but use them of size (2^n)-1.
> > Makes the indexing harder, but that -1 lets us stash the indices in
> > the first entry and makes the ring a nice 2^n size.
>
> Allocate at lease a cache line for those. The 2^n size is not really
> material, a division is never necessary.
We free-run our indices, so we *do* a division (by truncation). If we
limit indices to ringsize, then we have to handle empty/full confusion.
It's nice for simple OSes if things pack nicely into pages, but it's not
a killer feature IMHO.
> > > > > 16kB worth of descriptors is 1024 entries. With 4kB buffers, that's 4MB
> > > > > worth of data, or 4 ms at 10GbE line speed. With 1500 byte buffers it's
> > > > > just 1.5 ms. In any case I think it's sufficient.
> > > >
> > > > Right. So I think that without indirect, we waste about 3 entries
> > > > per packet for virtio header and transport etc headers.
> > >
> > > That does suck. Are there issues in increasing the ring size? Or
> > > making it discontiguous?
> >
> > Because the qemu implementation is broken.
>
> I was talking about something else, but this is more important. Every
> time we make a simplifying assumption, it turns around and bites us, and
> the code becomes twice as complicated as it would have been in the first
> place, and the test matrix explodes.
True, though we seem to be improving. But this is why I don't want
optional features in the spec; I want us always to exercise all of it.
> > We can often put the virtio
> > header at the head of the packet. In practice, the qemu implementation
> > insists the header be a single descriptor.
> >
> > (At least, it used to, perhaps it has now been fixed. We need a
> > VIRTIO_NET_F_I_NOW_CONFORM_TO_THE_DAMN_SPEC_SORRY_I_SUCK bit).
>
> We'll run out of bits in no time.
We had one already: VIRTIO_F_BAD_FEATURE. We haven't used it in a long
time (if ever), and I just removed it from the latest version of the
spec.
But we can cheat: we can add this as a requirement to The New Ring
Layout. And document the old behaviour as a footnote.
> > We currently use small rings: the guest can't negotiate so qemu has to
> > offer a lowest-common-denominator value. The new virtio-pci layout
> > fixes this, and lets the guest set the ring size.
>
> Ok good. Note the figuring out the best ring size needs some info from
> the host, but that can be had from other channels.
We have a ringsize field; should the host initialize it as a hint? Or
as a maximum allowable?
I'm just not sure how the host would even know to hint.
> > > Can you take a peek at how Xen manages its rings? They have the same
> > > problems we do.
> >
> > Yes, I made some mistakes, but I did steal from them in the first
> > place...
>
> There was a bit of second system syndrome there.
But also some third syndrome: Xen had gone through their great device
rewrite, and things like feature bits were a direct response against
that complexity. We did well there.
> And I don't understand how the ring/pool issue didn't surface during
> review, it seems so obvious now but completely eluded me then.
The idea was we'd just enlarge the rings. Only later did that prove
naive, and many things came in to fix that. The cacheline issue is
speculative at this point.
But we are learning, and making fewer mistakes; we could have learn some
things earlier (I was remiss in not seeking the Xen devs' advice on the
ring itself, for example).
I honestly don't think it gets much better than this in Real Life.
I'm working on the virtio-pci capability layout patch now, then I'll
look at a ring rewrite.
Thanks,
Rusty.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-06 5:07 ` Rusty Russell
@ 2011-12-06 9:58 ` Avi Kivity
2011-12-06 12:03 ` Rusty Russell
[not found] ` <87pqg1kiuu.fsf@rustcorp.com.au>
0 siblings, 2 replies; 41+ messages in thread
From: Avi Kivity @ 2011-12-06 9:58 UTC (permalink / raw)
To: Rusty Russell
Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Sasha Levin
On 12/06/2011 07:07 AM, Rusty Russell wrote:
> On Mon, 05 Dec 2011 11:52:54 +0200, Avi Kivity <avi@redhat.com> wrote:
> > On 12/05/2011 02:10 AM, Rusty Russell wrote:
> > > On Sun, 04 Dec 2011 17:16:59 +0200, Avi Kivity <avi@redhat.com> wrote:
> > > > On 12/04/2011 05:11 PM, Michael S. Tsirkin wrote:
> > > > > > There's also the used ring, but that's a
> > > > > > mistake if you have out of order completion. We should have used copying.
> > > > >
> > > > > Seems unrelated... unless you want used to be written into
> > > > > descriptor ring itself?
> > > >
> > > > The avail/used rings are in addition to the regular ring, no? If you
> > > > copy descriptors, then it goes away.
> > >
> > > There were two ideas which drove the current design:
> > >
> > > 1) The Van-Jacobson style "no two writers to same cacheline makes rings
> > > fast" idea. Empirically, this doesn't show any winnage.
> >
> > Write/write is the same as write/read or read/write. Both cases have to
> > send a probe and wait for the result. What we really need is to
> > minimize cache line ping ponging, and the descriptor pool fails that
> > with ooo completion. I doubt it's measurable though except with the
> > very fastest storage providers.
>
> The claim was that going exclusive->shared->exclusive was cheaper than
> exclusive->invalid->exclusive. When VJ said it, it seemed convincing :)
IIRC at some point in time certain transitions were forced to go through
main memory; perhaps that was behind this. I think these days
everything goes through the coherency protocol; not sure though.
> > > 2) Allowing a generic inter-guest copy mechanism, so we could have
> > > genuinely untrusted driver domains. Yet noone ever did this so it's
> > > hardly a killer feature :(
> >
> > It's still a goal, though not an important one. But we have to
> > translate rings anyway, don't, since buffers are in guest physical
> > addresses, and we're moving into an address space that doesn't map those.
>
> Yes, but the hypervisor/trusted party would simply have to do the copy;
> the rings themselves would be shared A would say "copy this to/from B's
> ring entry N" and you know that A can't have changed B's entry.
Sorry, I don't follow. How can the rings be shared? If A puts a gpa in
A's address space into the ring, there's no way B can do anything with
it, it's an opaque number. Xen solves this with an extra layer of
indirection (grant table handles) that cost extra hypercalls to map or copy.
> > I thought of having a vhost-copy driver that could do ring translation,
> > using a dma engine for the copy.
>
> As long as we get the length of data written from the vhost-copy driver
> (ie. not just the network header). Otherwise a malicious other guest
> can send short packets, and a local process can read uninitialized
> memory. And pre-zeroing the buffers for this corner case sucks.
Right. It's important to get the metadata visible at the right layer.
> > Also, stuff the cookie into len_and_flags as well.
>
> Every driver really wants to put a pointer in there. We have an array
> to map desc. numbers to cookies inside the virtio core.
>
> We really want 64 bits.
With multiqueue, it may be cheaper to do the extra translation locally
than to ship the cookie across cores (or, more likely, it will make no
difference).
However, moving pointers only works if you trust the other side. That
doesn't work if we manage to share a ring.
> > > 2) Stick with physically-contiguous rings, but use them of size (2^n)-1.
> > > Makes the indexing harder, but that -1 lets us stash the indices in
> > > the first entry and makes the ring a nice 2^n size.
> >
> > Allocate at lease a cache line for those. The 2^n size is not really
> > material, a division is never necessary.
>
> We free-run our indices, so we *do* a division (by truncation). If we
> limit indices to ringsize, then we have to handle empty/full confusion.
>
> It's nice for simple OSes if things pack nicely into pages, but it's not
> a killer feature IMHO.
Agree.
> > > > > > 16kB worth of descriptors is 1024 entries. With 4kB buffers, that's 4MB
> > > > > > worth of data, or 4 ms at 10GbE line speed. With 1500 byte buffers it's
> > > > > > just 1.5 ms. In any case I think it's sufficient.
> > > > >
> > > > > Right. So I think that without indirect, we waste about 3 entries
> > > > > per packet for virtio header and transport etc headers.
> > > >
> > > > That does suck. Are there issues in increasing the ring size? Or
> > > > making it discontiguous?
> > >
> > > Because the qemu implementation is broken.
> >
> > I was talking about something else, but this is more important. Every
> > time we make a simplifying assumption, it turns around and bites us, and
> > the code becomes twice as complicated as it would have been in the first
> > place, and the test matrix explodes.
>
> True, though we seem to be improving. But this is why I don't want
> optional features in the spec; I want us always to exercise all of it.
Yes, and I think the main reason we're improving is that we actually
have a spec these days. Agree about optional features, though of course
we accumulate them by not thinking of everything ahead of time, I don't
see how we can fix that.
> > > We currently use small rings: the guest can't negotiate so qemu has to
> > > offer a lowest-common-denominator value. The new virtio-pci layout
> > > fixes this, and lets the guest set the ring size.
> >
> > Ok good. Note the figuring out the best ring size needs some info from
> > the host, but that can be had from other channels.
>
> We have a ringsize field; should the host initialize it as a hint? Or
> as a maximum allowable?
>
> I'm just not sure how the host would even know to hint.
For JBOD storage, a good rule of thumb is (number of spindles) x 3.
With less, you might leave an idle spindle; with more, you're just
adding latency. This assumes you're using indirects so ring entry ==
request. The picture is muddier with massive battery-backed RAID
controllers or flash.
For networking, you want (ring size) * min(expected packet size, page
size) / (link bandwidth) to be something that doesn't get the
bufferbloat people after your blood.
> > > > Can you take a peek at how Xen manages its rings? They have the same
> > > > problems we do.
> > >
> > > Yes, I made some mistakes, but I did steal from them in the first
> > > place...
> >
> > There was a bit of second system syndrome there.
>
> But also some third syndrome: Xen had gone through their great device
> rewrite, and things like feature bits were a direct response against
> that complexity. We did well there.
Yes, feature bits really proved themselves.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-06 9:58 ` Avi Kivity
@ 2011-12-06 12:03 ` Rusty Russell
[not found] ` <87pqg1kiuu.fsf@rustcorp.com.au>
1 sibling, 0 replies; 41+ messages in thread
From: Rusty Russell @ 2011-12-06 12:03 UTC (permalink / raw)
To: Avi Kivity
Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Sasha Levin
On Tue, 06 Dec 2011 11:58:21 +0200, Avi Kivity <avi@redhat.com> wrote:
> On 12/06/2011 07:07 AM, Rusty Russell wrote:
> > Yes, but the hypervisor/trusted party would simply have to do the copy;
> > the rings themselves would be shared A would say "copy this to/from B's
> > ring entry N" and you know that A can't have changed B's entry.
>
> Sorry, I don't follow. How can the rings be shared? If A puts a gpa in
> A's address space into the ring, there's no way B can do anything with
> it, it's an opaque number. Xen solves this with an extra layer of
> indirection (grant table handles) that cost extra hypercalls to map or
> copy.
It's not symmetric. B can see the desc and avail pages R/O, and the
used page R/W. It needs to ask the something to copy in/out of
descriptors, though, because they're an opaque number, and it doesn't
have access. ie. the existence of the descriptor in the ring *implies*
a grant.
Perhaps this could be generalized further into a "connect these two
rings", but I'm not sure. Descriptors with both read and write parts
are tricky.
> > Every driver really wants to put a pointer in there. We have an array
> > to map desc. numbers to cookies inside the virtio core.
> >
> > We really want 64 bits.
>
> With multiqueue, it may be cheaper to do the extra translation locally
> than to ship the cookie across cores (or, more likely, it will make no
> difference).
Indeed.
> However, moving pointers only works if you trust the other side. That
> doesn't work if we manage to share a ring.
Yes, that part needs to be trusted too.
> > I'm just not sure how the host would even know to hint.
>
> For JBOD storage, a good rule of thumb is (number of spindles) x 3.
> With less, you might leave an idle spindle; with more, you're just
> adding latency. This assumes you're using indirects so ring entry ==
> request. The picture is muddier with massive battery-backed RAID
> controllers or flash.
>
> For networking, you want (ring size) * min(expected packet size, page
> size) / (link bandwidth) to be something that doesn't get the
> bufferbloat people after your blood.
OK, so while neither side knows, the host knows slightly more.
Now I think about it, from a spec POV, saying it's a "hint" is useless,
as it doesn't tell the driver what to do with it. I'll say it's a
maximum, which keeps it simple.
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
[not found] ` <87pqg1kiuu.fsf@rustcorp.com.au>
@ 2011-12-07 13:37 ` Avi Kivity
0 siblings, 0 replies; 41+ messages in thread
From: Avi Kivity @ 2011-12-07 13:37 UTC (permalink / raw)
To: Rusty Russell
Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Sasha Levin
On 12/06/2011 02:03 PM, Rusty Russell wrote:
> On Tue, 06 Dec 2011 11:58:21 +0200, Avi Kivity <avi@redhat.com> wrote:
> > On 12/06/2011 07:07 AM, Rusty Russell wrote:
> > > Yes, but the hypervisor/trusted party would simply have to do the copy;
> > > the rings themselves would be shared A would say "copy this to/from B's
> > > ring entry N" and you know that A can't have changed B's entry.
> >
> > Sorry, I don't follow. How can the rings be shared? If A puts a gpa in
> > A's address space into the ring, there's no way B can do anything with
> > it, it's an opaque number. Xen solves this with an extra layer of
> > indirection (grant table handles) that cost extra hypercalls to map or
> > copy.
>
> It's not symmetric. B can see the desc and avail pages R/O, and the
> used page R/W. It needs to ask the something to copy in/out of
> descriptors, though, because they're an opaque number, and it doesn't
> have access. ie. the existence of the descriptor in the ring *implies*
> a grant.
>
> Perhaps this could be generalized further into a "connect these two
> rings", but I'm not sure. Descriptors with both read and write parts
> are tricky.
Okay, I was using a wrong mental model of how this works. B must be
aware of the translation from A's address space into B. Both qemu and
the kernel can do this on their own, but if B is another guest, then it
cannot do this except by calling H.
vhost-copy cannot work fully transparently, because you need some memory
to copy into. Maybe we can have a pci device with a large BAR that
contains buffers for copying, and also a translation from A addresses
into B addresses. It would work something like this:
A prepares a request with both out and in buffers
vhost-copy allocates memory in B's virtio-copy BAR, copies (using a
DMA engine) the out buffers into it, and rewrites the out descriptors to
contain B addresses
B services the request, and updates the in addresses in the
descriptors to point at B memory
vhost-copy copies (using a DMA engine) the in buffers into A memory
> > > I'm just not sure how the host would even know to hint.
> >
> > For JBOD storage, a good rule of thumb is (number of spindles) x 3.
> > With less, you might leave an idle spindle; with more, you're just
> > adding latency. This assumes you're using indirects so ring entry ==
> > request. The picture is muddier with massive battery-backed RAID
> > controllers or flash.
> >
> > For networking, you want (ring size) * min(expected packet size, page
> > size) / (link bandwidth) to be something that doesn't get the
> > bufferbloat people after your blood.
>
> OK, so while neither side knows, the host knows slightly more.
>
> Now I think about it, from a spec POV, saying it's a "hint" is useless,
> as it doesn't tell the driver what to do with it. I'll say it's a
> maximum, which keeps it simple.
>
Those rules of thumb always have exceptions, I'd say it's the default
that the guest can override.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-04 18:23 ` Sasha Levin
@ 2011-12-07 14:02 ` Sasha Levin
2011-12-07 15:48 ` Michael S. Tsirkin
0 siblings, 1 reply; 41+ messages in thread
From: Sasha Levin @ 2011-12-07 14:02 UTC (permalink / raw)
To: Avi Kivity; +Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization
On Sun, 2011-12-04 at 20:23 +0200, Sasha Levin wrote:
[snip]
Rusty, Michael, does the below looks a reasonable optimization for you?
should I send it as a patch?
> Something like the following patch:
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index c7a2c20..3166ca0 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -82,6 +82,7 @@ struct vring_virtqueue
>
> /* Host supports indirect buffers */
> bool indirect;
> + struct kmem_cache *indirect_cache;
>
> /* Host publishes avail event idx */
> bool event;
> @@ -110,6 +111,9 @@ struct vring_virtqueue
>
> #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
>
> +static unsigned int ind_alloc_thresh = 0;
> +module_param(ind_alloc_thresh, uint, S_IRUGO);
> +
> /* Set up an indirect table of descriptors and add it to the queue. */
> static int vring_add_indirect(struct vring_virtqueue *vq,
> struct scatterlist sg[],
> @@ -121,7 +125,10 @@ static int vring_add_indirect(struct vring_virtqueue *vq,
> unsigned head;
> int i;
>
> - desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
> + if ((out + in) <= ind_alloc_thresh)
> + desc = kmem_cache_alloc(vq->indirect_cache, gfp);
> + else
> + desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
> if (!desc)
> return -ENOMEM;
>
> @@ -479,6 +486,9 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
> vq->broken = false;
> vq->last_used_idx = 0;
> vq->num_added = 0;
> + if (ind_alloc_thresh)
> + vq->indirect_cache = KMEM_CACHE(vring_desc[ind_alloc_thresh], 0);
> list_add_tail(&vq->vq.list, &vdev->vqs);
> #ifdef DEBUG
> vq->in_use = false;
>
--
Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-07 14:02 ` Sasha Levin
@ 2011-12-07 15:48 ` Michael S. Tsirkin
2011-12-08 9:44 ` Rusty Russell
[not found] ` <87r50fgzyj.fsf@rustcorp.com.au>
0 siblings, 2 replies; 41+ messages in thread
From: Michael S. Tsirkin @ 2011-12-07 15:48 UTC (permalink / raw)
To: Sasha Levin; +Cc: markmc, kvm, linux-kernel, virtualization, Avi Kivity
On Wed, Dec 07, 2011 at 04:02:45PM +0200, Sasha Levin wrote:
> On Sun, 2011-12-04 at 20:23 +0200, Sasha Levin wrote:
>
> [snip]
>
> Rusty, Michael, does the below looks a reasonable optimization for you?
OK overall but a bit hard to say for sure as it looks pretty incomplete ...
> should I send it as a patch?
What's the performance gain?
> > Something like the following patch:
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index c7a2c20..3166ca0 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -82,6 +82,7 @@ struct vring_virtqueue
> >
> > /* Host supports indirect buffers */
> > bool indirect;
> > + struct kmem_cache *indirect_cache;
> >
> > /* Host publishes avail event idx */
> > bool event;
> > @@ -110,6 +111,9 @@ struct vring_virtqueue
> >
> > #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
> >
> > +static unsigned int ind_alloc_thresh = 0;
> > +module_param(ind_alloc_thresh, uint, S_IRUGO);
0 will have no effect?
> > +
> > /* Set up an indirect table of descriptors and add it to the queue. */
> > static int vring_add_indirect(struct vring_virtqueue *vq,
> > struct scatterlist sg[],
A global parameter is OK for testing but likely not what we want
in real life. This needs to be different per device.
> > @@ -121,7 +125,10 @@ static int vring_add_indirect(struct vring_virtqueue *vq,
> > unsigned head;
> > int i;
> >
> > - desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
> > + if ((out + in) <= ind_alloc_thresh)
> > + desc = kmem_cache_alloc(vq->indirect_cache, gfp);
> > + else
> > + desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
> > if (!desc)
> > return -ENOMEM;
> >
free unaffected?
> > @@ -479,6 +486,9 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
> > vq->broken = false;
> > vq->last_used_idx = 0;
> > vq->num_added = 0;
> > + if (ind_alloc_thresh)
> > + vq->indirect_cache = KMEM_CACHE(vring_desc[ind_alloc_thresh], 0);
and need to cleanup too?
> > list_add_tail(&vq->vq.list, &vdev->vqs);
> > #ifdef DEBUG
> > vq->in_use = false;
> >
>
> --
>
> Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-07 15:48 ` Michael S. Tsirkin
@ 2011-12-08 9:44 ` Rusty Russell
[not found] ` <87r50fgzyj.fsf@rustcorp.com.au>
1 sibling, 0 replies; 41+ messages in thread
From: Rusty Russell @ 2011-12-08 9:44 UTC (permalink / raw)
To: Michael S. Tsirkin, Sasha Levin
Cc: markmc, virtualization, Avi Kivity, kvm, linux-kernel
On Wed, 7 Dec 2011 17:48:17 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Dec 07, 2011 at 04:02:45PM +0200, Sasha Levin wrote:
> > On Sun, 2011-12-04 at 20:23 +0200, Sasha Levin wrote:
> >
> > [snip]
> >
> > Rusty, Michael, does the below looks a reasonable optimization for you?
>
> OK overall but a bit hard to say for sure as it looks pretty incomplete ...
A static threshold is very hackish; we need to either initialize it to
a proven-good value (since noone will ever change it) or be cleverer.
Thanks,
Rusty.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
[not found] ` <87r50fgzyj.fsf@rustcorp.com.au>
@ 2011-12-08 10:37 ` Sasha Levin
2011-12-09 5:33 ` Rusty Russell
0 siblings, 1 reply; 41+ messages in thread
From: Sasha Levin @ 2011-12-08 10:37 UTC (permalink / raw)
To: Rusty Russell
Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Avi Kivity
On Thu, 2011-12-08 at 20:14 +1030, Rusty Russell wrote:
> On Wed, 7 Dec 2011 17:48:17 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Wed, Dec 07, 2011 at 04:02:45PM +0200, Sasha Levin wrote:
> > > On Sun, 2011-12-04 at 20:23 +0200, Sasha Levin wrote:
> > >
> > > [snip]
> > >
> > > Rusty, Michael, does the below looks a reasonable optimization for you?
> >
> > OK overall but a bit hard to say for sure as it looks pretty incomplete ...
>
> A static threshold is very hackish; we need to either initialize it to
> a proven-good value (since noone will ever change it) or be cleverer.
I'll better wait to see how the threshold issue is resolved, and
possibly do it as a dynamic value which depends on the threshold.
I doubt theres one magic value which would work for all.
--
Sasha.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors
2011-12-08 10:37 ` Sasha Levin
@ 2011-12-09 5:33 ` Rusty Russell
0 siblings, 0 replies; 41+ messages in thread
From: Rusty Russell @ 2011-12-09 5:33 UTC (permalink / raw)
To: Sasha Levin
Cc: markmc, kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Avi Kivity
On Thu, 08 Dec 2011 12:37:48 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> On Thu, 2011-12-08 at 20:14 +1030, Rusty Russell wrote:
> > On Wed, 7 Dec 2011 17:48:17 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Wed, Dec 07, 2011 at 04:02:45PM +0200, Sasha Levin wrote:
> > > > On Sun, 2011-12-04 at 20:23 +0200, Sasha Levin wrote:
> > > >
> > > > [snip]
> > > >
> > > > Rusty, Michael, does the below looks a reasonable optimization for you?
> > >
> > > OK overall but a bit hard to say for sure as it looks pretty incomplete ...
> >
> > A static threshold is very hackish; we need to either initialize it to
> > a proven-good value (since noone will ever change it) or be cleverer.
>
> I'll better wait to see how the threshold issue is resolved, and
> possibly do it as a dynamic value which depends on the threshold.
>
> I doubt theres one magic value which would work for all.
Sure, but if it's generally better than the current value, I'll apply it.
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2011-12-09 5:33 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-29 9:33 [PATCH] virtio-ring: Use threshold for switching to indirect descriptors Sasha Levin
2011-11-29 12:56 ` Michael S. Tsirkin
2011-11-29 13:34 ` Sasha Levin
2011-11-29 13:54 ` Michael S. Tsirkin
[not found] ` <20111129135406.GB30966@redhat.com>
2011-11-29 14:21 ` Sasha Levin
2011-11-29 14:54 ` Michael S. Tsirkin
2011-11-29 14:58 ` Avi Kivity
2011-11-30 16:11 ` Sasha Levin
2011-11-30 16:17 ` Sasha Levin
2011-12-01 2:42 ` Rusty Russell
2011-12-01 7:58 ` Michael S. Tsirkin
2011-12-01 8:09 ` Sasha Levin
2011-12-01 10:26 ` Michael S. Tsirkin
2011-12-02 0:46 ` Rusty Russell
2011-12-03 11:50 ` Sasha Levin
2011-12-04 11:06 ` Michael S. Tsirkin
2011-12-04 15:15 ` Michael S. Tsirkin
2011-12-04 11:52 ` Avi Kivity
2011-12-04 12:01 ` Michael S. Tsirkin
2011-12-04 12:06 ` Avi Kivity
2011-12-04 15:11 ` Michael S. Tsirkin
2011-12-04 15:16 ` Avi Kivity
2011-12-04 16:00 ` Michael S. Tsirkin
2011-12-04 16:33 ` Avi Kivity
2011-12-05 0:10 ` Rusty Russell
2011-12-05 9:52 ` Avi Kivity
2011-12-06 5:07 ` Rusty Russell
2011-12-06 9:58 ` Avi Kivity
2011-12-06 12:03 ` Rusty Russell
[not found] ` <87pqg1kiuu.fsf@rustcorp.com.au>
2011-12-07 13:37 ` Avi Kivity
2011-12-04 12:13 ` Sasha Levin
2011-12-04 16:22 ` Michael S. Tsirkin
2011-12-04 17:34 ` Sasha Levin
2011-12-04 17:37 ` Avi Kivity
2011-12-04 17:39 ` Sasha Levin
2011-12-04 18:23 ` Sasha Levin
2011-12-07 14:02 ` Sasha Levin
2011-12-07 15:48 ` Michael S. Tsirkin
2011-12-08 9:44 ` Rusty Russell
[not found] ` <87r50fgzyj.fsf@rustcorp.com.au>
2011-12-08 10:37 ` Sasha Levin
2011-12-09 5:33 ` Rusty Russell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).