qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
@ 2014-09-30  8:36 Zhangjie (HZ)
  2014-09-30  9:33 ` Michael S. Tsirkin
  0 siblings, 1 reply; 18+ messages in thread
From: Zhangjie (HZ) @ 2014-09-30  8:36 UTC (permalink / raw)
  To: qemu-devel, Michael S. Tsirkin, Jason Wang, akong, liuyongan,
	qinchuanyu

Hi,
There exits packets loss when we do packet forwarding in VM,
especially when we use dpdk to do the forwarding. By enlarging vring
can alleviate the problem. But now vring size is limited to 1024 as follows:
VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
                            void (*handle_output)(VirtIODevice *, VirtQueue *))
{
	...
	if (i == VIRTIO_PCI_QUEUE_MAX || queue_size > VIRTQUEUE_MAX_SIZE)
        abort();
}
ps:#define VIRTQUEUE_MAX_SIZE 1024
I delete the judgement code, and set vring size to 2048,
VM can be successfully started, and the network is ok too.
So, Why vring size is limited to 1024 and what is the influence?

Thanks!
-- 
Best Wishes!
Zhang Jie

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
  2014-09-30  8:36 [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024? Zhangjie (HZ)
@ 2014-09-30  9:33 ` Michael S. Tsirkin
  2014-10-08  7:17   ` Zhangjie (HZ)
  2014-10-08  7:43   ` Avi Kivity
  0 siblings, 2 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2014-09-30  9:33 UTC (permalink / raw)
  To: Zhangjie (HZ); +Cc: liuyongan, qinchuanyu, Jason Wang, akong, qemu-devel

On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote:
> Hi,
> There exits packets loss when we do packet forwarding in VM,
> especially when we use dpdk to do the forwarding. By enlarging vring
> can alleviate the problem.

I think this has to do with the fact that dpdk disables
checksum offloading, this has the side effect of disabling
segmentation offloading.

Please fix dpdk to support checksum offloading, and
I think the problem will go away.


> But now vring size is limited to 1024 as follows:
> VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>                             void (*handle_output)(VirtIODevice *, VirtQueue *))
> {
> 	...
> 	if (i == VIRTIO_PCI_QUEUE_MAX || queue_size > VIRTQUEUE_MAX_SIZE)
>         abort();
> }
> ps:#define VIRTQUEUE_MAX_SIZE 1024
> I delete the judgement code, and set vring size to 2048,
> VM can be successfully started, and the network is ok too.
> So, Why vring size is limited to 1024 and what is the influence?
> 
> Thanks!

There are several reason for this limit.
First guest has to allocate descriptor buffer which is 16 * vring size.
With 1K size that is already 16K which might be tricky to
allocate contigiously if memory is fragmented when device is
added by hotplug.
The second issue is that we want to be able to implement
the device on top of linux kernel, and
a single descriptor might use all of
the virtqueue. In this case we wont to be able to pass the
descriptor directly to linux as a single iov, since
that is limited to 1K entries.

> -- 
> Best Wishes!
> Zhang Jie

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
  2014-09-30  9:33 ` Michael S. Tsirkin
@ 2014-10-08  7:17   ` Zhangjie (HZ)
  2014-10-08  7:37     ` Michael S. Tsirkin
  2014-10-08  7:43   ` Avi Kivity
  1 sibling, 1 reply; 18+ messages in thread
From: Zhangjie (HZ) @ 2014-10-08  7:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: liuyongan, qinchuanyu, Jason Wang, akong, qemu-devel

Thanks for your patient answer! :-)

On 2014/9/30 17:33, Michael S. Tsirkin wrote:
> On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote:
>> Hi,
>> There exits packets loss when we do packet forwarding in VM,
>> especially when we use dpdk to do the forwarding. By enlarging vring
>> can alleviate the problem.
> 
> I think this has to do with the fact that dpdk disables
> checksum offloading, this has the side effect of disabling
> segmentation offloading.
> 
> Please fix dpdk to support checksum offloading, and
> I think the problem will go away.
In some application scene, loss of udp packets are not allowed,
 and udp packets are always short than mtu.
So, we need to support high pps(eg.0.3M Packets/s) forwarding, and
offloading cannot fix it.
> 
> 
>> But now vring size is limited to 1024 as follows:
>> VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>>                             void (*handle_output)(VirtIODevice *, VirtQueue *))
>> {
>> 	...
>> 	if (i == VIRTIO_PCI_QUEUE_MAX || queue_size > VIRTQUEUE_MAX_SIZE)
>>         abort();
>> }
>> ps:#define VIRTQUEUE_MAX_SIZE 1024
>> I delete the judgement code, and set vring size to 2048,
>> VM can be successfully started, and the network is ok too.
>> So, Why vring size is limited to 1024 and what is the influence?
>>
>> Thanks!
> 
> There are several reason for this limit.
> First guest has to allocate descriptor buffer which is 16 * vring size.
> With 1K size that is already 16K which might be tricky to
> allocate contigiously if memory is fragmented when device is
> added by hotplug.
That is very
> The second issue is that we want to be able to implement
> the device on top of linux kernel, and
> a single descriptor might use all of
> the virtqueue. In this case we wont to be able to pass the
> descriptor directly to linux as a single iov, since
> that is limited to 1K entries.
For the second issue, I wonder if it is ok to set vring size of virtio-net to large than 1024,
as for net work, there is at most 18 pages for a skb, it will not exceed iov.
> 
>> -- 
>> Best Wishes!
>> Zhang Jie
> .
> 

-- 
Best Wishes!
Zhang Jie

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
  2014-10-08  7:17   ` Zhangjie (HZ)
@ 2014-10-08  7:37     ` Michael S. Tsirkin
  2014-10-08  8:07       ` Zhangjie (HZ)
  0 siblings, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2014-10-08  7:37 UTC (permalink / raw)
  To: Zhangjie (HZ); +Cc: liuyongan, qinchuanyu, Jason Wang, akong, qemu-devel

On Wed, Oct 08, 2014 at 03:17:56PM +0800, Zhangjie (HZ) wrote:
> Thanks for your patient answer! :-)
> 
> On 2014/9/30 17:33, Michael S. Tsirkin wrote:
> > On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote:
> >> Hi,
> >> There exits packets loss when we do packet forwarding in VM,
> >> especially when we use dpdk to do the forwarding. By enlarging vring
> >> can alleviate the problem.
> > 
> > I think this has to do with the fact that dpdk disables
> > checksum offloading, this has the side effect of disabling
> > segmentation offloading.
> > 
> > Please fix dpdk to support checksum offloading, and
> > I think the problem will go away.
> In some application scene, loss of udp packets are not allowed,
>  and udp packets are always short than mtu.
> So, we need to support high pps(eg.0.3M Packets/s) forwarding, and
> offloading cannot fix it.

That's the point. With UFO you get larger than MTU UDP packets:
http://www.linuxfoundation.org/collaborate/workgroups/networking/ufo

Additionally, checksum offloading reduces CPU utilization
and reduces the number of data copies, allowing higher pps
with smaller buffers.

It might look like queue depth helps performance for netperf, but in
real-life workloads the latency under load will suffer, with more
protocols implementing tunnelling on top of UDP such extreme bufferbloat
will not be tolerated.

> > 
> > 
> >> But now vring size is limited to 1024 as follows:
> >> VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
> >>                             void (*handle_output)(VirtIODevice *, VirtQueue *))
> >> {
> >> 	...
> >> 	if (i == VIRTIO_PCI_QUEUE_MAX || queue_size > VIRTQUEUE_MAX_SIZE)
> >>         abort();
> >> }
> >> ps:#define VIRTQUEUE_MAX_SIZE 1024
> >> I delete the judgement code, and set vring size to 2048,
> >> VM can be successfully started, and the network is ok too.
> >> So, Why vring size is limited to 1024 and what is the influence?
> >>
> >> Thanks!
> > 
> > There are several reason for this limit.
> > First guest has to allocate descriptor buffer which is 16 * vring size.
> > With 1K size that is already 16K which might be tricky to
> > allocate contigiously if memory is fragmented when device is
> > added by hotplug.
> That is very
> > The second issue is that we want to be able to implement
> > the device on top of linux kernel, and
> > a single descriptor might use all of
> > the virtqueue. In this case we wont to be able to pass the
> > descriptor directly to linux as a single iov, since
> > that is limited to 1K entries.
> For the second issue, I wonder if it is ok to set vring size of virtio-net to large than 1024,
> as for net work, there is at most 18 pages for a skb, it will not exceed iov.
> > 
> >> -- 
> >> Best Wishes!
> >> Zhang Jie
> > .
> > 
> 
> -- 
> Best Wishes!
> Zhang Jie

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
  2014-09-30  9:33 ` Michael S. Tsirkin
  2014-10-08  7:17   ` Zhangjie (HZ)
@ 2014-10-08  7:43   ` Avi Kivity
  2014-10-08  8:26     ` Zhangjie (HZ)
  2014-10-08  9:15     ` Michael S. Tsirkin
  1 sibling, 2 replies; 18+ messages in thread
From: Avi Kivity @ 2014-10-08  7:43 UTC (permalink / raw)
  To: Michael S. Tsirkin, Zhangjie (HZ)
  Cc: liuyongan, qinchuanyu, Jason Wang, akong, qemu-devel


On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:
> a single descriptor might use all of
> the virtqueue. In this case we wont to be able to pass the
> descriptor directly to linux as a single iov, since
>

You could separate maximum request scatter/gather list size from the 
virtqueue size.  They are totally unrelated - even now you can have a 
larger request by using indirect descriptors.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
  2014-10-08  7:37     ` Michael S. Tsirkin
@ 2014-10-08  8:07       ` Zhangjie (HZ)
  2014-10-08  9:13         ` Michael S. Tsirkin
  0 siblings, 1 reply; 18+ messages in thread
From: Zhangjie (HZ) @ 2014-10-08  8:07 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: liuyongan, qinchuanyu, Jason Wang, akong, qemu-devel

MST, Thanks very much, I get it.

On 2014/10/8 15:37, Michael S. Tsirkin wrote:
> On Wed, Oct 08, 2014 at 03:17:56PM +0800, Zhangjie (HZ) wrote:
>> Thanks for your patient answer! :-)
>>
>> On 2014/9/30 17:33, Michael S. Tsirkin wrote:
>>> On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote:
>>>> Hi,
>>>> There exits packets loss when we do packet forwarding in VM,
>>>> especially when we use dpdk to do the forwarding. By enlarging vring
>>>> can alleviate the problem.
>>>
>>> I think this has to do with the fact that dpdk disables
>>> checksum offloading, this has the side effect of disabling
>>> segmentation offloading.
>>>
>>> Please fix dpdk to support checksum offloading, and
>>> I think the problem will go away.
>> In some application scene, loss of udp packets are not allowed,
>>  and udp packets are always short than mtu.
>> So, we need to support high pps(eg.0.3M Packets/s) forwarding, and
>> offloading cannot fix it.
> 
> That's the point. With UFO you get larger than MTU UDP packets:
> http://www.linuxfoundation.org/collaborate/workgroups/networking/ufo
Then vm only do forwarding, and not create new packets itself.
As we can not gro normal udp packets, when udp packets come from the nic of host, ufo cannot work.
> 
> Additionally, checksum offloading reduces CPU utilization
> and reduces the number of data copies, allowing higher pps
> with smaller buffers.
> 
> It might look like queue depth helps performance for netperf, but in
> real-life workloads the latency under load will suffer, with more
> protocols implementing tunnelling on top of UDP such extreme bufferbloat
> will not be tolerated.
> 
>>>
>>>
>>>> But now vring size is limited to 1024 as follows:
>>>> VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>>>>                             void (*handle_output)(VirtIODevice *, VirtQueue *))
>>>> {
>>>> 	...
>>>> 	if (i == VIRTIO_PCI_QUEUE_MAX || queue_size > VIRTQUEUE_MAX_SIZE)
>>>>         abort();
>>>> }
>>>> ps:#define VIRTQUEUE_MAX_SIZE 1024
>>>> I delete the judgement code, and set vring size to 2048,
>>>> VM can be successfully started, and the network is ok too.
>>>> So, Why vring size is limited to 1024 and what is the influence?
>>>>
>>>> Thanks!
>>>
>>> There are several reason for this limit.
>>> First guest has to allocate descriptor buffer which is 16 * vring size.
>>> With 1K size that is already 16K which might be tricky to
>>> allocate contigiously if memory is fragmented when device is
>>> added by hotplug.
>> That is very
>>> The second issue is that we want to be able to implement
>>> the device on top of linux kernel, and
>>> a single descriptor might use all of
>>> the virtqueue. In this case we wont to be able to pass the
>>> descriptor directly to linux as a single iov, since
>>> that is limited to 1K entries.
>> For the second issue, I wonder if it is ok to set vring size of virtio-net to large than 1024,
>> as for net work, there is at most 18 pages for a skb, it will not exceed iov.
>>>
>>>> -- 
>>>> Best Wishes!
>>>> Zhang Jie
>>> .
>>>
>>
>> -- 
>> Best Wishes!
>> Zhang Jie
> .
> 

-- 
Best Wishes!
Zhang Jie

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
  2014-10-08  7:43   ` Avi Kivity
@ 2014-10-08  8:26     ` Zhangjie (HZ)
  2014-10-08  9:15     ` Michael S. Tsirkin
  1 sibling, 0 replies; 18+ messages in thread
From: Zhangjie (HZ) @ 2014-10-08  8:26 UTC (permalink / raw)
  To: Avi Kivity, Michael S. Tsirkin
  Cc: liuyongan, qinchuanyu, Jason Wang, akong, qemu-devel



On 2014/10/8 15:43, Avi Kivity wrote:
>>
> 
> You could separate maximum request scatter/gather list size from the virtqueue size.  They are totally unrelated - even now you can have a larger request by using indirect descriptors.
Yes, there is no strong correlation between virtqueue size and iov form the code.
-- 
Best Wishes!
Zhang Jie

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
  2014-10-08  8:07       ` Zhangjie (HZ)
@ 2014-10-08  9:13         ` Michael S. Tsirkin
  0 siblings, 0 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2014-10-08  9:13 UTC (permalink / raw)
  To: Zhangjie (HZ); +Cc: liuyongan, qinchuanyu, Jason Wang, akong, qemu-devel

On Wed, Oct 08, 2014 at 04:07:47PM +0800, Zhangjie (HZ) wrote:
> MST, Thanks very much, I get it.
> 
> On 2014/10/8 15:37, Michael S. Tsirkin wrote:
> > On Wed, Oct 08, 2014 at 03:17:56PM +0800, Zhangjie (HZ) wrote:
> >> Thanks for your patient answer! :-)
> >>
> >> On 2014/9/30 17:33, Michael S. Tsirkin wrote:
> >>> On Tue, Sep 30, 2014 at 04:36:00PM +0800, Zhangjie (HZ) wrote:
> >>>> Hi,
> >>>> There exits packets loss when we do packet forwarding in VM,
> >>>> especially when we use dpdk to do the forwarding. By enlarging vring
> >>>> can alleviate the problem.
> >>>
> >>> I think this has to do with the fact that dpdk disables
> >>> checksum offloading, this has the side effect of disabling
> >>> segmentation offloading.
> >>>
> >>> Please fix dpdk to support checksum offloading, and
> >>> I think the problem will go away.
> >> In some application scene, loss of udp packets are not allowed,
> >>  and udp packets are always short than mtu.
> >> So, we need to support high pps(eg.0.3M Packets/s) forwarding, and
> >> offloading cannot fix it.
> > 
> > That's the point. With UFO you get larger than MTU UDP packets:
> > http://www.linuxfoundation.org/collaborate/workgroups/networking/ufo
> Then vm only do forwarding, and not create new packets itself.
> As we can not gro normal udp packets, when udp packets come from the nic of host, ufo cannot work.

This is something I've been thinking about for a while now.
We really should add GRO-like path for UDP, this isn't
too different from UDP.

LRO can often work with UDP too, but linux discards too much
info on LRO, but if you are doing drivers in userspace
you might be able to support this.

> > 
> > Additionally, checksum offloading reduces CPU utilization
> > and reduces the number of data copies, allowing higher pps
> > with smaller buffers.
> > 
> > It might look like queue depth helps performance for netperf, but in
> > real-life workloads the latency under load will suffer, with more
> > protocols implementing tunnelling on top of UDP such extreme bufferbloat
> > will not be tolerated.
> > 
> >>>
> >>>
> >>>> But now vring size is limited to 1024 as follows:
> >>>> VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
> >>>>                             void (*handle_output)(VirtIODevice *, VirtQueue *))
> >>>> {
> >>>> 	...
> >>>> 	if (i == VIRTIO_PCI_QUEUE_MAX || queue_size > VIRTQUEUE_MAX_SIZE)
> >>>>         abort();
> >>>> }
> >>>> ps:#define VIRTQUEUE_MAX_SIZE 1024
> >>>> I delete the judgement code, and set vring size to 2048,
> >>>> VM can be successfully started, and the network is ok too.
> >>>> So, Why vring size is limited to 1024 and what is the influence?
> >>>>
> >>>> Thanks!
> >>>
> >>> There are several reason for this limit.
> >>> First guest has to allocate descriptor buffer which is 16 * vring size.
> >>> With 1K size that is already 16K which might be tricky to
> >>> allocate contigiously if memory is fragmented when device is
> >>> added by hotplug.
> >> That is very
> >>> The second issue is that we want to be able to implement
> >>> the device on top of linux kernel, and
> >>> a single descriptor might use all of
> >>> the virtqueue. In this case we wont to be able to pass the
> >>> descriptor directly to linux as a single iov, since
> >>> that is limited to 1K entries.
> >> For the second issue, I wonder if it is ok to set vring size of virtio-net to large than 1024,
> >> as for net work, there is at most 18 pages for a skb, it will not exceed iov.
> >>>
> >>>> -- 
> >>>> Best Wishes!
> >>>> Zhang Jie
> >>> .
> >>>
> >>
> >> -- 
> >> Best Wishes!
> >> Zhang Jie
> > .
> > 
> 
> -- 
> Best Wishes!
> Zhang Jie

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
  2014-10-08  7:43   ` Avi Kivity
  2014-10-08  8:26     ` Zhangjie (HZ)
@ 2014-10-08  9:15     ` Michael S. Tsirkin
  2014-10-08  9:51       ` Avi Kivity
  1 sibling, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2014-10-08  9:15 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jason Wang, qemu-devel, liuyongan, qinchuanyu, Zhangjie (HZ),
	akong

On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote:
> 
> On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:
> >a single descriptor might use all of
> >the virtqueue. In this case we wont to be able to pass the
> >descriptor directly to linux as a single iov, since
> >
> 
> You could separate maximum request scatter/gather list size from the
> virtqueue size.  They are totally unrelated - even now you can have a larger
> request by using indirect descriptors.

We could add a feature to have a smaller or larger S/G length limit.
Is this something useful?

-- 
MST

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
  2014-10-08  9:15     ` Michael S. Tsirkin
@ 2014-10-08  9:51       ` Avi Kivity
  2014-10-08 10:14         ` Michael S. Tsirkin
  0 siblings, 1 reply; 18+ messages in thread
From: Avi Kivity @ 2014-10-08  9:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, qemu-devel, liuyongan, qinchuanyu, Zhangjie (HZ),
	akong


On 10/08/2014 12:15 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote:
>> On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:
>>> a single descriptor might use all of
>>> the virtqueue. In this case we wont to be able to pass the
>>> descriptor directly to linux as a single iov, since
>>>
>> You could separate maximum request scatter/gather list size from the
>> virtqueue size.  They are totally unrelated - even now you can have a larger
>> request by using indirect descriptors.
> We could add a feature to have a smaller or larger S/G length limit.
> Is this something useful?
>

Having a larger ring size is useful, esp. with zero-copy transmit, and 
you would need the sglist length limit in order to not require 
linearization on linux hosts.  So the limit is not useful in itself, 
only indirectly.

Google cloud engine exposes virtio ring sizes of 16384.

Even more useful is getting rid of the desc array and instead passing 
descs inline in avail and used.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
  2014-10-08  9:51       ` Avi Kivity
@ 2014-10-08 10:14         ` Michael S. Tsirkin
  2014-10-08 10:37           ` Avi Kivity
  0 siblings, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2014-10-08 10:14 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jason Wang, qemu-devel, liuyongan, qinchuanyu, Zhangjie (HZ),
	akong

On Wed, Oct 08, 2014 at 12:51:21PM +0300, Avi Kivity wrote:
> 
> On 10/08/2014 12:15 PM, Michael S. Tsirkin wrote:
> >On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote:
> >>On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:
> >>>a single descriptor might use all of
> >>>the virtqueue. In this case we wont to be able to pass the
> >>>descriptor directly to linux as a single iov, since
> >>>
> >>You could separate maximum request scatter/gather list size from the
> >>virtqueue size.  They are totally unrelated - even now you can have a larger
> >>request by using indirect descriptors.
> >We could add a feature to have a smaller or larger S/G length limit.
> >Is this something useful?
> >
> 
> Having a larger ring size is useful, esp. with zero-copy transmit, and you
> would need the sglist length limit in order to not require linearization on
> linux hosts.  So the limit is not useful in itself, only indirectly.
> 
> Google cloud engine exposes virtio ring sizes of 16384.

OK this sounds useful, I'll queue this up for consideration.
Thanks!

> Even more useful is getting rid of the desc array and instead passing descs
> inline in avail and used.

You expect this to improve performance?
Quite possibly but this will have to be demonstrated.

-- 
MST

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
  2014-10-08 10:14         ` Michael S. Tsirkin
@ 2014-10-08 10:37           ` Avi Kivity
  2014-10-08 10:55             ` Michael S. Tsirkin
  0 siblings, 1 reply; 18+ messages in thread
From: Avi Kivity @ 2014-10-08 10:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, qemu-devel, liuyongan, qinchuanyu, Zhangjie (HZ),
	akong


On 10/08/2014 01:14 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 08, 2014 at 12:51:21PM +0300, Avi Kivity wrote:
>> On 10/08/2014 12:15 PM, Michael S. Tsirkin wrote:
>>> On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote:
>>>> On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:
>>>>> a single descriptor might use all of
>>>>> the virtqueue. In this case we wont to be able to pass the
>>>>> descriptor directly to linux as a single iov, since
>>>>>
>>>> You could separate maximum request scatter/gather list size from the
>>>> virtqueue size.  They are totally unrelated - even now you can have a larger
>>>> request by using indirect descriptors.
>>> We could add a feature to have a smaller or larger S/G length limit.
>>> Is this something useful?
>>>
>> Having a larger ring size is useful, esp. with zero-copy transmit, and you
>> would need the sglist length limit in order to not require linearization on
>> linux hosts.  So the limit is not useful in itself, only indirectly.
>>
>> Google cloud engine exposes virtio ring sizes of 16384.
> OK this sounds useful, I'll queue this up for consideration.
> Thanks!

Thanks.

>> Even more useful is getting rid of the desc array and instead passing descs
>> inline in avail and used.
> You expect this to improve performance?
> Quite possibly but this will have to be demonstrated.
>

The top vhost function in small packet workloads is vhost_get_vq_desc, 
and the top instruction within that (50%) is the one that reads the 
first 8 bytes of desc.  It's a guaranteed cache line miss (and again on 
the guest side when it's time to reuse).

Inline descriptors will amortize the cache miss over 4 descriptors, and 
will allow the hardware to prefetch, since the descriptors are linear in 
memory.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
  2014-10-08 10:37           ` Avi Kivity
@ 2014-10-08 10:55             ` Michael S. Tsirkin
  2014-10-08 10:59               ` Avi Kivity
  2014-10-08 11:00               ` Avi Kivity
  0 siblings, 2 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2014-10-08 10:55 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jason Wang, qemu-devel, liuyongan, qinchuanyu, Zhangjie (HZ),
	akong

On Wed, Oct 08, 2014 at 01:37:25PM +0300, Avi Kivity wrote:
> 
> On 10/08/2014 01:14 PM, Michael S. Tsirkin wrote:
> >On Wed, Oct 08, 2014 at 12:51:21PM +0300, Avi Kivity wrote:
> >>On 10/08/2014 12:15 PM, Michael S. Tsirkin wrote:
> >>>On Wed, Oct 08, 2014 at 10:43:07AM +0300, Avi Kivity wrote:
> >>>>On 09/30/2014 12:33 PM, Michael S. Tsirkin wrote:
> >>>>>a single descriptor might use all of
> >>>>>the virtqueue. In this case we wont to be able to pass the
> >>>>>descriptor directly to linux as a single iov, since
> >>>>>
> >>>>You could separate maximum request scatter/gather list size from the
> >>>>virtqueue size.  They are totally unrelated - even now you can have a larger
> >>>>request by using indirect descriptors.
> >>>We could add a feature to have a smaller or larger S/G length limit.
> >>>Is this something useful?
> >>>
> >>Having a larger ring size is useful, esp. with zero-copy transmit, and you
> >>would need the sglist length limit in order to not require linearization on
> >>linux hosts.  So the limit is not useful in itself, only indirectly.
> >>
> >>Google cloud engine exposes virtio ring sizes of 16384.
> >OK this sounds useful, I'll queue this up for consideration.
> >Thanks!
> 
> Thanks.
> 
> >>Even more useful is getting rid of the desc array and instead passing descs
> >>inline in avail and used.
> >You expect this to improve performance?
> >Quite possibly but this will have to be demonstrated.
> >
> 
> The top vhost function in small packet workloads is vhost_get_vq_desc, and
> the top instruction within that (50%) is the one that reads the first 8
> bytes of desc.  It's a guaranteed cache line miss (and again on the guest
> side when it's time to reuse).

OK so basically what you are pointing out is that we get 5 accesses:
read of available head, read of available ring, read of descriptor,
write of used ring, write of used ring head.

If processing is in-order, we could build a much simpler design, with a
valid bit in the descriptor, cleared by host as descriptors are
consumed.

Basically get rid of both used and available ring.

Sounds good in theory.

> Inline descriptors will amortize the cache miss over 4 descriptors, and will
> allow the hardware to prefetch, since the descriptors are linear in memory.

If descriptors are used in order (as they are with current qemu)
then aren't they amortized already?

-- 
MST

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
  2014-10-08 10:55             ` Michael S. Tsirkin
@ 2014-10-08 10:59               ` Avi Kivity
  2014-10-08 12:22                 ` Michael S. Tsirkin
  2014-10-08 11:00               ` Avi Kivity
  1 sibling, 1 reply; 18+ messages in thread
From: Avi Kivity @ 2014-10-08 10:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, qemu-devel, liuyongan, qinchuanyu, Zhangjie (HZ),
	akong


On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote:
>>>> Even more useful is getting rid of the desc array and instead passing descs
>>>> inline in avail and used.
>>> You expect this to improve performance?
>>> Quite possibly but this will have to be demonstrated.
>>>
>> The top vhost function in small packet workloads is vhost_get_vq_desc, and
>> the top instruction within that (50%) is the one that reads the first 8
>> bytes of desc.  It's a guaranteed cache line miss (and again on the guest
>> side when it's time to reuse).
> OK so basically what you are pointing out is that we get 5 accesses:
> read of available head, read of available ring, read of descriptor,
> write of used ring, write of used ring head.

Right.  And only read of descriptor is not amortized.

> If processing is in-order, we could build a much simpler design, with a
> valid bit in the descriptor, cleared by host as descriptors are
> consumed.
>
> Basically get rid of both used and available ring.

That only works if you don't allow reordering, which is never the case 
for block, and not the case for zero-copy net.  It also has writers on 
both side of the ring.

The right design is to keep avail and used, but instead of making them 
rings of pointers to descs, make them rings of descs.

The host reads descs from avail, processes them, then writes them back 
on used (possibly out-of-order).  The guest writes descs to avail and 
reads them back from used.

You'll probably have to add a 64-bit cookie to desc so you can complete 
without an additional lookup.

>
> Sounds good in theory.
>
>> Inline descriptors will amortize the cache miss over 4 descriptors, and will
>> allow the hardware to prefetch, since the descriptors are linear in memory.
> If descriptors are used in order (as they are with current qemu)
> then aren't they amortized already?
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
  2014-10-08 10:55             ` Michael S. Tsirkin
  2014-10-08 10:59               ` Avi Kivity
@ 2014-10-08 11:00               ` Avi Kivity
  1 sibling, 0 replies; 18+ messages in thread
From: Avi Kivity @ 2014-10-08 11:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, qemu-devel, liuyongan, qinchuanyu, Zhangjie (HZ),
	akong


On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote:
>
>> Inline descriptors will amortize the cache miss over 4 descriptors, and will
>> allow the hardware to prefetch, since the descriptors are linear in memory.
> If descriptors are used in order (as they are with current qemu)
> then aren't they amortized already?
>

The descriptors are only in-order for non-zero-copy net.  They are out 
of order for block and zero-copy net.

(also, the guest has to be careful in how it allocates descriptors).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
  2014-10-08 10:59               ` Avi Kivity
@ 2014-10-08 12:22                 ` Michael S. Tsirkin
  2014-10-08 12:28                   ` Avi Kivity
  0 siblings, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2014-10-08 12:22 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jason Wang, qemu-devel, liuyongan, qinchuanyu, Zhangjie (HZ),
	akong

On Wed, Oct 08, 2014 at 01:59:13PM +0300, Avi Kivity wrote:
> 
> On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote:
> >>>>Even more useful is getting rid of the desc array and instead passing descs
> >>>>inline in avail and used.
> >>>You expect this to improve performance?
> >>>Quite possibly but this will have to be demonstrated.
> >>>
> >>The top vhost function in small packet workloads is vhost_get_vq_desc, and
> >>the top instruction within that (50%) is the one that reads the first 8
> >>bytes of desc.  It's a guaranteed cache line miss (and again on the guest
> >>side when it's time to reuse).
> >OK so basically what you are pointing out is that we get 5 accesses:
> >read of available head, read of available ring, read of descriptor,
> >write of used ring, write of used ring head.
> 
> Right.  And only read of descriptor is not amortized.
> 
> >If processing is in-order, we could build a much simpler design, with a
> >valid bit in the descriptor, cleared by host as descriptors are
> >consumed.
> >
> >Basically get rid of both used and available ring.
> 
> That only works if you don't allow reordering, which is never the case for
> block, and not the case for zero-copy net.  It also has writers on both side
> of the ring.
> 
> The right design is to keep avail and used, but instead of making them rings
> of pointers to descs, make them rings of descs.
> 
> The host reads descs from avail, processes them, then writes them back on
> used (possibly out-of-order).  The guest writes descs to avail and reads
> them back from used.
> 
> You'll probably have to add a 64-bit cookie to desc so you can complete
> without an additional lookup.

My old presentation from 2012 or so suggested something like this.
We don't need a 64 bit cookie I think - a small 16 bit one
should be enough.

> >
> >Sounds good in theory.
> >
> >>Inline descriptors will amortize the cache miss over 4 descriptors, and will
> >>allow the hardware to prefetch, since the descriptors are linear in memory.
> >If descriptors are used in order (as they are with current qemu)
> >then aren't they amortized already?
> >

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
  2014-10-08 12:22                 ` Michael S. Tsirkin
@ 2014-10-08 12:28                   ` Avi Kivity
  2014-10-08 12:36                     ` Avi Kivity
  0 siblings, 1 reply; 18+ messages in thread
From: Avi Kivity @ 2014-10-08 12:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, qemu-devel, liuyongan, qinchuanyu, Zhangjie (HZ),
	akong


On 10/08/2014 03:22 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 08, 2014 at 01:59:13PM +0300, Avi Kivity wrote:
>> On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote:
>>>>>> Even more useful is getting rid of the desc array and instead passing descs
>>>>>> inline in avail and used.
>>>>> You expect this to improve performance?
>>>>> Quite possibly but this will have to be demonstrated.
>>>>>
>>>> The top vhost function in small packet workloads is vhost_get_vq_desc, and
>>>> the top instruction within that (50%) is the one that reads the first 8
>>>> bytes of desc.  It's a guaranteed cache line miss (and again on the guest
>>>> side when it's time to reuse).
>>> OK so basically what you are pointing out is that we get 5 accesses:
>>> read of available head, read of available ring, read of descriptor,
>>> write of used ring, write of used ring head.
>> Right.  And only read of descriptor is not amortized.
>>
>>> If processing is in-order, we could build a much simpler design, with a
>>> valid bit in the descriptor, cleared by host as descriptors are
>>> consumed.
>>>
>>> Basically get rid of both used and available ring.
>> That only works if you don't allow reordering, which is never the case for
>> block, and not the case for zero-copy net.  It also has writers on both side
>> of the ring.
>>
>> The right design is to keep avail and used, but instead of making them rings
>> of pointers to descs, make them rings of descs.
>>
>> The host reads descs from avail, processes them, then writes them back on
>> used (possibly out-of-order).  The guest writes descs to avail and reads
>> them back from used.
>>
>> You'll probably have to add a 64-bit cookie to desc so you can complete
>> without an additional lookup.
> My old presentation from 2012 or so suggested something like this.
> We don't need a 64 bit cookie I think - a small 16 bit one
> should be enough.
>

A 16 bit cookie means you need an extra table to hold the real request 
pointers.

With a 64-bit cookie you can store a pointer to the skbuff or bio in the 
ring itself, and avoid the extra lookup.

The extra lookup isn't the end of the world, since doesn't cross core 
boundaries, but it's worth avoiding.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
  2014-10-08 12:28                   ` Avi Kivity
@ 2014-10-08 12:36                     ` Avi Kivity
  0 siblings, 0 replies; 18+ messages in thread
From: Avi Kivity @ 2014-10-08 12:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, qemu-devel, liuyongan, qinchuanyu, Zhangjie (HZ),
	akong


On 10/08/2014 03:28 PM, Avi Kivity wrote:
> My old presentation from 2012 or so suggested something like this.
>> We don't need a 64 bit cookie I think - a small 16 bit one
>> should be enough.
>>
>
> A 16 bit cookie means you need an extra table to hold the real request 
> pointers.
>
> With a 64-bit cookie you can store a pointer to the skbuff or bio in 
> the ring itself, and avoid the extra lookup.
>
> The extra lookup isn't the end of the world, since doesn't cross core 
> boundaries, but it's worth avoiding.
>

What you can do is have two types of descriptors: head and fragment

union desc {
     struct head {
          u16 nfrags
          u16 flags
          u64 cookie
     }
     struct frag {
          u64 paddr
          u16 flen
          u16 flags
     }
}

so now a request length is 12*(nfrags+1).

You can be evil and steal some bits from paddr/cookie, and have each 
descriptor 8 bytes long.

btw, I also recommend storing things like vnet_hdr in the ring itself, 
instead of out-of-line in memory.  Maybe the ring should just transport 
bytes and let the upper layer decide how it's formatted.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-10-08 12:36 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-30  8:36 [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024? Zhangjie (HZ)
2014-09-30  9:33 ` Michael S. Tsirkin
2014-10-08  7:17   ` Zhangjie (HZ)
2014-10-08  7:37     ` Michael S. Tsirkin
2014-10-08  8:07       ` Zhangjie (HZ)
2014-10-08  9:13         ` Michael S. Tsirkin
2014-10-08  7:43   ` Avi Kivity
2014-10-08  8:26     ` Zhangjie (HZ)
2014-10-08  9:15     ` Michael S. Tsirkin
2014-10-08  9:51       ` Avi Kivity
2014-10-08 10:14         ` Michael S. Tsirkin
2014-10-08 10:37           ` Avi Kivity
2014-10-08 10:55             ` Michael S. Tsirkin
2014-10-08 10:59               ` Avi Kivity
2014-10-08 12:22                 ` Michael S. Tsirkin
2014-10-08 12:28                   ` Avi Kivity
2014-10-08 12:36                     ` Avi Kivity
2014-10-08 11:00               ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).