* [PATCH v3] vhost: batch used descs chains write-back with packed ring
@ 2018-12-20 10:00 Maxime Coquelin
  2018-12-20 10:07 ` Tiwei Bie
  2018-12-20 14:30 ` Michael S. Tsirkin
  0 siblings, 2 replies; 4+ messages in thread
From: Maxime Coquelin @ 2018-12-20 10:00 UTC (permalink / raw)
  To: dev, i.maximets, tiwei.bie, zhihong.wang, jfreiman, mst; +Cc: Maxime Coquelin
Instead of writing back descriptors chains in order, let's
write the first chain flags last in order to improve batching.
With Kernel's pktgen benchmark, ~3% performance gain is measured.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/virtio_net.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 8c657a101..66ccd3c35 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
 {
 	int i;
 	uint16_t used_idx = vq->last_used_idx;
+	uint16_t head_idx = vq->last_used_idx;
+	uint16_t head_flags = 0;
 
 	/* Split loop in two to save memory barriers */
 	for (i = 0; i < vq->shadow_used_idx; i++) {
@@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
 			flags &= ~VRING_DESC_F_AVAIL;
 		}
 
-		vq->desc_packed[vq->last_used_idx].flags = flags;
+		if (i > 0) {
+			vq->desc_packed[vq->last_used_idx].flags = flags;
 
-		vhost_log_cache_used_vring(dev, vq,
+			vhost_log_cache_used_vring(dev, vq,
 					vq->last_used_idx *
 					sizeof(struct vring_packed_desc),
 					sizeof(struct vring_packed_desc));
+		} else {
+			head_idx = vq->last_used_idx;
+			head_flags = flags;
+		}
 
 		vq->last_used_idx += vq->shadow_used_packed[i].count;
 		if (vq->last_used_idx >= vq->size) {
@@ -140,7 +147,15 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
 		}
 	}
 
+	vq->desc_packed[head_idx].flags = head_flags;
+
 	rte_smp_wmb();
+
+	vhost_log_cache_used_vring(dev, vq,
+				head_idx *
+				sizeof(struct vring_packed_desc),
+				sizeof(struct vring_packed_desc));
+
 	vq->shadow_used_idx = 0;
 	vhost_log_cache_sync(dev, vq);
 }
-- 
2.17.2
^ permalink raw reply related	[flat|nested] 4+ messages in thread- * Re: [PATCH v3] vhost: batch used descs chains write-back with packed ring
  2018-12-20 10:00 [PATCH v3] vhost: batch used descs chains write-back with packed ring Maxime Coquelin
@ 2018-12-20 10:07 ` Tiwei Bie
  2018-12-20 14:30 ` Michael S. Tsirkin
  1 sibling, 0 replies; 4+ messages in thread
From: Tiwei Bie @ 2018-12-20 10:07 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, i.maximets, zhihong.wang, jfreiman, mst
On Thu, Dec 20, 2018 at 11:00:22AM +0100, Maxime Coquelin wrote:
> Instead of writing back descriptors chains in order, let's
> write the first chain flags last in order to improve batching.
> 
> With Kernel's pktgen benchmark, ~3% performance gain is measured.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/librte_vhost/virtio_net.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
^ permalink raw reply	[flat|nested] 4+ messages in thread 
- * Re: [PATCH v3] vhost: batch used descs chains write-back with packed ring
  2018-12-20 10:00 [PATCH v3] vhost: batch used descs chains write-back with packed ring Maxime Coquelin
  2018-12-20 10:07 ` Tiwei Bie
@ 2018-12-20 14:30 ` Michael S. Tsirkin
  2018-12-20 15:32   ` Maxime Coquelin
  1 sibling, 1 reply; 4+ messages in thread
From: Michael S. Tsirkin @ 2018-12-20 14:30 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, i.maximets, tiwei.bie, zhihong.wang, jfreiman
On Thu, Dec 20, 2018 at 11:00:22AM +0100, Maxime Coquelin wrote:
> Instead of writing back descriptors chains in order, let's
> write the first chain flags last in order to improve batching.
> 
> With Kernel's pktgen benchmark, ~3% performance gain is measured.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/librte_vhost/virtio_net.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 8c657a101..66ccd3c35 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>  {
>  	int i;
>  	uint16_t used_idx = vq->last_used_idx;
> +	uint16_t head_idx = vq->last_used_idx;
> +	uint16_t head_flags = 0;
>  
>  	/* Split loop in two to save memory barriers */
>  	for (i = 0; i < vq->shadow_used_idx; i++) {
> @@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>  			flags &= ~VRING_DESC_F_AVAIL;
>  		}
>  
> -		vq->desc_packed[vq->last_used_idx].flags = flags;
> +		if (i > 0) {
> +			vq->desc_packed[vq->last_used_idx].flags = flags;
>  
> -		vhost_log_cache_used_vring(dev, vq,
> +			vhost_log_cache_used_vring(dev, vq,
>  					vq->last_used_idx *
>  					sizeof(struct vring_packed_desc),
>  					sizeof(struct vring_packed_desc));
> +		} else {
> +			head_idx = vq->last_used_idx;
> +			head_flags = flags;
> +		}
>  
>  		vq->last_used_idx += vq->shadow_used_packed[i].count;
>  		if (vq->last_used_idx >= vq->size) {
> @@ -140,7 +147,15 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>  		}
>  	}
>  
> +	vq->desc_packed[head_idx].flags = head_flags;
> +
>  	rte_smp_wmb();
> +
> +	vhost_log_cache_used_vring(dev, vq,
> +				head_idx *
> +				sizeof(struct vring_packed_desc),
> +				sizeof(struct vring_packed_desc));
> +
>  	vq->shadow_used_idx = 0;
>  	vhost_log_cache_sync(dev, vq);
How about moving rte_smp_wmb into logging functions?
This way it's free with log disabled even on arm...
>  }
> -- 
> 2.17.2
^ permalink raw reply	[flat|nested] 4+ messages in thread
- * Re: [PATCH v3] vhost: batch used descs chains write-back with packed ring
  2018-12-20 14:30 ` Michael S. Tsirkin
@ 2018-12-20 15:32   ` Maxime Coquelin
  0 siblings, 0 replies; 4+ messages in thread
From: Maxime Coquelin @ 2018-12-20 15:32 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dev, i.maximets, tiwei.bie, zhihong.wang, jfreiman
On 12/20/18 3:30 PM, Michael S. Tsirkin wrote:
> On Thu, Dec 20, 2018 at 11:00:22AM +0100, Maxime Coquelin wrote:
>> Instead of writing back descriptors chains in order, let's
>> write the first chain flags last in order to improve batching.
>>
>> With Kernel's pktgen benchmark, ~3% performance gain is measured.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   lib/librte_vhost/virtio_net.c | 19 +++++++++++++++++--
>>   1 file changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
>> index 8c657a101..66ccd3c35 100644
>> --- a/lib/librte_vhost/virtio_net.c
>> +++ b/lib/librte_vhost/virtio_net.c
>> @@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>>   {
>>   	int i;
>>   	uint16_t used_idx = vq->last_used_idx;
>> +	uint16_t head_idx = vq->last_used_idx;
>> +	uint16_t head_flags = 0;
>>   
>>   	/* Split loop in two to save memory barriers */
>>   	for (i = 0; i < vq->shadow_used_idx; i++) {
>> @@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>>   			flags &= ~VRING_DESC_F_AVAIL;
>>   		}
>>   
>> -		vq->desc_packed[vq->last_used_idx].flags = flags;
>> +		if (i > 0) {
>> +			vq->desc_packed[vq->last_used_idx].flags = flags;
>>   
>> -		vhost_log_cache_used_vring(dev, vq,
>> +			vhost_log_cache_used_vring(dev, vq,
>>   					vq->last_used_idx *
>>   					sizeof(struct vring_packed_desc),
>>   					sizeof(struct vring_packed_desc));
>> +		} else {
>> +			head_idx = vq->last_used_idx;
>> +			head_flags = flags;
>> +		}
>>   
>>   		vq->last_used_idx += vq->shadow_used_packed[i].count;
>>   		if (vq->last_used_idx >= vq->size) {
>> @@ -140,7 +147,15 @@ flush_shadow_used_ring_packed(struct virtio_net *dev,
>>   		}
>>   	}
>>   
>> +	vq->desc_packed[head_idx].flags = head_flags;
>> +
>>   	rte_smp_wmb();
>> +
>> +	vhost_log_cache_used_vring(dev, vq,
>> +				head_idx *
>> +				sizeof(struct vring_packed_desc),
>> +				sizeof(struct vring_packed_desc));
>> +
>>   	vq->shadow_used_idx = 0;
>>   	vhost_log_cache_sync(dev, vq);
> 
> How about moving rte_smp_wmb into logging functions?
> This way it's free with log disabled even on arm...
That's what I initially suggested in my reply to v2.
Problem is that in split ring case, we already have a barrier before
cache sync, and we need it even if logging is disabled.
But I think you are right, it might be better to have the barrier twice
in split ring case when logging is enabled and none for packed ring when
logging is disabled.
I'll post a v4.
Thanks,
Maxime
>>   }
>> -- 
>> 2.17.2
^ permalink raw reply	[flat|nested] 4+ messages in thread
 
end of thread, other threads:[~2018-12-20 15:32 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-20 10:00 [PATCH v3] vhost: batch used descs chains write-back with packed ring Maxime Coquelin
2018-12-20 10:07 ` Tiwei Bie
2018-12-20 14:30 ` Michael S. Tsirkin
2018-12-20 15:32   ` Maxime Coquelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).