From mboxrd@z Thu Jan 1 00:00:00 1970 From: Maxime Coquelin Subject: Re: [PATCH v2] vhost: batch used descs chains write-back with packed ring Date: Thu, 20 Dec 2018 09:49:55 +0100 Message-ID: References: <20181219092952.25728-1-maxime.coquelin@redhat.com> <20181220044446.GB21484@dpdk-tbie.sh.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: dev@dpdk.org, i.maximets@samsung.com, zhihong.wang@intel.com, jfreiman@redhat.com, mst@redhat.com To: Tiwei Bie Return-path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 23D651B9A7 for ; Thu, 20 Dec 2018 09:50:03 +0100 (CET) In-Reply-To: <20181220044446.GB21484@dpdk-tbie.sh.intel.com> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 12/20/18 5:44 AM, Tiwei Bie wrote: > On Wed, Dec 19, 2018 at 10:29:52AM +0100, Maxime Coquelin wrote: >> Instead of writing back descriptors chains in order, let's >> write the first chain flags last in order to improve batching. >> >> With Kernel's pktgen benchmark, ~3% performance gain is measured. >> >> Signed-off-by: Maxime Coquelin >> --- >> >> V2: >> Revert back to initial implementation to have a write >> barrier before every descs flags store, but still >> store first desc flags last. (Missing barrier reported >> by Ilya) >> >> >> lib/librte_vhost/virtio_net.c | 19 ++++++++++++++++--- >> 1 file changed, 16 insertions(+), 3 deletions(-) >> >> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c >> index 8c657a101..de436af79 100644 >> --- a/lib/librte_vhost/virtio_net.c >> +++ b/lib/librte_vhost/virtio_net.c >> @@ -97,6 +97,8 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, >> { >> int i; >> uint16_t used_idx = vq->last_used_idx; >> + uint16_t head_idx = vq->last_used_idx; >> + uint16_t head_flags = 0; >> >> /* Split loop in two to save memory barriers */ >> for (i = 0; i < vq->shadow_used_idx; i++) { >> @@ -126,12 +128,17 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, >> flags &= ~VRING_DESC_F_AVAIL; >> } >> >> - vq->desc_packed[vq->last_used_idx].flags = flags; >> + if (i > 0) { >> + vq->desc_packed[vq->last_used_idx].flags = flags; >> >> - vhost_log_cache_used_vring(dev, vq, >> + vhost_log_cache_used_vring(dev, vq, >> vq->last_used_idx * >> sizeof(struct vring_packed_desc), >> sizeof(struct vring_packed_desc)); >> + } else { >> + head_idx = vq->last_used_idx; >> + head_flags = flags; >> + } >> >> vq->last_used_idx += vq->shadow_used_packed[i].count; >> if (vq->last_used_idx >= vq->size) { >> @@ -140,7 +147,13 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, >> } >> } >> >> - rte_smp_wmb(); >> + vq->desc_packed[head_idx].flags = head_flags; >> + >> + vhost_log_cache_used_vring(dev, vq, >> + vq->last_used_idx * > > Should be head_idx. Oh yes, thanks for spotting this. > >> + sizeof(struct vring_packed_desc), >> + sizeof(struct vring_packed_desc)); >> + >> vq->shadow_used_idx = 0; > > A wmb() is needed before log_cache_sync? I think you're right, I was wrong but thought we had a barrier in cache sync function. That's not very important for x86, but I think it should be preferable to do it in vhost_log_cache_sync(), if logging is enabled. What do you think? >> vhost_log_cache_sync(dev, vq); >> } >> -- >> 2.17.2 >>