From mboxrd@z Thu Jan 1 00:00:00 1970 From: Maxime Coquelin Subject: Re: vhost: batch used descriptors chains write-back with packed ring Date: Thu, 6 Dec 2018 18:10:58 +0100 Message-ID: References: <20181128094700.14598-1-maxime.coquelin@redhat.com> <7fbcfcea-3c81-d5d1-86bf-8fe8e63d4468@samsung.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: "Michael S. Tsirkin" To: Ilya Maximets , dev@dpdk.org, tiwei.bie@intel.com, zhihong.wang@intel.com, jfreimann@redhat.com Return-path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 423B65F34 for ; Thu, 6 Dec 2018 18:11:12 +0100 (CET) In-Reply-To: <7fbcfcea-3c81-d5d1-86bf-8fe8e63d4468@samsung.com> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 12/5/18 5:01 PM, Ilya Maximets wrote: > On 28.11.2018 12:47, Maxime Coquelin wrote: >> Instead of writing back descriptors chains in order, let's >> write the first chain flags last in order to improve batching. > > I'm not sure if this fully compliant with virtio spec. > It says that 'each side (driver and device) are only required to poll > (or test) a single location in memory', but it does not forbid to > test other descriptors. So, if the driver will try to check not only > 'the next device descriptor after the one they processed previously, > in circular order' but a few descriptors ahead, it could read an > inconsistent memory because there are no more write barriers between > updates for flags and id/len for them. > > What do you think ? Yes, that makes sense. It should have no cost on x86 moreover. I'll fix it in v2. Thanks, Maxime >> >> With Kernel's pktgen benchmark, ~3% performance gain is measured. >> >> Signed-off-by: Maxime Coquelin >> Tested-by: Jens Freimann >> Reviewed-by: Jens Freimann >> --- >> lib/librte_vhost/virtio_net.c | 37 ++++++++++++++++++++++------------- >> 1 file changed, 23 insertions(+), 14 deletions(-) >> >> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c >> index 5e1a1a727..f54642c2d 100644 >> --- a/lib/librte_vhost/virtio_net.c >> +++ b/lib/librte_vhost/virtio_net.c >> @@ -135,19 +135,10 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, >> struct vhost_virtqueue *vq) >> { >> int i; >> - uint16_t used_idx = vq->last_used_idx; >> + uint16_t head_flags, head_idx = vq->last_used_idx; >> >> - /* Split loop in two to save memory barriers */ >> - for (i = 0; i < vq->shadow_used_idx; i++) { >> - vq->desc_packed[used_idx].id = vq->shadow_used_packed[i].id; >> - vq->desc_packed[used_idx].len = vq->shadow_used_packed[i].len; >> - >> - used_idx += vq->shadow_used_packed[i].count; >> - if (used_idx >= vq->size) >> - used_idx -= vq->size; >> - } >> - >> - rte_smp_wmb(); >> + if (unlikely(vq->shadow_used_idx == 0)) >> + return; >> >> for (i = 0; i < vq->shadow_used_idx; i++) { >> uint16_t flags; >> @@ -165,12 +156,22 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, >> flags &= ~VRING_DESC_F_AVAIL; >> } >> >> - vq->desc_packed[vq->last_used_idx].flags = flags; >> + vq->desc_packed[vq->last_used_idx].id = >> + vq->shadow_used_packed[i].id; >> + vq->desc_packed[vq->last_used_idx].len = >> + vq->shadow_used_packed[i].len; >> + >> + if (i > 0) { >> + vq->desc_packed[vq->last_used_idx].flags = flags; >> >> - vhost_log_cache_used_vring(dev, vq, >> + vhost_log_cache_used_vring(dev, vq, >> vq->last_used_idx * >> sizeof(struct vring_packed_desc), >> sizeof(struct vring_packed_desc)); >> + } else { >> + head_idx = vq->last_used_idx; >> + head_flags = flags; >> + } >> >> vq->last_used_idx += vq->shadow_used_packed[i].count; >> if (vq->last_used_idx >= vq->size) { >> @@ -180,7 +181,15 @@ flush_shadow_used_ring_packed(struct virtio_net *dev, >> } >> >> rte_smp_wmb(); >> + >> + vq->desc_packed[head_idx].flags = head_flags; >> vq->shadow_used_idx = 0; >> + >> + vhost_log_cache_used_vring(dev, vq, >> + head_idx * >> + sizeof(struct vring_packed_desc), >> + sizeof(struct vring_packed_desc)); >> + >> vhost_log_cache_sync(dev, vq); >> } >> >>