From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tiwei Bie Subject: Re: [PATCH] vhost: adaptively batch small guest memory copies Date: Fri, 8 Sep 2017 08:48:50 +0800 Message-ID: <20170908004849.GA18498@debian-ZGViaWFuCg> References: <20170824021939.21306-1-tiwei.bie@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: dev@dpdk.org, yliu@fridaylinux.org, Zhihong Wang , Zhiyong Yang To: Maxime Coquelin Return-path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 479ED377E for ; Fri, 8 Sep 2017 02:48:26 +0200 (CEST) Content-Disposition: inline In-Reply-To: List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Maxime, On Thu, Sep 07, 2017 at 07:47:57PM +0200, Maxime Coquelin wrote: > Hi Tiwei, > > On 08/24/2017 04:19 AM, Tiwei Bie wrote: > > This patch adaptively batches the small guest memory copies. > > By batching the small copies, the efficiency of executing the > > memory LOAD instructions can be improved greatly, because the > > memory LOAD latency can be effectively hidden by the pipeline. > > We saw great performance boosts for small packets PVP test. > > > > This patch improves the performance for small packets, and has > > distinguished the packets by size. So although the performance > > for big packets doesn't change, it makes it relatively easy to > > do some special optimizations for the big packets too. > > > > Signed-off-by: Tiwei Bie > > Signed-off-by: Zhihong Wang > > Signed-off-by: Zhiyong Yang > > --- > > This optimization depends on the CPU internal pipeline design. > > So further tests (e.g. ARM) from the community is appreciated. > > > > lib/librte_vhost/vhost.c | 2 +- > > lib/librte_vhost/vhost.h | 13 +++ > > lib/librte_vhost/vhost_user.c | 12 +++ > > lib/librte_vhost/virtio_net.c | 240 ++++++++++++++++++++++++++++++++---------- > > 4 files changed, 209 insertions(+), 58 deletions(-) > > I did some PVP benchmark with your patch. > First I tried my standard PVP setup, with io forwarding on host and > macswap on guest in bidirectional mode. > > With this, I notice no improvement (18.8Mpps), but I think it explains > because guest is the bottleneck here. > So I change my setup to do csum forwarding on host side, so that host's > PMD threads are more loaded. > > In this case, I notice a great improvement, I get 18.8Mpps with your > patch instead of 14.8Mpps without! Great work! > > Reviewed-by: Maxime Coquelin > Thank you very much for taking time to review and test this patch! :-) Best regards, Tiwei Bie