From mboxrd@z Thu Jan  1 00:00:00 1970
From: Thomas Monjalon <thomas.monjalon@6wind.com>
Subject: Re: [PATCH v2] eal: optimize aligned rte_memcpy
Date: Tue, 17 Jan 2017 16:08:42 +0100
Message-ID: <1597948.LxUmgnGZos@xps13>
References: <1480641582-56186-1-git-send-email-zhihong.wang@intel.com>
 <1481074266-4461-1-git-send-email-zhihong.wang@intel.com>
 <20161208021843.GM31182@yliu-dev.sh.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7Bit
Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>, dev@dpdk.org,
 lei.a.yao@intel.com
To: Zhihong Wang <zhihong.wang@intel.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mail-wm0-f47.google.com (mail-wm0-f47.google.com [74.125.82.47])
 by dpdk.org (Postfix) with ESMTP id AECB81094
 for <dev@dpdk.org>; Tue, 17 Jan 2017 16:08:44 +0100 (CET)
Received: by mail-wm0-f47.google.com with SMTP id c85so203995352wmi.1
 for <dev@dpdk.org>; Tue, 17 Jan 2017 07:08:44 -0800 (PST)
In-Reply-To: <20161208021843.GM31182@yliu-dev.sh.intel.com>
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

2016-12-08 10:18, Yuanhan Liu:
> On Tue, Dec 06, 2016 at 08:31:06PM -0500, Zhihong Wang wrote:
> > This patch optimizes rte_memcpy for well aligned cases, where both
> > dst and src addr are aligned to maximum MOV width. It introduces a
> > dedicated function called rte_memcpy_aligned to handle the aligned
> > cases with simplified instruction stream. The existing rte_memcpy
> > is renamed as rte_memcpy_generic. The selection between them 2 is
> > done at the entry of rte_memcpy.
> > 
> > The existing rte_memcpy is for generic cases, it handles unaligned
> > copies and make store aligned, it even makes load aligned for micro
> > architectures like Ivy Bridge. However alignment handling comes at
> > a price: It adds extra load/store instructions, which can cause
> > complications sometime.
> > 
> > DPDK Vhost memcpy with Mergeable Rx Buffer feature as an example:
> > The copy is aligned, and remote, and there is header write along
> > which is also remote. In this case the memcpy instruction stream
> > should be simplified, to reduce extra load/store, therefore reduce
> > the probability of load/store buffer full caused pipeline stall, to
> > let the actual memcpy instructions be issued and let H/W prefetcher
> > goes to work as early as possible.
> > 
> > This patch is tested on Ivy Bridge, Haswell and Skylake, it provides
> > up to 20% gain for Virtio Vhost PVP traffic, with packet size ranging
> > from 64 to 1500 bytes.
> > 
> > The test can also be conducted without NIC, by setting loopback
> > traffic between Virtio and Vhost. For example, modify the macro
> > TXONLY_DEF_PACKET_LEN to the requested packet size in testpmd.h,
> > rebuild and start testpmd in both host and guest, then "start" on
> > one side and "start tx_first 32" on the other.
> > 
> > 
> > Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
> 
> Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

Applied, thanks