From mboxrd@z Thu Jan  1 00:00:00 1970
From: Bruce Richardson <bruce.richardson@intel.com>
Subject: Re: [PATCH v3] Implement memcmp using SIMD intrinsics
Date: Fri, 12 Jun 2015 10:03:35 +0100
Message-ID: <20150612090334.GA496@bricha3-MOBL3>
References: <1431979303-1346-1-git-send-email-rkerur@gmail.com>
 <20150612083056.GA18090@domone>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: dev@dpdk.org
To: =?utf-8?B?T25kxZllaiBCw61sa2E=?= <neleai@seznam.cz>
Return-path: <dev-bounces@dpdk.org>
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
 by dpdk.org (Postfix) with ESMTP id E8C37B3D6
 for <dev@dpdk.org>; Fri, 12 Jun 2015 11:03:38 +0200 (CEST)
Content-Disposition: inline
In-Reply-To: <20150612083056.GA18090@domone>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Fri, Jun 12, 2015 at 10:30:56AM +0200, Ond=C5=99ej B=C3=ADlka wrote:
> On Mon, May 18, 2015 at 01:01:42PM -0700, Ravi Kerur wrote:
> > Background:
> > After preliminary discussion with John (Zhihong) and Tim from Intel i=
t was
> > decided that it would be beneficial to use AVX/SSE intrinsics for mem=
cmp
> > similar to memcpy that had been implemeneted. In addition, we decided=
 to use
> > librte_hash as a test candidate to test both functionality and perfor=
mance.
> >=20
> > Further discussions lead to complete functionality implementation of =
memory
> > comparison and v3 code reflects that.
> >=20
> > Test was conducted on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, Ubuntu=
 14.04,
> > x86_64, 16GB DDR3 system.
> >=20
> > Ravi Kerur (1):
> >   Implement memcmp using Intel SIMD instrinsics.
>=20
> As my previous mail got lost I am resending it.=20
>=20
> In short you shouldn't
> use sse2/avx2 for memcmp at all. In 95% of calls you find inequality in
> first 8 bytes so sse2 adds just unnecessary overhead versus checking
> these with.
>=20
> 190:   48 8b 4e 08             mov    0x8(%rsi),%rcx
> 194:   48 39 4f 08             cmp    %rcx,0x8(%rdi)
> 198:   75 f3                   jne    18d <memeq30+0xd>
>=20
> Also as you have full memcmp does in your gcc optimize out=20
> if (memcmp(x,y))=20
> like in mine?
>=20
> So run also implementation below in your benchmark, my guess is it will
> be faster.
>=20
<snip for brevity>

Thanks for the contribution. It's very informative!

/Bruce