From mboxrd@z Thu Jan  1 00:00:00 1970
From: Thomas Monjalon <thomas.monjalon@6wind.com>
Subject: Re: [PATCH 2/3] hash: add vectorized comparison
Date: Sat, 27 Aug 2016 10:57:47 +0200
Message-ID: <5721729.LXq7JRZ983@xps13>
References: <1472247287-167011-1-git-send-email-pablo.de.lara.guarch@intel.com>
 <1472247287-167011-3-git-send-email-pablo.de.lara.guarch@intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7Bit
Cc: dev@dpdk.org, bruce.richardson@intel.com,
 Saikrishna Edupuganti <saikrishna.edupuganti@intel.com>, jianbo.liu@linaro.org,
 chaozhu@linux.vnet.ibm.com, jerin.jacob@caviumnetworks.com
To: Pablo de Lara <pablo.de.lara.guarch@intel.com>,
 Byron Marohn <byron.marohn@intel.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mail-wm0-f50.google.com (mail-wm0-f50.google.com [74.125.82.50])
 by dpdk.org (Postfix) with ESMTP id 039925682
 for <dev@dpdk.org>; Sat, 27 Aug 2016 10:57:50 +0200 (CEST)
Received: by mail-wm0-f50.google.com with SMTP id o80so22576666wme.1
 for <dev@dpdk.org>; Sat, 27 Aug 2016 01:57:49 -0700 (PDT)
In-Reply-To: <1472247287-167011-3-git-send-email-pablo.de.lara.guarch@intel.com>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

2016-08-26 22:34, Pablo de Lara:
> From: Byron Marohn <byron.marohn@intel.com>
> 
> In lookup bulk function, the signatures of all entries
> are compared against the signature of the key that is being looked up.
> Now that all the signatures are together, they can be compared
> with vector instructions (SSE, AVX2), achieving higher lookup performance.
> 
> Also, entries per bucket are increased to 8 when using processors
> with AVX2, as 256 bits can be compared at once, which is the size of
> 8x32-bit signatures.

Please, would it be possible to use the generic SIMD intrinsics?
We could define generic types compatible with Altivec and NEON:
	__attribute__ ((vector_size (n)))
as described in https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html

> +/* 8 entries per bucket */
> +#if defined(__AVX2__)

Please prefer
	#ifdef RTE_MACHINE_CPUFLAG_AVX2
Ideally the vector support could be checked at runtime:
	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
It would allow packaging one binary using the best optimization available.

> +	*prim_hash_matches |= _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
> +			_mm256_load_si256((__m256i const *)prim_bkt->sig_current),
> +			_mm256_set1_epi32(prim_hash)));
> +	*sec_hash_matches |= _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
> +			_mm256_load_si256((__m256i const *)sec_bkt->sig_current),
> +			_mm256_set1_epi32(sec_hash)));
> +/* 4 entries per bucket */
> +#elif defined(__SSE2__)
> +	*prim_hash_matches |= _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
> +			_mm_load_si128((__m128i const *)prim_bkt->sig_current),
> +			_mm_set1_epi32(prim_hash)));
> +	*sec_hash_matches |= _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
> +			_mm_load_si128((__m128i const *)sec_bkt->sig_current),
> +			_mm_set1_epi32(sec_hash)));

In order to allow such switch based on register size, we could have an
abstraction in EAL supporting 128/256/512 width for x86/ARM/POWER.
I think aliasing RTE_MACHINE_CPUFLAG_ and RTE_CPUFLAG_ may be enough.