From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ferruh Yigit Subject: Re: [PATCH v1 1/2] rte_memcmp functions using Intel AVX and SSE intrinsics Date: Thu, 20 Dec 2018 23:30:10 +0000 Message-ID: References: <1457391583-29604-1-git-send-email-rkerur@gmail.com> <1457391644-29645-1-git-send-email-rkerur@gmail.com> <8F6C2BD409508844A0EFC19955BE094110743936@SHSMSX103.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Cc: dpdk-dev , Thomas Monjalon To: "Wang, Zhihong" , Ravi Kerur Return-path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id DB51D1BD17 for ; Fri, 21 Dec 2018 00:30:13 +0100 (CET) In-Reply-To: <8F6C2BD409508844A0EFC19955BE094110743936@SHSMSX103.ccr.corp.intel.com> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 5/26/2016 9:57 AM, zhihong.wang at intel.com (Wang, Zhihong) wrote: >> -----Original Message----- >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ravi Kerur >> Sent: Tuesday, March 8, 2016 7:01 AM >> To: dev at dpdk.org >> Subject: [dpdk-dev] [PATCH v1 1/2] rte_memcmp functions using Intel AVX and >> SSE intrinsics >> >> v1: >> This patch adds memcmp functionality using AVX and SSE >> intrinsics provided by Intel. For other architectures >> supported by DPDK regular memcmp function is used. >> >> Compiled and tested on Ubuntu 14.04(non-NUMA) and 15.10(NUMA) >> systems. >> > [...] > >> + if (unlikely(!_mm_testz_si128(xmm2, xmm2))) { >> + __m128i idx = >> + _mm_setr_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0); > > line over 80 characters ;) > >> + >> + /* >> + * Reverse byte order >> + */ >> + xmm0 = _mm_shuffle_epi8(xmm0, idx); >> + xmm1 = _mm_shuffle_epi8(xmm1, idx); >> + >> + /* >> + * Compare unsigned bytes with instructions for signed bytes >> + */ >> + xmm0 = _mm_xor_si128(xmm0, _mm_set1_epi8(0x80)); >> + xmm1 = _mm_xor_si128(xmm1, _mm_set1_epi8(0x80)); >> + >> + return _mm_movemask_epi8(xmm0 > xmm1) - >> _mm_movemask_epi8(xmm1 > xmm0); >> + } >> + >> + return 0; >> +} > > [...] > >> +static inline int >> +rte_memcmp(const void *_src_1, const void *_src_2, size_t n) >> +{ >> + const uint8_t *src_1 = (const uint8_t *)_src_1; >> + const uint8_t *src_2 = (const uint8_t *)_src_2; >> + int ret = 0; >> + >> + if (n < 16) >> + return rte_memcmp_regular(src_1, src_2, n); > [...] >> + >> + while (n > 512) { >> + ret = rte_cmp256(src_1 + 0 * 256, src_2 + 0 * 256); > > Thanks for the great work! > > Seems to me there's a big improvement area before going into detailed > instruction layout tuning that -- No unalignment handling here for large > size memcmp. > > So almost without a doubt the performance will be low in micro-architectures > like Sandy Bridge if the start address is unaligned, which might be a > common case. Patch is waiting for comment for a long time, since 2016 May. Updating patch status as rejected. Anyone planning to work on vectorized version of rte_memcmp() can benefit from this patch: https://patches.dpdk.org/patch/11156/ https://patches.dpdk.org/patch/11157/