From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ravi Kerur Subject: Re: [PATCH] Clean up rte_memcpy.h file Date: Wed, 15 Apr 2015 14:00:51 -0700 Message-ID: References: <1429047011-11545-1-git-send-email-rkerur@gmail.com> <1429047113-11688-1-git-send-email-rkerur@gmail.com> <552E05FB.30504@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: "dev-VfR2kkLFssw@public.gmane.org" To: Pawel Wodkowski Return-path: In-Reply-To: <552E05FB.30504-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces-VfR2kkLFssw@public.gmane.org Sender: "dev" On Tue, Apr 14, 2015 at 11:32 PM, Pawel Wodkowski < pawelx.wodkowski-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: > On 2015-04-14 23:31, Ravi Kerur wrote: > >> + >> + for (i = 0; i < 8; i++) { >> + ymm = _mm256_loadu_si256((const __m256i *)(src + >> i * 32)); >> + _mm256_storeu_si256((__m256i *)(dst + i * 32), >> ymm); >> + } >> + >> n -= 256; >> - ymm1 = _mm256_loadu_si256((const __m256i *)((const >> uint8_t *)src + 1 * 32)); >> - ymm2 = _mm256_loadu_si256((const __m256i *)((const >> uint8_t *)src + 2 * 32)); >> - ymm3 = _mm256_loadu_si256((const __m256i *)((const >> uint8_t *)src + 3 * 32)); >> - ymm4 = _mm256_loadu_si256((const __m256i *)((const >> uint8_t *)src + 4 * 32)); >> - ymm5 = _mm256_loadu_si256((const __m256i *)((const >> uint8_t *)src + 5 * 32)); >> - ymm6 = _mm256_loadu_si256((const __m256i *)((const >> uint8_t *)src + 6 * 32)); >> - ymm7 = _mm256_loadu_si256((const __m256i *)((const >> uint8_t *)src + 7 * 32)); >> - src = (const uint8_t *)src + 256; >> - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 0 * 32), >> ymm0); >> - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 1 * 32), >> ymm1); >> - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 2 * 32), >> ymm2); >> - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 3 * 32), >> ymm3); >> - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 4 * 32), >> ymm4); >> - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 5 * 32), >> ymm5); >> - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 6 * 32), >> ymm6); >> - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 7 * 32), >> ymm7); >> - dst = (uint8_t *)dst + 256; >> + src = src + 256; >> + dst = dst + 256; >> } >> > > Did you perform a performance test on that part? > > I ran "make test" which runs "memcpy perf" results were given in "cover-letter". I am pasting it here again. /**********************With changes*************************************/ Start memcpy_perf: Success [00m 00s] Memcpy performance autotest: Success [09m 36s] [17m 45s] /**********************Without changes**********************************/ Start memcpy_perf: Success [00m 00s] Memcpy performance autotest: Success [09m 35s] [13m 57s] -- > Pawel >