All of lore.kernel.org
 help / color / mirror / Atom feed
* Prefetch in /lib/raid6/avx2.c
@ 2016-10-02 22:40 Doug Dumitru
  2016-10-05 23:17 ` Shaohua Li
  0 siblings, 1 reply; 4+ messages in thread
From: Doug Dumitru @ 2016-10-02 22:40 UTC (permalink / raw)
  To: linux-raid

I have been doing some high bandwidth testing of raid-6, and the
pretetch in raid6_avx24_gen_syndrome appears to be less than optimal.

This is my patch (against 4.4.0-38 [Ubuntu 16.04LTS)

--- cut here ---
--- lib/raid6/avx2.c0   2016-10-01 21:42:25.280347868 -0700
+++ lib/raid6/avx2.c    2016-10-02 15:35:48.168480760 -0700
@@ -189,10 +189,8 @@

                for (z = z0; z >= 0; z--) {

-                       asm volatile("prefetchnta %0" : : "m" (dptr[z][d]));
-                       asm volatile("prefetchnta %0" : : "m" (dptr[z][d+32]));
-                       asm volatile("prefetchnta %0" : : "m" (dptr[z][d+64]));
-                       asm volatile("prefetchnta %0" : : "m" (dptr[z][d+96]));
+                       asm volatile("prefetchnta %0" : : "m" (dptr[z][d+128]));
+                       asm volatile("prefetchnta %0" : : "m" (dptr[z][d+192]));

                        asm volatile("vpcmpgtb %ymm4,%ymm1,%ymm5");
                        asm volatile("vpcmpgtb %ymm6,%ymm1,%ymm7");
--- cut here ---

In perf, the cpu cycles goes from 5.3% to 3.0% for
raid6_avx24_gen_syndrome in my test and throughput increases from
about 8.2GB/sec to almost 10GB/sec.  It is a very "synthetic" test,
but the avx2 code does seem to be a factor.

I suspect other SSE and AVX "unroll variants" have similar issues, but
I have not tested those.

My test system is an E5-1650 v3 (single socket) with DDR4.  This might
help dual sockets even more.

Doug


-- 
Doug Dumitru
EasyCo LLC

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-10-06 17:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-02 22:40 Prefetch in /lib/raid6/avx2.c Doug Dumitru
2016-10-05 23:17 ` Shaohua Li
2016-10-06  7:27   ` AW: " Markus Stockhausen
2016-10-06 17:32     ` Doug Dumitru

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.