* [PATCH 1/2] crypto: twofish - disable AVX2 implementation @ 2013-06-02 16:51 Jussi Kivilinna 2013-06-02 16:51 ` [PATCH 2/2] crypto: blowfish " Jussi Kivilinna 0 siblings, 1 reply; 4+ messages in thread From: Jussi Kivilinna @ 2013-06-02 16:51 UTC (permalink / raw) To: linux-crypto; +Cc: Herbert Xu, David S. Miller It appears that the performance of 'vpgatherdd' is suboptimal for this kind of workload (tested on Core i5-4570) and causes twofish_avx2 to be significantly slower than twofish_avx. So disable the AVX2 implementation to avoid performance regressions. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> --- crypto/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/crypto/Kconfig b/crypto/Kconfig index d1ca631..678a6ed 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -1318,6 +1318,7 @@ config CRYPTO_TWOFISH_AVX_X86_64 config CRYPTO_TWOFISH_AVX2_X86_64 tristate "Twofish cipher algorithm (x86_64/AVX2)" depends on X86 && 64BIT + depends on BROKEN select CRYPTO_ALGAPI select CRYPTO_CRYPTD select CRYPTO_ABLK_HELPER_X86 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] crypto: blowfish - disable AVX2 implementation 2013-06-02 16:51 [PATCH 1/2] crypto: twofish - disable AVX2 implementation Jussi Kivilinna @ 2013-06-02 16:51 ` Jussi Kivilinna 2013-06-05 8:34 ` Herbert Xu 0 siblings, 1 reply; 4+ messages in thread From: Jussi Kivilinna @ 2013-06-02 16:51 UTC (permalink / raw) To: linux-crypto; +Cc: Herbert Xu, David S. Miller It appears that the performance of 'vpgatherdd' is suboptimal for this kind of workload (tested on Core i5-4570) and causes blowfish-avx2 to be significantly slower than blowfish-amd64. So disable the AVX2 implementation to avoid performance regressions. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> --- crypto/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/crypto/Kconfig b/crypto/Kconfig index 678a6ed..8ca52c5 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -842,6 +842,7 @@ config CRYPTO_BLOWFISH_X86_64 config CRYPTO_BLOWFISH_AVX2_X86_64 tristate "Blowfish cipher algorithm (x86_64/AVX2)" depends on X86 && 64BIT + depends on BROKEN select CRYPTO_ALGAPI select CRYPTO_CRYPTD select CRYPTO_ABLK_HELPER_X86 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] crypto: blowfish - disable AVX2 implementation 2013-06-02 16:51 ` [PATCH 2/2] crypto: blowfish " Jussi Kivilinna @ 2013-06-05 8:34 ` Herbert Xu 2013-06-05 12:26 ` Jussi Kivilinna 0 siblings, 1 reply; 4+ messages in thread From: Herbert Xu @ 2013-06-05 8:34 UTC (permalink / raw) To: Jussi Kivilinna; +Cc: linux-crypto, David S. Miller On Sun, Jun 02, 2013 at 07:51:52PM +0300, Jussi Kivilinna wrote: > It appears that the performance of 'vpgatherdd' is suboptimal for this kind of > workload (tested on Core i5-4570) and causes blowfish-avx2 to be significantly > slower than blowfish-amd64. So disable the AVX2 implementation to avoid > performance regressions. > > Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Both patches applied to crypto. I presume you're working on a more permanent solution on this? Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] crypto: blowfish - disable AVX2 implementation 2013-06-05 8:34 ` Herbert Xu @ 2013-06-05 12:26 ` Jussi Kivilinna 0 siblings, 0 replies; 4+ messages in thread From: Jussi Kivilinna @ 2013-06-05 12:26 UTC (permalink / raw) To: Herbert Xu; +Cc: linux-crypto, David S. Miller On 05.06.2013 11:34, Herbert Xu wrote: > On Sun, Jun 02, 2013 at 07:51:52PM +0300, Jussi Kivilinna wrote: >> It appears that the performance of 'vpgatherdd' is suboptimal for this kind of >> workload (tested on Core i5-4570) and causes blowfish-avx2 to be significantly >> slower than blowfish-amd64. So disable the AVX2 implementation to avoid >> performance regressions. >> >> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> > > Both patches applied to crypto. I presume you're working on > a more permanent solution on this? Yes, I've been looking for solution. Problem is, well, that I assumed vgather to be quicker than emulating gather using vpextr/vpinsr instructions. But it appears that vgather has about the same speed as group of vpextr/vpinsr doing gather manually. So doing asm volatile( "vpgatherdd %%xmm0, (%[ptr], %%xmm8, 4), %%xmm9; \n\t" "vpcmpeqd %%xmm0, %%xmm0, %%xmm0; /* reset mask */ \n\t" "vpgatherdd %%xmm0, (%[ptr], %%xmm9, 4), %%xmm8; \n\t" "vpcmpeqd %%xmm0, %%xmm0, %%xmm0; \n\t" :: [ptr] "r" (&mem[0]) : "memory" ); in loop is slightly _slower_ than manually extracting&inserting values with asm volatile( "vmovd %%xmm8, %%eax; \n\t" "vpextrd $1, %%xmm8, %%edx; \n\t" "vmovd (%[ptr], %%rax, 4), %%xmm10; \n\t" "vpextrd $2, %%xmm8, %%eax; \n\t" "vpinsrd $1, (%[ptr], %%rdx, 4), %%xmm10, %%xmm10; \n\t" "vpextrd $3, %%xmm8, %%edx; \n\t" "vpinsrd $2, (%[ptr], %%rax, 4), %%xmm10, %%xmm10; \n\t" "vpinsrd $3, (%[ptr], %%rdx, 4), %%xmm10, %%xmm9; \n\t" "vmovd %%xmm9, %%eax; \n\t" "vpextrd $1, %%xmm9, %%edx; \n\t" "vmovd (%[ptr], %%rax, 4), %%xmm10; \n\t" "vpextrd $2, %%xmm9, %%eax; \n\t" "vpinsrd $1, (%[ptr], %%rdx, 4), %%xmm10, %%xmm10; \n\t" "vpextrd $3, %%xmm9, %%edx; \n\t" "vpinsrd $2, (%[ptr], %%rax, 4), %%xmm10, %%xmm10; \n\t" "vpinsrd $3, (%[ptr], %%rdx, 4), %%xmm10, %%xmm8; \n\t" :: [ptr] "r" (&mem[0]) : "memory", "eax", "edx" ); vpextr/vpinsr cannot be used with 256-bit wide ymm registers, so 'vinserti128/vextracti128' is needed and make manual gather about the same speed as vpgatherdd. Now the block cipher implementations need to use all bytes of vector register for table look-ups, and the way that this is done in the AVX implementation of Twofish (move data from vector register to generic purpose registers, handle byte-extraction and table look-ups there and move processed data back to vector register) is about two to three times faster than the way with current AVX2 implementation using vgather. Blowfish does not do much processing in addition to table look-ups, so there is not much to that can be done. With Twofish, the table look-ups are the most computationally heavy part and I don't think that the wider vector registers in the other parts are going to give much boost. So permanent solution is likely to be revert. -Jussi > > Thanks, > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-06-05 12:26 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-06-02 16:51 [PATCH 1/2] crypto: twofish - disable AVX2 implementation Jussi Kivilinna 2013-06-02 16:51 ` [PATCH 2/2] crypto: blowfish " Jussi Kivilinna 2013-06-05 8:34 ` Herbert Xu 2013-06-05 12:26 ` Jussi Kivilinna
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox