* [PATCH 0/2] Implement PMULL using host intrinsics
@ 2023-06-01 12:33 Ard Biesheuvel
2023-06-01 12:33 ` [PATCH 1/2] target/arm: Use x86 intrinsics to implement PMULL.P64 Ard Biesheuvel
2023-06-01 12:33 ` [PATCH 2/2] target/i386: Implement PCLMULQDQ using AArch64 PMULL instructions Ard Biesheuvel
0 siblings, 2 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2023-06-01 12:33 UTC (permalink / raw)
To: qemu-arm
Cc: qemu-devel, Ard Biesheuvel, Peter Maydell, Alex Bennée,
Richard Henderson, Philippe Mathieu-Daudé
Another set of RFC patches - this time for 64x64->128 polynomial
multiplication. Playing round with this on top of the AES changes I sent
out earlier this week, I noticed that the speedup is rather substantial.
PMULL is relevant for GCM encryption, which combines AES in counter mode
with GHASH, which is based on multiplication in GF(2^128). The
significance of PMULL to this encryption mode is basically why PMULL is
part of the AES crypto extension on AArch64.
Note that user emulation on a AArch64 host of x86 binaries that perform
any kind of HTTPS communication under the hood would likely benefit from
this.
Again, this approach is likely too ad-hoc, but it helps span the space
of what we might want to cover in terms of host acceleration API. (I'm
not a TCG expert, but I guess this raises the question what to cover in
helpers and what to cover using native TCG ops?)
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Ard Biesheuvel (2):
target/arm: Use x86 intrinsics to implement PMULL.P64
target/i386: Implement PCLMULQDQ using AArch64 PMULL instructions
host/include/aarch64/host/cpuinfo.h | 1 +
host/include/i386/host/cpuinfo.h | 1 +
target/arm/tcg/vec_helper.c | 26 +++++++++++++++++++-
target/i386/ops_sse.h | 24 ++++++++++++++++++
util/cpuinfo-aarch64.c | 1 +
util/cpuinfo-i386.c | 1 +
6 files changed, 53 insertions(+), 1 deletion(-)
--
2.39.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] target/arm: Use x86 intrinsics to implement PMULL.P64
2023-06-01 12:33 [PATCH 0/2] Implement PMULL using host intrinsics Ard Biesheuvel
@ 2023-06-01 12:33 ` Ard Biesheuvel
2023-06-01 13:00 ` Peter Maydell
2023-06-01 12:33 ` [PATCH 2/2] target/i386: Implement PCLMULQDQ using AArch64 PMULL instructions Ard Biesheuvel
1 sibling, 1 reply; 6+ messages in thread
From: Ard Biesheuvel @ 2023-06-01 12:33 UTC (permalink / raw)
To: qemu-arm
Cc: qemu-devel, Ard Biesheuvel, Peter Maydell, Alex Bennée,
Richard Henderson, Philippe Mathieu-Daudé
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
host/include/i386/host/cpuinfo.h | 1 +
target/arm/tcg/vec_helper.c | 26 +++++++++++++++++++-
util/cpuinfo-i386.c | 1 +
3 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/host/include/i386/host/cpuinfo.h b/host/include/i386/host/cpuinfo.h
index 073d0a426f31487d..cf4ced844760d28f 100644
--- a/host/include/i386/host/cpuinfo.h
+++ b/host/include/i386/host/cpuinfo.h
@@ -27,6 +27,7 @@
#define CPUINFO_ATOMIC_VMOVDQA (1u << 16)
#define CPUINFO_ATOMIC_VMOVDQU (1u << 17)
#define CPUINFO_AES (1u << 18)
+#define CPUINFO_PMULL (1u << 19)
/* Initialized with a constructor. */
extern unsigned cpuinfo;
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index f59d3b26eacf08f8..fb422627588439b3 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -25,6 +25,14 @@
#include "qemu/int128.h"
#include "vec_internal.h"
+#ifdef __x86_64__
+#include "host/cpuinfo.h"
+#include <wmmintrin.h>
+#define TARGET_PMULL __attribute__((__target__("pclmul")))
+#else
+#define TARGET_PMULL
+#endif
+
/*
* Data for expanding active predicate bits to bytes, for byte elements.
*
@@ -2010,12 +2018,28 @@ void HELPER(gvec_pmul_b)(void *vd, void *vn, void *vm, uint32_t desc)
* Because of the lanes are not accessed in strict columns,
* this probably cannot be turned into a generic helper.
*/
-void HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, uint32_t desc)
+void TARGET_PMULL HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, uint32_t desc)
{
intptr_t i, j, opr_sz = simd_oprsz(desc);
intptr_t hi = simd_data(desc);
uint64_t *d = vd, *n = vn, *m = vm;
+#ifdef __x86_64__
+ if (cpuinfo & CPUINFO_PMULL) {
+ switch (hi) {
+ case 0:
+ *(__m128i *)vd = _mm_clmulepi64_si128(*(__m128i *)vm, *(__m128i *)vn, 0x0);
+ break;
+ case 1:
+ *(__m128i *)vd = _mm_clmulepi64_si128(*(__m128i *)vm, *(__m128i *)vn, 0x11);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ return;
+ }
+#endif
+
for (i = 0; i < opr_sz / 8; i += 2) {
uint64_t nn = n[i + hi];
uint64_t mm = m[i + hi];
diff --git a/util/cpuinfo-i386.c b/util/cpuinfo-i386.c
index 3043f066c0182dc8..8930e13451201a64 100644
--- a/util/cpuinfo-i386.c
+++ b/util/cpuinfo-i386.c
@@ -40,6 +40,7 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
info |= (c & bit_MOVBE ? CPUINFO_MOVBE : 0);
info |= (c & bit_POPCNT ? CPUINFO_POPCNT : 0);
info |= (c & bit_AES ? CPUINFO_AES : 0);
+ info |= (c & bit_PCLMULQDQ ? CPUINFO_PMULL : 0);
/* For AVX features, we must check available and usable. */
if ((c & bit_AVX) && (c & bit_OSXSAVE)) {
--
2.39.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] target/i386: Implement PCLMULQDQ using AArch64 PMULL instructions
2023-06-01 12:33 [PATCH 0/2] Implement PMULL using host intrinsics Ard Biesheuvel
2023-06-01 12:33 ` [PATCH 1/2] target/arm: Use x86 intrinsics to implement PMULL.P64 Ard Biesheuvel
@ 2023-06-01 12:33 ` Ard Biesheuvel
2023-06-01 17:13 ` Ard Biesheuvel
1 sibling, 1 reply; 6+ messages in thread
From: Ard Biesheuvel @ 2023-06-01 12:33 UTC (permalink / raw)
To: qemu-arm
Cc: qemu-devel, Ard Biesheuvel, Peter Maydell, Alex Bennée,
Richard Henderson, Philippe Mathieu-Daudé
Use the AArch64 PMULL{2}.P64 instructions to implement PCLMULQDQ instead
of emulating them in C code if the host supports this. This is used in
the implementation of GCM, which is widely used in IPsec VPN and HTTPS.
Somewhat surprising results: on my ThunderX2, enabling this on top of
the AES acceleration I sent out earlier, the speedup is substantial.
(1420 is a typical IPsec block size - in HTTPS, GCM operates on much
larger block sizes but the kernel mode benchmarks are not the best place
to measure its performance in this mode)
tcrypt: testing speed of rfc4106(gcm(aes)) (rfc4106-gcm-aesni) encryption
No acceleration
tcrypt: test 5 (160 bit key, 1420 byte blocks): 10046 operations in 1 seconds (14265320 bytes)
AES acceleration
tcrypt: test 5 (160 bit key, 1420 byte blocks): 13970 operations in 1 seconds (19837400 bytes)
AES + PMULL acceleration
tcrypt: test 5 (160 bit key, 1420 byte blocks): 24372 operations in 1 seconds (34608240 bytes)
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
host/include/aarch64/host/cpuinfo.h | 1 +
target/i386/ops_sse.h | 24 ++++++++++++++++++++
util/cpuinfo-aarch64.c | 1 +
3 files changed, 26 insertions(+)
diff --git a/host/include/aarch64/host/cpuinfo.h b/host/include/aarch64/host/cpuinfo.h
index 05feeb4f4369fc19..da268dce1390cac0 100644
--- a/host/include/aarch64/host/cpuinfo.h
+++ b/host/include/aarch64/host/cpuinfo.h
@@ -10,6 +10,7 @@
#define CPUINFO_LSE (1u << 1)
#define CPUINFO_LSE2 (1u << 2)
#define CPUINFO_AES (1u << 3)
+#define CPUINFO_PMULL (1u << 4)
/* Initialized with a constructor. */
extern unsigned cpuinfo;
diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index db79132778efd211..d7e7bd8b733122a8 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -2157,6 +2157,30 @@ void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s,
uint64_t a, b;
int i;
+#ifdef __aarch64__
+ if (cpuinfo & CPUINFO_PMULL) {
+ aes_vec_t vv = *(aes_vec_t *)v, vs = *(aes_vec_t *)s;
+ aes_vec_t *vd = (aes_vec_t *)d;
+
+ switch (ctrl & 0x11) {
+ case 0x1:
+ asm("ext %0.16b, %0.16b, %0.16b, #8":"+w"(vv));
+ /* fallthrough */
+ case 0x0:
+ asm(".arch_extension aes\n"
+ "pmull %0.1q, %1.1d, %2.1d":"=w"(*vd):"w"(vv),"w"(vs));
+ break;
+ case 0x10:
+ asm("ext %0.16b, %0.16b, %0.16b, #8":"+w"(vv));
+ /* fallthrough */
+ case 0x11:
+ asm(".arch_extension aes\n"
+ "pmull2 %0.1q, %1.2d, %2.2d":"=w"(*vd):"w"(vv),"w"(vs));
+ }
+ return;
+ }
+#endif
+
for (i = 0; i < 1 << SHIFT; i += 2) {
a = v->Q(((ctrl & 1) != 0) + i);
b = s->Q(((ctrl & 16) != 0) + i);
diff --git a/util/cpuinfo-aarch64.c b/util/cpuinfo-aarch64.c
index 769cdfeb2fc32d5e..95ec1f4adfc829b9 100644
--- a/util/cpuinfo-aarch64.c
+++ b/util/cpuinfo-aarch64.c
@@ -57,6 +57,7 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
info |= (hwcap & HWCAP_ATOMICS ? CPUINFO_LSE : 0);
info |= (hwcap & HWCAP_USCAT ? CPUINFO_LSE2 : 0);
info |= (hwcap & HWCAP_AES ? CPUINFO_AES : 0);
+ info |= (hwcap & HWCAP_PMULL ? CPUINFO_PMULL : 0);
#endif
#ifdef CONFIG_DARWIN
info |= sysctl_for_bool("hw.optional.arm.FEAT_LSE") * CPUINFO_LSE;
--
2.39.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] target/arm: Use x86 intrinsics to implement PMULL.P64
2023-06-01 12:33 ` [PATCH 1/2] target/arm: Use x86 intrinsics to implement PMULL.P64 Ard Biesheuvel
@ 2023-06-01 13:00 ` Peter Maydell
2023-06-01 15:28 ` Ard Biesheuvel
0 siblings, 1 reply; 6+ messages in thread
From: Peter Maydell @ 2023-06-01 13:00 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: qemu-arm, qemu-devel, Alex Bennée, Richard Henderson,
Philippe Mathieu-Daudé
On Thu, 1 Jun 2023 at 13:33, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> host/include/i386/host/cpuinfo.h | 1 +
> target/arm/tcg/vec_helper.c | 26 +++++++++++++++++++-
> util/cpuinfo-i386.c | 1 +
> 3 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/host/include/i386/host/cpuinfo.h b/host/include/i386/host/cpuinfo.h
> index 073d0a426f31487d..cf4ced844760d28f 100644
> --- a/host/include/i386/host/cpuinfo.h
> +++ b/host/include/i386/host/cpuinfo.h
> @@ -27,6 +27,7 @@
> #define CPUINFO_ATOMIC_VMOVDQA (1u << 16)
> #define CPUINFO_ATOMIC_VMOVDQU (1u << 17)
> #define CPUINFO_AES (1u << 18)
> +#define CPUINFO_PMULL (1u << 19)
>
> /* Initialized with a constructor. */
> extern unsigned cpuinfo;
> diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
> index f59d3b26eacf08f8..fb422627588439b3 100644
> --- a/target/arm/tcg/vec_helper.c
> +++ b/target/arm/tcg/vec_helper.c
> @@ -25,6 +25,14 @@
> #include "qemu/int128.h"
> #include "vec_internal.h"
>
> +#ifdef __x86_64__
> +#include "host/cpuinfo.h"
> +#include <wmmintrin.h>
> +#define TARGET_PMULL __attribute__((__target__("pclmul")))
> +#else
> +#define TARGET_PMULL
> +#endif
> +
> /*
> * Data for expanding active predicate bits to bytes, for byte elements.
> *
> @@ -2010,12 +2018,28 @@ void HELPER(gvec_pmul_b)(void *vd, void *vn, void *vm, uint32_t desc)
> * Because of the lanes are not accessed in strict columns,
> * this probably cannot be turned into a generic helper.
> */
> -void HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, uint32_t desc)
> +void TARGET_PMULL HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, uint32_t desc)
> {
> intptr_t i, j, opr_sz = simd_oprsz(desc);
> intptr_t hi = simd_data(desc);
> uint64_t *d = vd, *n = vn, *m = vm;
>
> +#ifdef __x86_64__
> + if (cpuinfo & CPUINFO_PMULL) {
> + switch (hi) {
> + case 0:
> + *(__m128i *)vd = _mm_clmulepi64_si128(*(__m128i *)vm, *(__m128i *)vn, 0x0);
> + break;
> + case 1:
> + *(__m128i *)vd = _mm_clmulepi64_si128(*(__m128i *)vm, *(__m128i *)vn, 0x11);
> + break;
> + default:
> + g_assert_not_reached();
> + }
> + return;
> + }
> +#endif
This needs to cope with the input vectors being more than
just 128 bits wide, I think. Also you probably still
need the clear_tail() to clear any high bits of the register.
thanks
-- PMM
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] target/arm: Use x86 intrinsics to implement PMULL.P64
2023-06-01 13:00 ` Peter Maydell
@ 2023-06-01 15:28 ` Ard Biesheuvel
0 siblings, 0 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2023-06-01 15:28 UTC (permalink / raw)
To: Peter Maydell
Cc: qemu-arm, qemu-devel, Alex Bennée, Richard Henderson,
Philippe Mathieu-Daudé
On Thu, 1 Jun 2023 at 15:01, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Thu, 1 Jun 2023 at 13:33, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > host/include/i386/host/cpuinfo.h | 1 +
> > target/arm/tcg/vec_helper.c | 26 +++++++++++++++++++-
> > util/cpuinfo-i386.c | 1 +
> > 3 files changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/host/include/i386/host/cpuinfo.h b/host/include/i386/host/cpuinfo.h
> > index 073d0a426f31487d..cf4ced844760d28f 100644
> > --- a/host/include/i386/host/cpuinfo.h
> > +++ b/host/include/i386/host/cpuinfo.h
> > @@ -27,6 +27,7 @@
> > #define CPUINFO_ATOMIC_VMOVDQA (1u << 16)
> > #define CPUINFO_ATOMIC_VMOVDQU (1u << 17)
> > #define CPUINFO_AES (1u << 18)
> > +#define CPUINFO_PMULL (1u << 19)
> >
> > /* Initialized with a constructor. */
> > extern unsigned cpuinfo;
> > diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
> > index f59d3b26eacf08f8..fb422627588439b3 100644
> > --- a/target/arm/tcg/vec_helper.c
> > +++ b/target/arm/tcg/vec_helper.c
> > @@ -25,6 +25,14 @@
> > #include "qemu/int128.h"
> > #include "vec_internal.h"
> >
> > +#ifdef __x86_64__
> > +#include "host/cpuinfo.h"
> > +#include <wmmintrin.h>
> > +#define TARGET_PMULL __attribute__((__target__("pclmul")))
> > +#else
> > +#define TARGET_PMULL
> > +#endif
> > +
> > /*
> > * Data for expanding active predicate bits to bytes, for byte elements.
> > *
> > @@ -2010,12 +2018,28 @@ void HELPER(gvec_pmul_b)(void *vd, void *vn, void *vm, uint32_t desc)
> > * Because of the lanes are not accessed in strict columns,
> > * this probably cannot be turned into a generic helper.
> > */
> > -void HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, uint32_t desc)
> > +void TARGET_PMULL HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, uint32_t desc)
> > {
> > intptr_t i, j, opr_sz = simd_oprsz(desc);
> > intptr_t hi = simd_data(desc);
> > uint64_t *d = vd, *n = vn, *m = vm;
> >
> > +#ifdef __x86_64__
> > + if (cpuinfo & CPUINFO_PMULL) {
> > + switch (hi) {
> > + case 0:
> > + *(__m128i *)vd = _mm_clmulepi64_si128(*(__m128i *)vm, *(__m128i *)vn, 0x0);
> > + break;
> > + case 1:
> > + *(__m128i *)vd = _mm_clmulepi64_si128(*(__m128i *)vm, *(__m128i *)vn, 0x11);
> > + break;
> > + default:
> > + g_assert_not_reached();
> > + }
> > + return;
> > + }
> > +#endif
>
> This needs to cope with the input vectors being more than
> just 128 bits wide, I think. Also you probably still
> need the clear_tail() to clear any high bits of the register.
>
Ah yes, I missed that completely.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] target/i386: Implement PCLMULQDQ using AArch64 PMULL instructions
2023-06-01 12:33 ` [PATCH 2/2] target/i386: Implement PCLMULQDQ using AArch64 PMULL instructions Ard Biesheuvel
@ 2023-06-01 17:13 ` Ard Biesheuvel
0 siblings, 0 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2023-06-01 17:13 UTC (permalink / raw)
To: qemu-arm
Cc: qemu-devel, Peter Maydell, Alex Bennée, Richard Henderson,
Philippe Mathieu-Daudé
On Thu, 1 Jun 2023 at 14:33, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> Use the AArch64 PMULL{2}.P64 instructions to implement PCLMULQDQ instead
> of emulating them in C code if the host supports this. This is used in
> the implementation of GCM, which is widely used in IPsec VPN and HTTPS.
>
> Somewhat surprising results: on my ThunderX2, enabling this on top of
> the AES acceleration I sent out earlier, the speedup is substantial.
>
> (1420 is a typical IPsec block size - in HTTPS, GCM operates on much
> larger block sizes but the kernel mode benchmarks are not the best place
> to measure its performance in this mode)
>
> tcrypt: testing speed of rfc4106(gcm(aes)) (rfc4106-gcm-aesni) encryption
>
> No acceleration
> tcrypt: test 5 (160 bit key, 1420 byte blocks): 10046 operations in 1 seconds (14265320 bytes)
>
> AES acceleration
> tcrypt: test 5 (160 bit key, 1420 byte blocks): 13970 operations in 1 seconds (19837400 bytes)
>
> AES + PMULL acceleration
> tcrypt: test 5 (160 bit key, 1420 byte blocks): 24372 operations in 1 seconds (34608240 bytes)
>
User space benchmark (using OS's qemu-x86_64 vs one built with these
changes applied)
Speedup is about 5x
ard@gambale:~/build/openssl$ apps/openssl speed -evp aes-128-gcm
Doing AES-128-GCM for 3s on 16 size blocks: 1692138 AES-128-GCM's in 2.98s
Doing AES-128-GCM for 3s on 64 size blocks: 665012 AES-128-GCM's in 3.00s
Doing AES-128-GCM for 3s on 256 size blocks: 203784 AES-128-GCM's in 3.00s
Doing AES-128-GCM for 3s on 1024 size blocks: 49397 AES-128-GCM's in 3.00s
Doing AES-128-GCM for 3s on 8192 size blocks: 6447 AES-128-GCM's in 3.00s
Doing AES-128-GCM for 3s on 16384 size blocks: 3058 AES-128-GCM's in 3.00s
version: 3.2.0-dev
built on: Thu Jun 1 17:06:09 2023 UTC
options: bn(64,64)
compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall
-O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL
-DNDEBUG
CPUINFO: OPENSSL_ia32cap=0xfed8320b0fcbfffd:0x8001020c01d843a9
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes
8192 bytes 16384 bytes
AES-128-GCM 9085.30k 14186.92k 17389.57k 16860.84k
17604.61k 16700.76k
ard@gambale:~/build/openssl$ ../qemu/build/qemu-x86_64 apps/openssl
speed -evp aes-128-gcm
Doing AES-128-GCM for 3s on 16 size blocks: 2703271 AES-128-GCM's in 2.99s
Doing AES-128-GCM for 3s on 64 size blocks: 1537884 AES-128-GCM's in 3.00s
Doing AES-128-GCM for 3s on 256 size blocks: 653008 AES-128-GCM's in 3.00s
Doing AES-128-GCM for 3s on 1024 size blocks: 203579 AES-128-GCM's in 3.00s
Doing AES-128-GCM for 3s on 8192 size blocks: 29020 AES-128-GCM's in 3.00s
Doing AES-128-GCM for 3s on 16384 size blocks: 14716 AES-128-GCM's in 2.99s
version: 3.2.0-dev
built on: Thu Jun 1 17:06:09 2023 UTC
options: bn(64,64)
compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall
-O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL
-DNDEBUG
CPUINFO: OPENSSL_ia32cap=0xfed8320b0fcbfffd:0x8001020c01d843a9
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes
8192 bytes 16384 bytes
AES-128-GCM 14465.66k 32808.19k 55723.35k 69488.30k
79243.95k 80637.77k
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-06-01 17:14 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-01 12:33 [PATCH 0/2] Implement PMULL using host intrinsics Ard Biesheuvel
2023-06-01 12:33 ` [PATCH 1/2] target/arm: Use x86 intrinsics to implement PMULL.P64 Ard Biesheuvel
2023-06-01 13:00 ` Peter Maydell
2023-06-01 15:28 ` Ard Biesheuvel
2023-06-01 12:33 ` [PATCH 2/2] target/i386: Implement PCLMULQDQ using AArch64 PMULL instructions Ard Biesheuvel
2023-06-01 17:13 ` Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).