qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/35] crypto: Provide aes-round.h and host accel
@ 2023-06-03  2:33 Richard Henderson
  2023-06-03  2:33 ` [PATCH 01/35] tests/multiarch: Add test-aes Richard Henderson
                   ` (35 more replies)
  0 siblings, 36 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

Inspired by Ard Biesheuvel's RFC patches for accelerating AES
under emulation, provide a set of primitives that maps between
the guest and host fragments.

There is a small guest correctness test case.

I think the end result is quite a bit cleaner, since the logic
is now centralized, rather than spread across 4 different guests.

Further work could clean up crypto/aes.c itself to use these
instead of the tables directly.  I'm sure that's just an ultimate
fallback when an appropriate system library is not available, and
so not terribly important, but it could still significantly reduce
the amount of code we carry.

I would imagine structuring a polynomial multiplication header
in a similar way.  There are 4 or 5 versions of those spread across
the different guests.

Anyway, please review.


r~


Richard Henderson (35):
  tests/multiarch: Add test-aes
  target/arm: Move aesmc and aesimc tables to crypto/aes.c
  crypto/aes: Add constants for ShiftRows, InvShiftRows
  crypto: Add aesenc_SB_SR
  target/i386: Use aesenc_SB_SR
  target/arm: Demultiplex AESE and AESMC
  target/arm: Use aesenc_SB_SR
  target/ppc: Use aesenc_SB_SR
  target/riscv: Use aesenc_SB_SR
  crypto: Add aesdec_ISB_ISR
  target/i386: Use aesdec_ISB_ISR
  target/arm: Use aesdec_ISB_ISR
  target/ppc: Use aesdec_ISB_ISR
  target/riscv: Use aesdec_ISB_ISR
  crypto: Add aesenc_MC
  target/arm: Use aesenc_MC
  crypto: Add aesdec_IMC
  target/i386: Use aesdec_IMC
  target/arm: Use aesdec_IMC
  target/riscv: Use aesdec_IMC
  crypto: Add aesenc_SB_SR_MC_AK
  target/i386: Use aesenc_SB_SR_MC_AK
  target/ppc: Use aesenc_SB_SR_MC_AK
  target/riscv: Use aesenc_SB_SR_MC_AK
  crypto: Add aesdec_ISB_ISR_IMC_AK
  target/i386: Use aesdec_ISB_ISR_IMC_AK
  target/riscv: Use aesdec_ISB_ISR_IMC_AK
  crypto: Add aesdec_ISB_ISR_AK_IMC
  target/ppc: Use aesdec_ISB_ISR_AK_IMC
  host/include/i386: Implement aes-round.h
  host/include/aarch64: Implement aes-round.h
  crypto: Remove AES_shifts, AES_ishifts
  crypto: Implement aesdec_IMC with AES_imc_rot
  crypto: Remove AES_imc
  crypto: Unexport AES_*_rot, AES_TeN, AES_TdN

 host/include/aarch64/host/aes-round.h   | 204 ++++++
 host/include/aarch64/host/cpuinfo.h     |   1 +
 host/include/generic/host/aes-round.h   |  36 ++
 host/include/i386/host/aes-round.h      | 148 +++++
 host/include/i386/host/cpuinfo.h        |   1 +
 host/include/x86_64/host/aes-round.h    |   1 +
 include/crypto/aes-round.h              | 158 +++++
 include/crypto/aes.h                    |  30 -
 target/arm/helper.h                     |   2 +
 target/i386/ops_sse.h                   |  64 +-
 target/arm/tcg/sve.decode               |   4 +-
 crypto/aes.c                            | 808 ++++++++++++++++--------
 target/arm/tcg/crypto_helper.c          | 245 +++----
 target/arm/tcg/translate-a64.c          |  13 +-
 target/arm/tcg/translate-neon.c         |   4 +-
 target/arm/tcg/translate-sve.c          |   8 +-
 target/ppc/int_helper.c                 |  58 +-
 target/riscv/crypto_helper.c            | 142 ++---
 tests/tcg/aarch64/test-aes.c            |  58 ++
 tests/tcg/i386/test-aes.c               |  68 ++
 tests/tcg/ppc64/test-aes.c              | 116 ++++
 tests/tcg/riscv64/test-aes.c            |  76 +++
 util/cpuinfo-aarch64.c                  |   2 +
 util/cpuinfo-i386.c                     |   3 +
 tests/tcg/multiarch/test-aes-main.c.inc | 183 ++++++
 tests/tcg/aarch64/Makefile.target       |   4 +
 tests/tcg/i386/Makefile.target          |   4 +
 tests/tcg/ppc64/Makefile.target         |   1 +
 tests/tcg/riscv64/Makefile.target       |   4 +
 29 files changed, 1776 insertions(+), 670 deletions(-)
 create mode 100644 host/include/aarch64/host/aes-round.h
 create mode 100644 host/include/generic/host/aes-round.h
 create mode 100644 host/include/i386/host/aes-round.h
 create mode 100644 host/include/x86_64/host/aes-round.h
 create mode 100644 include/crypto/aes-round.h
 create mode 100644 tests/tcg/aarch64/test-aes.c
 create mode 100644 tests/tcg/i386/test-aes.c
 create mode 100644 tests/tcg/ppc64/test-aes.c
 create mode 100644 tests/tcg/riscv64/test-aes.c
 create mode 100644 tests/tcg/multiarch/test-aes-main.c.inc

-- 
2.34.1



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 01/35] tests/multiarch: Add test-aes
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
@ 2023-06-03  2:33 ` Richard Henderson
  2023-06-03  2:33 ` [PATCH 02/35] target/arm: Move aesmc and aesimc tables to crypto/aes.c Richard Henderson
                   ` (34 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

Use a shared driver and backends for i386, aarch64, ppc64, riscv64.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tests/tcg/aarch64/test-aes.c            |  58 ++++++++
 tests/tcg/i386/test-aes.c               |  68 +++++++++
 tests/tcg/ppc64/test-aes.c              | 116 +++++++++++++++
 tests/tcg/riscv64/test-aes.c            |  76 ++++++++++
 tests/tcg/multiarch/test-aes-main.c.inc | 183 ++++++++++++++++++++++++
 tests/tcg/aarch64/Makefile.target       |   4 +
 tests/tcg/i386/Makefile.target          |   4 +
 tests/tcg/ppc64/Makefile.target         |   1 +
 tests/tcg/riscv64/Makefile.target       |   4 +
 9 files changed, 514 insertions(+)
 create mode 100644 tests/tcg/aarch64/test-aes.c
 create mode 100644 tests/tcg/i386/test-aes.c
 create mode 100644 tests/tcg/ppc64/test-aes.c
 create mode 100644 tests/tcg/riscv64/test-aes.c
 create mode 100644 tests/tcg/multiarch/test-aes-main.c.inc

diff --git a/tests/tcg/aarch64/test-aes.c b/tests/tcg/aarch64/test-aes.c
new file mode 100644
index 0000000000..2cd324f09b
--- /dev/null
+++ b/tests/tcg/aarch64/test-aes.c
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include "../multiarch/test-aes-main.c.inc"
+
+bool test_SB_SR(uint8_t *o, const uint8_t *i)
+{
+    /* aese also adds round key, so supply zero. */
+    asm("ld1 { v0.16b }, [%1]\n\t"
+        "movi v1.16b, #0\n\t"
+        "aese v0.16b, v1.16b\n\t"
+        "st1 { v0.16b }, [%0]"
+        : : "r"(o), "r"(i) : "v0", "v1", "memory");
+    return true;
+}
+
+bool test_MC(uint8_t *o, const uint8_t *i)
+{
+    asm("ld1 { v0.16b }, [%1]\n\t"
+        "aesmc v0.16b, v0.16b\n\t"
+        "st1 { v0.16b }, [%0]"
+        : : "r"(o), "r"(i) : "v0", "memory");
+    return true;
+}
+
+bool test_SB_SR_MC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    return false;
+}
+
+bool test_ISB_ISR(uint8_t *o, const uint8_t *i)
+{
+    /* aesd also adds round key, so supply zero. */
+    asm("ld1 { v0.16b }, [%1]\n\t"
+        "movi v1.16b, #0\n\t"
+        "aesd v0.16b, v1.16b\n\t"
+        "st1 { v0.16b }, [%0]"
+        : : "r"(o), "r"(i) : "v0", "v1", "memory");
+    return true;
+}
+
+bool test_IMC(uint8_t *o, const uint8_t *i)
+{
+    asm("ld1 { v0.16b }, [%1]\n\t"
+        "aesimc v0.16b, v0.16b\n\t"
+        "st1 { v0.16b }, [%0]"
+        : : "r"(o), "r"(i) : "v0", "memory");
+    return true;
+}
+
+bool test_ISB_ISR_AK_IMC(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    return false;
+}
+
+bool test_ISB_ISR_IMC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    return false;
+}
diff --git a/tests/tcg/i386/test-aes.c b/tests/tcg/i386/test-aes.c
new file mode 100644
index 0000000000..199395e6cc
--- /dev/null
+++ b/tests/tcg/i386/test-aes.c
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include "../multiarch/test-aes-main.c.inc"
+#include <immintrin.h>
+
+static bool test_SB_SR(uint8_t *o, const uint8_t *i)
+{
+    __m128i vi = _mm_loadu_si128((const __m128i_u *)i);
+
+    /* aesenclast also adds round key, so supply zero. */
+    vi = _mm_aesenclast_si128(vi, _mm_setzero_si128());
+
+    _mm_storeu_si128((__m128i_u *)o, vi);
+    return true;
+}
+
+static bool test_MC(uint8_t *o, const uint8_t *i)
+{
+    return false;
+}
+
+static bool test_SB_SR_MC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    __m128i vi = _mm_loadu_si128((const __m128i_u *)i);
+    __m128i vk = _mm_loadu_si128((const __m128i_u *)k);
+
+    vi = _mm_aesenc_si128(vi, vk);
+
+    _mm_storeu_si128((__m128i_u *)o, vi);
+    return true;
+}
+
+static bool test_ISB_ISR(uint8_t *o, const uint8_t *i)
+{
+    __m128i vi = _mm_loadu_si128((const __m128i_u *)i);
+
+    /* aesdeclast also adds round key, so supply zero. */
+    vi = _mm_aesdeclast_si128(vi, _mm_setzero_si128());
+
+    _mm_storeu_si128((__m128i_u *)o, vi);
+    return true;
+}
+
+static bool test_IMC(uint8_t *o, const uint8_t *i)
+{
+    __m128i vi = _mm_loadu_si128((const __m128i_u *)i);
+
+    vi = _mm_aesimc_si128(vi);
+
+    _mm_storeu_si128((__m128i_u *)o, vi);
+    return true;
+}
+
+static bool test_ISB_ISR_AK_IMC(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    return false;
+}
+
+static bool test_ISB_ISR_IMC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    __m128i vi = _mm_loadu_si128((const __m128i_u *)i);
+    __m128i vk = _mm_loadu_si128((const __m128i_u *)k);
+
+    vi = _mm_aesdec_si128(vi, vk);
+
+    _mm_storeu_si128((__m128i_u *)o, vi);
+    return true;
+}
diff --git a/tests/tcg/ppc64/test-aes.c b/tests/tcg/ppc64/test-aes.c
new file mode 100644
index 0000000000..1d2be488e9
--- /dev/null
+++ b/tests/tcg/ppc64/test-aes.c
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include "../multiarch/test-aes-main.c.inc"
+
+#undef BIG_ENDIAN
+#define BIG_ENDIAN  (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+
+static unsigned char bswap_le[16] __attribute__((aligned(16))) = {
+    8,9,10,11,12,13,14,15,
+    0,1,2,3,4,5,6,7
+};
+
+bool test_SB_SR(uint8_t *o, const uint8_t *i)
+{
+    /* vcipherlast also adds round key, so supply zero. */
+    if (BIG_ENDIAN) {
+        asm("lxvd2x 32,0,%1\n\t"
+            "vspltisb 1,0\n\t"
+            "vcipherlast 0,0,1\n\t"
+            "stxvd2x 32,0,%0"
+            : : "r"(o), "r"(i) : "memory", "v0", "v1");
+    } else {
+        asm("lxvd2x 32,0,%1\n\t"
+            "lxvd2x 34,0,%2\n\t"
+            "vspltisb 1,0\n\t"
+            "vperm 0,0,0,2\n\t"
+            "vcipherlast 0,0,1\n\t"
+            "vperm 0,0,0,2\n\t"
+            "stxvd2x 32,0,%0"
+            : : "r"(o), "r"(i), "r"(bswap_le) : "memory", "v0", "v1", "v2");
+    }
+    return true;
+}
+
+bool test_MC(uint8_t *o, const uint8_t *i)
+{
+    return false;
+}
+
+bool test_SB_SR_MC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    if (BIG_ENDIAN) {
+        asm("lxvd2x 32,0,%1\n\t"
+            "lxvd2x 33,0,%2\n\t"
+            "vcipher 0,0,1\n\t"
+            "stxvd2x 32,0,%0"
+            : : "r"(o), "r"(i), "r"(k) : "memory", "v0", "v1");
+    } else {
+        asm("lxvd2x 32,0,%1\n\t"
+            "lxvd2x 33,0,%2\n\t"
+            "lxvd2x 34,0,%3\n\t"
+            "vperm 0,0,0,2\n\t"
+            "vperm 1,1,1,2\n\t"
+            "vcipher 0,0,1\n\t"
+            "vperm 0,0,0,2\n\t"
+            "stxvd2x 32,0,%0"
+            : : "r"(o), "r"(i), "r"(k), "r"(bswap_le)
+              : "memory", "v0", "v1", "v2");
+    }
+    return true;
+}
+
+bool test_ISB_ISR(uint8_t *o, const uint8_t *i)
+{
+    /* vcipherlast also adds round key, so supply zero. */
+    if (BIG_ENDIAN) {
+        asm("lxvd2x 32,0,%1\n\t"
+            "vspltisb 1,0\n\t"
+            "vncipherlast 0,0,1\n\t"
+            "stxvd2x 32,0,%0"
+            : : "r"(o), "r"(i) : "memory", "v0", "v1");
+    } else {
+        asm("lxvd2x 32,0,%1\n\t"
+            "lxvd2x 34,0,%2\n\t"
+            "vspltisb 1,0\n\t"
+            "vperm 0,0,0,2\n\t"
+            "vncipherlast 0,0,1\n\t"
+            "vperm 0,0,0,2\n\t"
+            "stxvd2x 32,0,%0"
+            : : "r"(o), "r"(i), "r"(bswap_le) : "memory", "v0", "v1", "v2");
+    }
+    return true;
+}
+
+bool test_IMC(uint8_t *o, const uint8_t *i)
+{
+    return false;
+}
+
+bool test_ISB_ISR_AK_IMC(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    if (BIG_ENDIAN) {
+        asm("lxvd2x 32,0,%1\n\t"
+            "lxvd2x 33,0,%2\n\t"
+            "vncipher 0,0,1\n\t"
+            "stxvd2x 32,0,%0"
+            : : "r"(o), "r"(i), "r"(k) : "memory", "v0", "v1");
+    } else {
+        asm("lxvd2x 32,0,%1\n\t"
+            "lxvd2x 33,0,%2\n\t"
+            "lxvd2x 34,0,%3\n\t"
+            "vperm 0,0,0,2\n\t"
+            "vperm 1,1,1,2\n\t"
+            "vncipher 0,0,1\n\t"
+            "vperm 0,0,0,2\n\t"
+            "stxvd2x 32,0,%0"
+            : : "r"(o), "r"(i), "r"(k), "r"(bswap_le)
+              : "memory", "v0", "v1", "v2");
+    }
+    return true;
+}
+
+bool test_ISB_ISR_IMC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    return false;
+}
diff --git a/tests/tcg/riscv64/test-aes.c b/tests/tcg/riscv64/test-aes.c
new file mode 100644
index 0000000000..3d7ef0e33a
--- /dev/null
+++ b/tests/tcg/riscv64/test-aes.c
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include "../multiarch/test-aes-main.c.inc"
+
+bool test_SB_SR(uint8_t *o, const uint8_t *i)
+{
+    uint64_t *o8 = (uint64_t *)o;
+    const uint64_t *i8 = (const uint64_t *)i;
+
+    asm("aes64es %0,%2,%3\n\t"
+        "aes64es %1,%3,%2"
+        : "=&r"(o8[0]), "=&r"(o8[1]) : "r"(i8[0]), "r"(i8[1]));
+    return true;
+}
+
+bool test_MC(uint8_t *o, const uint8_t *i)
+{
+    return false;
+}
+
+bool test_SB_SR_MC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    uint64_t *o8 = (uint64_t *)o;
+    const uint64_t *i8 = (const uint64_t *)i;
+    const uint64_t *k8 = (const uint64_t *)k;
+
+    asm("aes64esm %0,%2,%3\n\t"
+        "aes64esm %1,%3,%2\n\t"
+        "xor %0,%0,%4\n\t"
+        "xor %1,%1,%5"
+        : "=&r"(o8[0]), "=&r"(o8[1])
+        : "r"(i8[0]), "r"(i8[1]), "r"(k8[0]), "r"(k8[1]));
+    return true;
+}
+
+bool test_ISB_ISR(uint8_t *o, const uint8_t *i)
+{
+    uint64_t *o8 = (uint64_t *)o;
+    const uint64_t *i8 = (const uint64_t *)i;
+
+    asm("aes64ds %0,%2,%3\n\t"
+        "aes64ds %1,%3,%2"
+        : "=&r"(o8[0]), "=&r"(o8[1]) : "r"(i8[0]), "r"(i8[1]));
+    return true;
+}
+
+bool test_IMC(uint8_t *o, const uint8_t *i)
+{
+    uint64_t *o8 = (uint64_t *)o;
+    const uint64_t *i8 = (const uint64_t *)i;
+
+    asm("aes64im %0,%0\n\t"
+        "aes64im %1,%1"
+        : "=r"(o8[0]), "=r"(o8[1]) : "0"(i8[0]), "1"(i8[1]));
+    return true;
+}
+
+bool test_ISB_ISR_AK_IMC(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    return false;
+}
+
+bool test_ISB_ISR_IMC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    uint64_t *o8 = (uint64_t *)o;
+    const uint64_t *i8 = (const uint64_t *)i;
+    const uint64_t *k8 = (const uint64_t *)k;
+
+    asm("aes64dsm %0,%2,%3\n\t"
+        "aes64dsm %1,%3,%2\n\t"
+        "xor %0,%0,%4\n\t"
+        "xor %1,%1,%5"
+        : "=&r"(o8[0]), "=&r"(o8[1])
+        : "r"(i8[0]), "r"(i8[1]), "r"(k8[0]), "r"(k8[1]));
+    return true;
+}
diff --git a/tests/tcg/multiarch/test-aes-main.c.inc b/tests/tcg/multiarch/test-aes-main.c.inc
new file mode 100644
index 0000000000..0039f8ba55
--- /dev/null
+++ b/tests/tcg/multiarch/test-aes-main.c.inc
@@ -0,0 +1,183 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+
+static bool test_SB_SR(uint8_t *o, const uint8_t *i);
+static bool test_MC(uint8_t *o, const uint8_t *i);
+static bool test_SB_SR_MC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k);
+
+static bool test_ISB_ISR(uint8_t *o, const uint8_t *i);
+static bool test_IMC(uint8_t *o, const uint8_t *i);
+static bool test_ISB_ISR_AK_IMC(uint8_t *o, const uint8_t *i, const uint8_t *k);
+static bool test_ISB_ISR_IMC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k);
+
+/*
+ * From https://doi.org/10.6028/NIST.FIPS.197-upd1,
+ * Appendix B -- Cipher Example
+ *
+ * Note that the formatting of the 4x4 matrices in the document is
+ * column-major, whereas C is row-major.  Therefore to get the bytes
+ * in the same order as the text, the matrices are transposed.
+ *
+ * Note that we are not going to test SubBytes or ShiftRows separately,
+ * so the "After SubBytes" column is omitted, using only the combined
+ * result "After ShiftRows" column.
+ */
+
+/* Ease the inline assembly by aligning everything. */
+typedef struct {
+    uint8_t b[16] __attribute__((aligned(16)));
+} State;
+
+typedef struct {
+    State start, after_sr, after_mc, round_key;
+} Round;
+
+static const Round rounds[] = {
+    /* Round 1 */
+    { { { 0x19, 0x3d, 0xe3, 0xbe,       /* start */
+          0xa0, 0xf4, 0xe2, 0x2b,
+          0x9a, 0xc6, 0x8d, 0x2a,
+          0xe9, 0xf8, 0x48, 0x08, } },
+
+      { { 0xd4, 0xbf, 0x5d, 0x30,       /* after shiftrows */
+          0xe0, 0xb4, 0x52, 0xae,
+          0xb8, 0x41, 0x11, 0xf1,
+          0x1e, 0x27, 0x98, 0xe5, } },
+
+      { { 0x04, 0x66, 0x81, 0xe5,       /* after mixcolumns */
+          0xe0, 0xcb, 0x19, 0x9a,
+          0x48, 0xf8, 0xd3, 0x7a,
+          0x28, 0x06, 0x26, 0x4c, } },
+
+      { { 0xa0, 0xfa, 0xfe, 0x17,       /* round key */
+          0x88, 0x54, 0x2c, 0xb1,
+          0x23, 0xa3, 0x39, 0x39,
+          0x2a, 0x6c, 0x76, 0x05, } } },
+
+    /* Round 2 */
+    { { { 0xa4, 0x9c, 0x7f, 0xf2,       /* start */
+          0x68, 0x9f, 0x35, 0x2b,
+          0x6b, 0x5b, 0xea, 0x43,
+          0x02, 0x6a, 0x50, 0x49, } },
+
+      { { 0x49, 0xdb, 0x87, 0x3b,       /* after shiftrows */
+          0x45, 0x39, 0x53, 0x89,
+          0x7f, 0x02, 0xd2, 0xf1,
+          0x77, 0xde, 0x96, 0x1a, } },
+
+      { { 0x58, 0x4d, 0xca, 0xf1,       /* after mixcolumns */
+          0x1b, 0x4b, 0x5a, 0xac,
+          0xdb, 0xe7, 0xca, 0xa8,
+          0x1b, 0x6b, 0xb0, 0xe5, } },
+
+      { { 0xf2, 0xc2, 0x95, 0xf2,       /* round key */
+          0x7a, 0x96, 0xb9, 0x43,
+          0x59, 0x35, 0x80, 0x7a,
+          0x73, 0x59, 0xf6, 0x7f, } } },
+
+    /* Round 3 */
+    { { { 0xaa, 0x8f, 0x5f, 0x03,       /* start */
+          0x61, 0xdd, 0xe3, 0xef,
+          0x82, 0xd2, 0x4a, 0xd2,
+          0x68, 0x32, 0x46, 0x9a, } },
+
+      { { 0xac, 0xc1, 0xd6, 0xb8,       /* after shiftrows */
+          0xef, 0xb5, 0x5a, 0x7b,
+          0x13, 0x23, 0xcf, 0xdf,
+          0x45, 0x73, 0x11, 0xb5, } },
+
+      { { 0x75, 0xec, 0x09, 0x93,       /* after mixcolumns */
+          0x20, 0x0b, 0x63, 0x33,
+          0x53, 0xc0, 0xcf, 0x7c,
+          0xbb, 0x25, 0xd0, 0xdc, } },
+
+      { { 0x3d, 0x80, 0x47, 0x7d,       /* round key */
+          0x47, 0x16, 0xfe, 0x3e,
+          0x1e, 0x23, 0x7e, 0x44,
+          0x6d, 0x7a, 0x88, 0x3b, } } },
+};
+
+static void verify_log(const char *prefix, const State *s)
+{
+    printf("%s:", prefix);
+    for (int i = 0; i < sizeof(State); ++i) {
+        printf(" %02x", s->b[i]);
+    }
+    printf("\n");
+}
+
+static void verify(const State *ref, const State *tst, const char *which)
+{
+    if (!memcmp(ref, tst, sizeof(State))) {
+        return;
+    }
+
+    printf("Mismatch on %s\n", which);
+    verify_log("ref", ref);
+    verify_log("tst", tst);
+    exit(EXIT_FAILURE);
+}
+
+int main()
+{
+    int i, n = sizeof(rounds) / sizeof(Round);
+    State t;
+
+    for (i = 0; i < n; ++i) {
+        if (test_SB_SR(t.b, rounds[i].start.b)) {
+            verify(&rounds[i].after_sr, &t, "SB+SR");
+        }
+    }
+
+    for (i = 0; i < n; ++i) {
+        if (test_MC(t.b, rounds[i].after_sr.b)) {
+            verify(&rounds[i].after_mc, &t, "MC");
+        }
+    }
+
+    /* The kernel of Cipher(). */
+    for (i = 0; i < n - 1; ++i) {
+        if (test_SB_SR_MC_AK(t.b, rounds[i].start.b, rounds[i].round_key.b)) {
+            verify(&rounds[i + 1].start, &t, "SB+SR+MC+AK");
+        }
+    }
+
+    for (i = 0; i < n; ++i) {
+        if (test_ISB_ISR(t.b, rounds[i].after_sr.b)) {
+            verify(&rounds[i].start, &t, "ISB+ISR");
+        }
+    }
+
+    for (i = 0; i < n; ++i) {
+        if (test_IMC(t.b, rounds[i].after_mc.b)) {
+            verify(&rounds[i].after_sr, &t, "IMC");
+        }
+    }
+
+    /* The kernel of InvCipher(). */
+    for (i = n - 1; i > 0; --i) {
+        if (test_ISB_ISR_AK_IMC(t.b, rounds[i].after_sr.b,
+                                rounds[i - 1].round_key.b)) {
+            verify(&rounds[i - 1].after_sr, &t, "ISB+ISR+AK+IMC");
+        }
+    }
+
+    /*
+     * The kernel of EqInvCipher().  
+     * We must compute a different round key: apply InvMixColumns to
+     * the standard round key, per KeyExpansion vs KeyExpansionEIC.
+     */
+    for (i = 1; i < n; ++i) {
+        if (test_IMC(t.b, rounds[i - 1].round_key.b) &&
+            test_ISB_ISR_IMC_AK(t.b, rounds[i].after_sr.b, t.b)) {
+            verify(&rounds[i - 1].after_sr, &t, "ISB+ISR+IMC+AK");
+        }
+    }
+
+    return EXIT_SUCCESS;
+}
diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
index 0315795487..7402d08d75 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -63,6 +63,10 @@ endif
 AARCH64_TESTS += sve-ioctls
 sve-ioctls: CFLAGS+=-march=armv8.1-a+sve
 
+AARCH64_TESTS += test-aes
+test-aes: CFLAGS += -O -march=armv8-a+aes
+test-aes: test-aes-main.c.inc
+
 # Vector SHA1
 sha1-vector: CFLAGS=-O3
 sha1-vector: sha1.c
diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target
index 821822ed0c..3ba61e3880 100644
--- a/tests/tcg/i386/Makefile.target
+++ b/tests/tcg/i386/Makefile.target
@@ -28,6 +28,10 @@ run-test-i386-bmi2: QEMU_OPTS += -cpu max
 test-i386-adcox: CFLAGS=-O2
 run-test-i386-adcox: QEMU_OPTS += -cpu max
 
+test-aes: CFLAGS += -O -msse2 -maes
+test-aes: test-aes-main.c.inc
+run-test-aes: QEMU_OPTS += -cpu max
+
 #
 # hello-i386 is a barebones app
 #
diff --git a/tests/tcg/ppc64/Makefile.target b/tests/tcg/ppc64/Makefile.target
index b084963b9a..5721c159f2 100644
--- a/tests/tcg/ppc64/Makefile.target
+++ b/tests/tcg/ppc64/Makefile.target
@@ -36,5 +36,6 @@ run-vector: QEMU_OPTS += -cpu POWER10
 
 PPC64_TESTS += signal_save_restore_xer
 PPC64_TESTS += xxspltw
+PPC64_TESTS += test-aes
 
 TESTS += $(PPC64_TESTS)
diff --git a/tests/tcg/riscv64/Makefile.target b/tests/tcg/riscv64/Makefile.target
index 9973ba3b5f..4002d14b9e 100644
--- a/tests/tcg/riscv64/Makefile.target
+++ b/tests/tcg/riscv64/Makefile.target
@@ -9,3 +9,7 @@ TESTS += noexec
 TESTS += test-noc
 test-noc: LDFLAGS = -nostdlib -static
 run-test-noc: QEMU_OPTS += -cpu rv64,c=false
+
+TESTS += test-aes
+test-aes: CFLAGS += -O -march=rv64gzk
+run-test-aes: QEMU_OPTS += -cpu rv64,zk=on
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 02/35] target/arm: Move aesmc and aesimc tables to crypto/aes.c
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
  2023-06-03  2:33 ` [PATCH 01/35] tests/multiarch: Add test-aes Richard Henderson
@ 2023-06-03  2:33 ` Richard Henderson
  2023-06-03 12:45   ` Ard Biesheuvel
  2023-06-05 10:45   ` Philippe Mathieu-Daudé
  2023-06-03  2:33 ` [PATCH 03/35] crypto/aes: Add constants for ShiftRows, InvShiftRows Richard Henderson
                   ` (33 subsequent siblings)
  35 siblings, 2 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

We do not currently have a table in crypto/ for
just MixColumns.  Move both tables for consistency.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/crypto/aes.h           |   6 ++
 crypto/aes.c                   | 142 ++++++++++++++++++++++++++++++++
 target/arm/tcg/crypto_helper.c | 143 ++-------------------------------
 3 files changed, 153 insertions(+), 138 deletions(-)

diff --git a/include/crypto/aes.h b/include/crypto/aes.h
index 822d64588c..24b073d569 100644
--- a/include/crypto/aes.h
+++ b/include/crypto/aes.h
@@ -34,6 +34,12 @@ extern const uint8_t AES_isbox[256];
 extern const uint8_t AES_shifts[16];
 extern const uint8_t AES_ishifts[16];
 
+/* AES MixColumns, for use with rot32. */
+extern const uint32_t AES_mc_rot[256];
+
+/* AES InvMixColumns, for use with rot32. */
+extern const uint32_t AES_imc_rot[256];
+
 /* AES InvMixColumns */
 /* AES_imc[x][0] = [x].[0e, 09, 0d, 0b]; */
 /* AES_imc[x][1] = [x].[0b, 0e, 09, 0d]; */
diff --git a/crypto/aes.c b/crypto/aes.c
index af72ff7779..72c95c38fb 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -116,6 +116,148 @@ const uint8_t AES_ishifts[16] = {
     0, 13, 10, 7, 4, 1, 14, 11, 8, 5, 2, 15, 12, 9, 6, 3
 };
 
+/*
+ * MixColumns lookup table, for use with rot32.
+ * From Arm ARM pseudocode.
+ */
+const uint32_t AES_mc_rot[256] = {
+    0x00000000, 0x03010102, 0x06020204, 0x05030306,
+    0x0c040408, 0x0f05050a, 0x0a06060c, 0x0907070e,
+    0x18080810, 0x1b090912, 0x1e0a0a14, 0x1d0b0b16,
+    0x140c0c18, 0x170d0d1a, 0x120e0e1c, 0x110f0f1e,
+    0x30101020, 0x33111122, 0x36121224, 0x35131326,
+    0x3c141428, 0x3f15152a, 0x3a16162c, 0x3917172e,
+    0x28181830, 0x2b191932, 0x2e1a1a34, 0x2d1b1b36,
+    0x241c1c38, 0x271d1d3a, 0x221e1e3c, 0x211f1f3e,
+    0x60202040, 0x63212142, 0x66222244, 0x65232346,
+    0x6c242448, 0x6f25254a, 0x6a26264c, 0x6927274e,
+    0x78282850, 0x7b292952, 0x7e2a2a54, 0x7d2b2b56,
+    0x742c2c58, 0x772d2d5a, 0x722e2e5c, 0x712f2f5e,
+    0x50303060, 0x53313162, 0x56323264, 0x55333366,
+    0x5c343468, 0x5f35356a, 0x5a36366c, 0x5937376e,
+    0x48383870, 0x4b393972, 0x4e3a3a74, 0x4d3b3b76,
+    0x443c3c78, 0x473d3d7a, 0x423e3e7c, 0x413f3f7e,
+    0xc0404080, 0xc3414182, 0xc6424284, 0xc5434386,
+    0xcc444488, 0xcf45458a, 0xca46468c, 0xc947478e,
+    0xd8484890, 0xdb494992, 0xde4a4a94, 0xdd4b4b96,
+    0xd44c4c98, 0xd74d4d9a, 0xd24e4e9c, 0xd14f4f9e,
+    0xf05050a0, 0xf35151a2, 0xf65252a4, 0xf55353a6,
+    0xfc5454a8, 0xff5555aa, 0xfa5656ac, 0xf95757ae,
+    0xe85858b0, 0xeb5959b2, 0xee5a5ab4, 0xed5b5bb6,
+    0xe45c5cb8, 0xe75d5dba, 0xe25e5ebc, 0xe15f5fbe,
+    0xa06060c0, 0xa36161c2, 0xa66262c4, 0xa56363c6,
+    0xac6464c8, 0xaf6565ca, 0xaa6666cc, 0xa96767ce,
+    0xb86868d0, 0xbb6969d2, 0xbe6a6ad4, 0xbd6b6bd6,
+    0xb46c6cd8, 0xb76d6dda, 0xb26e6edc, 0xb16f6fde,
+    0x907070e0, 0x937171e2, 0x967272e4, 0x957373e6,
+    0x9c7474e8, 0x9f7575ea, 0x9a7676ec, 0x997777ee,
+    0x887878f0, 0x8b7979f2, 0x8e7a7af4, 0x8d7b7bf6,
+    0x847c7cf8, 0x877d7dfa, 0x827e7efc, 0x817f7ffe,
+    0x9b80801b, 0x98818119, 0x9d82821f, 0x9e83831d,
+    0x97848413, 0x94858511, 0x91868617, 0x92878715,
+    0x8388880b, 0x80898909, 0x858a8a0f, 0x868b8b0d,
+    0x8f8c8c03, 0x8c8d8d01, 0x898e8e07, 0x8a8f8f05,
+    0xab90903b, 0xa8919139, 0xad92923f, 0xae93933d,
+    0xa7949433, 0xa4959531, 0xa1969637, 0xa2979735,
+    0xb398982b, 0xb0999929, 0xb59a9a2f, 0xb69b9b2d,
+    0xbf9c9c23, 0xbc9d9d21, 0xb99e9e27, 0xba9f9f25,
+    0xfba0a05b, 0xf8a1a159, 0xfda2a25f, 0xfea3a35d,
+    0xf7a4a453, 0xf4a5a551, 0xf1a6a657, 0xf2a7a755,
+    0xe3a8a84b, 0xe0a9a949, 0xe5aaaa4f, 0xe6abab4d,
+    0xefacac43, 0xecadad41, 0xe9aeae47, 0xeaafaf45,
+    0xcbb0b07b, 0xc8b1b179, 0xcdb2b27f, 0xceb3b37d,
+    0xc7b4b473, 0xc4b5b571, 0xc1b6b677, 0xc2b7b775,
+    0xd3b8b86b, 0xd0b9b969, 0xd5baba6f, 0xd6bbbb6d,
+    0xdfbcbc63, 0xdcbdbd61, 0xd9bebe67, 0xdabfbf65,
+    0x5bc0c09b, 0x58c1c199, 0x5dc2c29f, 0x5ec3c39d,
+    0x57c4c493, 0x54c5c591, 0x51c6c697, 0x52c7c795,
+    0x43c8c88b, 0x40c9c989, 0x45caca8f, 0x46cbcb8d,
+    0x4fcccc83, 0x4ccdcd81, 0x49cece87, 0x4acfcf85,
+    0x6bd0d0bb, 0x68d1d1b9, 0x6dd2d2bf, 0x6ed3d3bd,
+    0x67d4d4b3, 0x64d5d5b1, 0x61d6d6b7, 0x62d7d7b5,
+    0x73d8d8ab, 0x70d9d9a9, 0x75dadaaf, 0x76dbdbad,
+    0x7fdcdca3, 0x7cdddda1, 0x79dedea7, 0x7adfdfa5,
+    0x3be0e0db, 0x38e1e1d9, 0x3de2e2df, 0x3ee3e3dd,
+    0x37e4e4d3, 0x34e5e5d1, 0x31e6e6d7, 0x32e7e7d5,
+    0x23e8e8cb, 0x20e9e9c9, 0x25eaeacf, 0x26ebebcd,
+    0x2fececc3, 0x2cededc1, 0x29eeeec7, 0x2aefefc5,
+    0x0bf0f0fb, 0x08f1f1f9, 0x0df2f2ff, 0x0ef3f3fd,
+    0x07f4f4f3, 0x04f5f5f1, 0x01f6f6f7, 0x02f7f7f5,
+    0x13f8f8eb, 0x10f9f9e9, 0x15fafaef, 0x16fbfbed,
+    0x1ffcfce3, 0x1cfdfde1, 0x19fefee7, 0x1affffe5,
+};
+
+/*
+ * Inverse MixColumns lookup table, for use with rot32.
+ * From Arm ARM pseudocode.
+ */
+const uint32_t AES_imc_rot[256] = {
+    0x00000000, 0x0b0d090e, 0x161a121c, 0x1d171b12,
+    0x2c342438, 0x27392d36, 0x3a2e3624, 0x31233f2a,
+    0x58684870, 0x5365417e, 0x4e725a6c, 0x457f5362,
+    0x745c6c48, 0x7f516546, 0x62467e54, 0x694b775a,
+    0xb0d090e0, 0xbbdd99ee, 0xa6ca82fc, 0xadc78bf2,
+    0x9ce4b4d8, 0x97e9bdd6, 0x8afea6c4, 0x81f3afca,
+    0xe8b8d890, 0xe3b5d19e, 0xfea2ca8c, 0xf5afc382,
+    0xc48cfca8, 0xcf81f5a6, 0xd296eeb4, 0xd99be7ba,
+    0x7bbb3bdb, 0x70b632d5, 0x6da129c7, 0x66ac20c9,
+    0x578f1fe3, 0x5c8216ed, 0x41950dff, 0x4a9804f1,
+    0x23d373ab, 0x28de7aa5, 0x35c961b7, 0x3ec468b9,
+    0x0fe75793, 0x04ea5e9d, 0x19fd458f, 0x12f04c81,
+    0xcb6bab3b, 0xc066a235, 0xdd71b927, 0xd67cb029,
+    0xe75f8f03, 0xec52860d, 0xf1459d1f, 0xfa489411,
+    0x9303e34b, 0x980eea45, 0x8519f157, 0x8e14f859,
+    0xbf37c773, 0xb43ace7d, 0xa92dd56f, 0xa220dc61,
+    0xf66d76ad, 0xfd607fa3, 0xe07764b1, 0xeb7a6dbf,
+    0xda595295, 0xd1545b9b, 0xcc434089, 0xc74e4987,
+    0xae053edd, 0xa50837d3, 0xb81f2cc1, 0xb31225cf,
+    0x82311ae5, 0x893c13eb, 0x942b08f9, 0x9f2601f7,
+    0x46bde64d, 0x4db0ef43, 0x50a7f451, 0x5baafd5f,
+    0x6a89c275, 0x6184cb7b, 0x7c93d069, 0x779ed967,
+    0x1ed5ae3d, 0x15d8a733, 0x08cfbc21, 0x03c2b52f,
+    0x32e18a05, 0x39ec830b, 0x24fb9819, 0x2ff69117,
+    0x8dd64d76, 0x86db4478, 0x9bcc5f6a, 0x90c15664,
+    0xa1e2694e, 0xaaef6040, 0xb7f87b52, 0xbcf5725c,
+    0xd5be0506, 0xdeb30c08, 0xc3a4171a, 0xc8a91e14,
+    0xf98a213e, 0xf2872830, 0xef903322, 0xe49d3a2c,
+    0x3d06dd96, 0x360bd498, 0x2b1ccf8a, 0x2011c684,
+    0x1132f9ae, 0x1a3ff0a0, 0x0728ebb2, 0x0c25e2bc,
+    0x656e95e6, 0x6e639ce8, 0x737487fa, 0x78798ef4,
+    0x495ab1de, 0x4257b8d0, 0x5f40a3c2, 0x544daacc,
+    0xf7daec41, 0xfcd7e54f, 0xe1c0fe5d, 0xeacdf753,
+    0xdbeec879, 0xd0e3c177, 0xcdf4da65, 0xc6f9d36b,
+    0xafb2a431, 0xa4bfad3f, 0xb9a8b62d, 0xb2a5bf23,
+    0x83868009, 0x888b8907, 0x959c9215, 0x9e919b1b,
+    0x470a7ca1, 0x4c0775af, 0x51106ebd, 0x5a1d67b3,
+    0x6b3e5899, 0x60335197, 0x7d244a85, 0x7629438b,
+    0x1f6234d1, 0x146f3ddf, 0x097826cd, 0x02752fc3,
+    0x335610e9, 0x385b19e7, 0x254c02f5, 0x2e410bfb,
+    0x8c61d79a, 0x876cde94, 0x9a7bc586, 0x9176cc88,
+    0xa055f3a2, 0xab58faac, 0xb64fe1be, 0xbd42e8b0,
+    0xd4099fea, 0xdf0496e4, 0xc2138df6, 0xc91e84f8,
+    0xf83dbbd2, 0xf330b2dc, 0xee27a9ce, 0xe52aa0c0,
+    0x3cb1477a, 0x37bc4e74, 0x2aab5566, 0x21a65c68,
+    0x10856342, 0x1b886a4c, 0x069f715e, 0x0d927850,
+    0x64d90f0a, 0x6fd40604, 0x72c31d16, 0x79ce1418,
+    0x48ed2b32, 0x43e0223c, 0x5ef7392e, 0x55fa3020,
+    0x01b79aec, 0x0aba93e2, 0x17ad88f0, 0x1ca081fe,
+    0x2d83bed4, 0x268eb7da, 0x3b99acc8, 0x3094a5c6,
+    0x59dfd29c, 0x52d2db92, 0x4fc5c080, 0x44c8c98e,
+    0x75ebf6a4, 0x7ee6ffaa, 0x63f1e4b8, 0x68fcedb6,
+    0xb1670a0c, 0xba6a0302, 0xa77d1810, 0xac70111e,
+    0x9d532e34, 0x965e273a, 0x8b493c28, 0x80443526,
+    0xe90f427c, 0xe2024b72, 0xff155060, 0xf418596e,
+    0xc53b6644, 0xce366f4a, 0xd3217458, 0xd82c7d56,
+    0x7a0ca137, 0x7101a839, 0x6c16b32b, 0x671bba25,
+    0x5638850f, 0x5d358c01, 0x40229713, 0x4b2f9e1d,
+    0x2264e947, 0x2969e049, 0x347efb5b, 0x3f73f255,
+    0x0e50cd7f, 0x055dc471, 0x184adf63, 0x1347d66d,
+    0xcadc31d7, 0xc1d138d9, 0xdcc623cb, 0xd7cb2ac5,
+    0xe6e815ef, 0xede51ce1, 0xf0f207f3, 0xfbff0efd,
+    0x92b479a7, 0x99b970a9, 0x84ae6bbb, 0x8fa362b5,
+    0xbe805d9f, 0xb58d5491, 0xa89a4f83, 0xa397468d,
+};
+
 /* AES_imc[x][0] = [x].[0e, 09, 0d, 0b]; */
 /* AES_imc[x][1] = [x].[0b, 0e, 09, 0d]; */
 /* AES_imc[x][2] = [x].[0d, 0b, 0e, 09]; */
diff --git a/target/arm/tcg/crypto_helper.c b/target/arm/tcg/crypto_helper.c
index d28690321f..06254939d2 100644
--- a/target/arm/tcg/crypto_helper.c
+++ b/target/arm/tcg/crypto_helper.c
@@ -80,149 +80,16 @@ void HELPER(crypto_aese)(void *vd, void *vn, void *vm, uint32_t desc)
 
 static void do_crypto_aesmc(uint64_t *rd, uint64_t *rm, bool decrypt)
 {
-    static uint32_t const mc[][256] = { {
-        /* MixColumns lookup table */
-        0x00000000, 0x03010102, 0x06020204, 0x05030306,
-        0x0c040408, 0x0f05050a, 0x0a06060c, 0x0907070e,
-        0x18080810, 0x1b090912, 0x1e0a0a14, 0x1d0b0b16,
-        0x140c0c18, 0x170d0d1a, 0x120e0e1c, 0x110f0f1e,
-        0x30101020, 0x33111122, 0x36121224, 0x35131326,
-        0x3c141428, 0x3f15152a, 0x3a16162c, 0x3917172e,
-        0x28181830, 0x2b191932, 0x2e1a1a34, 0x2d1b1b36,
-        0x241c1c38, 0x271d1d3a, 0x221e1e3c, 0x211f1f3e,
-        0x60202040, 0x63212142, 0x66222244, 0x65232346,
-        0x6c242448, 0x6f25254a, 0x6a26264c, 0x6927274e,
-        0x78282850, 0x7b292952, 0x7e2a2a54, 0x7d2b2b56,
-        0x742c2c58, 0x772d2d5a, 0x722e2e5c, 0x712f2f5e,
-        0x50303060, 0x53313162, 0x56323264, 0x55333366,
-        0x5c343468, 0x5f35356a, 0x5a36366c, 0x5937376e,
-        0x48383870, 0x4b393972, 0x4e3a3a74, 0x4d3b3b76,
-        0x443c3c78, 0x473d3d7a, 0x423e3e7c, 0x413f3f7e,
-        0xc0404080, 0xc3414182, 0xc6424284, 0xc5434386,
-        0xcc444488, 0xcf45458a, 0xca46468c, 0xc947478e,
-        0xd8484890, 0xdb494992, 0xde4a4a94, 0xdd4b4b96,
-        0xd44c4c98, 0xd74d4d9a, 0xd24e4e9c, 0xd14f4f9e,
-        0xf05050a0, 0xf35151a2, 0xf65252a4, 0xf55353a6,
-        0xfc5454a8, 0xff5555aa, 0xfa5656ac, 0xf95757ae,
-        0xe85858b0, 0xeb5959b2, 0xee5a5ab4, 0xed5b5bb6,
-        0xe45c5cb8, 0xe75d5dba, 0xe25e5ebc, 0xe15f5fbe,
-        0xa06060c0, 0xa36161c2, 0xa66262c4, 0xa56363c6,
-        0xac6464c8, 0xaf6565ca, 0xaa6666cc, 0xa96767ce,
-        0xb86868d0, 0xbb6969d2, 0xbe6a6ad4, 0xbd6b6bd6,
-        0xb46c6cd8, 0xb76d6dda, 0xb26e6edc, 0xb16f6fde,
-        0x907070e0, 0x937171e2, 0x967272e4, 0x957373e6,
-        0x9c7474e8, 0x9f7575ea, 0x9a7676ec, 0x997777ee,
-        0x887878f0, 0x8b7979f2, 0x8e7a7af4, 0x8d7b7bf6,
-        0x847c7cf8, 0x877d7dfa, 0x827e7efc, 0x817f7ffe,
-        0x9b80801b, 0x98818119, 0x9d82821f, 0x9e83831d,
-        0x97848413, 0x94858511, 0x91868617, 0x92878715,
-        0x8388880b, 0x80898909, 0x858a8a0f, 0x868b8b0d,
-        0x8f8c8c03, 0x8c8d8d01, 0x898e8e07, 0x8a8f8f05,
-        0xab90903b, 0xa8919139, 0xad92923f, 0xae93933d,
-        0xa7949433, 0xa4959531, 0xa1969637, 0xa2979735,
-        0xb398982b, 0xb0999929, 0xb59a9a2f, 0xb69b9b2d,
-        0xbf9c9c23, 0xbc9d9d21, 0xb99e9e27, 0xba9f9f25,
-        0xfba0a05b, 0xf8a1a159, 0xfda2a25f, 0xfea3a35d,
-        0xf7a4a453, 0xf4a5a551, 0xf1a6a657, 0xf2a7a755,
-        0xe3a8a84b, 0xe0a9a949, 0xe5aaaa4f, 0xe6abab4d,
-        0xefacac43, 0xecadad41, 0xe9aeae47, 0xeaafaf45,
-        0xcbb0b07b, 0xc8b1b179, 0xcdb2b27f, 0xceb3b37d,
-        0xc7b4b473, 0xc4b5b571, 0xc1b6b677, 0xc2b7b775,
-        0xd3b8b86b, 0xd0b9b969, 0xd5baba6f, 0xd6bbbb6d,
-        0xdfbcbc63, 0xdcbdbd61, 0xd9bebe67, 0xdabfbf65,
-        0x5bc0c09b, 0x58c1c199, 0x5dc2c29f, 0x5ec3c39d,
-        0x57c4c493, 0x54c5c591, 0x51c6c697, 0x52c7c795,
-        0x43c8c88b, 0x40c9c989, 0x45caca8f, 0x46cbcb8d,
-        0x4fcccc83, 0x4ccdcd81, 0x49cece87, 0x4acfcf85,
-        0x6bd0d0bb, 0x68d1d1b9, 0x6dd2d2bf, 0x6ed3d3bd,
-        0x67d4d4b3, 0x64d5d5b1, 0x61d6d6b7, 0x62d7d7b5,
-        0x73d8d8ab, 0x70d9d9a9, 0x75dadaaf, 0x76dbdbad,
-        0x7fdcdca3, 0x7cdddda1, 0x79dedea7, 0x7adfdfa5,
-        0x3be0e0db, 0x38e1e1d9, 0x3de2e2df, 0x3ee3e3dd,
-        0x37e4e4d3, 0x34e5e5d1, 0x31e6e6d7, 0x32e7e7d5,
-        0x23e8e8cb, 0x20e9e9c9, 0x25eaeacf, 0x26ebebcd,
-        0x2fececc3, 0x2cededc1, 0x29eeeec7, 0x2aefefc5,
-        0x0bf0f0fb, 0x08f1f1f9, 0x0df2f2ff, 0x0ef3f3fd,
-        0x07f4f4f3, 0x04f5f5f1, 0x01f6f6f7, 0x02f7f7f5,
-        0x13f8f8eb, 0x10f9f9e9, 0x15fafaef, 0x16fbfbed,
-        0x1ffcfce3, 0x1cfdfde1, 0x19fefee7, 0x1affffe5,
-    }, {
-        /* Inverse MixColumns lookup table */
-        0x00000000, 0x0b0d090e, 0x161a121c, 0x1d171b12,
-        0x2c342438, 0x27392d36, 0x3a2e3624, 0x31233f2a,
-        0x58684870, 0x5365417e, 0x4e725a6c, 0x457f5362,
-        0x745c6c48, 0x7f516546, 0x62467e54, 0x694b775a,
-        0xb0d090e0, 0xbbdd99ee, 0xa6ca82fc, 0xadc78bf2,
-        0x9ce4b4d8, 0x97e9bdd6, 0x8afea6c4, 0x81f3afca,
-        0xe8b8d890, 0xe3b5d19e, 0xfea2ca8c, 0xf5afc382,
-        0xc48cfca8, 0xcf81f5a6, 0xd296eeb4, 0xd99be7ba,
-        0x7bbb3bdb, 0x70b632d5, 0x6da129c7, 0x66ac20c9,
-        0x578f1fe3, 0x5c8216ed, 0x41950dff, 0x4a9804f1,
-        0x23d373ab, 0x28de7aa5, 0x35c961b7, 0x3ec468b9,
-        0x0fe75793, 0x04ea5e9d, 0x19fd458f, 0x12f04c81,
-        0xcb6bab3b, 0xc066a235, 0xdd71b927, 0xd67cb029,
-        0xe75f8f03, 0xec52860d, 0xf1459d1f, 0xfa489411,
-        0x9303e34b, 0x980eea45, 0x8519f157, 0x8e14f859,
-        0xbf37c773, 0xb43ace7d, 0xa92dd56f, 0xa220dc61,
-        0xf66d76ad, 0xfd607fa3, 0xe07764b1, 0xeb7a6dbf,
-        0xda595295, 0xd1545b9b, 0xcc434089, 0xc74e4987,
-        0xae053edd, 0xa50837d3, 0xb81f2cc1, 0xb31225cf,
-        0x82311ae5, 0x893c13eb, 0x942b08f9, 0x9f2601f7,
-        0x46bde64d, 0x4db0ef43, 0x50a7f451, 0x5baafd5f,
-        0x6a89c275, 0x6184cb7b, 0x7c93d069, 0x779ed967,
-        0x1ed5ae3d, 0x15d8a733, 0x08cfbc21, 0x03c2b52f,
-        0x32e18a05, 0x39ec830b, 0x24fb9819, 0x2ff69117,
-        0x8dd64d76, 0x86db4478, 0x9bcc5f6a, 0x90c15664,
-        0xa1e2694e, 0xaaef6040, 0xb7f87b52, 0xbcf5725c,
-        0xd5be0506, 0xdeb30c08, 0xc3a4171a, 0xc8a91e14,
-        0xf98a213e, 0xf2872830, 0xef903322, 0xe49d3a2c,
-        0x3d06dd96, 0x360bd498, 0x2b1ccf8a, 0x2011c684,
-        0x1132f9ae, 0x1a3ff0a0, 0x0728ebb2, 0x0c25e2bc,
-        0x656e95e6, 0x6e639ce8, 0x737487fa, 0x78798ef4,
-        0x495ab1de, 0x4257b8d0, 0x5f40a3c2, 0x544daacc,
-        0xf7daec41, 0xfcd7e54f, 0xe1c0fe5d, 0xeacdf753,
-        0xdbeec879, 0xd0e3c177, 0xcdf4da65, 0xc6f9d36b,
-        0xafb2a431, 0xa4bfad3f, 0xb9a8b62d, 0xb2a5bf23,
-        0x83868009, 0x888b8907, 0x959c9215, 0x9e919b1b,
-        0x470a7ca1, 0x4c0775af, 0x51106ebd, 0x5a1d67b3,
-        0x6b3e5899, 0x60335197, 0x7d244a85, 0x7629438b,
-        0x1f6234d1, 0x146f3ddf, 0x097826cd, 0x02752fc3,
-        0x335610e9, 0x385b19e7, 0x254c02f5, 0x2e410bfb,
-        0x8c61d79a, 0x876cde94, 0x9a7bc586, 0x9176cc88,
-        0xa055f3a2, 0xab58faac, 0xb64fe1be, 0xbd42e8b0,
-        0xd4099fea, 0xdf0496e4, 0xc2138df6, 0xc91e84f8,
-        0xf83dbbd2, 0xf330b2dc, 0xee27a9ce, 0xe52aa0c0,
-        0x3cb1477a, 0x37bc4e74, 0x2aab5566, 0x21a65c68,
-        0x10856342, 0x1b886a4c, 0x069f715e, 0x0d927850,
-        0x64d90f0a, 0x6fd40604, 0x72c31d16, 0x79ce1418,
-        0x48ed2b32, 0x43e0223c, 0x5ef7392e, 0x55fa3020,
-        0x01b79aec, 0x0aba93e2, 0x17ad88f0, 0x1ca081fe,
-        0x2d83bed4, 0x268eb7da, 0x3b99acc8, 0x3094a5c6,
-        0x59dfd29c, 0x52d2db92, 0x4fc5c080, 0x44c8c98e,
-        0x75ebf6a4, 0x7ee6ffaa, 0x63f1e4b8, 0x68fcedb6,
-        0xb1670a0c, 0xba6a0302, 0xa77d1810, 0xac70111e,
-        0x9d532e34, 0x965e273a, 0x8b493c28, 0x80443526,
-        0xe90f427c, 0xe2024b72, 0xff155060, 0xf418596e,
-        0xc53b6644, 0xce366f4a, 0xd3217458, 0xd82c7d56,
-        0x7a0ca137, 0x7101a839, 0x6c16b32b, 0x671bba25,
-        0x5638850f, 0x5d358c01, 0x40229713, 0x4b2f9e1d,
-        0x2264e947, 0x2969e049, 0x347efb5b, 0x3f73f255,
-        0x0e50cd7f, 0x055dc471, 0x184adf63, 0x1347d66d,
-        0xcadc31d7, 0xc1d138d9, 0xdcc623cb, 0xd7cb2ac5,
-        0xe6e815ef, 0xede51ce1, 0xf0f207f3, 0xfbff0efd,
-        0x92b479a7, 0x99b970a9, 0x84ae6bbb, 0x8fa362b5,
-        0xbe805d9f, 0xb58d5491, 0xa89a4f83, 0xa397468d,
-    } };
-
     union CRYPTO_STATE st = { .l = { rm[0], rm[1] } };
+    const uint32_t *mc = decrypt ? AES_imc_rot : AES_mc_rot;
     int i;
 
     for (i = 0; i < 16; i += 4) {
         CR_ST_WORD(st, i >> 2) =
-            mc[decrypt][CR_ST_BYTE(st, i)] ^
-            rol32(mc[decrypt][CR_ST_BYTE(st, i + 1)], 8) ^
-            rol32(mc[decrypt][CR_ST_BYTE(st, i + 2)], 16) ^
-            rol32(mc[decrypt][CR_ST_BYTE(st, i + 3)], 24);
+            mc[CR_ST_BYTE(st, i)] ^
+            rol32(mc[CR_ST_BYTE(st, i + 1)], 8) ^
+            rol32(mc[CR_ST_BYTE(st, i + 2)], 16) ^
+            rol32(mc[CR_ST_BYTE(st, i + 3)], 24);
     }
 
     rd[0] = st.l[0];
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 03/35] crypto/aes: Add constants for ShiftRows, InvShiftRows
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
  2023-06-03  2:33 ` [PATCH 01/35] tests/multiarch: Add test-aes Richard Henderson
  2023-06-03  2:33 ` [PATCH 02/35] target/arm: Move aesmc and aesimc tables to crypto/aes.c Richard Henderson
@ 2023-06-03  2:33 ` Richard Henderson
  2023-06-05 10:46   ` Philippe Mathieu-Daudé
  2023-06-03  2:33 ` [PATCH 04/35] crypto: Add aesenc_SB_SR Richard Henderson
                   ` (32 subsequent siblings)
  35 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

These symbols will avoid the indirection through memory
when fully unrolling some new primitives.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 crypto/aes.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/crypto/aes.c b/crypto/aes.c
index 72c95c38fb..1309a13e91 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -108,12 +108,58 @@ const uint8_t AES_isbox[256] = {
     0xE1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0C, 0x7D,
 };
 
+/* AES ShiftRows, for complete unrolling. */
+enum {
+    AES_SH_0 = 0x0,
+    AES_SH_1 = 0x5,
+    AES_SH_2 = 0xa,
+    AES_SH_3 = 0xf,
+    AES_SH_4 = 0x4,
+    AES_SH_5 = 0x9,
+    AES_SH_6 = 0xe,
+    AES_SH_7 = 0x3,
+    AES_SH_8 = 0x8,
+    AES_SH_9 = 0xd,
+    AES_SH_A = 0x2,
+    AES_SH_B = 0x7,
+    AES_SH_C = 0xc,
+    AES_SH_D = 0x1,
+    AES_SH_E = 0x6,
+    AES_SH_F = 0xb,
+};
+
 const uint8_t AES_shifts[16] = {
-    0, 5, 10, 15, 4, 9, 14, 3, 8, 13, 2, 7, 12, 1, 6, 11
+    AES_SH_0, AES_SH_1, AES_SH_2, AES_SH_3,
+    AES_SH_4, AES_SH_5, AES_SH_6, AES_SH_7,
+    AES_SH_8, AES_SH_9, AES_SH_A, AES_SH_B,
+    AES_SH_C, AES_SH_D, AES_SH_E, AES_SH_F,
+};
+
+/* AES InvShiftRows, for complete unrolling. */
+enum {
+    AES_ISH_0 = 0x0,
+    AES_ISH_1 = 0xd,
+    AES_ISH_2 = 0xa,
+    AES_ISH_3 = 0x7,
+    AES_ISH_4 = 0x4,
+    AES_ISH_5 = 0x1,
+    AES_ISH_6 = 0xe,
+    AES_ISH_7 = 0xb,
+    AES_ISH_8 = 0x8,
+    AES_ISH_9 = 0x5,
+    AES_ISH_A = 0x2,
+    AES_ISH_B = 0xf,
+    AES_ISH_C = 0xc,
+    AES_ISH_D = 0x9,
+    AES_ISH_E = 0x6,
+    AES_ISH_F = 0x3,
 };
 
 const uint8_t AES_ishifts[16] = {
-    0, 13, 10, 7, 4, 1, 14, 11, 8, 5, 2, 15, 12, 9, 6, 3
+    AES_ISH_0, AES_ISH_1, AES_ISH_2, AES_ISH_3,
+    AES_ISH_4, AES_ISH_5, AES_ISH_6, AES_ISH_7,
+    AES_ISH_8, AES_ISH_9, AES_ISH_A, AES_ISH_B,
+    AES_ISH_C, AES_ISH_D, AES_ISH_E, AES_ISH_F,
 };
 
 /*
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 04/35] crypto: Add aesenc_SB_SR
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (2 preceding siblings ...)
  2023-06-03  2:33 ` [PATCH 03/35] crypto/aes: Add constants for ShiftRows, InvShiftRows Richard Henderson
@ 2023-06-03  2:33 ` Richard Henderson
  2023-06-03 13:15   ` Ard Biesheuvel
  2023-06-03  2:33 ` [PATCH 05/35] target/i386: Use aesenc_SB_SR Richard Henderson
                   ` (31 subsequent siblings)
  35 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

Start adding infrastructure for accelerating guest AES.
Begin with a SubBytes + ShiftRows primitive.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/generic/host/aes-round.h | 15 +++++++++
 include/crypto/aes-round.h            | 41 +++++++++++++++++++++++
 crypto/aes.c                          | 47 +++++++++++++++++++++++++++
 3 files changed, 103 insertions(+)
 create mode 100644 host/include/generic/host/aes-round.h
 create mode 100644 include/crypto/aes-round.h

diff --git a/host/include/generic/host/aes-round.h b/host/include/generic/host/aes-round.h
new file mode 100644
index 0000000000..598242c603
--- /dev/null
+++ b/host/include/generic/host/aes-round.h
@@ -0,0 +1,15 @@
+/*
+ * No host specific aes acceleration.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HOST_AES_ROUND_H
+#define HOST_AES_ROUND_H
+
+#define HAVE_AES_ACCEL  false
+#define ATTR_AES_ACCEL
+
+void aesenc_SB_SR_accel(AESState *, const AESState *, bool)
+    QEMU_ERROR("unsupported accel");
+
+#endif
diff --git a/include/crypto/aes-round.h b/include/crypto/aes-round.h
new file mode 100644
index 0000000000..784e1daee6
--- /dev/null
+++ b/include/crypto/aes-round.h
@@ -0,0 +1,41 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * AES round fragments, generic version
+ *
+ * Copyright (C) 2023 Linaro, Ltd.
+ */
+
+#ifndef CRYPTO_AES_ROUND_H
+#define CRYPTO_AES_ROUND_H
+
+/* Hosts with acceleration will usually need a 16-byte vector type. */
+typedef uint8_t AESStateVec __attribute__((vector_size(16)));
+
+typedef union {
+    uint8_t b[16];
+    uint32_t w[4];
+    uint64_t d[4];
+    AESStateVec v;
+} AESState;
+
+#include "host/aes-round.h"
+
+/*
+ * Perform SubBytes + ShiftRows.
+ */
+
+void aesenc_SB_SR_gen(AESState *ret, const AESState *st);
+void aesenc_SB_SR_genrev(AESState *ret, const AESState *st);
+
+static inline void aesenc_SB_SR(AESState *r, const AESState *st, bool be)
+{
+    if (HAVE_AES_ACCEL) {
+        aesenc_SB_SR_accel(r, st, be);
+    } else if (HOST_BIG_ENDIAN == be) {
+        aesenc_SB_SR_gen(r, st);
+    } else {
+        aesenc_SB_SR_genrev(r, st);
+    }
+}
+
+#endif /* CRYPTO_AES_ROUND_H */
diff --git a/crypto/aes.c b/crypto/aes.c
index 1309a13e91..708838315a 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -29,6 +29,7 @@
  */
 #include "qemu/osdep.h"
 #include "crypto/aes.h"
+#include "crypto/aes-round.h"
 
 typedef uint32_t u32;
 typedef uint8_t u8;
@@ -1251,6 +1252,52 @@ static const u32 rcon[] = {
         0x1B000000, 0x36000000, /* for 128-bit blocks, Rijndael never uses more than 10 rcon values */
 };
 
+/* Perform SubBytes + ShiftRows. */
+static inline void
+aesenc_SB_SR_swap(AESState *r, const AESState *st, bool swap)
+{
+    const int swap_b = swap ? 15 : 0;
+    uint8_t t;
+
+    /* These four indexes are not swizzled. */
+    r->b[swap_b ^ 0x0] = AES_sbox[st->b[swap_b ^ AES_SH_0]];
+    r->b[swap_b ^ 0x4] = AES_sbox[st->b[swap_b ^ AES_SH_4]];
+    r->b[swap_b ^ 0x8] = AES_sbox[st->b[swap_b ^ AES_SH_8]];
+    r->b[swap_b ^ 0xc] = AES_sbox[st->b[swap_b ^ AES_SH_C]];
+
+    /* Otherwise, break cycles. */
+
+    t = AES_sbox[st->b[swap_b ^ AES_SH_D]];
+    r->b[swap_b ^ 0x1] = AES_sbox[st->b[swap_b ^ AES_SH_1]];
+    r->b[swap_b ^ 0x5] = AES_sbox[st->b[swap_b ^ AES_SH_5]];
+    r->b[swap_b ^ 0x9] = AES_sbox[st->b[swap_b ^ AES_SH_9]];
+    r->b[swap_b ^ 0xd] = t;
+
+    t = AES_sbox[st->b[swap_b ^ AES_SH_A]];
+    r->b[swap_b ^ 0x2] = AES_sbox[st->b[swap_b ^ AES_SH_2]];
+    r->b[swap_b ^ 0xa] = t;
+
+    t = AES_sbox[st->b[swap_b ^ AES_SH_E]];
+    r->b[swap_b ^ 0x6] = AES_sbox[st->b[swap_b ^ AES_SH_6]];
+    r->b[swap_b ^ 0xe] = t;
+
+    t = AES_sbox[st->b[swap_b ^ AES_SH_7]];
+    r->b[swap_b ^ 0x3] = AES_sbox[st->b[swap_b ^ AES_SH_3]];
+    r->b[swap_b ^ 0xf] = AES_sbox[st->b[swap_b ^ AES_SH_F]];
+    r->b[swap_b ^ 0xb] = AES_sbox[st->b[swap_b ^ AES_SH_B]];
+    r->b[swap_b ^ 0x7] = t;
+}
+
+void aesenc_SB_SR_gen(AESState *r, const AESState *st)
+{
+    aesenc_SB_SR_swap(r, st, false);
+}
+
+void aesenc_SB_SR_genrev(AESState *r, const AESState *st)
+{
+    aesenc_SB_SR_swap(r, st, true);
+}
+
 /**
  * Expand the cipher key into the encryption key schedule.
  */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 05/35] target/i386: Use aesenc_SB_SR
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (3 preceding siblings ...)
  2023-06-03  2:33 ` [PATCH 04/35] crypto: Add aesenc_SB_SR Richard Henderson
@ 2023-06-03  2:33 ` Richard Henderson
  2023-06-03  2:33 ` [PATCH 06/35] target/arm: Demultiplex AESE and AESMC Richard Henderson
                   ` (30 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the AESENCLAST instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/i386/ops_sse.h | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index fb63af7afa..31e1f6edc7 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -19,6 +19,7 @@
  */
 
 #include "crypto/aes.h"
+#include "crypto/aes-round.h"
 
 #if SHIFT == 0
 #define Reg MMXReg
@@ -2202,12 +2203,14 @@ void glue(helper_aesenc, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
 
 void glue(helper_aesenclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
 {
-    int i;
-    Reg st = *v;
-    Reg rk = *s;
+    for (int i = 0; i < SHIFT; i++) {
+        AESState *ad = (AESState *)&d->ZMM_X(i);
+        AESState *st = (AESState *)&v->ZMM_X(i);
+        AESState *rk = (AESState *)&s->ZMM_X(i);
+        AESState t;
 
-    for (i = 0; i < 8 << SHIFT; i++) {
-        d->B(i) = rk.B(i) ^ (AES_sbox[st.B(AES_shifts[i & 15] + (i & ~15))]);
+        aesenc_SB_SR(&t, st, false);
+        ad->v = t.v ^ rk->v;
     }
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 06/35] target/arm: Demultiplex AESE and AESMC
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (4 preceding siblings ...)
  2023-06-03  2:33 ` [PATCH 05/35] target/i386: Use aesenc_SB_SR Richard Henderson
@ 2023-06-03  2:33 ` Richard Henderson
  2023-06-05 10:56   ` Philippe Mathieu-Daudé
  2023-06-03  2:33 ` [PATCH 07/35] target/arm: Use aesenc_SB_SR Richard Henderson
                   ` (29 subsequent siblings)
  35 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

Split these helpers so that we are not passing 'decrypt'
within the simd descriptor.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h             |  2 ++
 target/arm/tcg/sve.decode       |  4 ++--
 target/arm/tcg/crypto_helper.c  | 37 +++++++++++++++++++++++----------
 target/arm/tcg/translate-a64.c  | 13 ++++--------
 target/arm/tcg/translate-neon.c |  4 ++--
 target/arm/tcg/translate-sve.c  |  8 ++++---
 6 files changed, 41 insertions(+), 27 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 3335c2b10b..95e32a697a 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -552,7 +552,9 @@ DEF_HELPER_FLAGS_2(neon_qzip16, TCG_CALL_NO_RWG, void, ptr, ptr)
 DEF_HELPER_FLAGS_2(neon_qzip32, TCG_CALL_NO_RWG, void, ptr, ptr)
 
 DEF_HELPER_FLAGS_4(crypto_aese, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(crypto_aesd, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(crypto_aesmc, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(crypto_aesimc, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(crypto_sha1su0, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(crypto_sha1c, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode
index 14b3a69c36..04b6fcc0cf 100644
--- a/target/arm/tcg/sve.decode
+++ b/target/arm/tcg/sve.decode
@@ -1629,8 +1629,8 @@ STNT1_zprz      1110010 .. 10 ..... 001 ... ..... ..... \
 ### SVE2 Crypto Extensions
 
 # SVE2 crypto unary operations
-# AESMC and AESIMC
-AESMC           01000101 00 10000011100 decrypt:1 00000 rd:5
+AESMC           01000101 00 10000011100 0 00000 rd:5
+AESIMC          01000101 00 10000011100 1 00000 rd:5
 
 # SVE2 crypto destructive binary operations
 AESE            01000101 00 10001 0 11100 0 ..... .....  @rdn_rm_e0
diff --git a/target/arm/tcg/crypto_helper.c b/target/arm/tcg/crypto_helper.c
index 06254939d2..75882d9ea3 100644
--- a/target/arm/tcg/crypto_helper.c
+++ b/target/arm/tcg/crypto_helper.c
@@ -45,11 +45,9 @@ static void clear_tail_16(void *vd, uint32_t desc)
     clear_tail(vd, opr_sz, max_sz);
 }
 
-static void do_crypto_aese(uint64_t *rd, uint64_t *rn,
-                           uint64_t *rm, bool decrypt)
+static void do_crypto_aese(uint64_t *rd, uint64_t *rn, uint64_t *rm,
+                           const uint8_t *sbox, const uint8_t *shift)
 {
-    static uint8_t const * const sbox[2] = { AES_sbox, AES_isbox };
-    static uint8_t const * const shift[2] = { AES_shifts, AES_ishifts };
     union CRYPTO_STATE rk = { .l = { rm[0], rm[1] } };
     union CRYPTO_STATE st = { .l = { rn[0], rn[1] } };
     int i;
@@ -60,7 +58,7 @@ static void do_crypto_aese(uint64_t *rd, uint64_t *rn,
 
     /* combine ShiftRows operation and sbox substitution */
     for (i = 0; i < 16; i++) {
-        CR_ST_BYTE(st, i) = sbox[decrypt][CR_ST_BYTE(rk, shift[decrypt][i])];
+        CR_ST_BYTE(st, i) = sbox[CR_ST_BYTE(rk, shift[i])];
     }
 
     rd[0] = st.l[0];
@@ -70,18 +68,26 @@ static void do_crypto_aese(uint64_t *rd, uint64_t *rn,
 void HELPER(crypto_aese)(void *vd, void *vn, void *vm, uint32_t desc)
 {
     intptr_t i, opr_sz = simd_oprsz(desc);
-    bool decrypt = simd_data(desc);
 
     for (i = 0; i < opr_sz; i += 16) {
-        do_crypto_aese(vd + i, vn + i, vm + i, decrypt);
+        do_crypto_aese(vd + i, vn + i, vm + i, AES_sbox, AES_shifts);
     }
     clear_tail(vd, opr_sz, simd_maxsz(desc));
 }
 
-static void do_crypto_aesmc(uint64_t *rd, uint64_t *rm, bool decrypt)
+void HELPER(crypto_aesd)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+
+    for (i = 0; i < opr_sz; i += 16) {
+        do_crypto_aese(vd + i, vn + i, vm + i, AES_isbox, AES_ishifts);
+    }
+    clear_tail(vd, opr_sz, simd_maxsz(desc));
+}
+
+static void do_crypto_aesmc(uint64_t *rd, uint64_t *rm, const uint32_t *mc)
 {
     union CRYPTO_STATE st = { .l = { rm[0], rm[1] } };
-    const uint32_t *mc = decrypt ? AES_imc_rot : AES_mc_rot;
     int i;
 
     for (i = 0; i < 16; i += 4) {
@@ -99,10 +105,19 @@ static void do_crypto_aesmc(uint64_t *rd, uint64_t *rm, bool decrypt)
 void HELPER(crypto_aesmc)(void *vd, void *vm, uint32_t desc)
 {
     intptr_t i, opr_sz = simd_oprsz(desc);
-    bool decrypt = simd_data(desc);
 
     for (i = 0; i < opr_sz; i += 16) {
-        do_crypto_aesmc(vd + i, vm + i, decrypt);
+        do_crypto_aesmc(vd + i, vm + i, AES_mc_rot);
+    }
+    clear_tail(vd, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(crypto_aesimc)(void *vd, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+
+    for (i = 0; i < opr_sz; i += 16) {
+        do_crypto_aesmc(vd + i, vm + i, AES_imc_rot);
     }
     clear_tail(vd, opr_sz, simd_maxsz(desc));
 }
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 741a608739..3a97216d9b 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -13416,7 +13416,6 @@ static void disas_crypto_aes(DisasContext *s, uint32_t insn)
     int opcode = extract32(insn, 12, 5);
     int rn = extract32(insn, 5, 5);
     int rd = extract32(insn, 0, 5);
-    int decrypt;
     gen_helper_gvec_2 *genfn2 = NULL;
     gen_helper_gvec_3 *genfn3 = NULL;
 
@@ -13427,20 +13426,16 @@ static void disas_crypto_aes(DisasContext *s, uint32_t insn)
 
     switch (opcode) {
     case 0x4: /* AESE */
-        decrypt = 0;
         genfn3 = gen_helper_crypto_aese;
         break;
     case 0x6: /* AESMC */
-        decrypt = 0;
         genfn2 = gen_helper_crypto_aesmc;
         break;
     case 0x5: /* AESD */
-        decrypt = 1;
-        genfn3 = gen_helper_crypto_aese;
+        genfn3 = gen_helper_crypto_aesd;
         break;
     case 0x7: /* AESIMC */
-        decrypt = 1;
-        genfn2 = gen_helper_crypto_aesmc;
+        genfn2 = gen_helper_crypto_aesimc;
         break;
     default:
         unallocated_encoding(s);
@@ -13451,9 +13446,9 @@ static void disas_crypto_aes(DisasContext *s, uint32_t insn)
         return;
     }
     if (genfn2) {
-        gen_gvec_op2_ool(s, true, rd, rn, decrypt, genfn2);
+        gen_gvec_op2_ool(s, true, rd, rn, 0, genfn2);
     } else {
-        gen_gvec_op3_ool(s, true, rd, rd, rn, decrypt, genfn3);
+        gen_gvec_op3_ool(s, true, rd, rd, rn, 0, genfn3);
     }
 }
 
diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c
index af8685a4ac..bb92ee411d 100644
--- a/target/arm/tcg/translate-neon.c
+++ b/target/arm/tcg/translate-neon.c
@@ -3455,9 +3455,9 @@ static bool trans_VMVN(DisasContext *s, arg_2misc *a)
     }
 
 WRAP_2M_3_OOL_FN(gen_AESE, gen_helper_crypto_aese, 0)
-WRAP_2M_3_OOL_FN(gen_AESD, gen_helper_crypto_aese, 1)
+WRAP_2M_3_OOL_FN(gen_AESD, gen_helper_crypto_aesd, 0)
 WRAP_2M_2_OOL_FN(gen_AESMC, gen_helper_crypto_aesmc, 0)
-WRAP_2M_2_OOL_FN(gen_AESIMC, gen_helper_crypto_aesmc, 1)
+WRAP_2M_2_OOL_FN(gen_AESIMC, gen_helper_crypto_aesimc, 0)
 WRAP_2M_2_OOL_FN(gen_SHA1H, gen_helper_crypto_sha1h, 0)
 WRAP_2M_2_OOL_FN(gen_SHA1SU1, gen_helper_crypto_sha1su1, 0)
 WRAP_2M_2_OOL_FN(gen_SHA256SU0, gen_helper_crypto_sha256su0, 0)
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index 92ab290106..553c79cfe3 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -7116,12 +7116,14 @@ TRANS_FEAT(USDOT_zzzz, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz,
            a->esz == 2 ? gen_helper_gvec_usdot_b : NULL, a, 0)
 
 TRANS_FEAT_NONSTREAMING(AESMC, aa64_sve2_aes, gen_gvec_ool_zz,
-                        gen_helper_crypto_aesmc, a->rd, a->rd, a->decrypt)
+                        gen_helper_crypto_aesmc, a->rd, a->rd, 0)
+TRANS_FEAT_NONSTREAMING(AESIMC, aa64_sve2_aes, gen_gvec_ool_zz,
+                        gen_helper_crypto_aesimc, a->rd, a->rd, 0)
 
 TRANS_FEAT_NONSTREAMING(AESE, aa64_sve2_aes, gen_gvec_ool_arg_zzz,
-                        gen_helper_crypto_aese, a, false)
+                        gen_helper_crypto_aese, a, 0)
 TRANS_FEAT_NONSTREAMING(AESD, aa64_sve2_aes, gen_gvec_ool_arg_zzz,
-                        gen_helper_crypto_aese, a, true)
+                        gen_helper_crypto_aesd, a, 0)
 
 TRANS_FEAT_NONSTREAMING(SM4E, aa64_sve2_sm4, gen_gvec_ool_arg_zzz,
                         gen_helper_crypto_sm4e, a, 0)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 07/35] target/arm: Use aesenc_SB_SR
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (5 preceding siblings ...)
  2023-06-03  2:33 ` [PATCH 06/35] target/arm: Demultiplex AESE and AESMC Richard Henderson
@ 2023-06-03  2:33 ` Richard Henderson
  2023-06-03  2:33 ` [PATCH 08/35] target/ppc: " Richard Henderson
                   ` (28 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the AESE instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/crypto_helper.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/target/arm/tcg/crypto_helper.c b/target/arm/tcg/crypto_helper.c
index 75882d9ea3..5cebc88f5f 100644
--- a/target/arm/tcg/crypto_helper.c
+++ b/target/arm/tcg/crypto_helper.c
@@ -15,6 +15,7 @@
 #include "exec/helper-proto.h"
 #include "tcg/tcg-gvec-desc.h"
 #include "crypto/aes.h"
+#include "crypto/aes-round.h"
 #include "crypto/sm4.h"
 #include "vec_internal.h"
 
@@ -70,7 +71,22 @@ void HELPER(crypto_aese)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t i, opr_sz = simd_oprsz(desc);
 
     for (i = 0; i < opr_sz; i += 16) {
-        do_crypto_aese(vd + i, vn + i, vm + i, AES_sbox, AES_shifts);
+        AESState *ad = (AESState *)(vd + i);
+        AESState *st = (AESState *)(vn + i);
+        AESState *rk = (AESState *)(vm + i);
+        AESState t;
+
+        /* Our uint64_t are in the wrong order for big-endian. */
+        if (HOST_BIG_ENDIAN) {
+            t.d[0] = st->d[1] ^ rk->d[1];
+            t.d[1] = st->d[0] ^ rk->d[0];
+            aesenc_SB_SR(&t, &t, false);
+            ad->d[0] = t.d[1];
+            ad->d[1] = t.d[0];
+        } else {
+            t.v = st->v ^ rk->v;
+            aesenc_SB_SR(ad, &t, false);
+        }
     }
     clear_tail(vd, opr_sz, simd_maxsz(desc));
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 08/35] target/ppc: Use aesenc_SB_SR
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (6 preceding siblings ...)
  2023-06-03  2:33 ` [PATCH 07/35] target/arm: Use aesenc_SB_SR Richard Henderson
@ 2023-06-03  2:33 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 09/35] target/riscv: " Richard Henderson
                   ` (27 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the VCIPHERLAST instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/int_helper.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index d97a7f1f28..b49e17685b 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -25,6 +25,7 @@
 #include "qemu/log.h"
 #include "exec/helper-proto.h"
 #include "crypto/aes.h"
+#include "crypto/aes-round.h"
 #include "fpu/softfloat.h"
 #include "qapi/error.h"
 #include "qemu/guest-random.h"
@@ -2947,13 +2948,13 @@ void helper_vcipher(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 
 void helper_vcipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
-    ppc_avr_t result;
-    int i;
+    AESState *ad = (AESState *)r;
+    AESState *st = (AESState *)a;
+    AESState *rk = (AESState *)b;
+    AESState t;
 
-    VECTOR_FOR_INORDER_I(i, u8) {
-        result.VsrB(i) = b->VsrB(i) ^ (AES_sbox[a->VsrB(AES_shifts[i])]);
-    }
-    *r = result;
+    aesenc_SB_SR(&t, st, true);
+    ad->v = t.v ^ rk->v;
 }
 
 void helper_vncipher(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 09/35] target/riscv: Use aesenc_SB_SR
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (7 preceding siblings ...)
  2023-06-03  2:33 ` [PATCH 08/35] target/ppc: " Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 10/35] crypto: Add aesdec_ISB_ISR Richard Henderson
                   ` (26 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the AES64ES instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/crypto_helper.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/target/riscv/crypto_helper.c b/target/riscv/crypto_helper.c
index 2ef30281b1..82d7f3a060 100644
--- a/target/riscv/crypto_helper.c
+++ b/target/riscv/crypto_helper.c
@@ -22,6 +22,7 @@
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
 #include "crypto/aes.h"
+#include "crypto/aes-round.h"
 #include "crypto/sm4.h"
 
 #define AES_XTIME(a) \
@@ -200,7 +201,12 @@ target_ulong HELPER(aes64esm)(target_ulong rs1, target_ulong rs2)
 
 target_ulong HELPER(aes64es)(target_ulong rs1, target_ulong rs2)
 {
-    return aes64_operation(rs1, rs2, true, false);
+    AESState t;
+
+    t.d[HOST_BIG_ENDIAN] = rs1;
+    t.d[!HOST_BIG_ENDIAN] = rs2;
+    aesenc_SB_SR(&t, &t, false);
+    return t.d[HOST_BIG_ENDIAN];
 }
 
 target_ulong HELPER(aes64ds)(target_ulong rs1, target_ulong rs2)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 10/35] crypto: Add aesdec_ISB_ISR
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (8 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 09/35] target/riscv: " Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 11/35] target/i386: Use aesdec_ISB_ISR Richard Henderson
                   ` (25 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

Add a primitive for InvSubBytes + InvShiftRows.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/generic/host/aes-round.h |  3 ++
 include/crypto/aes-round.h            | 18 +++++++++++
 crypto/aes.c                          | 46 +++++++++++++++++++++++++++
 3 files changed, 67 insertions(+)

diff --git a/host/include/generic/host/aes-round.h b/host/include/generic/host/aes-round.h
index 598242c603..cb4fed61fe 100644
--- a/host/include/generic/host/aes-round.h
+++ b/host/include/generic/host/aes-round.h
@@ -12,4 +12,7 @@
 void aesenc_SB_SR_accel(AESState *, const AESState *, bool)
     QEMU_ERROR("unsupported accel");
 
+void aesdec_ISB_ISR_accel(AESState *, const AESState *, bool)
+    QEMU_ERROR("unsupported accel");
+
 #endif
diff --git a/include/crypto/aes-round.h b/include/crypto/aes-round.h
index 784e1daee6..ff1914bd63 100644
--- a/include/crypto/aes-round.h
+++ b/include/crypto/aes-round.h
@@ -38,4 +38,22 @@ static inline void aesenc_SB_SR(AESState *r, const AESState *st, bool be)
     }
 }
 
+/*
+ * Perform InvSubBytes + InvShiftRows.
+ */
+
+void aesdec_ISB_ISR_gen(AESState *ret, const AESState *st);
+void aesdec_ISB_ISR_genrev(AESState *ret, const AESState *st);
+
+static inline void aesdec_ISB_ISR(AESState *r, const AESState *st, bool be)
+{
+    if (HAVE_AES_ACCEL) {
+        aesdec_ISB_ISR_accel(r, st, be);
+    } else if (HOST_BIG_ENDIAN == be) {
+        aesdec_ISB_ISR_gen(r, st);
+    } else {
+        aesdec_ISB_ISR_genrev(r, st);
+    }
+}
+
 #endif /* CRYPTO_AES_ROUND_H */
diff --git a/crypto/aes.c b/crypto/aes.c
index 708838315a..937377647f 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -1298,6 +1298,52 @@ void aesenc_SB_SR_genrev(AESState *r, const AESState *st)
     aesenc_SB_SR_swap(r, st, true);
 }
 
+/* Perform InvSubBytes + InvShiftRows. */
+static inline void
+aesdec_ISB_ISR_swap(AESState *r, const AESState *st, bool swap)
+{
+    const int swap_b = swap ? 15 : 0;
+    uint8_t t;
+
+    /* These four indexes are not swizzled. */
+    r->b[swap_b ^ 0x0] = AES_isbox[st->b[swap_b ^ AES_ISH_0]];
+    r->b[swap_b ^ 0x4] = AES_isbox[st->b[swap_b ^ AES_ISH_4]];
+    r->b[swap_b ^ 0x8] = AES_isbox[st->b[swap_b ^ AES_ISH_8]];
+    r->b[swap_b ^ 0xc] = AES_isbox[st->b[swap_b ^ AES_ISH_C]];
+
+    /* Otherwise, break cycles. */
+
+    t = AES_isbox[st->b[swap_b ^ AES_ISH_5]];
+    r->b[swap_b ^ 0x1] = AES_isbox[st->b[swap_b ^ AES_ISH_1]];
+    r->b[swap_b ^ 0xd] = AES_isbox[st->b[swap_b ^ AES_ISH_D]];
+    r->b[swap_b ^ 0x9] = AES_isbox[st->b[swap_b ^ AES_ISH_9]];
+    r->b[swap_b ^ 0x5] = t;
+
+    t = AES_isbox[st->b[swap_b ^ AES_ISH_A]];
+    r->b[swap_b ^ 0x2] = AES_isbox[st->b[swap_b ^ AES_ISH_2]];
+    r->b[swap_b ^ 0xa] = t;
+
+    t = AES_isbox[st->b[swap_b ^ AES_ISH_E]];
+    r->b[swap_b ^ 0x6] = AES_isbox[st->b[swap_b ^ AES_ISH_6]];
+    r->b[swap_b ^ 0xe] = t;
+
+    t = AES_isbox[st->b[swap_b ^ AES_ISH_F]];
+    r->b[swap_b ^ 0x3] = AES_isbox[st->b[swap_b ^ AES_ISH_3]];
+    r->b[swap_b ^ 0x7] = AES_isbox[st->b[swap_b ^ AES_ISH_7]];
+    r->b[swap_b ^ 0xb] = AES_isbox[st->b[swap_b ^ AES_ISH_B]];
+    r->b[swap_b ^ 0xf] = t;
+}
+
+void aesdec_ISB_ISR_gen(AESState *r, const AESState *st)
+{
+    aesdec_ISB_ISR_swap(r, st, false);
+}
+
+void aesdec_ISB_ISR_genrev(AESState *r, const AESState *st)
+{
+    aesdec_ISB_ISR_swap(r, st, true);
+}
+
 /**
  * Expand the cipher key into the encryption key schedule.
  */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 11/35] target/i386: Use aesdec_ISB_ISR
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (9 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 10/35] crypto: Add aesdec_ISB_ISR Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 12/35] target/arm: " Richard Henderson
                   ` (24 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the AESDECLAST instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/i386/ops_sse.h | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 31e1f6edc7..036eabdf95 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -2177,12 +2177,14 @@ void glue(helper_aesdec, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
 
 void glue(helper_aesdeclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
 {
-    int i;
-    Reg st = *v;
-    Reg rk = *s;
+    for (int i = 0; i < SHIFT; i++) {
+        AESState *ad = (AESState *)&d->ZMM_X(i);
+        AESState *st = (AESState *)&v->ZMM_X(i);
+        AESState *rk = (AESState *)&s->ZMM_X(i);
+        AESState t;
 
-    for (i = 0; i < 8 << SHIFT; i++) {
-        d->B(i) = rk.B(i) ^ (AES_isbox[st.B(AES_ishifts[i & 15] + (i & ~15))]);
+        aesdec_ISB_ISR(&t, st, false);
+        ad->v = t.v ^ rk->v;
     }
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 12/35] target/arm: Use aesdec_ISB_ISR
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (10 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 11/35] target/i386: Use aesdec_ISB_ISR Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 13/35] target/ppc: " Richard Henderson
                   ` (23 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the AESD instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/crypto_helper.c | 37 +++++++++++++++-------------------
 1 file changed, 16 insertions(+), 21 deletions(-)

diff --git a/target/arm/tcg/crypto_helper.c b/target/arm/tcg/crypto_helper.c
index 5cebc88f5f..d7b644851f 100644
--- a/target/arm/tcg/crypto_helper.c
+++ b/target/arm/tcg/crypto_helper.c
@@ -46,26 +46,6 @@ static void clear_tail_16(void *vd, uint32_t desc)
     clear_tail(vd, opr_sz, max_sz);
 }
 
-static void do_crypto_aese(uint64_t *rd, uint64_t *rn, uint64_t *rm,
-                           const uint8_t *sbox, const uint8_t *shift)
-{
-    union CRYPTO_STATE rk = { .l = { rm[0], rm[1] } };
-    union CRYPTO_STATE st = { .l = { rn[0], rn[1] } };
-    int i;
-
-    /* xor state vector with round key */
-    rk.l[0] ^= st.l[0];
-    rk.l[1] ^= st.l[1];
-
-    /* combine ShiftRows operation and sbox substitution */
-    for (i = 0; i < 16; i++) {
-        CR_ST_BYTE(st, i) = sbox[CR_ST_BYTE(rk, shift[i])];
-    }
-
-    rd[0] = st.l[0];
-    rd[1] = st.l[1];
-}
-
 void HELPER(crypto_aese)(void *vd, void *vn, void *vm, uint32_t desc)
 {
     intptr_t i, opr_sz = simd_oprsz(desc);
@@ -96,7 +76,22 @@ void HELPER(crypto_aesd)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t i, opr_sz = simd_oprsz(desc);
 
     for (i = 0; i < opr_sz; i += 16) {
-        do_crypto_aese(vd + i, vn + i, vm + i, AES_isbox, AES_ishifts);
+        AESState *ad = (AESState *)(vd + i);
+        AESState *st = (AESState *)(vn + i);
+        AESState *rk = (AESState *)(vm + i);
+        AESState t;
+
+        /* Our uint64_t are in the wrong order for big-endian. */
+        if (HOST_BIG_ENDIAN) {
+            t.d[0] = st->d[1] ^ rk->d[1];
+            t.d[1] = st->d[0] ^ rk->d[0];
+            aesdec_ISB_ISR(&t, &t, false);
+            ad->d[0] = t.d[1];
+            ad->d[1] = t.d[0];
+        } else {
+            t.v = st->v ^ rk->v;
+            aesdec_ISB_ISR(ad, &t, false);
+        }
     }
     clear_tail(vd, opr_sz, simd_maxsz(desc));
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 13/35] target/ppc: Use aesdec_ISB_ISR
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (11 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 12/35] target/arm: " Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 14/35] target/riscv: " Richard Henderson
                   ` (22 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the VNCIPHERLAST instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/int_helper.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index b49e17685b..444beb1779 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -2979,13 +2979,13 @@ void helper_vncipher(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 
 void helper_vncipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
-    ppc_avr_t result;
-    int i;
+    AESState *ad = (AESState *)r;
+    AESState *st = (AESState *)a;
+    AESState *rk = (AESState *)b;
+    AESState t;
 
-    VECTOR_FOR_INORDER_I(i, u8) {
-        result.VsrB(i) = b->VsrB(i) ^ (AES_isbox[a->VsrB(AES_ishifts[i])]);
-    }
-    *r = result;
+    aesdec_ISB_ISR(&t, st, true);
+    ad->v = t.v ^ rk->v;
 }
 
 void helper_vshasigmaw(ppc_avr_t *r,  ppc_avr_t *a, uint32_t st_six)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 14/35] target/riscv: Use aesdec_ISB_ISR
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (12 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 13/35] target/ppc: " Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 15/35] crypto: Add aesenc_MC Richard Henderson
                   ` (21 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the AES64DS instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/crypto_helper.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/target/riscv/crypto_helper.c b/target/riscv/crypto_helper.c
index 82d7f3a060..08191b4b2a 100644
--- a/target/riscv/crypto_helper.c
+++ b/target/riscv/crypto_helper.c
@@ -211,7 +211,12 @@ target_ulong HELPER(aes64es)(target_ulong rs1, target_ulong rs2)
 
 target_ulong HELPER(aes64ds)(target_ulong rs1, target_ulong rs2)
 {
-    return aes64_operation(rs1, rs2, false, false);
+    AESState t;
+
+    t.d[HOST_BIG_ENDIAN] = rs1;
+    t.d[!HOST_BIG_ENDIAN] = rs2;
+    aesdec_ISB_ISR(&t, &t, false);
+    return t.d[HOST_BIG_ENDIAN];
 }
 
 target_ulong HELPER(aes64dsm)(target_ulong rs1, target_ulong rs2)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 15/35] crypto: Add aesenc_MC
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (13 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 14/35] target/riscv: " Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 16/35] target/arm: Use aesenc_MC Richard Henderson
                   ` (20 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

Add a primitive for MixColumns.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/generic/host/aes-round.h |  3 ++
 include/crypto/aes-round.h            | 18 +++++++++
 crypto/aes.c                          | 58 +++++++++++++++++++++++++++
 3 files changed, 79 insertions(+)

diff --git a/host/include/generic/host/aes-round.h b/host/include/generic/host/aes-round.h
index cb4fed61fe..7c48db24b6 100644
--- a/host/include/generic/host/aes-round.h
+++ b/host/include/generic/host/aes-round.h
@@ -9,6 +9,9 @@
 #define HAVE_AES_ACCEL  false
 #define ATTR_AES_ACCEL
 
+void aesenc_MC_accel(AESState *, const AESState *, bool)
+    QEMU_ERROR("unsupported accel");
+
 void aesenc_SB_SR_accel(AESState *, const AESState *, bool)
     QEMU_ERROR("unsupported accel");
 
diff --git a/include/crypto/aes-round.h b/include/crypto/aes-round.h
index ff1914bd63..f25e9572a3 100644
--- a/include/crypto/aes-round.h
+++ b/include/crypto/aes-round.h
@@ -38,6 +38,24 @@ static inline void aesenc_SB_SR(AESState *r, const AESState *st, bool be)
     }
 }
 
+/*
+ * Perform MixColumns.
+ */
+
+void aesenc_MC_gen(AESState *ret, const AESState *st);
+void aesenc_MC_genrev(AESState *ret, const AESState *st);
+
+static inline void aesenc_MC(AESState *r, const AESState *st, bool be)
+{
+    if (HAVE_AES_ACCEL) {
+        aesenc_MC_accel(r, st, be);
+    } else if (HOST_BIG_ENDIAN == be) {
+        aesenc_MC_gen(r, st);
+    } else {
+        aesenc_MC_genrev(r, st);
+    }
+}
+
 /*
  * Perform InvSubBytes + InvShiftRows.
  */
diff --git a/crypto/aes.c b/crypto/aes.c
index 937377647f..c7123eddd5 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -28,6 +28,8 @@
  * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 #include "qemu/osdep.h"
+#include "qemu/bswap.h"
+#include "qemu/bitops.h"
 #include "crypto/aes.h"
 #include "crypto/aes-round.h"
 
@@ -1298,6 +1300,62 @@ void aesenc_SB_SR_genrev(AESState *r, const AESState *st)
     aesenc_SB_SR_swap(r, st, true);
 }
 
+/* Perform MixColumns. */
+static inline void
+aesenc_MC_swap(AESState *r, const AESState *st, bool swap)
+{
+    int swap_b = swap * 0xf;
+    int swap_w = swap * 0x3;
+    bool be = HOST_BIG_ENDIAN ^ swap;
+    uint32_t t;
+
+    t = (      AES_mc_rot[st->b[swap_b ^ 0x0]] ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0x1]], 8) ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0x2]], 16) ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0x3]], 24));
+    if (be) {
+        t = bswap32(t);
+    }
+    r->w[swap_w ^ 0] = t;
+
+    t = (      AES_mc_rot[st->b[swap_b ^ 0x4]] ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0x5]], 8) ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0x6]], 16) ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0x7]], 24));
+    if (be) {
+        t = bswap32(t);
+    }
+    r->w[swap_w ^ 1] = t;
+
+    t = (      AES_mc_rot[st->b[swap_b ^ 0x8]] ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0x9]], 8) ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0xA]], 16) ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0xB]], 24));
+    if (be) {
+        t = bswap32(t);
+    }
+    r->w[swap_w ^ 2] = t;
+
+    t = (      AES_mc_rot[st->b[swap_b ^ 0xC]] ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0xD]], 8) ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0xE]], 16) ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0xF]], 24));
+    if (be) {
+        t = bswap32(t);
+    }
+    r->w[swap_w ^ 3] = t;
+}
+
+void aesenc_MC_gen(AESState *r, const AESState *st)
+{
+    aesenc_MC_swap(r, st, false);
+}
+
+void aesenc_MC_genrev(AESState *r, const AESState *st)
+{
+    aesenc_MC_swap(r, st, true);
+}
+
 /* Perform InvSubBytes + InvShiftRows. */
 static inline void
 aesdec_ISB_ISR_swap(AESState *r, const AESState *st, bool swap)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 16/35] target/arm: Use aesenc_MC
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (14 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 15/35] crypto: Add aesenc_MC Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 17/35] crypto: Add aesdec_IMC Richard Henderson
                   ` (19 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the AESMC instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/crypto_helper.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/target/arm/tcg/crypto_helper.c b/target/arm/tcg/crypto_helper.c
index d7b644851f..a0fec08771 100644
--- a/target/arm/tcg/crypto_helper.c
+++ b/target/arm/tcg/crypto_helper.c
@@ -118,7 +118,20 @@ void HELPER(crypto_aesmc)(void *vd, void *vm, uint32_t desc)
     intptr_t i, opr_sz = simd_oprsz(desc);
 
     for (i = 0; i < opr_sz; i += 16) {
-        do_crypto_aesmc(vd + i, vm + i, AES_mc_rot);
+        AESState *ad = (AESState *)(vd + i);
+        AESState *st = (AESState *)(vm + i);
+        AESState t;
+
+        /* Our uint64_t are in the wrong order for big-endian. */
+        if (HOST_BIG_ENDIAN) {
+            t.d[0] = st->d[1];
+            t.d[1] = st->d[0];
+            aesenc_MC(&t, &t, false);
+            ad->d[0] = t.d[1];
+            ad->d[1] = t.d[0];
+        } else {
+            aesenc_MC(ad, st, false);
+        }
     }
     clear_tail(vd, opr_sz, simd_maxsz(desc));
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 17/35] crypto: Add aesdec_IMC
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (15 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 16/35] target/arm: Use aesenc_MC Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 18/35] target/i386: Use aesdec_IMC Richard Henderson
                   ` (18 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

Add a primitive for InvMixColumns.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/generic/host/aes-round.h |  3 ++
 include/crypto/aes-round.h            | 18 +++++++++
 crypto/aes.c                          | 57 +++++++++++++++++++++++++++
 3 files changed, 78 insertions(+)

diff --git a/host/include/generic/host/aes-round.h b/host/include/generic/host/aes-round.h
index 7c48db24b6..1e9b97d274 100644
--- a/host/include/generic/host/aes-round.h
+++ b/host/include/generic/host/aes-round.h
@@ -15,6 +15,9 @@ void aesenc_MC_accel(AESState *, const AESState *, bool)
 void aesenc_SB_SR_accel(AESState *, const AESState *, bool)
     QEMU_ERROR("unsupported accel");
 
+void aesdec_IMC_accel(AESState *, const AESState *, bool)
+    QEMU_ERROR("unsupported accel");
+
 void aesdec_ISB_ISR_accel(AESState *, const AESState *, bool)
     QEMU_ERROR("unsupported accel");
 
diff --git a/include/crypto/aes-round.h b/include/crypto/aes-round.h
index f25e9572a3..2d962ede0b 100644
--- a/include/crypto/aes-round.h
+++ b/include/crypto/aes-round.h
@@ -74,4 +74,22 @@ static inline void aesdec_ISB_ISR(AESState *r, const AESState *st, bool be)
     }
 }
 
+/*
+ * Perform InvMixColumns.
+ */
+
+void aesdec_IMC_gen(AESState *ret, const AESState *st);
+void aesdec_IMC_genrev(AESState *ret, const AESState *st);
+
+static inline void aesdec_IMC(AESState *r, const AESState *st, bool be)
+{
+    if (HAVE_AES_ACCEL) {
+        aesdec_IMC_accel(r, st, be);
+    } else if (HOST_BIG_ENDIAN == be) {
+        aesdec_IMC_gen(r, st);
+    } else {
+        aesdec_IMC_genrev(r, st);
+    }
+}
+
 #endif /* CRYPTO_AES_ROUND_H */
diff --git a/crypto/aes.c b/crypto/aes.c
index c7123eddd5..4e654e5404 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -1402,6 +1402,63 @@ void aesdec_ISB_ISR_genrev(AESState *r, const AESState *st)
     aesdec_ISB_ISR_swap(r, st, true);
 }
 
+/* Perform InvMixColumns. */
+static inline void
+aesdec_IMC_swap(AESState *r, const AESState *st, bool swap)
+{
+    int swap_b = swap * 0xf;
+    int swap_w = swap * 0x3;
+    bool be = HOST_BIG_ENDIAN ^ swap;
+    uint32_t t;
+
+    /* Note that AES_imc is encoded for big-endian. */
+    t = (AES_imc[st->b[swap_b ^ 0x0]][0] ^
+         AES_imc[st->b[swap_b ^ 0x1]][1] ^
+         AES_imc[st->b[swap_b ^ 0x2]][2] ^
+         AES_imc[st->b[swap_b ^ 0x3]][3]);
+    if (!be) {
+        t = bswap32(t);
+    }
+    r->w[swap_w ^ 0] = t;
+
+    t = (AES_imc[st->b[swap_b ^ 0x4]][0] ^
+         AES_imc[st->b[swap_b ^ 0x5]][1] ^
+         AES_imc[st->b[swap_b ^ 0x6]][2] ^
+         AES_imc[st->b[swap_b ^ 0x7]][3]);
+    if (!be) {
+        t = bswap32(t);
+    }
+    r->w[swap_w ^ 1] = t;
+
+    t = (AES_imc[st->b[swap_b ^ 0x8]][0] ^
+         AES_imc[st->b[swap_b ^ 0x9]][1] ^
+         AES_imc[st->b[swap_b ^ 0xA]][2] ^
+         AES_imc[st->b[swap_b ^ 0xB]][3]);
+    if (!be) {
+        t = bswap32(t);
+    }
+    r->w[swap_w ^ 2] = t;
+
+    t = (AES_imc[st->b[swap_b ^ 0xC]][0] ^
+         AES_imc[st->b[swap_b ^ 0xD]][1] ^
+         AES_imc[st->b[swap_b ^ 0xE]][2] ^
+         AES_imc[st->b[swap_b ^ 0xF]][3]);
+    if (!be) {
+        t = bswap32(t);
+    }
+    r->w[swap_w ^ 3] = t;
+}
+
+void aesdec_IMC_gen(AESState *r, const AESState *st)
+{
+    aesdec_IMC_swap(r, st, false);
+}
+
+void aesdec_IMC_genrev(AESState *r, const AESState *st)
+{
+    aesdec_IMC_swap(r, st, true);
+}
+
 /**
  * Expand the cipher key into the encryption key schedule.
  */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 18/35] target/i386: Use aesdec_IMC
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (16 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 17/35] crypto: Add aesdec_IMC Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 19/35] target/arm: " Richard Henderson
                   ` (17 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the AESIMC instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/i386/ops_sse.h | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 036eabdf95..0187651140 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -2219,15 +2219,10 @@ void glue(helper_aesenclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
 #if SHIFT == 1
 void glue(helper_aesimc, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
 {
-    int i;
-    Reg tmp = *s;
+    AESState *ad = (AESState *)&d->ZMM_X(0);
+    AESState *st = (AESState *)&s->ZMM_X(0);
 
-    for (i = 0 ; i < 4 ; i++) {
-        d->L(i) = bswap32(AES_imc[tmp.B(4 * i + 0)][0] ^
-                          AES_imc[tmp.B(4 * i + 1)][1] ^
-                          AES_imc[tmp.B(4 * i + 2)][2] ^
-                          AES_imc[tmp.B(4 * i + 3)][3]);
-    }
+    aesdec_IMC(ad, st, false);
 }
 
 void glue(helper_aeskeygenassist, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 19/35] target/arm: Use aesdec_IMC
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (17 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 18/35] target/i386: Use aesdec_IMC Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 20/35] target/riscv: " Richard Henderson
                   ` (16 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the AESIMC instruction.  We have converted everything
to crypto/aes-round.h; crypto/aes.h is no longer needed.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/crypto_helper.c | 33 ++++++++++++++-------------------
 1 file changed, 14 insertions(+), 19 deletions(-)

diff --git a/target/arm/tcg/crypto_helper.c b/target/arm/tcg/crypto_helper.c
index a0fec08771..d2da80f2ba 100644
--- a/target/arm/tcg/crypto_helper.c
+++ b/target/arm/tcg/crypto_helper.c
@@ -14,7 +14,6 @@
 #include "cpu.h"
 #include "exec/helper-proto.h"
 #include "tcg/tcg-gvec-desc.h"
-#include "crypto/aes.h"
 #include "crypto/aes-round.h"
 #include "crypto/sm4.h"
 #include "vec_internal.h"
@@ -96,23 +95,6 @@ void HELPER(crypto_aesd)(void *vd, void *vn, void *vm, uint32_t desc)
     clear_tail(vd, opr_sz, simd_maxsz(desc));
 }
 
-static void do_crypto_aesmc(uint64_t *rd, uint64_t *rm, const uint32_t *mc)
-{
-    union CRYPTO_STATE st = { .l = { rm[0], rm[1] } };
-    int i;
-
-    for (i = 0; i < 16; i += 4) {
-        CR_ST_WORD(st, i >> 2) =
-            mc[CR_ST_BYTE(st, i)] ^
-            rol32(mc[CR_ST_BYTE(st, i + 1)], 8) ^
-            rol32(mc[CR_ST_BYTE(st, i + 2)], 16) ^
-            rol32(mc[CR_ST_BYTE(st, i + 3)], 24);
-    }
-
-    rd[0] = st.l[0];
-    rd[1] = st.l[1];
-}
-
 void HELPER(crypto_aesmc)(void *vd, void *vm, uint32_t desc)
 {
     intptr_t i, opr_sz = simd_oprsz(desc);
@@ -141,7 +123,20 @@ void HELPER(crypto_aesimc)(void *vd, void *vm, uint32_t desc)
     intptr_t i, opr_sz = simd_oprsz(desc);
 
     for (i = 0; i < opr_sz; i += 16) {
-        do_crypto_aesmc(vd + i, vm + i, AES_imc_rot);
+        AESState *ad = (AESState *)(vd + i);
+        AESState *st = (AESState *)(vm + i);
+        AESState t;
+
+        /* Our uint64_t are in the wrong order for big-endian. */
+        if (HOST_BIG_ENDIAN) {
+            t.d[0] = st->d[1];
+            t.d[1] = st->d[0];
+            aesdec_IMC(&t, &t, false);
+            ad->d[0] = t.d[1];
+            ad->d[1] = t.d[0];
+        } else {
+            aesdec_IMC(ad, st, false);
+        }
     }
     clear_tail(vd, opr_sz, simd_maxsz(desc));
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 20/35] target/riscv: Use aesdec_IMC
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (18 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 19/35] target/arm: " Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 21/35] crypto: Add aesenc_SB_SR_MC_AK Richard Henderson
                   ` (15 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the AES64IM instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/crypto_helper.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/target/riscv/crypto_helper.c b/target/riscv/crypto_helper.c
index 08191b4b2a..64004b2329 100644
--- a/target/riscv/crypto_helper.c
+++ b/target/riscv/crypto_helper.c
@@ -270,17 +270,12 @@ target_ulong HELPER(aes64ks1i)(target_ulong rs1, target_ulong rnum)
 
 target_ulong HELPER(aes64im)(target_ulong rs1)
 {
-    uint64_t RS1 = rs1;
-    uint32_t col_0 = RS1 & 0xFFFFFFFF;
-    uint32_t col_1 = RS1 >> 32;
-    target_ulong result;
+    AESState t;
 
-    col_0 = AES_INVMIXCOLUMN(col_0);
-    col_1 = AES_INVMIXCOLUMN(col_1);
-
-    result = ((uint64_t)col_1 << 32) | col_0;
-
-    return result;
+    t.d[HOST_BIG_ENDIAN] = rs1;
+    t.d[!HOST_BIG_ENDIAN] = 0;
+    aesdec_IMC(&t, &t, false);
+    return t.d[HOST_BIG_ENDIAN];
 }
 
 target_ulong HELPER(sm4ed)(target_ulong rs1, target_ulong rs2,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 21/35] crypto: Add aesenc_SB_SR_MC_AK
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (19 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 20/35] target/riscv: " Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 22/35] target/i386: Use aesenc_SB_SR_MC_AK Richard Henderson
                   ` (14 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

Add a primitive for SubBytes + ShiftRows + MixColumns + AddRoundKey.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/generic/host/aes-round.h |  4 ++
 include/crypto/aes-round.h            | 21 ++++++++++
 crypto/aes.c                          | 56 +++++++++++++++++++++++++++
 3 files changed, 81 insertions(+)

diff --git a/host/include/generic/host/aes-round.h b/host/include/generic/host/aes-round.h
index 1e9b97d274..dc2c751ac3 100644
--- a/host/include/generic/host/aes-round.h
+++ b/host/include/generic/host/aes-round.h
@@ -15,6 +15,10 @@ void aesenc_MC_accel(AESState *, const AESState *, bool)
 void aesenc_SB_SR_accel(AESState *, const AESState *, bool)
     QEMU_ERROR("unsupported accel");
 
+void aesenc_SB_SR_MC_AK_accel(AESState *, const AESState *,
+                              const AESState *, bool)
+    QEMU_ERROR("unsupported accel");
+
 void aesdec_IMC_accel(AESState *, const AESState *, bool)
     QEMU_ERROR("unsupported accel");
 
diff --git a/include/crypto/aes-round.h b/include/crypto/aes-round.h
index 2d962ede0b..aefa17fcc3 100644
--- a/include/crypto/aes-round.h
+++ b/include/crypto/aes-round.h
@@ -56,6 +56,27 @@ static inline void aesenc_MC(AESState *r, const AESState *st, bool be)
     }
 }
 
+/*
+ * Perform SubBytes + ShiftRows + MixColumns + AddRoundKey.
+ */
+
+void aesenc_SB_SR_MC_AK_gen(AESState *ret, const AESState *st,
+                            const AESState *rk);
+void aesenc_SB_SR_MC_AK_genrev(AESState *ret, const AESState *st,
+                               const AESState *rk);
+
+static inline void aesenc_SB_SR_MC_AK(AESState *r, const AESState *st,
+                                      const AESState *rk, bool be)
+{
+    if (HAVE_AES_ACCEL) {
+        aesenc_SB_SR_MC_AK_accel(r, st, rk, be);
+    } else if (HOST_BIG_ENDIAN == be) {
+        aesenc_SB_SR_MC_AK_gen(r, st, rk);
+    } else {
+        aesenc_SB_SR_MC_AK_genrev(r, st, rk);
+    }
+}
+
 /*
  * Perform InvSubBytes + InvShiftRows.
  */
diff --git a/crypto/aes.c b/crypto/aes.c
index 4e654e5404..6172495b46 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -1356,6 +1356,62 @@ void aesenc_MC_genrev(AESState *r, const AESState *st)
     aesenc_MC_swap(r, st, true);
 }
 
+/* Perform SubBytes + ShiftRows + MixColumns + AddRoundKey. */
+static inline void
+aesenc_SB_SR_MC_AK_swap(AESState *r, const AESState *st,
+                        const AESState *rk, bool swap)
+{
+    int swap_b = swap * 0xf;
+    int swap_w = swap * 0x3;
+    bool be = HOST_BIG_ENDIAN ^ swap;
+    uint32_t w0, w1, w2, w3;
+
+    w0 = (AES_Te0[st->b[swap_b ^ AES_SH_0]] ^
+          AES_Te1[st->b[swap_b ^ AES_SH_1]] ^
+          AES_Te2[st->b[swap_b ^ AES_SH_2]] ^
+          AES_Te3[st->b[swap_b ^ AES_SH_3]]);
+
+    w1 = (AES_Te0[st->b[swap_b ^ AES_SH_4]] ^
+          AES_Te1[st->b[swap_b ^ AES_SH_5]] ^
+          AES_Te2[st->b[swap_b ^ AES_SH_6]] ^
+          AES_Te3[st->b[swap_b ^ AES_SH_7]]);
+
+    w2 = (AES_Te0[st->b[swap_b ^ AES_SH_8]] ^
+          AES_Te1[st->b[swap_b ^ AES_SH_9]] ^
+          AES_Te2[st->b[swap_b ^ AES_SH_A]] ^
+          AES_Te3[st->b[swap_b ^ AES_SH_B]]);
+
+    w3 = (AES_Te0[st->b[swap_b ^ AES_SH_C]] ^
+          AES_Te1[st->b[swap_b ^ AES_SH_D]] ^
+          AES_Te2[st->b[swap_b ^ AES_SH_E]] ^
+          AES_Te3[st->b[swap_b ^ AES_SH_F]]);
+
+    /* Note that AES_TeX is encoded for big-endian. */
+    if (!be) {
+        w0 = bswap32(w0);
+        w1 = bswap32(w1);
+        w2 = bswap32(w2);
+        w3 = bswap32(w3);
+    }
+
+    r->w[swap_w ^ 0] = rk->w[swap_w ^ 0] ^ w0;
+    r->w[swap_w ^ 1] = rk->w[swap_w ^ 1] ^ w1;
+    r->w[swap_w ^ 2] = rk->w[swap_w ^ 2] ^ w2;
+    r->w[swap_w ^ 3] = rk->w[swap_w ^ 3] ^ w3;
+}
+
+void aesenc_SB_SR_MC_AK_gen(AESState *r, const AESState *st,
+                            const AESState *rk)
+{
+    aesenc_SB_SR_MC_AK_swap(r, st, rk, false);
+}
+
+void aesenc_SB_SR_MC_AK_genrev(AESState *r, const AESState *st,
+                               const AESState *rk)
+{
+    aesenc_SB_SR_MC_AK_swap(r, st, rk, true);
+}
+
 /* Perform InvSubBytes + InvShiftRows. */
 static inline void
 aesdec_ISB_ISR_swap(AESState *r, const AESState *st, bool swap)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 22/35] target/i386: Use aesenc_SB_SR_MC_AK
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (20 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 21/35] crypto: Add aesenc_SB_SR_MC_AK Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 23/35] target/ppc: " Richard Henderson
                   ` (13 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the AESENC instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/i386/ops_sse.h | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 0187651140..c7a2c586f4 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -2190,16 +2190,12 @@ void glue(helper_aesdeclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
 
 void glue(helper_aesenc, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
 {
-    int i;
-    Reg st = *v;
-    Reg rk = *s;
+    for (int i = 0; i < SHIFT; i++) {
+        AESState *ad = (AESState *)&d->ZMM_X(i);
+        AESState *st = (AESState *)&v->ZMM_X(i);
+        AESState *rk = (AESState *)&s->ZMM_X(i);
 
-    for (i = 0 ; i < 2 << SHIFT ; i++) {
-        int j = i & 3;
-        d->L(i) = rk.L(i) ^ bswap32(AES_Te0[st.B(AES_shifts[4 * j + 0])] ^
-                                    AES_Te1[st.B(AES_shifts[4 * j + 1])] ^
-                                    AES_Te2[st.B(AES_shifts[4 * j + 2])] ^
-                                    AES_Te3[st.B(AES_shifts[4 * j + 3])]);
+        aesenc_SB_SR_MC_AK(ad, st, rk, false);
     }
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 23/35] target/ppc: Use aesenc_SB_SR_MC_AK
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (21 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 22/35] target/i386: Use aesenc_SB_SR_MC_AK Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 24/35] target/riscv: " Richard Henderson
                   ` (12 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the VCIPHER instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/int_helper.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 444beb1779..c7f8b39e9a 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -2933,17 +2933,11 @@ void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
 
 void helper_vcipher(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
-    ppc_avr_t result;
-    int i;
+    AESState *ad = (AESState *)r;
+    AESState *st = (AESState *)a;
+    AESState *rk = (AESState *)b;
 
-    VECTOR_FOR_INORDER_I(i, u32) {
-        result.VsrW(i) = b->VsrW(i) ^
-            (AES_Te0[a->VsrB(AES_shifts[4 * i + 0])] ^
-             AES_Te1[a->VsrB(AES_shifts[4 * i + 1])] ^
-             AES_Te2[a->VsrB(AES_shifts[4 * i + 2])] ^
-             AES_Te3[a->VsrB(AES_shifts[4 * i + 3])]);
-    }
-    *r = result;
+    aesenc_SB_SR_MC_AK(ad, st, rk, true);
 }
 
 void helper_vcipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 24/35] target/riscv: Use aesenc_SB_SR_MC_AK
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (22 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 23/35] target/ppc: " Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 25/35] crypto: Add aesdec_ISB_ISR_IMC_AK Richard Henderson
                   ` (11 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the AES64ESM instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/crypto_helper.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/target/riscv/crypto_helper.c b/target/riscv/crypto_helper.c
index 64004b2329..71694b787c 100644
--- a/target/riscv/crypto_helper.c
+++ b/target/riscv/crypto_helper.c
@@ -196,7 +196,16 @@ static inline target_ulong aes64_operation(target_ulong rs1, target_ulong rs2,
 
 target_ulong HELPER(aes64esm)(target_ulong rs1, target_ulong rs2)
 {
-    return aes64_operation(rs1, rs2, true, true);
+    AESState t, z = { };
+
+    /*
+     * This instruction does not include a round key,
+     * so supply a zero to our primitive.
+     */
+    t.d[HOST_BIG_ENDIAN] = rs1;
+    t.d[!HOST_BIG_ENDIAN] = rs2;
+    aesenc_SB_SR_MC_AK(&t, &t, &z, false);
+    return t.d[HOST_BIG_ENDIAN];
 }
 
 target_ulong HELPER(aes64es)(target_ulong rs1, target_ulong rs2)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 25/35] crypto: Add aesdec_ISB_ISR_IMC_AK
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (23 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 24/35] target/riscv: " Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 26/35] target/i386: Use aesdec_ISB_ISR_IMC_AK Richard Henderson
                   ` (10 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

Add a primitive for InvSubBytes + InvShiftRows +
InvMixColumns + AddRoundKey.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/generic/host/aes-round.h |  4 ++
 include/crypto/aes-round.h            | 21 ++++++++++
 crypto/aes.c                          | 56 +++++++++++++++++++++++++++
 3 files changed, 81 insertions(+)

diff --git a/host/include/generic/host/aes-round.h b/host/include/generic/host/aes-round.h
index dc2c751ac3..848436379d 100644
--- a/host/include/generic/host/aes-round.h
+++ b/host/include/generic/host/aes-round.h
@@ -25,4 +25,8 @@ void aesdec_IMC_accel(AESState *, const AESState *, bool)
 void aesdec_ISB_ISR_accel(AESState *, const AESState *, bool)
     QEMU_ERROR("unsupported accel");
 
+void aesdec_ISB_ISR_IMC_AK_accel(AESState *, const AESState *,
+                                 const AESState *, bool)
+    QEMU_ERROR("unsupported accel");
+
 #endif
diff --git a/include/crypto/aes-round.h b/include/crypto/aes-round.h
index aefa17fcc3..352687ce11 100644
--- a/include/crypto/aes-round.h
+++ b/include/crypto/aes-round.h
@@ -113,4 +113,25 @@ static inline void aesdec_IMC(AESState *r, const AESState *st, bool be)
     }
 }
 
+/*
+ * Perform InvSubBytes + InvShiftRows + InvMixColumns + AddRoundKey.
+ */
+
+void aesdec_ISB_ISR_IMC_AK_gen(AESState *ret, const AESState *st,
+                               const AESState *rk);
+void aesdec_ISB_ISR_IMC_AK_genrev(AESState *ret, const AESState *st,
+                                  const AESState *rk);
+
+static inline void aesdec_ISB_ISR_IMC_AK(AESState *r, const AESState *st,
+                                         const AESState *rk, bool be)
+{
+    if (HAVE_AES_ACCEL) {
+        aesdec_ISB_ISR_IMC_AK_accel(r, st, rk, be);
+    } else if (HOST_BIG_ENDIAN == be) {
+        aesdec_ISB_ISR_IMC_AK_gen(r, st, rk);
+    } else {
+        aesdec_ISB_ISR_IMC_AK_genrev(r, st, rk);
+    }
+}
+
 #endif /* CRYPTO_AES_ROUND_H */
diff --git a/crypto/aes.c b/crypto/aes.c
index 6172495b46..1696086868 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -1515,6 +1515,62 @@ void aesdec_IMC_genrev(AESState *r, const AESState *st)
     aesdec_IMC_swap(r, st, true);
 }
 
+/* Perform InvSubBytes + InvShiftRows + InvMixColumns + AddRoundKey. */
+static inline void
+aesdec_ISB_ISR_IMC_AK_swap(AESState *r, const AESState *st,
+                           const AESState *rk, bool swap)
+{
+    int swap_b = swap * 0xf;
+    int swap_w = swap * 0x3;
+    bool be = HOST_BIG_ENDIAN ^ swap;
+    uint32_t w0, w1, w2, w3;
+
+    w0 = (AES_Td0[st->b[swap_b ^ AES_ISH_0]] ^
+          AES_Td1[st->b[swap_b ^ AES_ISH_1]] ^
+          AES_Td2[st->b[swap_b ^ AES_ISH_2]] ^
+          AES_Td3[st->b[swap_b ^ AES_ISH_3]]);
+
+    w1 = (AES_Td0[st->b[swap_b ^ AES_ISH_4]] ^
+          AES_Td1[st->b[swap_b ^ AES_ISH_5]] ^
+          AES_Td2[st->b[swap_b ^ AES_ISH_6]] ^
+          AES_Td3[st->b[swap_b ^ AES_ISH_7]]);
+
+    w2 = (AES_Td0[st->b[swap_b ^ AES_ISH_8]] ^
+          AES_Td1[st->b[swap_b ^ AES_ISH_9]] ^
+          AES_Td2[st->b[swap_b ^ AES_ISH_A]] ^
+          AES_Td3[st->b[swap_b ^ AES_ISH_B]]);
+
+    w3 = (AES_Td0[st->b[swap_b ^ AES_ISH_C]] ^
+          AES_Td1[st->b[swap_b ^ AES_ISH_D]] ^
+          AES_Td2[st->b[swap_b ^ AES_ISH_E]] ^
+          AES_Td3[st->b[swap_b ^ AES_ISH_F]]);
+
+    /* Note that AES_TdX is encoded for big-endian. */
+    if (!be) {
+        w0 = bswap32(w0);
+        w1 = bswap32(w1);
+        w2 = bswap32(w2);
+        w3 = bswap32(w3);
+    }
+
+    r->w[swap_w ^ 0] = rk->w[swap_w ^ 0] ^ w0;
+    r->w[swap_w ^ 1] = rk->w[swap_w ^ 1] ^ w1;
+    r->w[swap_w ^ 2] = rk->w[swap_w ^ 2] ^ w2;
+    r->w[swap_w ^ 3] = rk->w[swap_w ^ 3] ^ w3;
+}
+
+void aesdec_ISB_ISR_IMC_AK_gen(AESState *r, const AESState *st,
+                               const AESState *rk)
+{
+    aesdec_ISB_ISR_IMC_AK_swap(r, st, rk, false);
+}
+
+void aesdec_ISB_ISR_IMC_AK_genrev(AESState *r, const AESState *st,
+                                  const AESState *rk)
+{
+    aesdec_ISB_ISR_IMC_AK_swap(r, st, rk, true);
+}
+
 /**
  * Expand the cipher key into the encryption key schedule.
  */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 26/35] target/i386: Use aesdec_ISB_ISR_IMC_AK
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (24 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 25/35] crypto: Add aesdec_ISB_ISR_IMC_AK Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 27/35] target/riscv: " Richard Henderson
                   ` (9 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the AESDEC instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/i386/ops_sse.h | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index c7a2c586f4..e666bd5068 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -2162,16 +2162,12 @@ void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s,
 
 void glue(helper_aesdec, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
 {
-    int i;
-    Reg st = *v;
-    Reg rk = *s;
+    for (int i = 0; i < SHIFT; i++) {
+        AESState *ad = (AESState *)&d->ZMM_X(i);
+        AESState *st = (AESState *)&v->ZMM_X(i);
+        AESState *rk = (AESState *)&s->ZMM_X(i);
 
-    for (i = 0 ; i < 2 << SHIFT ; i++) {
-        int j = i & 3;
-        d->L(i) = rk.L(i) ^ bswap32(AES_Td0[st.B(AES_ishifts[4 * j + 0])] ^
-                                    AES_Td1[st.B(AES_ishifts[4 * j + 1])] ^
-                                    AES_Td2[st.B(AES_ishifts[4 * j + 2])] ^
-                                    AES_Td3[st.B(AES_ishifts[4 * j + 3])]);
+        aesdec_ISB_ISR_IMC_AK(ad, st, rk, false);
     }
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 27/35] target/riscv: Use aesdec_ISB_ISR_IMC_AK
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (25 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 26/35] target/i386: Use aesdec_ISB_ISR_IMC_AK Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 28/35] crypto: Add aesdec_ISB_ISR_AK_IMC Richard Henderson
                   ` (8 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the AES64DSM instruction.  This was the last use
of aes64_operation and its support macros, so remove them all.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/crypto_helper.c | 101 ++++-------------------------------
 1 file changed, 10 insertions(+), 91 deletions(-)

diff --git a/target/riscv/crypto_helper.c b/target/riscv/crypto_helper.c
index 71694b787c..affa8292d1 100644
--- a/target/riscv/crypto_helper.c
+++ b/target/riscv/crypto_helper.c
@@ -104,96 +104,6 @@ target_ulong HELPER(aes32dsi)(target_ulong rs1, target_ulong rs2,
     return aes32_operation(shamt, rs1, rs2, false, false);
 }
 
-#define BY(X, I) ((X >> (8 * I)) & 0xFF)
-
-#define AES_SHIFROWS_LO(RS1, RS2) ( \
-    (((RS1 >> 24) & 0xFF) << 56) | (((RS2 >> 48) & 0xFF) << 48) | \
-    (((RS2 >> 8) & 0xFF) << 40) | (((RS1 >> 32) & 0xFF) << 32) | \
-    (((RS2 >> 56) & 0xFF) << 24) | (((RS2 >> 16) & 0xFF) << 16) | \
-    (((RS1 >> 40) & 0xFF) << 8) | (((RS1 >> 0) & 0xFF) << 0))
-
-#define AES_INVSHIFROWS_LO(RS1, RS2) ( \
-    (((RS2 >> 24) & 0xFF) << 56) | (((RS2 >> 48) & 0xFF) << 48) | \
-    (((RS1 >> 8) & 0xFF) << 40) | (((RS1 >> 32) & 0xFF) << 32) | \
-    (((RS1 >> 56) & 0xFF) << 24) | (((RS2 >> 16) & 0xFF) << 16) | \
-    (((RS2 >> 40) & 0xFF) << 8) | (((RS1 >> 0) & 0xFF) << 0))
-
-#define AES_MIXBYTE(COL, B0, B1, B2, B3) ( \
-    BY(COL, B3) ^ BY(COL, B2) ^ AES_GFMUL(BY(COL, B1), 3) ^ \
-    AES_GFMUL(BY(COL, B0), 2))
-
-#define AES_MIXCOLUMN(COL) ( \
-    AES_MIXBYTE(COL, 3, 0, 1, 2) << 24 | \
-    AES_MIXBYTE(COL, 2, 3, 0, 1) << 16 | \
-    AES_MIXBYTE(COL, 1, 2, 3, 0) << 8 | AES_MIXBYTE(COL, 0, 1, 2, 3) << 0)
-
-#define AES_INVMIXBYTE(COL, B0, B1, B2, B3) ( \
-    AES_GFMUL(BY(COL, B3), 0x9) ^ AES_GFMUL(BY(COL, B2), 0xd) ^ \
-    AES_GFMUL(BY(COL, B1), 0xb) ^ AES_GFMUL(BY(COL, B0), 0xe))
-
-#define AES_INVMIXCOLUMN(COL) ( \
-    AES_INVMIXBYTE(COL, 3, 0, 1, 2) << 24 | \
-    AES_INVMIXBYTE(COL, 2, 3, 0, 1) << 16 | \
-    AES_INVMIXBYTE(COL, 1, 2, 3, 0) << 8 | \
-    AES_INVMIXBYTE(COL, 0, 1, 2, 3) << 0)
-
-static inline target_ulong aes64_operation(target_ulong rs1, target_ulong rs2,
-                                           bool enc, bool mix)
-{
-    uint64_t RS1 = rs1;
-    uint64_t RS2 = rs2;
-    uint64_t result;
-    uint64_t temp;
-    uint32_t col_0;
-    uint32_t col_1;
-
-    if (enc) {
-        temp = AES_SHIFROWS_LO(RS1, RS2);
-        temp = (((uint64_t)AES_sbox[(temp >> 0) & 0xFF] << 0) |
-                ((uint64_t)AES_sbox[(temp >> 8) & 0xFF] << 8) |
-                ((uint64_t)AES_sbox[(temp >> 16) & 0xFF] << 16) |
-                ((uint64_t)AES_sbox[(temp >> 24) & 0xFF] << 24) |
-                ((uint64_t)AES_sbox[(temp >> 32) & 0xFF] << 32) |
-                ((uint64_t)AES_sbox[(temp >> 40) & 0xFF] << 40) |
-                ((uint64_t)AES_sbox[(temp >> 48) & 0xFF] << 48) |
-                ((uint64_t)AES_sbox[(temp >> 56) & 0xFF] << 56));
-        if (mix) {
-            col_0 = temp & 0xFFFFFFFF;
-            col_1 = temp >> 32;
-
-            col_0 = AES_MIXCOLUMN(col_0);
-            col_1 = AES_MIXCOLUMN(col_1);
-
-            result = ((uint64_t)col_1 << 32) | col_0;
-        } else {
-            result = temp;
-        }
-    } else {
-        temp = AES_INVSHIFROWS_LO(RS1, RS2);
-        temp = (((uint64_t)AES_isbox[(temp >> 0) & 0xFF] << 0) |
-                ((uint64_t)AES_isbox[(temp >> 8) & 0xFF] << 8) |
-                ((uint64_t)AES_isbox[(temp >> 16) & 0xFF] << 16) |
-                ((uint64_t)AES_isbox[(temp >> 24) & 0xFF] << 24) |
-                ((uint64_t)AES_isbox[(temp >> 32) & 0xFF] << 32) |
-                ((uint64_t)AES_isbox[(temp >> 40) & 0xFF] << 40) |
-                ((uint64_t)AES_isbox[(temp >> 48) & 0xFF] << 48) |
-                ((uint64_t)AES_isbox[(temp >> 56) & 0xFF] << 56));
-        if (mix) {
-            col_0 = temp & 0xFFFFFFFF;
-            col_1 = temp >> 32;
-
-            col_0 = AES_INVMIXCOLUMN(col_0);
-            col_1 = AES_INVMIXCOLUMN(col_1);
-
-            result = ((uint64_t)col_1 << 32) | col_0;
-        } else {
-            result = temp;
-        }
-    }
-
-    return result;
-}
-
 target_ulong HELPER(aes64esm)(target_ulong rs1, target_ulong rs2)
 {
     AESState t, z = { };
@@ -230,7 +140,16 @@ target_ulong HELPER(aes64ds)(target_ulong rs1, target_ulong rs2)
 
 target_ulong HELPER(aes64dsm)(target_ulong rs1, target_ulong rs2)
 {
-    return aes64_operation(rs1, rs2, false, true);
+    AESState t, z = { };
+
+    /*
+     * This instruction does not include a round key,
+     * so supply a zero to our primitive.
+     */
+    t.d[HOST_BIG_ENDIAN] = rs1;
+    t.d[!HOST_BIG_ENDIAN] = rs2;
+    aesdec_ISB_ISR_IMC_AK(&t, &t, &z, false);
+    return t.d[HOST_BIG_ENDIAN];
 }
 
 target_ulong HELPER(aes64ks2)(target_ulong rs1, target_ulong rs2)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 28/35] crypto: Add aesdec_ISB_ISR_AK_IMC
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (26 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 27/35] target/riscv: " Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 29/35] target/ppc: Use aesdec_ISB_ISR_AK_IMC Richard Henderson
                   ` (7 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

Add a primitive for InvSubBytes + InvShiftRows +
AddRoundKey + InvMixColumns.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/generic/host/aes-round.h |  4 ++++
 include/crypto/aes-round.h            | 21 +++++++++++++++++++++
 crypto/aes.c                          | 20 ++++++++++++++++++++
 3 files changed, 45 insertions(+)

diff --git a/host/include/generic/host/aes-round.h b/host/include/generic/host/aes-round.h
index 848436379d..84f82e53d8 100644
--- a/host/include/generic/host/aes-round.h
+++ b/host/include/generic/host/aes-round.h
@@ -25,6 +25,10 @@ void aesdec_IMC_accel(AESState *, const AESState *, bool)
 void aesdec_ISB_ISR_accel(AESState *, const AESState *, bool)
     QEMU_ERROR("unsupported accel");
 
+void aesdec_ISB_ISR_AK_IMC_accel(AESState *, const AESState *,
+                                 const AESState *, bool)
+    QEMU_ERROR("unsupported accel");
+
 void aesdec_ISB_ISR_IMC_AK_accel(AESState *, const AESState *,
                                  const AESState *, bool)
     QEMU_ERROR("unsupported accel");
diff --git a/include/crypto/aes-round.h b/include/crypto/aes-round.h
index 352687ce11..b48b87671c 100644
--- a/include/crypto/aes-round.h
+++ b/include/crypto/aes-round.h
@@ -113,6 +113,27 @@ static inline void aesdec_IMC(AESState *r, const AESState *st, bool be)
     }
 }
 
+/*
+ * Perform InvSubBytes + InvShiftRows + AddRoundKey + InvMixColumns.
+ */
+
+void aesdec_ISB_ISR_AK_IMC_gen(AESState *ret, const AESState *st,
+                               const AESState *rk);
+void aesdec_ISB_ISR_AK_IMC_genrev(AESState *ret, const AESState *st,
+                                  const AESState *rk);
+
+static inline void aesdec_ISB_ISR_AK_IMC(AESState *r, const AESState *st,
+                                         const AESState *rk, bool be)
+{
+    if (HAVE_AES_ACCEL) {
+        aesdec_ISB_ISR_AK_IMC_accel(r, st, rk, be);
+    } else if (HOST_BIG_ENDIAN == be) {
+        aesdec_ISB_ISR_AK_IMC_gen(r, st, rk);
+    } else {
+        aesdec_ISB_ISR_AK_IMC_genrev(r, st, rk);
+    }
+}
+
 /*
  * Perform InvSubBytes + InvShiftRows + InvMixColumns + AddRoundKey.
  */
diff --git a/crypto/aes.c b/crypto/aes.c
index 1696086868..c0e4bc5580 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -1571,6 +1571,26 @@ void aesdec_ISB_ISR_IMC_AK_genrev(AESState *r, const AESState *st,
     aesdec_ISB_ISR_IMC_AK_swap(r, st, rk, true);
 }
 
+void aesdec_ISB_ISR_AK_IMC_gen(AESState *r, const AESState *st,
+                               const AESState *rk)
+{
+    AESState t;
+
+    aesdec_ISB_ISR_gen(&t, st);
+    t.v ^= rk->v;
+    aesdec_IMC_gen(r, &t);
+}
+
+void aesdec_ISB_ISR_AK_IMC_genrev(AESState *r, const AESState *st,
+                                  const AESState *rk)
+{
+    AESState t;
+
+    aesdec_ISB_ISR_genrev(&t, st);
+    t.v ^= rk->v;
+    aesdec_IMC_genrev(r, &t);
+}
+
 /**
  * Expand the cipher key into the encryption key schedule.
  */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 29/35] target/ppc: Use aesdec_ISB_ISR_AK_IMC
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (27 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 28/35] crypto: Add aesdec_ISB_ISR_AK_IMC Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 30/35] host/include/i386: Implement aes-round.h Richard Henderson
                   ` (6 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This implements the VNCIPHER instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/int_helper.c | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index c7f8b39e9a..8ae10ad748 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -2953,22 +2953,11 @@ void helper_vcipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 
 void helper_vncipher(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
-    /* This differs from what is written in ISA V2.07.  The RTL is */
-    /* incorrect and will be fixed in V2.07B.                      */
-    int i;
-    ppc_avr_t tmp;
+    AESState *ad = (AESState *)r;
+    AESState *st = (AESState *)a;
+    AESState *rk = (AESState *)b;
 
-    VECTOR_FOR_INORDER_I(i, u8) {
-        tmp.VsrB(i) = b->VsrB(i) ^ AES_isbox[a->VsrB(AES_ishifts[i])];
-    }
-
-    VECTOR_FOR_INORDER_I(i, u32) {
-        r->VsrW(i) =
-            AES_imc[tmp.VsrB(4 * i + 0)][0] ^
-            AES_imc[tmp.VsrB(4 * i + 1)][1] ^
-            AES_imc[tmp.VsrB(4 * i + 2)][2] ^
-            AES_imc[tmp.VsrB(4 * i + 3)][3];
-    }
+    aesdec_ISB_ISR_AK_IMC(ad, st, rk, true);
 }
 
 void helper_vncipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 30/35] host/include/i386: Implement aes-round.h
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (28 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 29/35] target/ppc: Use aesdec_ISB_ISR_AK_IMC Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 31/35] host/include/aarch64: " Richard Henderson
                   ` (5 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

Detect AES in cpuinfo; implement the accel hooks.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/i386/host/aes-round.h   | 148 +++++++++++++++++++++++++++
 host/include/i386/host/cpuinfo.h     |   1 +
 host/include/x86_64/host/aes-round.h |   1 +
 util/cpuinfo-i386.c                  |   3 +
 4 files changed, 153 insertions(+)
 create mode 100644 host/include/i386/host/aes-round.h
 create mode 100644 host/include/x86_64/host/aes-round.h

diff --git a/host/include/i386/host/aes-round.h b/host/include/i386/host/aes-round.h
new file mode 100644
index 0000000000..b67e20578d
--- /dev/null
+++ b/host/include/i386/host/aes-round.h
@@ -0,0 +1,148 @@
+/*
+ * x86 specific aes acceleration.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HOST_AES_ROUND_H
+#define HOST_AES_ROUND_H
+
+#include "host/cpuinfo.h"
+#include <immintrin.h>
+
+#if defined(__AES__) && defined(__SSSE3__)
+# define HAVE_AES_ACCEL  true
+# define ATTR_AES_ACCEL
+#else
+# define HAVE_AES_ACCEL  likely(cpuinfo & CPUINFO_AES)
+# define ATTR_AES_ACCEL  __attribute__((target("aes,ssse3")))
+#endif
+
+static inline __m128i ATTR_AES_ACCEL
+aes_accel_bswap(__m128i x)
+{
+    return _mm_shuffle_epi8(x, _mm_set_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8,
+                                            9, 10, 11, 12, 13, 14, 15));
+}
+
+static inline void ATTR_AES_ACCEL
+aesenc_MC_accel(AESState *ret, const AESState *st, bool be)
+{
+    __m128i t = (__m128i)st->v;
+    __m128i z = _mm_setzero_si128();
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        t = _mm_aesdeclast_si128(t, z);
+        t = _mm_aesenc_si128(t, z);
+        t = aes_accel_bswap(t);
+    } else {
+        t = _mm_aesdeclast_si128(t, z);
+        t = _mm_aesenc_si128(t, z);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesenc_SB_SR_accel(AESState *ret, const AESState *st, bool be)
+{
+    __m128i t = (__m128i)st->v;
+    __m128i z = _mm_setzero_si128();
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        t = _mm_aesenclast_si128(t, z);
+        t = aes_accel_bswap(t);
+    } else {
+        t = _mm_aesenclast_si128(t, z);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesenc_SB_SR_MC_AK_accel(AESState *ret, const AESState *st,
+                         const AESState *rk, bool be)
+{
+    __m128i t = (__m128i)st->v;
+    __m128i k = (__m128i)rk->v;
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        k = aes_accel_bswap(k);
+        t = _mm_aesenc_si128(t, k);
+        t = aes_accel_bswap(t);
+    } else {
+        t = _mm_aesenc_si128(t, k);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesdec_IMC_accel(AESState *ret, const AESState *st, bool be)
+{
+    __m128i t = (__m128i)st->v;
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        t = _mm_aesimc_si128(t);
+        t = aes_accel_bswap(t);
+    } else {
+        t = _mm_aesimc_si128(t);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesdec_ISB_ISR_accel(AESState *ret, const AESState *st, bool be)
+{
+    __m128i t = (__m128i)st->v;
+    __m128i z = _mm_setzero_si128();
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        t = _mm_aesdeclast_si128(t, z);
+        t = aes_accel_bswap(t);
+    } else {
+        t = _mm_aesdeclast_si128(t, z);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesdec_ISB_ISR_AK_IMC_accel(AESState *ret, const AESState *st,
+                            const AESState *rk, bool be)
+{
+    __m128i t = (__m128i)st->v;
+    __m128i k = (__m128i)rk->v;
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        k = aes_accel_bswap(k);
+        k = _mm_aesimc_si128(k);
+        t = _mm_aesdec_si128(t, k);
+        t = aes_accel_bswap(t);
+    } else {
+        k = _mm_aesimc_si128(k);
+        t = _mm_aesdec_si128(t, k);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesdec_ISB_ISR_IMC_AK_accel(AESState *ret, const AESState *st,
+                            const AESState *rk, bool be)
+{
+    __m128i t = (__m128i)st->v;
+    __m128i k = (__m128i)rk->v;
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        k = aes_accel_bswap(k);
+        t = _mm_aesdec_si128(t, k);
+        t = aes_accel_bswap(t);
+    } else {
+        t = _mm_aesdec_si128(t, k);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+#endif
diff --git a/host/include/i386/host/cpuinfo.h b/host/include/i386/host/cpuinfo.h
index a6537123cf..073d0a426f 100644
--- a/host/include/i386/host/cpuinfo.h
+++ b/host/include/i386/host/cpuinfo.h
@@ -26,6 +26,7 @@
 #define CPUINFO_AVX512VBMI2     (1u << 15)
 #define CPUINFO_ATOMIC_VMOVDQA  (1u << 16)
 #define CPUINFO_ATOMIC_VMOVDQU  (1u << 17)
+#define CPUINFO_AES             (1u << 18)
 
 /* Initialized with a constructor. */
 extern unsigned cpuinfo;
diff --git a/host/include/x86_64/host/aes-round.h b/host/include/x86_64/host/aes-round.h
new file mode 100644
index 0000000000..7da13f5424
--- /dev/null
+++ b/host/include/x86_64/host/aes-round.h
@@ -0,0 +1 @@
+#include "host/include/i386/host/aes-round.h"
diff --git a/util/cpuinfo-i386.c b/util/cpuinfo-i386.c
index ab6143d9e7..3a7b7e0ad1 100644
--- a/util/cpuinfo-i386.c
+++ b/util/cpuinfo-i386.c
@@ -40,6 +40,9 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
         info |= (c & bit_MOVBE ? CPUINFO_MOVBE : 0);
         info |= (c & bit_POPCNT ? CPUINFO_POPCNT : 0);
 
+        /* Our AES support requires PSHUFB as well. */
+        info |= ((c & bit_AES) && (c & bit_SSSE3) ? CPUINFO_AES : 0);
+
         /* For AVX features, we must check available and usable. */
         if ((c & bit_AVX) && (c & bit_OSXSAVE)) {
             unsigned bv = xgetbv_low(0);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 31/35] host/include/aarch64: Implement aes-round.h
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (29 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 30/35] host/include/i386: Implement aes-round.h Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03 12:50   ` Ard Biesheuvel
  2023-06-03  2:34 ` [PATCH 32/35] crypto: Remove AES_shifts, AES_ishifts Richard Henderson
                   ` (4 subsequent siblings)
  35 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

Detect AES in cpuinfo; implement the accel hooks.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/aarch64/host/aes-round.h | 204 ++++++++++++++++++++++++++
 host/include/aarch64/host/cpuinfo.h   |   1 +
 util/cpuinfo-aarch64.c                |   2 +
 3 files changed, 207 insertions(+)
 create mode 100644 host/include/aarch64/host/aes-round.h

diff --git a/host/include/aarch64/host/aes-round.h b/host/include/aarch64/host/aes-round.h
new file mode 100644
index 0000000000..27ca823db6
--- /dev/null
+++ b/host/include/aarch64/host/aes-round.h
@@ -0,0 +1,204 @@
+/*
+ * AArch64 specific aes acceleration.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HOST_AES_ROUND_H
+#define HOST_AES_ROUND_H
+
+#include "host/cpuinfo.h"
+#include <arm_neon.h>
+
+#ifdef __ARM_FEATURE_AES
+# define HAVE_AES_ACCEL  true
+# define ATTR_AES_ACCEL
+#else
+# define HAVE_AES_ACCEL  likely(cpuinfo & CPUINFO_AES)
+# define ATTR_AES_ACCEL  __attribute__((target("+crypto")))
+#endif
+
+static inline uint8x16_t aes_accel_bswap(uint8x16_t x)
+{
+    /* No arm_neon.h primitive, and the compilers don't share builtins. */
+#ifdef __clang__
+    return __builtin_shufflevector(x, x, 15, 14, 13, 12, 11, 10, 9, 8,
+                                   7, 6, 5, 4, 3, 2, 1, 0);
+#else
+    return __builtin_shuffle(x, (uint8x16_t)
+                             { 15, 14, 13, 12, 11, 10, 9, 8,
+                               7,  6,  5,  4,  3,   2, 1, 0, });
+#endif
+}
+
+/*
+ * Through clang 15, the aes inlines are only defined if __ARM_FEATURE_AES;
+ * one cannot use __attribute__((target)) to make them appear after the fact.
+ * Therefore we must fallback to inline asm.
+ */
+#ifdef __ARM_FEATURE_AES
+# define aes_accel_aesd   vaesdq_u8
+# define aes_accel_aese   vaeseq_u8
+# define aes_accel_aesmc  vaesmcq_u8
+# define aes_accel_aesimc vaesimcq_u8
+#else
+static inline uint8x16_t aes_accel_aesd(uint8x16_t d, uint8x16_t k)
+{
+    asm(".arch_extension aes\n\t"
+        "aesd %0.16b, %1.16b" : "+w"(d) : "w"(k));
+    return d;
+}
+
+static inline uint8x16_t aes_accel_aese(uint8x16_t d, uint8x16_t k)
+{
+    asm(".arch_extension aes\n\t"
+        "aese %0.16b, %1.16b" : "+w"(d) : "w"(k));
+    return d;
+}
+
+static inline uint8x16_t aes_accel_aesmc(uint8x16_t d)
+{
+    asm(".arch_extension aes\n\t"
+        "aesmc %0.16b, %1.16b" : "=w"(d) : "w"(d));
+    return d;
+}
+
+static inline uint8x16_t aes_accel_aesimc(uint8x16_t d)
+{
+    asm(".arch_extension aes\n\t"
+        "aesimc %0.16b, %1.16b" : "=w"(d) : "w"(d));
+    return d;
+}
+#endif /* __ARM_FEATURE_AES */
+
+static inline void ATTR_AES_ACCEL
+aesenc_MC_accel(AESState *ret, const AESState *st, bool be)
+{
+    uint8x16_t t = (uint8x16_t)st->v;
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        t = aes_accel_aesmc(t);
+        t = aes_accel_bswap(t);
+    } else {
+        t = aes_accel_aesmc(t);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesenc_SB_SR_accel(AESState *ret, const AESState *st, bool be)
+{
+    uint8x16_t t = (uint8x16_t)st->v;
+    uint8x16_t z = { };
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        t = aes_accel_aese(t, z);
+        t = aes_accel_bswap(t);
+    } else {
+        t = aes_accel_aese(t, z);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesenc_SB_SR_MC_AK_accel(AESState *ret, const AESState *st,
+                         const AESState *rk, bool be)
+{
+    uint8x16_t t = (uint8x16_t)st->v;
+    uint8x16_t k = (uint8x16_t)rk->v;
+    uint8x16_t z = { };
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        k = aes_accel_bswap(k);
+        t = aes_accel_aese(t, z);
+        t = aes_accel_aesmc(t);
+        t = veorq_u8(t, k);
+        t = aes_accel_bswap(t);
+    } else {
+        t = aes_accel_aese(t, z);
+        t = aes_accel_aesmc(t);
+        t = veorq_u8(t, k);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesdec_IMC_accel(AESState *ret, const AESState *st, bool be)
+{
+    uint8x16_t t = (uint8x16_t)st->v;
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        t = aes_accel_aesimc(t);
+        t = aes_accel_bswap(t);
+    } else {
+        t = aes_accel_aesimc(t);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesdec_ISB_ISR_accel(AESState *ret, const AESState *st, bool be)
+{
+    uint8x16_t t = (uint8x16_t)st->v;
+    uint8x16_t z = { };
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        t = aes_accel_aesd(t, z);
+        t = aes_accel_bswap(t);
+    } else {
+        t = aes_accel_aesd(t, z);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesdec_ISB_ISR_AK_IMC_accel(AESState *ret, const AESState *st,
+                            const AESState *rk, bool be)
+{
+    uint8x16_t t = (uint8x16_t)st->v;
+    uint8x16_t k = (uint8x16_t)rk->v;
+    uint8x16_t z = { };
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        k = aes_accel_bswap(k);
+        t = aes_accel_aesd(t, z);
+        t = veorq_u8(t, k);
+        t = aes_accel_aesimc(t);
+        t = aes_accel_bswap(t);
+    } else {
+        t = aes_accel_aesd(t, z);
+        t = veorq_u8(t, k);
+        t = aes_accel_aesimc(t);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesdec_ISB_ISR_IMC_AK_accel(AESState *ret, const AESState *st,
+                            const AESState *rk, bool be)
+{
+    uint8x16_t t = (uint8x16_t)st->v;
+    uint8x16_t k = (uint8x16_t)rk->v;
+    uint8x16_t z = { };
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        k = aes_accel_bswap(k);
+        t = aes_accel_aesd(t, z);
+        t = aes_accel_aesimc(t);
+        t = veorq_u8(t, k);
+        t = aes_accel_bswap(t);
+    } else {
+        t = aes_accel_aesd(t, z);
+        t = aes_accel_aesimc(t);
+        t = veorq_u8(t, k);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+#endif
diff --git a/host/include/aarch64/host/cpuinfo.h b/host/include/aarch64/host/cpuinfo.h
index 82227890b4..05feeb4f43 100644
--- a/host/include/aarch64/host/cpuinfo.h
+++ b/host/include/aarch64/host/cpuinfo.h
@@ -9,6 +9,7 @@
 #define CPUINFO_ALWAYS          (1u << 0)  /* so cpuinfo is nonzero */
 #define CPUINFO_LSE             (1u << 1)
 #define CPUINFO_LSE2            (1u << 2)
+#define CPUINFO_AES             (1u << 3)
 
 /* Initialized with a constructor. */
 extern unsigned cpuinfo;
diff --git a/util/cpuinfo-aarch64.c b/util/cpuinfo-aarch64.c
index f99acb7884..ababc39550 100644
--- a/util/cpuinfo-aarch64.c
+++ b/util/cpuinfo-aarch64.c
@@ -56,10 +56,12 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
     unsigned long hwcap = qemu_getauxval(AT_HWCAP);
     info |= (hwcap & HWCAP_ATOMICS ? CPUINFO_LSE : 0);
     info |= (hwcap & HWCAP_USCAT ? CPUINFO_LSE2 : 0);
+    info |= (hwcap & HWCAP_AES ? CPUINFO_AES: 0);
 #endif
 #ifdef CONFIG_DARWIN
     info |= sysctl_for_bool("hw.optional.arm.FEAT_LSE") * CPUINFO_LSE;
     info |= sysctl_for_bool("hw.optional.arm.FEAT_LSE2") * CPUINFO_LSE2;
+    info |= sysctl_for_bool("hw.optional.arm.FEAT_AES") * CPUINFO_AES;
 #endif
 
     cpuinfo = info;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 32/35] crypto: Remove AES_shifts, AES_ishifts
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (30 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 31/35] host/include/aarch64: " Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 33/35] crypto: Implement aesdec_IMC with AES_imc_rot Richard Henderson
                   ` (3 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

These arrays are no longer used, replaced by AES_SH_*, AES_ISH_*.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/crypto/aes.h |  4 ----
 crypto/aes.c         | 14 --------------
 2 files changed, 18 deletions(-)

diff --git a/include/crypto/aes.h b/include/crypto/aes.h
index 24b073d569..aa8b54065d 100644
--- a/include/crypto/aes.h
+++ b/include/crypto/aes.h
@@ -30,10 +30,6 @@ void AES_decrypt(const unsigned char *in, unsigned char *out,
 extern const uint8_t AES_sbox[256];
 extern const uint8_t AES_isbox[256];
 
-/* AES ShiftRows and InvShiftRows */
-extern const uint8_t AES_shifts[16];
-extern const uint8_t AES_ishifts[16];
-
 /* AES MixColumns, for use with rot32. */
 extern const uint32_t AES_mc_rot[256];
 
diff --git a/crypto/aes.c b/crypto/aes.c
index c0e4bc5580..4438d4dcdc 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -131,13 +131,6 @@ enum {
     AES_SH_F = 0xb,
 };
 
-const uint8_t AES_shifts[16] = {
-    AES_SH_0, AES_SH_1, AES_SH_2, AES_SH_3,
-    AES_SH_4, AES_SH_5, AES_SH_6, AES_SH_7,
-    AES_SH_8, AES_SH_9, AES_SH_A, AES_SH_B,
-    AES_SH_C, AES_SH_D, AES_SH_E, AES_SH_F,
-};
-
 /* AES InvShiftRows, for complete unrolling. */
 enum {
     AES_ISH_0 = 0x0,
@@ -158,13 +151,6 @@ enum {
     AES_ISH_F = 0x3,
 };
 
-const uint8_t AES_ishifts[16] = {
-    AES_ISH_0, AES_ISH_1, AES_ISH_2, AES_ISH_3,
-    AES_ISH_4, AES_ISH_5, AES_ISH_6, AES_ISH_7,
-    AES_ISH_8, AES_ISH_9, AES_ISH_A, AES_ISH_B,
-    AES_ISH_C, AES_ISH_D, AES_ISH_E, AES_ISH_F,
-};
-
 /*
  * MixColumns lookup table, for use with rot32.
  * From Arm ARM pseudocode.
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 33/35] crypto: Implement aesdec_IMC with AES_imc_rot
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (31 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 32/35] crypto: Remove AES_shifts, AES_ishifts Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 34/35] crypto: Remove AES_imc Richard Henderson
                   ` (2 subsequent siblings)
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This method uses one uint32_t * 256 table instead of 4,
which means its data cache overhead is less.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 crypto/aes.c | 41 ++++++++++++++++++++---------------------
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/crypto/aes.c b/crypto/aes.c
index 4438d4dcdc..914ccf38ef 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -1453,39 +1453,38 @@ aesdec_IMC_swap(AESState *r, const AESState *st, bool swap)
     bool be = HOST_BIG_ENDIAN ^ swap;
     uint32_t t;
 
-    /* Note that AES_imc is encoded for big-endian. */
-    t = (AES_imc[st->b[swap_b ^ 0x0]][0] ^
-         AES_imc[st->b[swap_b ^ 0x1]][1] ^
-         AES_imc[st->b[swap_b ^ 0x2]][2] ^
-         AES_imc[st->b[swap_b ^ 0x3]][3]);
-    if (!be) {
+    t = (      AES_imc_rot[st->b[swap_b ^ 0x0]] ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0x1]], 8) ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0x2]], 16) ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0x3]], 24));
+    if (be) {
         t = bswap32(t);
     }
     r->w[swap_w ^ 0] = t;
 
-    t = (AES_imc[st->b[swap_b ^ 0x4]][0] ^
-         AES_imc[st->b[swap_b ^ 0x5]][1] ^
-         AES_imc[st->b[swap_b ^ 0x6]][2] ^
-         AES_imc[st->b[swap_b ^ 0x7]][3]);
-    if (!be) {
+    t = (      AES_imc_rot[st->b[swap_b ^ 0x4]] ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0x5]], 8) ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0x6]], 16) ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0x7]], 24));
+    if (be) {
         t = bswap32(t);
     }
     r->w[swap_w ^ 1] = t;
 
-    t = (AES_imc[st->b[swap_b ^ 0x8]][0] ^
-         AES_imc[st->b[swap_b ^ 0x9]][1] ^
-         AES_imc[st->b[swap_b ^ 0xA]][2] ^
-         AES_imc[st->b[swap_b ^ 0xB]][3]);
-    if (!be) {
+    t = (      AES_imc_rot[st->b[swap_b ^ 0x8]] ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0x9]], 8) ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0xA]], 16) ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0xB]], 24));
+    if (be) {
         t = bswap32(t);
     }
     r->w[swap_w ^ 2] = t;
 
-    t = (AES_imc[st->b[swap_b ^ 0xC]][0] ^
-         AES_imc[st->b[swap_b ^ 0xD]][1] ^
-         AES_imc[st->b[swap_b ^ 0xE]][2] ^
-         AES_imc[st->b[swap_b ^ 0xF]][3]);
-    if (!be) {
+    t = (      AES_imc_rot[st->b[swap_b ^ 0xC]] ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0xD]], 8) ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0xE]], 16) ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0xF]], 24));
+    if (be) {
         t = bswap32(t);
     }
     r->w[swap_w ^ 3] = t;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 34/35] crypto: Remove AES_imc
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (32 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 33/35] crypto: Implement aesdec_IMC with AES_imc_rot Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03  2:34 ` [PATCH 35/35] crypto: Unexport AES_*_rot, AES_TeN, AES_TdN Richard Henderson
  2023-06-03 13:23 ` [PATCH 00/35] crypto: Provide aes-round.h and host accel Ard Biesheuvel
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

This array is no longer used.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/crypto/aes.h |   7 --
 crypto/aes.c         | 264 -------------------------------------------
 2 files changed, 271 deletions(-)

diff --git a/include/crypto/aes.h b/include/crypto/aes.h
index aa8b54065d..99209f51b9 100644
--- a/include/crypto/aes.h
+++ b/include/crypto/aes.h
@@ -36,13 +36,6 @@ extern const uint32_t AES_mc_rot[256];
 /* AES InvMixColumns, for use with rot32. */
 extern const uint32_t AES_imc_rot[256];
 
-/* AES InvMixColumns */
-/* AES_imc[x][0] = [x].[0e, 09, 0d, 0b]; */
-/* AES_imc[x][1] = [x].[0b, 0e, 09, 0d]; */
-/* AES_imc[x][2] = [x].[0d, 0b, 0e, 09]; */
-/* AES_imc[x][3] = [x].[09, 0d, 0b, 0e]; */
-extern const uint32_t AES_imc[256][4];
-
 /*
 AES_Te0[x] = S [x].[02, 01, 01, 03];
 AES_Te1[x] = S [x].[03, 02, 01, 01];
diff --git a/crypto/aes.c b/crypto/aes.c
index 914ccf38ef..4d84bef520 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -293,270 +293,6 @@ const uint32_t AES_imc_rot[256] = {
     0xbe805d9f, 0xb58d5491, 0xa89a4f83, 0xa397468d,
 };
 
-/* AES_imc[x][0] = [x].[0e, 09, 0d, 0b]; */
-/* AES_imc[x][1] = [x].[0b, 0e, 09, 0d]; */
-/* AES_imc[x][2] = [x].[0d, 0b, 0e, 09]; */
-/* AES_imc[x][3] = [x].[09, 0d, 0b, 0e]; */
-const uint32_t AES_imc[256][4] = {
-    { 0x00000000, 0x00000000, 0x00000000, 0x00000000, }, /* x=00 */
-    { 0x0E090D0B, 0x0B0E090D, 0x0D0B0E09, 0x090D0B0E, }, /* x=01 */
-    { 0x1C121A16, 0x161C121A, 0x1A161C12, 0x121A161C, }, /* x=02 */
-    { 0x121B171D, 0x1D121B17, 0x171D121B, 0x1B171D12, }, /* x=03 */
-    { 0x3824342C, 0x2C382434, 0x342C3824, 0x24342C38, }, /* x=04 */
-    { 0x362D3927, 0x27362D39, 0x3927362D, 0x2D392736, }, /* x=05 */
-    { 0x24362E3A, 0x3A24362E, 0x2E3A2436, 0x362E3A24, }, /* x=06 */
-    { 0x2A3F2331, 0x312A3F23, 0x23312A3F, 0x3F23312A, }, /* x=07 */
-    { 0x70486858, 0x58704868, 0x68587048, 0x48685870, }, /* x=08 */
-    { 0x7E416553, 0x537E4165, 0x65537E41, 0x4165537E, }, /* x=09 */
-    { 0x6C5A724E, 0x4E6C5A72, 0x724E6C5A, 0x5A724E6C, }, /* x=0A */
-    { 0x62537F45, 0x4562537F, 0x7F456253, 0x537F4562, }, /* x=0B */
-    { 0x486C5C74, 0x74486C5C, 0x5C74486C, 0x6C5C7448, }, /* x=0C */
-    { 0x4665517F, 0x7F466551, 0x517F4665, 0x65517F46, }, /* x=0D */
-    { 0x547E4662, 0x62547E46, 0x4662547E, 0x7E466254, }, /* x=0E */
-    { 0x5A774B69, 0x695A774B, 0x4B695A77, 0x774B695A, }, /* x=0F */
-    { 0xE090D0B0, 0xB0E090D0, 0xD0B0E090, 0x90D0B0E0, }, /* x=10 */
-    { 0xEE99DDBB, 0xBBEE99DD, 0xDDBBEE99, 0x99DDBBEE, }, /* x=11 */
-    { 0xFC82CAA6, 0xA6FC82CA, 0xCAA6FC82, 0x82CAA6FC, }, /* x=12 */
-    { 0xF28BC7AD, 0xADF28BC7, 0xC7ADF28B, 0x8BC7ADF2, }, /* x=13 */
-    { 0xD8B4E49C, 0x9CD8B4E4, 0xE49CD8B4, 0xB4E49CD8, }, /* x=14 */
-    { 0xD6BDE997, 0x97D6BDE9, 0xE997D6BD, 0xBDE997D6, }, /* x=15 */
-    { 0xC4A6FE8A, 0x8AC4A6FE, 0xFE8AC4A6, 0xA6FE8AC4, }, /* x=16 */
-    { 0xCAAFF381, 0x81CAAFF3, 0xF381CAAF, 0xAFF381CA, }, /* x=17 */
-    { 0x90D8B8E8, 0xE890D8B8, 0xB8E890D8, 0xD8B8E890, }, /* x=18 */
-    { 0x9ED1B5E3, 0xE39ED1B5, 0xB5E39ED1, 0xD1B5E39E, }, /* x=19 */
-    { 0x8CCAA2FE, 0xFE8CCAA2, 0xA2FE8CCA, 0xCAA2FE8C, }, /* x=1A */
-    { 0x82C3AFF5, 0xF582C3AF, 0xAFF582C3, 0xC3AFF582, }, /* x=1B */
-    { 0xA8FC8CC4, 0xC4A8FC8C, 0x8CC4A8FC, 0xFC8CC4A8, }, /* x=1C */
-    { 0xA6F581CF, 0xCFA6F581, 0x81CFA6F5, 0xF581CFA6, }, /* x=1D */
-    { 0xB4EE96D2, 0xD2B4EE96, 0x96D2B4EE, 0xEE96D2B4, }, /* x=1E */
-    { 0xBAE79BD9, 0xD9BAE79B, 0x9BD9BAE7, 0xE79BD9BA, }, /* x=1F */
-    { 0xDB3BBB7B, 0x7BDB3BBB, 0xBB7BDB3B, 0x3BBB7BDB, }, /* x=20 */
-    { 0xD532B670, 0x70D532B6, 0xB670D532, 0x32B670D5, }, /* x=21 */
-    { 0xC729A16D, 0x6DC729A1, 0xA16DC729, 0x29A16DC7, }, /* x=22 */
-    { 0xC920AC66, 0x66C920AC, 0xAC66C920, 0x20AC66C9, }, /* x=23 */
-    { 0xE31F8F57, 0x57E31F8F, 0x8F57E31F, 0x1F8F57E3, }, /* x=24 */
-    { 0xED16825C, 0x5CED1682, 0x825CED16, 0x16825CED, }, /* x=25 */
-    { 0xFF0D9541, 0x41FF0D95, 0x9541FF0D, 0x0D9541FF, }, /* x=26 */
-    { 0xF104984A, 0x4AF10498, 0x984AF104, 0x04984AF1, }, /* x=27 */
-    { 0xAB73D323, 0x23AB73D3, 0xD323AB73, 0x73D323AB, }, /* x=28 */
-    { 0xA57ADE28, 0x28A57ADE, 0xDE28A57A, 0x7ADE28A5, }, /* x=29 */
-    { 0xB761C935, 0x35B761C9, 0xC935B761, 0x61C935B7, }, /* x=2A */
-    { 0xB968C43E, 0x3EB968C4, 0xC43EB968, 0x68C43EB9, }, /* x=2B */
-    { 0x9357E70F, 0x0F9357E7, 0xE70F9357, 0x57E70F93, }, /* x=2C */
-    { 0x9D5EEA04, 0x049D5EEA, 0xEA049D5E, 0x5EEA049D, }, /* x=2D */
-    { 0x8F45FD19, 0x198F45FD, 0xFD198F45, 0x45FD198F, }, /* x=2E */
-    { 0x814CF012, 0x12814CF0, 0xF012814C, 0x4CF01281, }, /* x=2F */
-    { 0x3BAB6BCB, 0xCB3BAB6B, 0x6BCB3BAB, 0xAB6BCB3B, }, /* x=30 */
-    { 0x35A266C0, 0xC035A266, 0x66C035A2, 0xA266C035, }, /* x=31 */
-    { 0x27B971DD, 0xDD27B971, 0x71DD27B9, 0xB971DD27, }, /* x=32 */
-    { 0x29B07CD6, 0xD629B07C, 0x7CD629B0, 0xB07CD629, }, /* x=33 */
-    { 0x038F5FE7, 0xE7038F5F, 0x5FE7038F, 0x8F5FE703, }, /* x=34 */
-    { 0x0D8652EC, 0xEC0D8652, 0x52EC0D86, 0x8652EC0D, }, /* x=35 */
-    { 0x1F9D45F1, 0xF11F9D45, 0x45F11F9D, 0x9D45F11F, }, /* x=36 */
-    { 0x119448FA, 0xFA119448, 0x48FA1194, 0x9448FA11, }, /* x=37 */
-    { 0x4BE30393, 0x934BE303, 0x03934BE3, 0xE303934B, }, /* x=38 */
-    { 0x45EA0E98, 0x9845EA0E, 0x0E9845EA, 0xEA0E9845, }, /* x=39 */
-    { 0x57F11985, 0x8557F119, 0x198557F1, 0xF1198557, }, /* x=3A */
-    { 0x59F8148E, 0x8E59F814, 0x148E59F8, 0xF8148E59, }, /* x=3B */
-    { 0x73C737BF, 0xBF73C737, 0x37BF73C7, 0xC737BF73, }, /* x=3C */
-    { 0x7DCE3AB4, 0xB47DCE3A, 0x3AB47DCE, 0xCE3AB47D, }, /* x=3D */
-    { 0x6FD52DA9, 0xA96FD52D, 0x2DA96FD5, 0xD52DA96F, }, /* x=3E */
-    { 0x61DC20A2, 0xA261DC20, 0x20A261DC, 0xDC20A261, }, /* x=3F */
-    { 0xAD766DF6, 0xF6AD766D, 0x6DF6AD76, 0x766DF6AD, }, /* x=40 */
-    { 0xA37F60FD, 0xFDA37F60, 0x60FDA37F, 0x7F60FDA3, }, /* x=41 */
-    { 0xB16477E0, 0xE0B16477, 0x77E0B164, 0x6477E0B1, }, /* x=42 */
-    { 0xBF6D7AEB, 0xEBBF6D7A, 0x7AEBBF6D, 0x6D7AEBBF, }, /* x=43 */
-    { 0x955259DA, 0xDA955259, 0x59DA9552, 0x5259DA95, }, /* x=44 */
-    { 0x9B5B54D1, 0xD19B5B54, 0x54D19B5B, 0x5B54D19B, }, /* x=45 */
-    { 0x894043CC, 0xCC894043, 0x43CC8940, 0x4043CC89, }, /* x=46 */
-    { 0x87494EC7, 0xC787494E, 0x4EC78749, 0x494EC787, }, /* x=47 */
-    { 0xDD3E05AE, 0xAEDD3E05, 0x05AEDD3E, 0x3E05AEDD, }, /* x=48 */
-    { 0xD33708A5, 0xA5D33708, 0x08A5D337, 0x3708A5D3, }, /* x=49 */
-    { 0xC12C1FB8, 0xB8C12C1F, 0x1FB8C12C, 0x2C1FB8C1, }, /* x=4A */
-    { 0xCF2512B3, 0xB3CF2512, 0x12B3CF25, 0x2512B3CF, }, /* x=4B */
-    { 0xE51A3182, 0x82E51A31, 0x3182E51A, 0x1A3182E5, }, /* x=4C */
-    { 0xEB133C89, 0x89EB133C, 0x3C89EB13, 0x133C89EB, }, /* x=4D */
-    { 0xF9082B94, 0x94F9082B, 0x2B94F908, 0x082B94F9, }, /* x=4E */
-    { 0xF701269F, 0x9FF70126, 0x269FF701, 0x01269FF7, }, /* x=4F */
-    { 0x4DE6BD46, 0x464DE6BD, 0xBD464DE6, 0xE6BD464D, }, /* x=50 */
-    { 0x43EFB04D, 0x4D43EFB0, 0xB04D43EF, 0xEFB04D43, }, /* x=51 */
-    { 0x51F4A750, 0x5051F4A7, 0xA75051F4, 0xF4A75051, }, /* x=52 */
-    { 0x5FFDAA5B, 0x5B5FFDAA, 0xAA5B5FFD, 0xFDAA5B5F, }, /* x=53 */
-    { 0x75C2896A, 0x6A75C289, 0x896A75C2, 0xC2896A75, }, /* x=54 */
-    { 0x7BCB8461, 0x617BCB84, 0x84617BCB, 0xCB84617B, }, /* x=55 */
-    { 0x69D0937C, 0x7C69D093, 0x937C69D0, 0xD0937C69, }, /* x=56 */
-    { 0x67D99E77, 0x7767D99E, 0x9E7767D9, 0xD99E7767, }, /* x=57 */
-    { 0x3DAED51E, 0x1E3DAED5, 0xD51E3DAE, 0xAED51E3D, }, /* x=58 */
-    { 0x33A7D815, 0x1533A7D8, 0xD81533A7, 0xA7D81533, }, /* x=59 */
-    { 0x21BCCF08, 0x0821BCCF, 0xCF0821BC, 0xBCCF0821, }, /* x=5A */
-    { 0x2FB5C203, 0x032FB5C2, 0xC2032FB5, 0xB5C2032F, }, /* x=5B */
-    { 0x058AE132, 0x32058AE1, 0xE132058A, 0x8AE13205, }, /* x=5C */
-    { 0x0B83EC39, 0x390B83EC, 0xEC390B83, 0x83EC390B, }, /* x=5D */
-    { 0x1998FB24, 0x241998FB, 0xFB241998, 0x98FB2419, }, /* x=5E */
-    { 0x1791F62F, 0x2F1791F6, 0xF62F1791, 0x91F62F17, }, /* x=5F */
-    { 0x764DD68D, 0x8D764DD6, 0xD68D764D, 0x4DD68D76, }, /* x=60 */
-    { 0x7844DB86, 0x867844DB, 0xDB867844, 0x44DB8678, }, /* x=61 */
-    { 0x6A5FCC9B, 0x9B6A5FCC, 0xCC9B6A5F, 0x5FCC9B6A, }, /* x=62 */
-    { 0x6456C190, 0x906456C1, 0xC1906456, 0x56C19064, }, /* x=63 */
-    { 0x4E69E2A1, 0xA14E69E2, 0xE2A14E69, 0x69E2A14E, }, /* x=64 */
-    { 0x4060EFAA, 0xAA4060EF, 0xEFAA4060, 0x60EFAA40, }, /* x=65 */
-    { 0x527BF8B7, 0xB7527BF8, 0xF8B7527B, 0x7BF8B752, }, /* x=66 */
-    { 0x5C72F5BC, 0xBC5C72F5, 0xF5BC5C72, 0x72F5BC5C, }, /* x=67 */
-    { 0x0605BED5, 0xD50605BE, 0xBED50605, 0x05BED506, }, /* x=68 */
-    { 0x080CB3DE, 0xDE080CB3, 0xB3DE080C, 0x0CB3DE08, }, /* x=69 */
-    { 0x1A17A4C3, 0xC31A17A4, 0xA4C31A17, 0x17A4C31A, }, /* x=6A */
-    { 0x141EA9C8, 0xC8141EA9, 0xA9C8141E, 0x1EA9C814, }, /* x=6B */
-    { 0x3E218AF9, 0xF93E218A, 0x8AF93E21, 0x218AF93E, }, /* x=6C */
-    { 0x302887F2, 0xF2302887, 0x87F23028, 0x2887F230, }, /* x=6D */
-    { 0x223390EF, 0xEF223390, 0x90EF2233, 0x3390EF22, }, /* x=6E */
-    { 0x2C3A9DE4, 0xE42C3A9D, 0x9DE42C3A, 0x3A9DE42C, }, /* x=6F */
-    { 0x96DD063D, 0x3D96DD06, 0x063D96DD, 0xDD063D96, }, /* x=70 */
-    { 0x98D40B36, 0x3698D40B, 0x0B3698D4, 0xD40B3698, }, /* x=71 */
-    { 0x8ACF1C2B, 0x2B8ACF1C, 0x1C2B8ACF, 0xCF1C2B8A, }, /* x=72 */
-    { 0x84C61120, 0x2084C611, 0x112084C6, 0xC6112084, }, /* x=73 */
-    { 0xAEF93211, 0x11AEF932, 0x3211AEF9, 0xF93211AE, }, /* x=74 */
-    { 0xA0F03F1A, 0x1AA0F03F, 0x3F1AA0F0, 0xF03F1AA0, }, /* x=75 */
-    { 0xB2EB2807, 0x07B2EB28, 0x2807B2EB, 0xEB2807B2, }, /* x=76 */
-    { 0xBCE2250C, 0x0CBCE225, 0x250CBCE2, 0xE2250CBC, }, /* x=77 */
-    { 0xE6956E65, 0x65E6956E, 0x6E65E695, 0x956E65E6, }, /* x=78 */
-    { 0xE89C636E, 0x6EE89C63, 0x636EE89C, 0x9C636EE8, }, /* x=79 */
-    { 0xFA877473, 0x73FA8774, 0x7473FA87, 0x877473FA, }, /* x=7A */
-    { 0xF48E7978, 0x78F48E79, 0x7978F48E, 0x8E7978F4, }, /* x=7B */
-    { 0xDEB15A49, 0x49DEB15A, 0x5A49DEB1, 0xB15A49DE, }, /* x=7C */
-    { 0xD0B85742, 0x42D0B857, 0x5742D0B8, 0xB85742D0, }, /* x=7D */
-    { 0xC2A3405F, 0x5FC2A340, 0x405FC2A3, 0xA3405FC2, }, /* x=7E */
-    { 0xCCAA4D54, 0x54CCAA4D, 0x4D54CCAA, 0xAA4D54CC, }, /* x=7F */
-    { 0x41ECDAF7, 0xF741ECDA, 0xDAF741EC, 0xECDAF741, }, /* x=80 */
-    { 0x4FE5D7FC, 0xFC4FE5D7, 0xD7FC4FE5, 0xE5D7FC4F, }, /* x=81 */
-    { 0x5DFEC0E1, 0xE15DFEC0, 0xC0E15DFE, 0xFEC0E15D, }, /* x=82 */
-    { 0x53F7CDEA, 0xEA53F7CD, 0xCDEA53F7, 0xF7CDEA53, }, /* x=83 */
-    { 0x79C8EEDB, 0xDB79C8EE, 0xEEDB79C8, 0xC8EEDB79, }, /* x=84 */
-    { 0x77C1E3D0, 0xD077C1E3, 0xE3D077C1, 0xC1E3D077, }, /* x=85 */
-    { 0x65DAF4CD, 0xCD65DAF4, 0xF4CD65DA, 0xDAF4CD65, }, /* x=86 */
-    { 0x6BD3F9C6, 0xC66BD3F9, 0xF9C66BD3, 0xD3F9C66B, }, /* x=87 */
-    { 0x31A4B2AF, 0xAF31A4B2, 0xB2AF31A4, 0xA4B2AF31, }, /* x=88 */
-    { 0x3FADBFA4, 0xA43FADBF, 0xBFA43FAD, 0xADBFA43F, }, /* x=89 */
-    { 0x2DB6A8B9, 0xB92DB6A8, 0xA8B92DB6, 0xB6A8B92D, }, /* x=8A */
-    { 0x23BFA5B2, 0xB223BFA5, 0xA5B223BF, 0xBFA5B223, }, /* x=8B */
-    { 0x09808683, 0x83098086, 0x86830980, 0x80868309, }, /* x=8C */
-    { 0x07898B88, 0x8807898B, 0x8B880789, 0x898B8807, }, /* x=8D */
-    { 0x15929C95, 0x9515929C, 0x9C951592, 0x929C9515, }, /* x=8E */
-    { 0x1B9B919E, 0x9E1B9B91, 0x919E1B9B, 0x9B919E1B, }, /* x=8F */
-    { 0xA17C0A47, 0x47A17C0A, 0x0A47A17C, 0x7C0A47A1, }, /* x=90 */
-    { 0xAF75074C, 0x4CAF7507, 0x074CAF75, 0x75074CAF, }, /* x=91 */
-    { 0xBD6E1051, 0x51BD6E10, 0x1051BD6E, 0x6E1051BD, }, /* x=92 */
-    { 0xB3671D5A, 0x5AB3671D, 0x1D5AB367, 0x671D5AB3, }, /* x=93 */
-    { 0x99583E6B, 0x6B99583E, 0x3E6B9958, 0x583E6B99, }, /* x=94 */
-    { 0x97513360, 0x60975133, 0x33609751, 0x51336097, }, /* x=95 */
-    { 0x854A247D, 0x7D854A24, 0x247D854A, 0x4A247D85, }, /* x=96 */
-    { 0x8B432976, 0x768B4329, 0x29768B43, 0x4329768B, }, /* x=97 */
-    { 0xD134621F, 0x1FD13462, 0x621FD134, 0x34621FD1, }, /* x=98 */
-    { 0xDF3D6F14, 0x14DF3D6F, 0x6F14DF3D, 0x3D6F14DF, }, /* x=99 */
-    { 0xCD267809, 0x09CD2678, 0x7809CD26, 0x267809CD, }, /* x=9A */
-    { 0xC32F7502, 0x02C32F75, 0x7502C32F, 0x2F7502C3, }, /* x=9B */
-    { 0xE9105633, 0x33E91056, 0x5633E910, 0x105633E9, }, /* x=9C */
-    { 0xE7195B38, 0x38E7195B, 0x5B38E719, 0x195B38E7, }, /* x=9D */
-    { 0xF5024C25, 0x25F5024C, 0x4C25F502, 0x024C25F5, }, /* x=9E */
-    { 0xFB0B412E, 0x2EFB0B41, 0x412EFB0B, 0x0B412EFB, }, /* x=9F */
-    { 0x9AD7618C, 0x8C9AD761, 0x618C9AD7, 0xD7618C9A, }, /* x=A0 */
-    { 0x94DE6C87, 0x8794DE6C, 0x6C8794DE, 0xDE6C8794, }, /* x=A1 */
-    { 0x86C57B9A, 0x9A86C57B, 0x7B9A86C5, 0xC57B9A86, }, /* x=A2 */
-    { 0x88CC7691, 0x9188CC76, 0x769188CC, 0xCC769188, }, /* x=A3 */
-    { 0xA2F355A0, 0xA0A2F355, 0x55A0A2F3, 0xF355A0A2, }, /* x=A4 */
-    { 0xACFA58AB, 0xABACFA58, 0x58ABACFA, 0xFA58ABAC, }, /* x=A5 */
-    { 0xBEE14FB6, 0xB6BEE14F, 0x4FB6BEE1, 0xE14FB6BE, }, /* x=A6 */
-    { 0xB0E842BD, 0xBDB0E842, 0x42BDB0E8, 0xE842BDB0, }, /* x=A7 */
-    { 0xEA9F09D4, 0xD4EA9F09, 0x09D4EA9F, 0x9F09D4EA, }, /* x=A8 */
-    { 0xE49604DF, 0xDFE49604, 0x04DFE496, 0x9604DFE4, }, /* x=A9 */
-    { 0xF68D13C2, 0xC2F68D13, 0x13C2F68D, 0x8D13C2F6, }, /* x=AA */
-    { 0xF8841EC9, 0xC9F8841E, 0x1EC9F884, 0x841EC9F8, }, /* x=AB */
-    { 0xD2BB3DF8, 0xF8D2BB3D, 0x3DF8D2BB, 0xBB3DF8D2, }, /* x=AC */
-    { 0xDCB230F3, 0xF3DCB230, 0x30F3DCB2, 0xB230F3DC, }, /* x=AD */
-    { 0xCEA927EE, 0xEECEA927, 0x27EECEA9, 0xA927EECE, }, /* x=AE */
-    { 0xC0A02AE5, 0xE5C0A02A, 0x2AE5C0A0, 0xA02AE5C0, }, /* x=AF */
-    { 0x7A47B13C, 0x3C7A47B1, 0xB13C7A47, 0x47B13C7A, }, /* x=B0 */
-    { 0x744EBC37, 0x37744EBC, 0xBC37744E, 0x4EBC3774, }, /* x=B1 */
-    { 0x6655AB2A, 0x2A6655AB, 0xAB2A6655, 0x55AB2A66, }, /* x=B2 */
-    { 0x685CA621, 0x21685CA6, 0xA621685C, 0x5CA62168, }, /* x=B3 */
-    { 0x42638510, 0x10426385, 0x85104263, 0x63851042, }, /* x=B4 */
-    { 0x4C6A881B, 0x1B4C6A88, 0x881B4C6A, 0x6A881B4C, }, /* x=B5 */
-    { 0x5E719F06, 0x065E719F, 0x9F065E71, 0x719F065E, }, /* x=B6 */
-    { 0x5078920D, 0x0D507892, 0x920D5078, 0x78920D50, }, /* x=B7 */
-    { 0x0A0FD964, 0x640A0FD9, 0xD9640A0F, 0x0FD9640A, }, /* x=B8 */
-    { 0x0406D46F, 0x6F0406D4, 0xD46F0406, 0x06D46F04, }, /* x=B9 */
-    { 0x161DC372, 0x72161DC3, 0xC372161D, 0x1DC37216, }, /* x=BA */
-    { 0x1814CE79, 0x791814CE, 0xCE791814, 0x14CE7918, }, /* x=BB */
-    { 0x322BED48, 0x48322BED, 0xED48322B, 0x2BED4832, }, /* x=BC */
-    { 0x3C22E043, 0x433C22E0, 0xE0433C22, 0x22E0433C, }, /* x=BD */
-    { 0x2E39F75E, 0x5E2E39F7, 0xF75E2E39, 0x39F75E2E, }, /* x=BE */
-    { 0x2030FA55, 0x552030FA, 0xFA552030, 0x30FA5520, }, /* x=BF */
-    { 0xEC9AB701, 0x01EC9AB7, 0xB701EC9A, 0x9AB701EC, }, /* x=C0 */
-    { 0xE293BA0A, 0x0AE293BA, 0xBA0AE293, 0x93BA0AE2, }, /* x=C1 */
-    { 0xF088AD17, 0x17F088AD, 0xAD17F088, 0x88AD17F0, }, /* x=C2 */
-    { 0xFE81A01C, 0x1CFE81A0, 0xA01CFE81, 0x81A01CFE, }, /* x=C3 */
-    { 0xD4BE832D, 0x2DD4BE83, 0x832DD4BE, 0xBE832DD4, }, /* x=C4 */
-    { 0xDAB78E26, 0x26DAB78E, 0x8E26DAB7, 0xB78E26DA, }, /* x=C5 */
-    { 0xC8AC993B, 0x3BC8AC99, 0x993BC8AC, 0xAC993BC8, }, /* x=C6 */
-    { 0xC6A59430, 0x30C6A594, 0x9430C6A5, 0xA59430C6, }, /* x=C7 */
-    { 0x9CD2DF59, 0x599CD2DF, 0xDF599CD2, 0xD2DF599C, }, /* x=C8 */
-    { 0x92DBD252, 0x5292DBD2, 0xD25292DB, 0xDBD25292, }, /* x=C9 */
-    { 0x80C0C54F, 0x4F80C0C5, 0xC54F80C0, 0xC0C54F80, }, /* x=CA */
-    { 0x8EC9C844, 0x448EC9C8, 0xC8448EC9, 0xC9C8448E, }, /* x=CB */
-    { 0xA4F6EB75, 0x75A4F6EB, 0xEB75A4F6, 0xF6EB75A4, }, /* x=CC */
-    { 0xAAFFE67E, 0x7EAAFFE6, 0xE67EAAFF, 0xFFE67EAA, }, /* x=CD */
-    { 0xB8E4F163, 0x63B8E4F1, 0xF163B8E4, 0xE4F163B8, }, /* x=CE */
-    { 0xB6EDFC68, 0x68B6EDFC, 0xFC68B6ED, 0xEDFC68B6, }, /* x=CF */
-    { 0x0C0A67B1, 0xB10C0A67, 0x67B10C0A, 0x0A67B10C, }, /* x=D0 */
-    { 0x02036ABA, 0xBA02036A, 0x6ABA0203, 0x036ABA02, }, /* x=D1 */
-    { 0x10187DA7, 0xA710187D, 0x7DA71018, 0x187DA710, }, /* x=D2 */
-    { 0x1E1170AC, 0xAC1E1170, 0x70AC1E11, 0x1170AC1E, }, /* x=D3 */
-    { 0x342E539D, 0x9D342E53, 0x539D342E, 0x2E539D34, }, /* x=D4 */
-    { 0x3A275E96, 0x963A275E, 0x5E963A27, 0x275E963A, }, /* x=D5 */
-    { 0x283C498B, 0x8B283C49, 0x498B283C, 0x3C498B28, }, /* x=D6 */
-    { 0x26354480, 0x80263544, 0x44802635, 0x35448026, }, /* x=D7 */
-    { 0x7C420FE9, 0xE97C420F, 0x0FE97C42, 0x420FE97C, }, /* x=D8 */
-    { 0x724B02E2, 0xE2724B02, 0x02E2724B, 0x4B02E272, }, /* x=D9 */
-    { 0x605015FF, 0xFF605015, 0x15FF6050, 0x5015FF60, }, /* x=DA */
-    { 0x6E5918F4, 0xF46E5918, 0x18F46E59, 0x5918F46E, }, /* x=DB */
-    { 0x44663BC5, 0xC544663B, 0x3BC54466, 0x663BC544, }, /* x=DC */
-    { 0x4A6F36CE, 0xCE4A6F36, 0x36CE4A6F, 0x6F36CE4A, }, /* x=DD */
-    { 0x587421D3, 0xD3587421, 0x21D35874, 0x7421D358, }, /* x=DE */
-    { 0x567D2CD8, 0xD8567D2C, 0x2CD8567D, 0x7D2CD856, }, /* x=DF */
-    { 0x37A10C7A, 0x7A37A10C, 0x0C7A37A1, 0xA10C7A37, }, /* x=E0 */
-    { 0x39A80171, 0x7139A801, 0x017139A8, 0xA8017139, }, /* x=E1 */
-    { 0x2BB3166C, 0x6C2BB316, 0x166C2BB3, 0xB3166C2B, }, /* x=E2 */
-    { 0x25BA1B67, 0x6725BA1B, 0x1B6725BA, 0xBA1B6725, }, /* x=E3 */
-    { 0x0F853856, 0x560F8538, 0x38560F85, 0x8538560F, }, /* x=E4 */
-    { 0x018C355D, 0x5D018C35, 0x355D018C, 0x8C355D01, }, /* x=E5 */
-    { 0x13972240, 0x40139722, 0x22401397, 0x97224013, }, /* x=E6 */
-    { 0x1D9E2F4B, 0x4B1D9E2F, 0x2F4B1D9E, 0x9E2F4B1D, }, /* x=E7 */
-    { 0x47E96422, 0x2247E964, 0x642247E9, 0xE9642247, }, /* x=E8 */
-    { 0x49E06929, 0x2949E069, 0x692949E0, 0xE0692949, }, /* x=E9 */
-    { 0x5BFB7E34, 0x345BFB7E, 0x7E345BFB, 0xFB7E345B, }, /* x=EA */
-    { 0x55F2733F, 0x3F55F273, 0x733F55F2, 0xF2733F55, }, /* x=EB */
-    { 0x7FCD500E, 0x0E7FCD50, 0x500E7FCD, 0xCD500E7F, }, /* x=EC */
-    { 0x71C45D05, 0x0571C45D, 0x5D0571C4, 0xC45D0571, }, /* x=ED */
-    { 0x63DF4A18, 0x1863DF4A, 0x4A1863DF, 0xDF4A1863, }, /* x=EE */
-    { 0x6DD64713, 0x136DD647, 0x47136DD6, 0xD647136D, }, /* x=EF */
-    { 0xD731DCCA, 0xCAD731DC, 0xDCCAD731, 0x31DCCAD7, }, /* x=F0 */
-    { 0xD938D1C1, 0xC1D938D1, 0xD1C1D938, 0x38D1C1D9, }, /* x=F1 */
-    { 0xCB23C6DC, 0xDCCB23C6, 0xC6DCCB23, 0x23C6DCCB, }, /* x=F2 */
-    { 0xC52ACBD7, 0xD7C52ACB, 0xCBD7C52A, 0x2ACBD7C5, }, /* x=F3 */
-    { 0xEF15E8E6, 0xE6EF15E8, 0xE8E6EF15, 0x15E8E6EF, }, /* x=F4 */
-    { 0xE11CE5ED, 0xEDE11CE5, 0xE5EDE11C, 0x1CE5EDE1, }, /* x=F5 */
-    { 0xF307F2F0, 0xF0F307F2, 0xF2F0F307, 0x07F2F0F3, }, /* x=F6 */
-    { 0xFD0EFFFB, 0xFBFD0EFF, 0xFFFBFD0E, 0x0EFFFBFD, }, /* x=F7 */
-    { 0xA779B492, 0x92A779B4, 0xB492A779, 0x79B492A7, }, /* x=F8 */
-    { 0xA970B999, 0x99A970B9, 0xB999A970, 0x70B999A9, }, /* x=F9 */
-    { 0xBB6BAE84, 0x84BB6BAE, 0xAE84BB6B, 0x6BAE84BB, }, /* x=FA */
-    { 0xB562A38F, 0x8FB562A3, 0xA38FB562, 0x62A38FB5, }, /* x=FB */
-    { 0x9F5D80BE, 0xBE9F5D80, 0x80BE9F5D, 0x5D80BE9F, }, /* x=FC */
-    { 0x91548DB5, 0xB591548D, 0x8DB59154, 0x548DB591, }, /* x=FD */
-    { 0x834F9AA8, 0xA8834F9A, 0x9AA8834F, 0x4F9AA883, }, /* x=FE */
-    { 0x8D4697A3, 0xA38D4697, 0x97A38D46, 0x4697A38D, }, /* x=FF */
-};
-
-
 
 /*
 AES_Te0[x] = S [x].[02, 01, 01, 03];
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 35/35] crypto: Unexport AES_*_rot, AES_TeN, AES_TdN
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (33 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 34/35] crypto: Remove AES_imc Richard Henderson
@ 2023-06-03  2:34 ` Richard Henderson
  2023-06-03 13:23 ` [PATCH 00/35] crypto: Provide aes-round.h and host accel Ard Biesheuvel
  35 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03  2:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

These arrays are no longer used outside of aes.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/crypto/aes.h | 25 -------------------------
 crypto/aes.c         | 33 +++++++++++++++++++++------------
 2 files changed, 21 insertions(+), 37 deletions(-)

diff --git a/include/crypto/aes.h b/include/crypto/aes.h
index 99209f51b9..709d4d226b 100644
--- a/include/crypto/aes.h
+++ b/include/crypto/aes.h
@@ -30,29 +30,4 @@ void AES_decrypt(const unsigned char *in, unsigned char *out,
 extern const uint8_t AES_sbox[256];
 extern const uint8_t AES_isbox[256];
 
-/* AES MixColumns, for use with rot32. */
-extern const uint32_t AES_mc_rot[256];
-
-/* AES InvMixColumns, for use with rot32. */
-extern const uint32_t AES_imc_rot[256];
-
-/*
-AES_Te0[x] = S [x].[02, 01, 01, 03];
-AES_Te1[x] = S [x].[03, 02, 01, 01];
-AES_Te2[x] = S [x].[01, 03, 02, 01];
-AES_Te3[x] = S [x].[01, 01, 03, 02];
-AES_Te4[x] = S [x].[01, 01, 01, 01];
-
-AES_Td0[x] = Si[x].[0e, 09, 0d, 0b];
-AES_Td1[x] = Si[x].[0b, 0e, 09, 0d];
-AES_Td2[x] = Si[x].[0d, 0b, 0e, 09];
-AES_Td3[x] = Si[x].[09, 0d, 0b, 0e];
-AES_Td4[x] = Si[x].[01, 01, 01, 01];
-*/
-
-extern const uint32_t AES_Te0[256], AES_Te1[256], AES_Te2[256],
-                      AES_Te3[256], AES_Te4[256];
-extern const uint32_t AES_Td0[256], AES_Td1[256], AES_Td2[256],
-                      AES_Td3[256], AES_Td4[256];
-
 #endif
diff --git a/crypto/aes.c b/crypto/aes.c
index 4d84bef520..c51b1c1d5e 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -155,7 +155,7 @@ enum {
  * MixColumns lookup table, for use with rot32.
  * From Arm ARM pseudocode.
  */
-const uint32_t AES_mc_rot[256] = {
+static const uint32_t AES_mc_rot[256] = {
     0x00000000, 0x03010102, 0x06020204, 0x05030306,
     0x0c040408, 0x0f05050a, 0x0a06060c, 0x0907070e,
     0x18080810, 0x1b090912, 0x1e0a0a14, 0x1d0b0b16,
@@ -226,7 +226,7 @@ const uint32_t AES_mc_rot[256] = {
  * Inverse MixColumns lookup table, for use with rot32.
  * From Arm ARM pseudocode.
  */
-const uint32_t AES_imc_rot[256] = {
+static const uint32_t AES_imc_rot[256] = {
     0x00000000, 0x0b0d090e, 0x161a121c, 0x1d171b12,
     0x2c342438, 0x27392d36, 0x3a2e3624, 0x31233f2a,
     0x58684870, 0x5365417e, 0x4e725a6c, 0x457f5362,
@@ -308,7 +308,7 @@ AES_Td3[x] = Si[x].[09, 0d, 0b, 0e];
 AES_Td4[x] = Si[x].[01, 01, 01, 01];
 */
 
-const uint32_t AES_Te0[256] = {
+static const uint32_t AES_Te0[256] = {
     0xc66363a5U, 0xf87c7c84U, 0xee777799U, 0xf67b7b8dU,
     0xfff2f20dU, 0xd66b6bbdU, 0xde6f6fb1U, 0x91c5c554U,
     0x60303050U, 0x02010103U, 0xce6767a9U, 0x562b2b7dU,
@@ -374,7 +374,8 @@ const uint32_t AES_Te0[256] = {
     0x824141c3U, 0x299999b0U, 0x5a2d2d77U, 0x1e0f0f11U,
     0x7bb0b0cbU, 0xa85454fcU, 0x6dbbbbd6U, 0x2c16163aU,
 };
-const uint32_t AES_Te1[256] = {
+
+static const uint32_t AES_Te1[256] = {
     0xa5c66363U, 0x84f87c7cU, 0x99ee7777U, 0x8df67b7bU,
     0x0dfff2f2U, 0xbdd66b6bU, 0xb1de6f6fU, 0x5491c5c5U,
     0x50603030U, 0x03020101U, 0xa9ce6767U, 0x7d562b2bU,
@@ -440,7 +441,8 @@ const uint32_t AES_Te1[256] = {
     0xc3824141U, 0xb0299999U, 0x775a2d2dU, 0x111e0f0fU,
     0xcb7bb0b0U, 0xfca85454U, 0xd66dbbbbU, 0x3a2c1616U,
 };
-const uint32_t AES_Te2[256] = {
+
+static const uint32_t AES_Te2[256] = {
     0x63a5c663U, 0x7c84f87cU, 0x7799ee77U, 0x7b8df67bU,
     0xf20dfff2U, 0x6bbdd66bU, 0x6fb1de6fU, 0xc55491c5U,
     0x30506030U, 0x01030201U, 0x67a9ce67U, 0x2b7d562bU,
@@ -506,8 +508,8 @@ const uint32_t AES_Te2[256] = {
     0x41c38241U, 0x99b02999U, 0x2d775a2dU, 0x0f111e0fU,
     0xb0cb7bb0U, 0x54fca854U, 0xbbd66dbbU, 0x163a2c16U,
 };
-const uint32_t AES_Te3[256] = {
 
+static const uint32_t AES_Te3[256] = {
     0x6363a5c6U, 0x7c7c84f8U, 0x777799eeU, 0x7b7b8df6U,
     0xf2f20dffU, 0x6b6bbdd6U, 0x6f6fb1deU, 0xc5c55491U,
     0x30305060U, 0x01010302U, 0x6767a9ceU, 0x2b2b7d56U,
@@ -573,7 +575,8 @@ const uint32_t AES_Te3[256] = {
     0x4141c382U, 0x9999b029U, 0x2d2d775aU, 0x0f0f111eU,
     0xb0b0cb7bU, 0x5454fca8U, 0xbbbbd66dU, 0x16163a2cU,
 };
-const uint32_t AES_Te4[256] = {
+
+static const uint32_t AES_Te4[256] = {
     0x63636363U, 0x7c7c7c7cU, 0x77777777U, 0x7b7b7b7bU,
     0xf2f2f2f2U, 0x6b6b6b6bU, 0x6f6f6f6fU, 0xc5c5c5c5U,
     0x30303030U, 0x01010101U, 0x67676767U, 0x2b2b2b2bU,
@@ -639,7 +642,8 @@ const uint32_t AES_Te4[256] = {
     0x41414141U, 0x99999999U, 0x2d2d2d2dU, 0x0f0f0f0fU,
     0xb0b0b0b0U, 0x54545454U, 0xbbbbbbbbU, 0x16161616U,
 };
-const uint32_t AES_Td0[256] = {
+
+static const uint32_t AES_Td0[256] = {
     0x51f4a750U, 0x7e416553U, 0x1a17a4c3U, 0x3a275e96U,
     0x3bab6bcbU, 0x1f9d45f1U, 0xacfa58abU, 0x4be30393U,
     0x2030fa55U, 0xad766df6U, 0x88cc7691U, 0xf5024c25U,
@@ -705,7 +709,8 @@ const uint32_t AES_Td0[256] = {
     0x39a80171U, 0x080cb3deU, 0xd8b4e49cU, 0x6456c190U,
     0x7bcb8461U, 0xd532b670U, 0x486c5c74U, 0xd0b85742U,
 };
-const uint32_t AES_Td1[256] = {
+
+static const uint32_t AES_Td1[256] = {
     0x5051f4a7U, 0x537e4165U, 0xc31a17a4U, 0x963a275eU,
     0xcb3bab6bU, 0xf11f9d45U, 0xabacfa58U, 0x934be303U,
     0x552030faU, 0xf6ad766dU, 0x9188cc76U, 0x25f5024cU,
@@ -771,7 +776,8 @@ const uint32_t AES_Td1[256] = {
     0x7139a801U, 0xde080cb3U, 0x9cd8b4e4U, 0x906456c1U,
     0x617bcb84U, 0x70d532b6U, 0x74486c5cU, 0x42d0b857U,
 };
-const uint32_t AES_Td2[256] = {
+
+static const uint32_t AES_Td2[256] = {
     0xa75051f4U, 0x65537e41U, 0xa4c31a17U, 0x5e963a27U,
     0x6bcb3babU, 0x45f11f9dU, 0x58abacfaU, 0x03934be3U,
     0xfa552030U, 0x6df6ad76U, 0x769188ccU, 0x4c25f502U,
@@ -838,7 +844,8 @@ const uint32_t AES_Td2[256] = {
     0x017139a8U, 0xb3de080cU, 0xe49cd8b4U, 0xc1906456U,
     0x84617bcbU, 0xb670d532U, 0x5c74486cU, 0x5742d0b8U,
 };
-const uint32_t AES_Td3[256] = {
+
+static const uint32_t AES_Td3[256] = {
     0xf4a75051U, 0x4165537eU, 0x17a4c31aU, 0x275e963aU,
     0xab6bcb3bU, 0x9d45f11fU, 0xfa58abacU, 0xe303934bU,
     0x30fa5520U, 0x766df6adU, 0xcc769188U, 0x024c25f5U,
@@ -904,7 +911,8 @@ const uint32_t AES_Td3[256] = {
     0xa8017139U, 0x0cb3de08U, 0xb4e49cd8U, 0x56c19064U,
     0xcb84617bU, 0x32b670d5U, 0x6c5c7448U, 0xb85742d0U,
 };
-const uint32_t AES_Td4[256] = {
+
+static const uint32_t AES_Td4[256] = {
     0x52525252U, 0x09090909U, 0x6a6a6a6aU, 0xd5d5d5d5U,
     0x30303030U, 0x36363636U, 0xa5a5a5a5U, 0x38383838U,
     0xbfbfbfbfU, 0x40404040U, 0xa3a3a3a3U, 0x9e9e9e9eU,
@@ -970,6 +978,7 @@ const uint32_t AES_Td4[256] = {
     0xe1e1e1e1U, 0x69696969U, 0x14141414U, 0x63636363U,
     0x55555555U, 0x21212121U, 0x0c0c0c0cU, 0x7d7d7d7dU,
 };
+
 static const u32 rcon[] = {
         0x01000000, 0x02000000, 0x04000000, 0x08000000,
         0x10000000, 0x20000000, 0x40000000, 0x80000000,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 02/35] target/arm: Move aesmc and aesimc tables to crypto/aes.c
  2023-06-03  2:33 ` [PATCH 02/35] target/arm: Move aesmc and aesimc tables to crypto/aes.c Richard Henderson
@ 2023-06-03 12:45   ` Ard Biesheuvel
  2023-06-03 15:21     ` Richard Henderson
  2023-06-05 10:45   ` Philippe Mathieu-Daudé
  1 sibling, 1 reply; 48+ messages in thread
From: Ard Biesheuvel @ 2023-06-03 12:45 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

On Sat, 3 Jun 2023 at 04:34, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> We do not currently have a table in crypto/ for
> just MixColumns.  Move both tables for consistency.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/crypto/aes.h           |   6 ++
>  crypto/aes.c                   | 142 ++++++++++++++++++++++++++++++++
>  target/arm/tcg/crypto_helper.c | 143 ++-------------------------------
>  3 files changed, 153 insertions(+), 138 deletions(-)
>
> diff --git a/include/crypto/aes.h b/include/crypto/aes.h
> index 822d64588c..24b073d569 100644
> --- a/include/crypto/aes.h
> +++ b/include/crypto/aes.h
> @@ -34,6 +34,12 @@ extern const uint8_t AES_isbox[256];
>  extern const uint8_t AES_shifts[16];
>  extern const uint8_t AES_ishifts[16];
>
> +/* AES MixColumns, for use with rot32. */
> +extern const uint32_t AES_mc_rot[256];
> +
> +/* AES InvMixColumns, for use with rot32. */
> +extern const uint32_t AES_imc_rot[256];
> +
>  /* AES InvMixColumns */
>  /* AES_imc[x][0] = [x].[0e, 09, 0d, 0b]; */
>  /* AES_imc[x][1] = [x].[0b, 0e, 09, 0d]; */
> diff --git a/crypto/aes.c b/crypto/aes.c
> index af72ff7779..72c95c38fb 100644
> --- a/crypto/aes.c
> +++ b/crypto/aes.c
> @@ -116,6 +116,148 @@ const uint8_t AES_ishifts[16] = {
>      0, 13, 10, 7, 4, 1, 14, 11, 8, 5, 2, 15, 12, 9, 6, 3
>  };
>
> +/*
> + * MixColumns lookup table, for use with rot32.
> + * From Arm ARM pseudocode.

I remember writing the code to generate these tables, and my copy of
the ARM ARM doesn't appear to have them, so this comment seems
inaccurate to me.

> + */
> +const uint32_t AES_mc_rot[256] = {
> +    0x00000000, 0x03010102, 0x06020204, 0x05030306,
> +    0x0c040408, 0x0f05050a, 0x0a06060c, 0x0907070e,
> +    0x18080810, 0x1b090912, 0x1e0a0a14, 0x1d0b0b16,
> +    0x140c0c18, 0x170d0d1a, 0x120e0e1c, 0x110f0f1e,
> +    0x30101020, 0x33111122, 0x36121224, 0x35131326,
> +    0x3c141428, 0x3f15152a, 0x3a16162c, 0x3917172e,
> +    0x28181830, 0x2b191932, 0x2e1a1a34, 0x2d1b1b36,
> +    0x241c1c38, 0x271d1d3a, 0x221e1e3c, 0x211f1f3e,
> +    0x60202040, 0x63212142, 0x66222244, 0x65232346,
> +    0x6c242448, 0x6f25254a, 0x6a26264c, 0x6927274e,
> +    0x78282850, 0x7b292952, 0x7e2a2a54, 0x7d2b2b56,
> +    0x742c2c58, 0x772d2d5a, 0x722e2e5c, 0x712f2f5e,
> +    0x50303060, 0x53313162, 0x56323264, 0x55333366,
> +    0x5c343468, 0x5f35356a, 0x5a36366c, 0x5937376e,
> +    0x48383870, 0x4b393972, 0x4e3a3a74, 0x4d3b3b76,
> +    0x443c3c78, 0x473d3d7a, 0x423e3e7c, 0x413f3f7e,
> +    0xc0404080, 0xc3414182, 0xc6424284, 0xc5434386,
> +    0xcc444488, 0xcf45458a, 0xca46468c, 0xc947478e,
> +    0xd8484890, 0xdb494992, 0xde4a4a94, 0xdd4b4b96,
> +    0xd44c4c98, 0xd74d4d9a, 0xd24e4e9c, 0xd14f4f9e,
> +    0xf05050a0, 0xf35151a2, 0xf65252a4, 0xf55353a6,
> +    0xfc5454a8, 0xff5555aa, 0xfa5656ac, 0xf95757ae,
> +    0xe85858b0, 0xeb5959b2, 0xee5a5ab4, 0xed5b5bb6,
> +    0xe45c5cb8, 0xe75d5dba, 0xe25e5ebc, 0xe15f5fbe,
> +    0xa06060c0, 0xa36161c2, 0xa66262c4, 0xa56363c6,
> +    0xac6464c8, 0xaf6565ca, 0xaa6666cc, 0xa96767ce,
> +    0xb86868d0, 0xbb6969d2, 0xbe6a6ad4, 0xbd6b6bd6,
> +    0xb46c6cd8, 0xb76d6dda, 0xb26e6edc, 0xb16f6fde,
> +    0x907070e0, 0x937171e2, 0x967272e4, 0x957373e6,
> +    0x9c7474e8, 0x9f7575ea, 0x9a7676ec, 0x997777ee,
> +    0x887878f0, 0x8b7979f2, 0x8e7a7af4, 0x8d7b7bf6,
> +    0x847c7cf8, 0x877d7dfa, 0x827e7efc, 0x817f7ffe,
> +    0x9b80801b, 0x98818119, 0x9d82821f, 0x9e83831d,
> +    0x97848413, 0x94858511, 0x91868617, 0x92878715,
> +    0x8388880b, 0x80898909, 0x858a8a0f, 0x868b8b0d,
> +    0x8f8c8c03, 0x8c8d8d01, 0x898e8e07, 0x8a8f8f05,
> +    0xab90903b, 0xa8919139, 0xad92923f, 0xae93933d,
> +    0xa7949433, 0xa4959531, 0xa1969637, 0xa2979735,
> +    0xb398982b, 0xb0999929, 0xb59a9a2f, 0xb69b9b2d,
> +    0xbf9c9c23, 0xbc9d9d21, 0xb99e9e27, 0xba9f9f25,
> +    0xfba0a05b, 0xf8a1a159, 0xfda2a25f, 0xfea3a35d,
> +    0xf7a4a453, 0xf4a5a551, 0xf1a6a657, 0xf2a7a755,
> +    0xe3a8a84b, 0xe0a9a949, 0xe5aaaa4f, 0xe6abab4d,
> +    0xefacac43, 0xecadad41, 0xe9aeae47, 0xeaafaf45,
> +    0xcbb0b07b, 0xc8b1b179, 0xcdb2b27f, 0xceb3b37d,
> +    0xc7b4b473, 0xc4b5b571, 0xc1b6b677, 0xc2b7b775,
> +    0xd3b8b86b, 0xd0b9b969, 0xd5baba6f, 0xd6bbbb6d,
> +    0xdfbcbc63, 0xdcbdbd61, 0xd9bebe67, 0xdabfbf65,
> +    0x5bc0c09b, 0x58c1c199, 0x5dc2c29f, 0x5ec3c39d,
> +    0x57c4c493, 0x54c5c591, 0x51c6c697, 0x52c7c795,
> +    0x43c8c88b, 0x40c9c989, 0x45caca8f, 0x46cbcb8d,
> +    0x4fcccc83, 0x4ccdcd81, 0x49cece87, 0x4acfcf85,
> +    0x6bd0d0bb, 0x68d1d1b9, 0x6dd2d2bf, 0x6ed3d3bd,
> +    0x67d4d4b3, 0x64d5d5b1, 0x61d6d6b7, 0x62d7d7b5,
> +    0x73d8d8ab, 0x70d9d9a9, 0x75dadaaf, 0x76dbdbad,
> +    0x7fdcdca3, 0x7cdddda1, 0x79dedea7, 0x7adfdfa5,
> +    0x3be0e0db, 0x38e1e1d9, 0x3de2e2df, 0x3ee3e3dd,
> +    0x37e4e4d3, 0x34e5e5d1, 0x31e6e6d7, 0x32e7e7d5,
> +    0x23e8e8cb, 0x20e9e9c9, 0x25eaeacf, 0x26ebebcd,
> +    0x2fececc3, 0x2cededc1, 0x29eeeec7, 0x2aefefc5,
> +    0x0bf0f0fb, 0x08f1f1f9, 0x0df2f2ff, 0x0ef3f3fd,
> +    0x07f4f4f3, 0x04f5f5f1, 0x01f6f6f7, 0x02f7f7f5,
> +    0x13f8f8eb, 0x10f9f9e9, 0x15fafaef, 0x16fbfbed,
> +    0x1ffcfce3, 0x1cfdfde1, 0x19fefee7, 0x1affffe5,
> +};
> +
> +/*
> + * Inverse MixColumns lookup table, for use with rot32.
> + * From Arm ARM pseudocode.
> + */
> +const uint32_t AES_imc_rot[256] = {
> +    0x00000000, 0x0b0d090e, 0x161a121c, 0x1d171b12,
> +    0x2c342438, 0x27392d36, 0x3a2e3624, 0x31233f2a,
> +    0x58684870, 0x5365417e, 0x4e725a6c, 0x457f5362,
> +    0x745c6c48, 0x7f516546, 0x62467e54, 0x694b775a,
> +    0xb0d090e0, 0xbbdd99ee, 0xa6ca82fc, 0xadc78bf2,
> +    0x9ce4b4d8, 0x97e9bdd6, 0x8afea6c4, 0x81f3afca,
> +    0xe8b8d890, 0xe3b5d19e, 0xfea2ca8c, 0xf5afc382,
> +    0xc48cfca8, 0xcf81f5a6, 0xd296eeb4, 0xd99be7ba,
> +    0x7bbb3bdb, 0x70b632d5, 0x6da129c7, 0x66ac20c9,
> +    0x578f1fe3, 0x5c8216ed, 0x41950dff, 0x4a9804f1,
> +    0x23d373ab, 0x28de7aa5, 0x35c961b7, 0x3ec468b9,
> +    0x0fe75793, 0x04ea5e9d, 0x19fd458f, 0x12f04c81,
> +    0xcb6bab3b, 0xc066a235, 0xdd71b927, 0xd67cb029,
> +    0xe75f8f03, 0xec52860d, 0xf1459d1f, 0xfa489411,
> +    0x9303e34b, 0x980eea45, 0x8519f157, 0x8e14f859,
> +    0xbf37c773, 0xb43ace7d, 0xa92dd56f, 0xa220dc61,
> +    0xf66d76ad, 0xfd607fa3, 0xe07764b1, 0xeb7a6dbf,
> +    0xda595295, 0xd1545b9b, 0xcc434089, 0xc74e4987,
> +    0xae053edd, 0xa50837d3, 0xb81f2cc1, 0xb31225cf,
> +    0x82311ae5, 0x893c13eb, 0x942b08f9, 0x9f2601f7,
> +    0x46bde64d, 0x4db0ef43, 0x50a7f451, 0x5baafd5f,
> +    0x6a89c275, 0x6184cb7b, 0x7c93d069, 0x779ed967,
> +    0x1ed5ae3d, 0x15d8a733, 0x08cfbc21, 0x03c2b52f,
> +    0x32e18a05, 0x39ec830b, 0x24fb9819, 0x2ff69117,
> +    0x8dd64d76, 0x86db4478, 0x9bcc5f6a, 0x90c15664,
> +    0xa1e2694e, 0xaaef6040, 0xb7f87b52, 0xbcf5725c,
> +    0xd5be0506, 0xdeb30c08, 0xc3a4171a, 0xc8a91e14,
> +    0xf98a213e, 0xf2872830, 0xef903322, 0xe49d3a2c,
> +    0x3d06dd96, 0x360bd498, 0x2b1ccf8a, 0x2011c684,
> +    0x1132f9ae, 0x1a3ff0a0, 0x0728ebb2, 0x0c25e2bc,
> +    0x656e95e6, 0x6e639ce8, 0x737487fa, 0x78798ef4,
> +    0x495ab1de, 0x4257b8d0, 0x5f40a3c2, 0x544daacc,
> +    0xf7daec41, 0xfcd7e54f, 0xe1c0fe5d, 0xeacdf753,
> +    0xdbeec879, 0xd0e3c177, 0xcdf4da65, 0xc6f9d36b,
> +    0xafb2a431, 0xa4bfad3f, 0xb9a8b62d, 0xb2a5bf23,
> +    0x83868009, 0x888b8907, 0x959c9215, 0x9e919b1b,
> +    0x470a7ca1, 0x4c0775af, 0x51106ebd, 0x5a1d67b3,
> +    0x6b3e5899, 0x60335197, 0x7d244a85, 0x7629438b,
> +    0x1f6234d1, 0x146f3ddf, 0x097826cd, 0x02752fc3,
> +    0x335610e9, 0x385b19e7, 0x254c02f5, 0x2e410bfb,
> +    0x8c61d79a, 0x876cde94, 0x9a7bc586, 0x9176cc88,
> +    0xa055f3a2, 0xab58faac, 0xb64fe1be, 0xbd42e8b0,
> +    0xd4099fea, 0xdf0496e4, 0xc2138df6, 0xc91e84f8,
> +    0xf83dbbd2, 0xf330b2dc, 0xee27a9ce, 0xe52aa0c0,
> +    0x3cb1477a, 0x37bc4e74, 0x2aab5566, 0x21a65c68,
> +    0x10856342, 0x1b886a4c, 0x069f715e, 0x0d927850,
> +    0x64d90f0a, 0x6fd40604, 0x72c31d16, 0x79ce1418,
> +    0x48ed2b32, 0x43e0223c, 0x5ef7392e, 0x55fa3020,
> +    0x01b79aec, 0x0aba93e2, 0x17ad88f0, 0x1ca081fe,
> +    0x2d83bed4, 0x268eb7da, 0x3b99acc8, 0x3094a5c6,
> +    0x59dfd29c, 0x52d2db92, 0x4fc5c080, 0x44c8c98e,
> +    0x75ebf6a4, 0x7ee6ffaa, 0x63f1e4b8, 0x68fcedb6,
> +    0xb1670a0c, 0xba6a0302, 0xa77d1810, 0xac70111e,
> +    0x9d532e34, 0x965e273a, 0x8b493c28, 0x80443526,
> +    0xe90f427c, 0xe2024b72, 0xff155060, 0xf418596e,
> +    0xc53b6644, 0xce366f4a, 0xd3217458, 0xd82c7d56,
> +    0x7a0ca137, 0x7101a839, 0x6c16b32b, 0x671bba25,
> +    0x5638850f, 0x5d358c01, 0x40229713, 0x4b2f9e1d,
> +    0x2264e947, 0x2969e049, 0x347efb5b, 0x3f73f255,
> +    0x0e50cd7f, 0x055dc471, 0x184adf63, 0x1347d66d,
> +    0xcadc31d7, 0xc1d138d9, 0xdcc623cb, 0xd7cb2ac5,
> +    0xe6e815ef, 0xede51ce1, 0xf0f207f3, 0xfbff0efd,
> +    0x92b479a7, 0x99b970a9, 0x84ae6bbb, 0x8fa362b5,
> +    0xbe805d9f, 0xb58d5491, 0xa89a4f83, 0xa397468d,
> +};
> +
>  /* AES_imc[x][0] = [x].[0e, 09, 0d, 0b]; */
>  /* AES_imc[x][1] = [x].[0b, 0e, 09, 0d]; */
>  /* AES_imc[x][2] = [x].[0d, 0b, 0e, 09]; */
> diff --git a/target/arm/tcg/crypto_helper.c b/target/arm/tcg/crypto_helper.c
> index d28690321f..06254939d2 100644
> --- a/target/arm/tcg/crypto_helper.c
> +++ b/target/arm/tcg/crypto_helper.c
> @@ -80,149 +80,16 @@ void HELPER(crypto_aese)(void *vd, void *vn, void *vm, uint32_t desc)
>
>  static void do_crypto_aesmc(uint64_t *rd, uint64_t *rm, bool decrypt)
>  {
> -    static uint32_t const mc[][256] = { {
> -        /* MixColumns lookup table */
> -        0x00000000, 0x03010102, 0x06020204, 0x05030306,
> -        0x0c040408, 0x0f05050a, 0x0a06060c, 0x0907070e,
> -        0x18080810, 0x1b090912, 0x1e0a0a14, 0x1d0b0b16,
> -        0x140c0c18, 0x170d0d1a, 0x120e0e1c, 0x110f0f1e,
> -        0x30101020, 0x33111122, 0x36121224, 0x35131326,
> -        0x3c141428, 0x3f15152a, 0x3a16162c, 0x3917172e,
> -        0x28181830, 0x2b191932, 0x2e1a1a34, 0x2d1b1b36,
> -        0x241c1c38, 0x271d1d3a, 0x221e1e3c, 0x211f1f3e,
> -        0x60202040, 0x63212142, 0x66222244, 0x65232346,
> -        0x6c242448, 0x6f25254a, 0x6a26264c, 0x6927274e,
> -        0x78282850, 0x7b292952, 0x7e2a2a54, 0x7d2b2b56,
> -        0x742c2c58, 0x772d2d5a, 0x722e2e5c, 0x712f2f5e,
> -        0x50303060, 0x53313162, 0x56323264, 0x55333366,
> -        0x5c343468, 0x5f35356a, 0x5a36366c, 0x5937376e,
> -        0x48383870, 0x4b393972, 0x4e3a3a74, 0x4d3b3b76,
> -        0x443c3c78, 0x473d3d7a, 0x423e3e7c, 0x413f3f7e,
> -        0xc0404080, 0xc3414182, 0xc6424284, 0xc5434386,
> -        0xcc444488, 0xcf45458a, 0xca46468c, 0xc947478e,
> -        0xd8484890, 0xdb494992, 0xde4a4a94, 0xdd4b4b96,
> -        0xd44c4c98, 0xd74d4d9a, 0xd24e4e9c, 0xd14f4f9e,
> -        0xf05050a0, 0xf35151a2, 0xf65252a4, 0xf55353a6,
> -        0xfc5454a8, 0xff5555aa, 0xfa5656ac, 0xf95757ae,
> -        0xe85858b0, 0xeb5959b2, 0xee5a5ab4, 0xed5b5bb6,
> -        0xe45c5cb8, 0xe75d5dba, 0xe25e5ebc, 0xe15f5fbe,
> -        0xa06060c0, 0xa36161c2, 0xa66262c4, 0xa56363c6,
> -        0xac6464c8, 0xaf6565ca, 0xaa6666cc, 0xa96767ce,
> -        0xb86868d0, 0xbb6969d2, 0xbe6a6ad4, 0xbd6b6bd6,
> -        0xb46c6cd8, 0xb76d6dda, 0xb26e6edc, 0xb16f6fde,
> -        0x907070e0, 0x937171e2, 0x967272e4, 0x957373e6,
> -        0x9c7474e8, 0x9f7575ea, 0x9a7676ec, 0x997777ee,
> -        0x887878f0, 0x8b7979f2, 0x8e7a7af4, 0x8d7b7bf6,
> -        0x847c7cf8, 0x877d7dfa, 0x827e7efc, 0x817f7ffe,
> -        0x9b80801b, 0x98818119, 0x9d82821f, 0x9e83831d,
> -        0x97848413, 0x94858511, 0x91868617, 0x92878715,
> -        0x8388880b, 0x80898909, 0x858a8a0f, 0x868b8b0d,
> -        0x8f8c8c03, 0x8c8d8d01, 0x898e8e07, 0x8a8f8f05,
> -        0xab90903b, 0xa8919139, 0xad92923f, 0xae93933d,
> -        0xa7949433, 0xa4959531, 0xa1969637, 0xa2979735,
> -        0xb398982b, 0xb0999929, 0xb59a9a2f, 0xb69b9b2d,
> -        0xbf9c9c23, 0xbc9d9d21, 0xb99e9e27, 0xba9f9f25,
> -        0xfba0a05b, 0xf8a1a159, 0xfda2a25f, 0xfea3a35d,
> -        0xf7a4a453, 0xf4a5a551, 0xf1a6a657, 0xf2a7a755,
> -        0xe3a8a84b, 0xe0a9a949, 0xe5aaaa4f, 0xe6abab4d,
> -        0xefacac43, 0xecadad41, 0xe9aeae47, 0xeaafaf45,
> -        0xcbb0b07b, 0xc8b1b179, 0xcdb2b27f, 0xceb3b37d,
> -        0xc7b4b473, 0xc4b5b571, 0xc1b6b677, 0xc2b7b775,
> -        0xd3b8b86b, 0xd0b9b969, 0xd5baba6f, 0xd6bbbb6d,
> -        0xdfbcbc63, 0xdcbdbd61, 0xd9bebe67, 0xdabfbf65,
> -        0x5bc0c09b, 0x58c1c199, 0x5dc2c29f, 0x5ec3c39d,
> -        0x57c4c493, 0x54c5c591, 0x51c6c697, 0x52c7c795,
> -        0x43c8c88b, 0x40c9c989, 0x45caca8f, 0x46cbcb8d,
> -        0x4fcccc83, 0x4ccdcd81, 0x49cece87, 0x4acfcf85,
> -        0x6bd0d0bb, 0x68d1d1b9, 0x6dd2d2bf, 0x6ed3d3bd,
> -        0x67d4d4b3, 0x64d5d5b1, 0x61d6d6b7, 0x62d7d7b5,
> -        0x73d8d8ab, 0x70d9d9a9, 0x75dadaaf, 0x76dbdbad,
> -        0x7fdcdca3, 0x7cdddda1, 0x79dedea7, 0x7adfdfa5,
> -        0x3be0e0db, 0x38e1e1d9, 0x3de2e2df, 0x3ee3e3dd,
> -        0x37e4e4d3, 0x34e5e5d1, 0x31e6e6d7, 0x32e7e7d5,
> -        0x23e8e8cb, 0x20e9e9c9, 0x25eaeacf, 0x26ebebcd,
> -        0x2fececc3, 0x2cededc1, 0x29eeeec7, 0x2aefefc5,
> -        0x0bf0f0fb, 0x08f1f1f9, 0x0df2f2ff, 0x0ef3f3fd,
> -        0x07f4f4f3, 0x04f5f5f1, 0x01f6f6f7, 0x02f7f7f5,
> -        0x13f8f8eb, 0x10f9f9e9, 0x15fafaef, 0x16fbfbed,
> -        0x1ffcfce3, 0x1cfdfde1, 0x19fefee7, 0x1affffe5,
> -    }, {
> -        /* Inverse MixColumns lookup table */
> -        0x00000000, 0x0b0d090e, 0x161a121c, 0x1d171b12,
> -        0x2c342438, 0x27392d36, 0x3a2e3624, 0x31233f2a,
> -        0x58684870, 0x5365417e, 0x4e725a6c, 0x457f5362,
> -        0x745c6c48, 0x7f516546, 0x62467e54, 0x694b775a,
> -        0xb0d090e0, 0xbbdd99ee, 0xa6ca82fc, 0xadc78bf2,
> -        0x9ce4b4d8, 0x97e9bdd6, 0x8afea6c4, 0x81f3afca,
> -        0xe8b8d890, 0xe3b5d19e, 0xfea2ca8c, 0xf5afc382,
> -        0xc48cfca8, 0xcf81f5a6, 0xd296eeb4, 0xd99be7ba,
> -        0x7bbb3bdb, 0x70b632d5, 0x6da129c7, 0x66ac20c9,
> -        0x578f1fe3, 0x5c8216ed, 0x41950dff, 0x4a9804f1,
> -        0x23d373ab, 0x28de7aa5, 0x35c961b7, 0x3ec468b9,
> -        0x0fe75793, 0x04ea5e9d, 0x19fd458f, 0x12f04c81,
> -        0xcb6bab3b, 0xc066a235, 0xdd71b927, 0xd67cb029,
> -        0xe75f8f03, 0xec52860d, 0xf1459d1f, 0xfa489411,
> -        0x9303e34b, 0x980eea45, 0x8519f157, 0x8e14f859,
> -        0xbf37c773, 0xb43ace7d, 0xa92dd56f, 0xa220dc61,
> -        0xf66d76ad, 0xfd607fa3, 0xe07764b1, 0xeb7a6dbf,
> -        0xda595295, 0xd1545b9b, 0xcc434089, 0xc74e4987,
> -        0xae053edd, 0xa50837d3, 0xb81f2cc1, 0xb31225cf,
> -        0x82311ae5, 0x893c13eb, 0x942b08f9, 0x9f2601f7,
> -        0x46bde64d, 0x4db0ef43, 0x50a7f451, 0x5baafd5f,
> -        0x6a89c275, 0x6184cb7b, 0x7c93d069, 0x779ed967,
> -        0x1ed5ae3d, 0x15d8a733, 0x08cfbc21, 0x03c2b52f,
> -        0x32e18a05, 0x39ec830b, 0x24fb9819, 0x2ff69117,
> -        0x8dd64d76, 0x86db4478, 0x9bcc5f6a, 0x90c15664,
> -        0xa1e2694e, 0xaaef6040, 0xb7f87b52, 0xbcf5725c,
> -        0xd5be0506, 0xdeb30c08, 0xc3a4171a, 0xc8a91e14,
> -        0xf98a213e, 0xf2872830, 0xef903322, 0xe49d3a2c,
> -        0x3d06dd96, 0x360bd498, 0x2b1ccf8a, 0x2011c684,
> -        0x1132f9ae, 0x1a3ff0a0, 0x0728ebb2, 0x0c25e2bc,
> -        0x656e95e6, 0x6e639ce8, 0x737487fa, 0x78798ef4,
> -        0x495ab1de, 0x4257b8d0, 0x5f40a3c2, 0x544daacc,
> -        0xf7daec41, 0xfcd7e54f, 0xe1c0fe5d, 0xeacdf753,
> -        0xdbeec879, 0xd0e3c177, 0xcdf4da65, 0xc6f9d36b,
> -        0xafb2a431, 0xa4bfad3f, 0xb9a8b62d, 0xb2a5bf23,
> -        0x83868009, 0x888b8907, 0x959c9215, 0x9e919b1b,
> -        0x470a7ca1, 0x4c0775af, 0x51106ebd, 0x5a1d67b3,
> -        0x6b3e5899, 0x60335197, 0x7d244a85, 0x7629438b,
> -        0x1f6234d1, 0x146f3ddf, 0x097826cd, 0x02752fc3,
> -        0x335610e9, 0x385b19e7, 0x254c02f5, 0x2e410bfb,
> -        0x8c61d79a, 0x876cde94, 0x9a7bc586, 0x9176cc88,
> -        0xa055f3a2, 0xab58faac, 0xb64fe1be, 0xbd42e8b0,
> -        0xd4099fea, 0xdf0496e4, 0xc2138df6, 0xc91e84f8,
> -        0xf83dbbd2, 0xf330b2dc, 0xee27a9ce, 0xe52aa0c0,
> -        0x3cb1477a, 0x37bc4e74, 0x2aab5566, 0x21a65c68,
> -        0x10856342, 0x1b886a4c, 0x069f715e, 0x0d927850,
> -        0x64d90f0a, 0x6fd40604, 0x72c31d16, 0x79ce1418,
> -        0x48ed2b32, 0x43e0223c, 0x5ef7392e, 0x55fa3020,
> -        0x01b79aec, 0x0aba93e2, 0x17ad88f0, 0x1ca081fe,
> -        0x2d83bed4, 0x268eb7da, 0x3b99acc8, 0x3094a5c6,
> -        0x59dfd29c, 0x52d2db92, 0x4fc5c080, 0x44c8c98e,
> -        0x75ebf6a4, 0x7ee6ffaa, 0x63f1e4b8, 0x68fcedb6,
> -        0xb1670a0c, 0xba6a0302, 0xa77d1810, 0xac70111e,
> -        0x9d532e34, 0x965e273a, 0x8b493c28, 0x80443526,
> -        0xe90f427c, 0xe2024b72, 0xff155060, 0xf418596e,
> -        0xc53b6644, 0xce366f4a, 0xd3217458, 0xd82c7d56,
> -        0x7a0ca137, 0x7101a839, 0x6c16b32b, 0x671bba25,
> -        0x5638850f, 0x5d358c01, 0x40229713, 0x4b2f9e1d,
> -        0x2264e947, 0x2969e049, 0x347efb5b, 0x3f73f255,
> -        0x0e50cd7f, 0x055dc471, 0x184adf63, 0x1347d66d,
> -        0xcadc31d7, 0xc1d138d9, 0xdcc623cb, 0xd7cb2ac5,
> -        0xe6e815ef, 0xede51ce1, 0xf0f207f3, 0xfbff0efd,
> -        0x92b479a7, 0x99b970a9, 0x84ae6bbb, 0x8fa362b5,
> -        0xbe805d9f, 0xb58d5491, 0xa89a4f83, 0xa397468d,
> -    } };
> -
>      union CRYPTO_STATE st = { .l = { rm[0], rm[1] } };
> +    const uint32_t *mc = decrypt ? AES_imc_rot : AES_mc_rot;
>      int i;
>
>      for (i = 0; i < 16; i += 4) {
>          CR_ST_WORD(st, i >> 2) =
> -            mc[decrypt][CR_ST_BYTE(st, i)] ^
> -            rol32(mc[decrypt][CR_ST_BYTE(st, i + 1)], 8) ^
> -            rol32(mc[decrypt][CR_ST_BYTE(st, i + 2)], 16) ^
> -            rol32(mc[decrypt][CR_ST_BYTE(st, i + 3)], 24);
> +            mc[CR_ST_BYTE(st, i)] ^
> +            rol32(mc[CR_ST_BYTE(st, i + 1)], 8) ^
> +            rol32(mc[CR_ST_BYTE(st, i + 2)], 16) ^
> +            rol32(mc[CR_ST_BYTE(st, i + 3)], 24);
>      }
>
>      rd[0] = st.l[0];
> --
> 2.34.1
>


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 31/35] host/include/aarch64: Implement aes-round.h
  2023-06-03  2:34 ` [PATCH 31/35] host/include/aarch64: " Richard Henderson
@ 2023-06-03 12:50   ` Ard Biesheuvel
  2023-06-03 16:01     ` Richard Henderson
  0 siblings, 1 reply; 48+ messages in thread
From: Ard Biesheuvel @ 2023-06-03 12:50 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

On Sat, 3 Jun 2023 at 04:34, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Detect AES in cpuinfo; implement the accel hooks.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  host/include/aarch64/host/aes-round.h | 204 ++++++++++++++++++++++++++
>  host/include/aarch64/host/cpuinfo.h   |   1 +
>  util/cpuinfo-aarch64.c                |   2 +
>  3 files changed, 207 insertions(+)
>  create mode 100644 host/include/aarch64/host/aes-round.h
>
> diff --git a/host/include/aarch64/host/aes-round.h b/host/include/aarch64/host/aes-round.h
> new file mode 100644
> index 0000000000..27ca823db6
> --- /dev/null
> +++ b/host/include/aarch64/host/aes-round.h
> @@ -0,0 +1,204 @@
> +/*
> + * AArch64 specific aes acceleration.
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef HOST_AES_ROUND_H
> +#define HOST_AES_ROUND_H
> +
> +#include "host/cpuinfo.h"
> +#include <arm_neon.h>
> +
> +#ifdef __ARM_FEATURE_AES
> +# define HAVE_AES_ACCEL  true
> +# define ATTR_AES_ACCEL
> +#else
> +# define HAVE_AES_ACCEL  likely(cpuinfo & CPUINFO_AES)
> +# define ATTR_AES_ACCEL  __attribute__((target("+crypto")))
> +#endif
> +
> +static inline uint8x16_t aes_accel_bswap(uint8x16_t x)
> +{
> +    /* No arm_neon.h primitive, and the compilers don't share builtins. */

vqtbl1q_u8() perhaps?

> +#ifdef __clang__
> +    return __builtin_shufflevector(x, x, 15, 14, 13, 12, 11, 10, 9, 8,
> +                                   7, 6, 5, 4, 3, 2, 1, 0);
> +#else
> +    return __builtin_shuffle(x, (uint8x16_t)
> +                             { 15, 14, 13, 12, 11, 10, 9, 8,
> +                               7,  6,  5,  4,  3,   2, 1, 0, });
> +#endif
> +}
> +
> +/*
> + * Through clang 15, the aes inlines are only defined if __ARM_FEATURE_AES;
> + * one cannot use __attribute__((target)) to make them appear after the fact.
> + * Therefore we must fallback to inline asm.
> + */
> +#ifdef __ARM_FEATURE_AES
> +# define aes_accel_aesd   vaesdq_u8
> +# define aes_accel_aese   vaeseq_u8
> +# define aes_accel_aesmc  vaesmcq_u8
> +# define aes_accel_aesimc vaesimcq_u8
> +#else
> +static inline uint8x16_t aes_accel_aesd(uint8x16_t d, uint8x16_t k)
> +{
> +    asm(".arch_extension aes\n\t"
> +        "aesd %0.16b, %1.16b" : "+w"(d) : "w"(k));
> +    return d;
> +}
> +
> +static inline uint8x16_t aes_accel_aese(uint8x16_t d, uint8x16_t k)
> +{
> +    asm(".arch_extension aes\n\t"
> +        "aese %0.16b, %1.16b" : "+w"(d) : "w"(k));
> +    return d;
> +}
> +
> +static inline uint8x16_t aes_accel_aesmc(uint8x16_t d)
> +{
> +    asm(".arch_extension aes\n\t"
> +        "aesmc %0.16b, %1.16b" : "=w"(d) : "w"(d));


Most ARM cores fuse aese/aesmc into a single uop (with the associated
performance boost) if the pattern is

aese x, y
aesmc x,x

aesd x, y
aesimc x,x

So it might make sense to use +w here at least, and use only a single
register (which the compiler will likely do in any case, but still)

I would assume that the compiler cannot issue these separately based
on the sequences below, but if it might, it may be worth it to emit
the aese/aesmc together in a single asm() block

> +    return d;
> +}
> +
> +static inline uint8x16_t aes_accel_aesimc(uint8x16_t d)
> +{
> +    asm(".arch_extension aes\n\t"
> +        "aesimc %0.16b, %1.16b" : "=w"(d) : "w"(d));
> +    return d;
> +}
> +#endif /* __ARM_FEATURE_AES */
> +
> +static inline void ATTR_AES_ACCEL
> +aesenc_MC_accel(AESState *ret, const AESState *st, bool be)
> +{
> +    uint8x16_t t = (uint8x16_t)st->v;
> +
> +    if (be) {
> +        t = aes_accel_bswap(t);
> +        t = aes_accel_aesmc(t);
> +        t = aes_accel_bswap(t);
> +    } else {
> +        t = aes_accel_aesmc(t);
> +    }
> +    ret->v = (AESStateVec)t;
> +}
> +
> +static inline void ATTR_AES_ACCEL
> +aesenc_SB_SR_accel(AESState *ret, const AESState *st, bool be)
> +{
> +    uint8x16_t t = (uint8x16_t)st->v;
> +    uint8x16_t z = { };
> +
> +    if (be) {
> +        t = aes_accel_bswap(t);
> +        t = aes_accel_aese(t, z);
> +        t = aes_accel_bswap(t);
> +    } else {
> +        t = aes_accel_aese(t, z);
> +    }
> +    ret->v = (AESStateVec)t;
> +}
> +
> +static inline void ATTR_AES_ACCEL
> +aesenc_SB_SR_MC_AK_accel(AESState *ret, const AESState *st,
> +                         const AESState *rk, bool be)
> +{
> +    uint8x16_t t = (uint8x16_t)st->v;
> +    uint8x16_t k = (uint8x16_t)rk->v;
> +    uint8x16_t z = { };
> +
> +    if (be) {
> +        t = aes_accel_bswap(t);
> +        k = aes_accel_bswap(k);
> +        t = aes_accel_aese(t, z);
> +        t = aes_accel_aesmc(t);
> +        t = veorq_u8(t, k);
> +        t = aes_accel_bswap(t);
> +    } else {
> +        t = aes_accel_aese(t, z);
> +        t = aes_accel_aesmc(t);
> +        t = veorq_u8(t, k);
> +    }
> +    ret->v = (AESStateVec)t;
> +}
> +
> +static inline void ATTR_AES_ACCEL
> +aesdec_IMC_accel(AESState *ret, const AESState *st, bool be)
> +{
> +    uint8x16_t t = (uint8x16_t)st->v;
> +
> +    if (be) {
> +        t = aes_accel_bswap(t);
> +        t = aes_accel_aesimc(t);
> +        t = aes_accel_bswap(t);
> +    } else {
> +        t = aes_accel_aesimc(t);
> +    }
> +    ret->v = (AESStateVec)t;
> +}
> +
> +static inline void ATTR_AES_ACCEL
> +aesdec_ISB_ISR_accel(AESState *ret, const AESState *st, bool be)
> +{
> +    uint8x16_t t = (uint8x16_t)st->v;
> +    uint8x16_t z = { };
> +
> +    if (be) {
> +        t = aes_accel_bswap(t);
> +        t = aes_accel_aesd(t, z);
> +        t = aes_accel_bswap(t);
> +    } else {
> +        t = aes_accel_aesd(t, z);
> +    }
> +    ret->v = (AESStateVec)t;
> +}
> +
> +static inline void ATTR_AES_ACCEL
> +aesdec_ISB_ISR_AK_IMC_accel(AESState *ret, const AESState *st,
> +                            const AESState *rk, bool be)
> +{
> +    uint8x16_t t = (uint8x16_t)st->v;
> +    uint8x16_t k = (uint8x16_t)rk->v;
> +    uint8x16_t z = { };
> +
> +    if (be) {
> +        t = aes_accel_bswap(t);
> +        k = aes_accel_bswap(k);
> +        t = aes_accel_aesd(t, z);
> +        t = veorq_u8(t, k);
> +        t = aes_accel_aesimc(t);
> +        t = aes_accel_bswap(t);
> +    } else {
> +        t = aes_accel_aesd(t, z);
> +        t = veorq_u8(t, k);
> +        t = aes_accel_aesimc(t);
> +    }
> +    ret->v = (AESStateVec)t;
> +}
> +
> +static inline void ATTR_AES_ACCEL
> +aesdec_ISB_ISR_IMC_AK_accel(AESState *ret, const AESState *st,
> +                            const AESState *rk, bool be)
> +{
> +    uint8x16_t t = (uint8x16_t)st->v;
> +    uint8x16_t k = (uint8x16_t)rk->v;
> +    uint8x16_t z = { };
> +
> +    if (be) {
> +        t = aes_accel_bswap(t);
> +        k = aes_accel_bswap(k);
> +        t = aes_accel_aesd(t, z);
> +        t = aes_accel_aesimc(t);
> +        t = veorq_u8(t, k);
> +        t = aes_accel_bswap(t);
> +    } else {
> +        t = aes_accel_aesd(t, z);
> +        t = aes_accel_aesimc(t);
> +        t = veorq_u8(t, k);
> +    }
> +    ret->v = (AESStateVec)t;
> +}
> +
> +#endif
> diff --git a/host/include/aarch64/host/cpuinfo.h b/host/include/aarch64/host/cpuinfo.h
> index 82227890b4..05feeb4f43 100644
> --- a/host/include/aarch64/host/cpuinfo.h
> +++ b/host/include/aarch64/host/cpuinfo.h
> @@ -9,6 +9,7 @@
>  #define CPUINFO_ALWAYS          (1u << 0)  /* so cpuinfo is nonzero */
>  #define CPUINFO_LSE             (1u << 1)
>  #define CPUINFO_LSE2            (1u << 2)
> +#define CPUINFO_AES             (1u << 3)
>
>  /* Initialized with a constructor. */
>  extern unsigned cpuinfo;
> diff --git a/util/cpuinfo-aarch64.c b/util/cpuinfo-aarch64.c
> index f99acb7884..ababc39550 100644
> --- a/util/cpuinfo-aarch64.c
> +++ b/util/cpuinfo-aarch64.c
> @@ -56,10 +56,12 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
>      unsigned long hwcap = qemu_getauxval(AT_HWCAP);
>      info |= (hwcap & HWCAP_ATOMICS ? CPUINFO_LSE : 0);
>      info |= (hwcap & HWCAP_USCAT ? CPUINFO_LSE2 : 0);
> +    info |= (hwcap & HWCAP_AES ? CPUINFO_AES: 0);
>  #endif
>  #ifdef CONFIG_DARWIN
>      info |= sysctl_for_bool("hw.optional.arm.FEAT_LSE") * CPUINFO_LSE;
>      info |= sysctl_for_bool("hw.optional.arm.FEAT_LSE2") * CPUINFO_LSE2;
> +    info |= sysctl_for_bool("hw.optional.arm.FEAT_AES") * CPUINFO_AES;
>  #endif
>
>      cpuinfo = info;
> --
> 2.34.1
>


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/35] crypto: Add aesenc_SB_SR
  2023-06-03  2:33 ` [PATCH 04/35] crypto: Add aesenc_SB_SR Richard Henderson
@ 2023-06-03 13:15   ` Ard Biesheuvel
  2023-06-03 15:24     ` Richard Henderson
  0 siblings, 1 reply; 48+ messages in thread
From: Ard Biesheuvel @ 2023-06-03 13:15 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

On Sat, 3 Jun 2023 at 04:34, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Start adding infrastructure for accelerating guest AES.
> Begin with a SubBytes + ShiftRows primitive.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  host/include/generic/host/aes-round.h | 15 +++++++++
>  include/crypto/aes-round.h            | 41 +++++++++++++++++++++++
>  crypto/aes.c                          | 47 +++++++++++++++++++++++++++
>  3 files changed, 103 insertions(+)
>  create mode 100644 host/include/generic/host/aes-round.h
>  create mode 100644 include/crypto/aes-round.h
>
> diff --git a/host/include/generic/host/aes-round.h b/host/include/generic/host/aes-round.h
> new file mode 100644
> index 0000000000..598242c603
> --- /dev/null
> +++ b/host/include/generic/host/aes-round.h
> @@ -0,0 +1,15 @@
> +/*
> + * No host specific aes acceleration.
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef HOST_AES_ROUND_H
> +#define HOST_AES_ROUND_H
> +
> +#define HAVE_AES_ACCEL  false
> +#define ATTR_AES_ACCEL
> +
> +void aesenc_SB_SR_accel(AESState *, const AESState *, bool)
> +    QEMU_ERROR("unsupported accel");
> +
> +#endif
> diff --git a/include/crypto/aes-round.h b/include/crypto/aes-round.h
> new file mode 100644
> index 0000000000..784e1daee6
> --- /dev/null
> +++ b/include/crypto/aes-round.h
> @@ -0,0 +1,41 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + * AES round fragments, generic version
> + *
> + * Copyright (C) 2023 Linaro, Ltd.
> + */
> +
> +#ifndef CRYPTO_AES_ROUND_H
> +#define CRYPTO_AES_ROUND_H
> +
> +/* Hosts with acceleration will usually need a 16-byte vector type. */
> +typedef uint8_t AESStateVec __attribute__((vector_size(16)));
> +
> +typedef union {
> +    uint8_t b[16];
> +    uint32_t w[4];
> +    uint64_t d[4];
> +    AESStateVec v;
> +} AESState;
> +
> +#include "host/aes-round.h"
> +
> +/*
> + * Perform SubBytes + ShiftRows.
> + */
> +
> +void aesenc_SB_SR_gen(AESState *ret, const AESState *st);
> +void aesenc_SB_SR_genrev(AESState *ret, const AESState *st);
> +
> +static inline void aesenc_SB_SR(AESState *r, const AESState *st, bool be)
> +{
> +    if (HAVE_AES_ACCEL) {
> +        aesenc_SB_SR_accel(r, st, be);
> +    } else if (HOST_BIG_ENDIAN == be) {
> +        aesenc_SB_SR_gen(r, st);
> +    } else {
> +        aesenc_SB_SR_genrev(r, st);
> +    }
> +}
> +
> +#endif /* CRYPTO_AES_ROUND_H */
> diff --git a/crypto/aes.c b/crypto/aes.c
> index 1309a13e91..708838315a 100644
> --- a/crypto/aes.c
> +++ b/crypto/aes.c
> @@ -29,6 +29,7 @@
>   */
>  #include "qemu/osdep.h"
>  #include "crypto/aes.h"
> +#include "crypto/aes-round.h"
>
>  typedef uint32_t u32;
>  typedef uint8_t u8;
> @@ -1251,6 +1252,52 @@ static const u32 rcon[] = {
>          0x1B000000, 0x36000000, /* for 128-bit blocks, Rijndael never uses more than 10 rcon values */
>  };
>
> +/* Perform SubBytes + ShiftRows. */
> +static inline void
> +aesenc_SB_SR_swap(AESState *r, const AESState *st, bool swap)
> +{
> +    const int swap_b = swap ? 15 : 0;
> +    uint8_t t;
> +
> +    /* These four indexes are not swizzled. */
> +    r->b[swap_b ^ 0x0] = AES_sbox[st->b[swap_b ^ AES_SH_0]];
> +    r->b[swap_b ^ 0x4] = AES_sbox[st->b[swap_b ^ AES_SH_4]];
> +    r->b[swap_b ^ 0x8] = AES_sbox[st->b[swap_b ^ AES_SH_8]];
> +    r->b[swap_b ^ 0xc] = AES_sbox[st->b[swap_b ^ AES_SH_C]];
> +
> +    /* Otherwise, break cycles. */
> +

This is only needed it r == st, right?

> +    t = AES_sbox[st->b[swap_b ^ AES_SH_D]];
> +    r->b[swap_b ^ 0x1] = AES_sbox[st->b[swap_b ^ AES_SH_1]];
> +    r->b[swap_b ^ 0x5] = AES_sbox[st->b[swap_b ^ AES_SH_5]];
> +    r->b[swap_b ^ 0x9] = AES_sbox[st->b[swap_b ^ AES_SH_9]];
> +    r->b[swap_b ^ 0xd] = t;
> +
> +    t = AES_sbox[st->b[swap_b ^ AES_SH_A]];
> +    r->b[swap_b ^ 0x2] = AES_sbox[st->b[swap_b ^ AES_SH_2]];
> +    r->b[swap_b ^ 0xa] = t;
> +
> +    t = AES_sbox[st->b[swap_b ^ AES_SH_E]];
> +    r->b[swap_b ^ 0x6] = AES_sbox[st->b[swap_b ^ AES_SH_6]];
> +    r->b[swap_b ^ 0xe] = t;
> +
> +    t = AES_sbox[st->b[swap_b ^ AES_SH_7]];
> +    r->b[swap_b ^ 0x3] = AES_sbox[st->b[swap_b ^ AES_SH_3]];
> +    r->b[swap_b ^ 0xf] = AES_sbox[st->b[swap_b ^ AES_SH_F]];
> +    r->b[swap_b ^ 0xb] = AES_sbox[st->b[swap_b ^ AES_SH_B]];
> +    r->b[swap_b ^ 0x7] = t;
> +}
> +
> +void aesenc_SB_SR_gen(AESState *r, const AESState *st)
> +{
> +    aesenc_SB_SR_swap(r, st, false);
> +}
> +
> +void aesenc_SB_SR_genrev(AESState *r, const AESState *st)
> +{
> +    aesenc_SB_SR_swap(r, st, true);
> +}
> +
>  /**
>   * Expand the cipher key into the encryption key schedule.
>   */
> --
> 2.34.1
>


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 00/35] crypto: Provide aes-round.h and host accel
  2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (34 preceding siblings ...)
  2023-06-03  2:34 ` [PATCH 35/35] crypto: Unexport AES_*_rot, AES_TeN, AES_TdN Richard Henderson
@ 2023-06-03 13:23 ` Ard Biesheuvel
  2023-06-04 10:47   ` Ard Biesheuvel
  35 siblings, 1 reply; 48+ messages in thread
From: Ard Biesheuvel @ 2023-06-03 13:23 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

On Sat, 3 Jun 2023 at 04:34, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Inspired by Ard Biesheuvel's RFC patches for accelerating AES
> under emulation, provide a set of primitives that maps between
> the guest and host fragments.
>
> There is a small guest correctness test case.
>
> I think the end result is quite a bit cleaner, since the logic
> is now centralized, rather than spread across 4 different guests.
>
> Further work could clean up crypto/aes.c itself to use these
> instead of the tables directly.  I'm sure that's just an ultimate
> fallback when an appropriate system library is not available, and
> so not terribly important, but it could still significantly reduce
> the amount of code we carry.
>
> I would imagine structuring a polynomial multiplication header
> in a similar way.  There are 4 or 5 versions of those spread across
> the different guests.
>
> Anyway, please review.
>
>
> r~
>
>
> Richard Henderson (35):
>   tests/multiarch: Add test-aes
>   target/arm: Move aesmc and aesimc tables to crypto/aes.c
>   crypto/aes: Add constants for ShiftRows, InvShiftRows
>   crypto: Add aesenc_SB_SR
>   target/i386: Use aesenc_SB_SR
>   target/arm: Demultiplex AESE and AESMC
>   target/arm: Use aesenc_SB_SR
>   target/ppc: Use aesenc_SB_SR
>   target/riscv: Use aesenc_SB_SR
>   crypto: Add aesdec_ISB_ISR
>   target/i386: Use aesdec_ISB_ISR
>   target/arm: Use aesdec_ISB_ISR
>   target/ppc: Use aesdec_ISB_ISR
>   target/riscv: Use aesdec_ISB_ISR
>   crypto: Add aesenc_MC
>   target/arm: Use aesenc_MC
>   crypto: Add aesdec_IMC
>   target/i386: Use aesdec_IMC
>   target/arm: Use aesdec_IMC
>   target/riscv: Use aesdec_IMC
>   crypto: Add aesenc_SB_SR_MC_AK
>   target/i386: Use aesenc_SB_SR_MC_AK
>   target/ppc: Use aesenc_SB_SR_MC_AK
>   target/riscv: Use aesenc_SB_SR_MC_AK
>   crypto: Add aesdec_ISB_ISR_IMC_AK
>   target/i386: Use aesdec_ISB_ISR_IMC_AK
>   target/riscv: Use aesdec_ISB_ISR_IMC_AK
>   crypto: Add aesdec_ISB_ISR_AK_IMC
>   target/ppc: Use aesdec_ISB_ISR_AK_IMC
>   host/include/i386: Implement aes-round.h
>   host/include/aarch64: Implement aes-round.h
>   crypto: Remove AES_shifts, AES_ishifts
>   crypto: Implement aesdec_IMC with AES_imc_rot
>   crypto: Remove AES_imc
>   crypto: Unexport AES_*_rot, AES_TeN, AES_TdN
>

This is looking very good - it is clearly a much better abstraction
than what I proposed, and I'd expect the performance boost to be the
same.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 02/35] target/arm: Move aesmc and aesimc tables to crypto/aes.c
  2023-06-03 12:45   ` Ard Biesheuvel
@ 2023-06-03 15:21     ` Richard Henderson
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03 15:21 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: qemu-devel, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

On 6/3/23 05:45, Ard Biesheuvel wrote:
> On Sat, 3 Jun 2023 at 04:34, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> We do not currently have a table in crypto/ for
>> just MixColumns.  Move both tables for consistency.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   include/crypto/aes.h           |   6 ++
>>   crypto/aes.c                   | 142 ++++++++++++++++++++++++++++++++
>>   target/arm/tcg/crypto_helper.c | 143 ++-------------------------------
>>   3 files changed, 153 insertions(+), 138 deletions(-)
>>
>> diff --git a/include/crypto/aes.h b/include/crypto/aes.h
>> index 822d64588c..24b073d569 100644
>> --- a/include/crypto/aes.h
>> +++ b/include/crypto/aes.h
>> @@ -34,6 +34,12 @@ extern const uint8_t AES_isbox[256];
>>   extern const uint8_t AES_shifts[16];
>>   extern const uint8_t AES_ishifts[16];
>>
>> +/* AES MixColumns, for use with rot32. */
>> +extern const uint32_t AES_mc_rot[256];
>> +
>> +/* AES InvMixColumns, for use with rot32. */
>> +extern const uint32_t AES_imc_rot[256];
>> +
>>   /* AES InvMixColumns */
>>   /* AES_imc[x][0] = [x].[0e, 09, 0d, 0b]; */
>>   /* AES_imc[x][1] = [x].[0b, 0e, 09, 0d]; */
>> diff --git a/crypto/aes.c b/crypto/aes.c
>> index af72ff7779..72c95c38fb 100644
>> --- a/crypto/aes.c
>> +++ b/crypto/aes.c
>> @@ -116,6 +116,148 @@ const uint8_t AES_ishifts[16] = {
>>       0, 13, 10, 7, 4, 1, 14, 11, 8, 5, 2, 15, 12, 9, 6, 3
>>   };
>>
>> +/*
>> + * MixColumns lookup table, for use with rot32.
>> + * From Arm ARM pseudocode.
> 
> I remember writing the code to generate these tables, and my copy of
> the ARM ARM doesn't appear to have them, so this comment seems
> inaccurate to me.

Quite right.  I remember having copied *some* table from the ARM, but it wasn't this one. 
I went back to A.a to double-check that it simply wasn't removed from a recent edition.


r~


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/35] crypto: Add aesenc_SB_SR
  2023-06-03 13:15   ` Ard Biesheuvel
@ 2023-06-03 15:24     ` Richard Henderson
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03 15:24 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: qemu-devel, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

On 6/3/23 06:15, Ard Biesheuvel wrote:
>> diff --git a/crypto/aes.c b/crypto/aes.c
>> index 1309a13e91..708838315a 100644
>> --- a/crypto/aes.c
>> +++ b/crypto/aes.c
>> @@ -29,6 +29,7 @@
>>    */
>>   #include "qemu/osdep.h"
>>   #include "crypto/aes.h"
>> +#include "crypto/aes-round.h"
>>
>>   typedef uint32_t u32;
>>   typedef uint8_t u8;
>> @@ -1251,6 +1252,52 @@ static const u32 rcon[] = {
>>           0x1B000000, 0x36000000, /* for 128-bit blocks, Rijndael never uses more than 10 rcon values */
>>   };
>>
>> +/* Perform SubBytes + ShiftRows. */
>> +static inline void
>> +aesenc_SB_SR_swap(AESState *r, const AESState *st, bool swap)
>> +{
>> +    const int swap_b = swap ? 15 : 0;
>> +    uint8_t t;
>> +
>> +    /* These four indexes are not swizzled. */
>> +    r->b[swap_b ^ 0x0] = AES_sbox[st->b[swap_b ^ AES_SH_0]];
>> +    r->b[swap_b ^ 0x4] = AES_sbox[st->b[swap_b ^ AES_SH_4]];
>> +    r->b[swap_b ^ 0x8] = AES_sbox[st->b[swap_b ^ AES_SH_8]];
>> +    r->b[swap_b ^ 0xc] = AES_sbox[st->b[swap_b ^ AES_SH_C]];
>> +
>> +    /* Otherwise, break cycles. */
>> +
> 
> This is only needed it r == st, right?

Yes.  This is, perhaps, where using symbolic AES_SH_X while assuming knowledge of the 
value does not aid understanding.


r~

> 
>> +    t = AES_sbox[st->b[swap_b ^ AES_SH_D]];
>> +    r->b[swap_b ^ 0x1] = AES_sbox[st->b[swap_b ^ AES_SH_1]];
>> +    r->b[swap_b ^ 0x5] = AES_sbox[st->b[swap_b ^ AES_SH_5]];
>> +    r->b[swap_b ^ 0x9] = AES_sbox[st->b[swap_b ^ AES_SH_9]];
>> +    r->b[swap_b ^ 0xd] = t;
>> +
>> +    t = AES_sbox[st->b[swap_b ^ AES_SH_A]];
>> +    r->b[swap_b ^ 0x2] = AES_sbox[st->b[swap_b ^ AES_SH_2]];
>> +    r->b[swap_b ^ 0xa] = t;
>> +
>> +    t = AES_sbox[st->b[swap_b ^ AES_SH_E]];
>> +    r->b[swap_b ^ 0x6] = AES_sbox[st->b[swap_b ^ AES_SH_6]];
>> +    r->b[swap_b ^ 0xe] = t;
>> +
>> +    t = AES_sbox[st->b[swap_b ^ AES_SH_7]];
>> +    r->b[swap_b ^ 0x3] = AES_sbox[st->b[swap_b ^ AES_SH_3]];
>> +    r->b[swap_b ^ 0xf] = AES_sbox[st->b[swap_b ^ AES_SH_F]];
>> +    r->b[swap_b ^ 0xb] = AES_sbox[st->b[swap_b ^ AES_SH_B]];
>> +    r->b[swap_b ^ 0x7] = t;
>> +}
>> +
>> +void aesenc_SB_SR_gen(AESState *r, const AESState *st)
>> +{
>> +    aesenc_SB_SR_swap(r, st, false);
>> +}
>> +
>> +void aesenc_SB_SR_genrev(AESState *r, const AESState *st)
>> +{
>> +    aesenc_SB_SR_swap(r, st, true);
>> +}
>> +
>>   /**
>>    * Expand the cipher key into the encryption key schedule.
>>    */
>> --
>> 2.34.1
>>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 31/35] host/include/aarch64: Implement aes-round.h
  2023-06-03 12:50   ` Ard Biesheuvel
@ 2023-06-03 16:01     ` Richard Henderson
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2023-06-03 16:01 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: qemu-devel, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

On 6/3/23 05:50, Ard Biesheuvel wrote:
> On Sat, 3 Jun 2023 at 04:34, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> Detect AES in cpuinfo; implement the accel hooks.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   host/include/aarch64/host/aes-round.h | 204 ++++++++++++++++++++++++++
>>   host/include/aarch64/host/cpuinfo.h   |   1 +
>>   util/cpuinfo-aarch64.c                |   2 +
>>   3 files changed, 207 insertions(+)
>>   create mode 100644 host/include/aarch64/host/aes-round.h
>>
>> diff --git a/host/include/aarch64/host/aes-round.h b/host/include/aarch64/host/aes-round.h
>> new file mode 100644
>> index 0000000000..27ca823db6
>> --- /dev/null
>> +++ b/host/include/aarch64/host/aes-round.h
>> @@ -0,0 +1,204 @@
>> +/*
>> + * AArch64 specific aes acceleration.
>> + * SPDX-License-Identifier: GPL-2.0-or-later
>> + */
>> +
>> +#ifndef HOST_AES_ROUND_H
>> +#define HOST_AES_ROUND_H
>> +
>> +#include "host/cpuinfo.h"
>> +#include <arm_neon.h>
>> +
>> +#ifdef __ARM_FEATURE_AES
>> +# define HAVE_AES_ACCEL  true
>> +# define ATTR_AES_ACCEL
>> +#else
>> +# define HAVE_AES_ACCEL  likely(cpuinfo & CPUINFO_AES)
>> +# define ATTR_AES_ACCEL  __attribute__((target("+crypto")))
>> +#endif
>> +
>> +static inline uint8x16_t aes_accel_bswap(uint8x16_t x)
>> +{
>> +    /* No arm_neon.h primitive, and the compilers don't share builtins. */
> 
> vqtbl1q_u8() perhaps?

Ah, yes, thanks.


>> +static inline uint8x16_t aes_accel_aesmc(uint8x16_t d)
>> +{
>> +    asm(".arch_extension aes\n\t"
>> +        "aesmc %0.16b, %1.16b" : "=w"(d) : "w"(d));
> 
> 
> Most ARM cores fuse aese/aesmc into a single uop (with the associated
> performance boost) if the pattern is
> 
> aese x, y
> aesmc x,x
> 
> aesd x, y
> aesimc x,x
> 
> So it might make sense to use +w here at least, and use only a single
> register (which the compiler will likely do in any case, but still)
> 
> I would assume that the compiler cannot issue these separately based
> on the sequences below, but if it might, it may be worth it to emit
> the aese/aesmc together in a single asm() block

There could be shuffling.  It's low probability, but possible.

I really should move the builtin test to meson, as clang-16 fixes the builtin visibility 
issue.  I can see that gcc knows fusion of these pairs; I assume clang does as well, but I 
don't know the code base well enough to check.

I suppose it's going to be years until clang-16 can be assumed, as Debian bookworm is to 
be released this month with clang-14.  So it's probably worth spending a few more minutes 
on this now.


r~


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 00/35] crypto: Provide aes-round.h and host accel
  2023-06-03 13:23 ` [PATCH 00/35] crypto: Provide aes-round.h and host accel Ard Biesheuvel
@ 2023-06-04 10:47   ` Ard Biesheuvel
  0 siblings, 0 replies; 48+ messages in thread
From: Ard Biesheuvel @ 2023-06-04 10:47 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

On Sat, 3 Jun 2023 at 15:23, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Sat, 3 Jun 2023 at 04:34, Richard Henderson
> <richard.henderson@linaro.org> wrote:
> >
> > Inspired by Ard Biesheuvel's RFC patches for accelerating AES
> > under emulation, provide a set of primitives that maps between
> > the guest and host fragments.
> >
> > There is a small guest correctness test case.
> >
> > I think the end result is quite a bit cleaner, since the logic
> > is now centralized, rather than spread across 4 different guests.
> >
> > Further work could clean up crypto/aes.c itself to use these
> > instead of the tables directly.  I'm sure that's just an ultimate
> > fallback when an appropriate system library is not available, and
> > so not terribly important, but it could still significantly reduce
> > the amount of code we carry.
> >
> > I would imagine structuring a polynomial multiplication header
> > in a similar way.  There are 4 or 5 versions of those spread across
> > the different guests.
> >
> > Anyway, please review.
> >
> >
> > r~
> >
> >
> > Richard Henderson (35):
> >   tests/multiarch: Add test-aes
> >   target/arm: Move aesmc and aesimc tables to crypto/aes.c
> >   crypto/aes: Add constants for ShiftRows, InvShiftRows
> >   crypto: Add aesenc_SB_SR
> >   target/i386: Use aesenc_SB_SR
> >   target/arm: Demultiplex AESE and AESMC
> >   target/arm: Use aesenc_SB_SR
> >   target/ppc: Use aesenc_SB_SR
> >   target/riscv: Use aesenc_SB_SR
> >   crypto: Add aesdec_ISB_ISR
> >   target/i386: Use aesdec_ISB_ISR
> >   target/arm: Use aesdec_ISB_ISR
> >   target/ppc: Use aesdec_ISB_ISR
> >   target/riscv: Use aesdec_ISB_ISR
> >   crypto: Add aesenc_MC
> >   target/arm: Use aesenc_MC
> >   crypto: Add aesdec_IMC
> >   target/i386: Use aesdec_IMC
> >   target/arm: Use aesdec_IMC
> >   target/riscv: Use aesdec_IMC
> >   crypto: Add aesenc_SB_SR_MC_AK
> >   target/i386: Use aesenc_SB_SR_MC_AK
> >   target/ppc: Use aesenc_SB_SR_MC_AK
> >   target/riscv: Use aesenc_SB_SR_MC_AK
> >   crypto: Add aesdec_ISB_ISR_IMC_AK
> >   target/i386: Use aesdec_ISB_ISR_IMC_AK
> >   target/riscv: Use aesdec_ISB_ISR_IMC_AK
> >   crypto: Add aesdec_ISB_ISR_AK_IMC
> >   target/ppc: Use aesdec_ISB_ISR_AK_IMC
> >   host/include/i386: Implement aes-round.h
> >   host/include/aarch64: Implement aes-round.h
> >   crypto: Remove AES_shifts, AES_ishifts
> >   crypto: Implement aesdec_IMC with AES_imc_rot
> >   crypto: Remove AES_imc
> >   crypto: Unexport AES_*_rot, AES_TeN, AES_TdN
> >
>
> This is looking very good - it is clearly a much better abstraction
> than what I proposed, and I'd expect the performance boost to be the
> same.

Benchmark results for OpenSSL running in emulation on TX2:

Without acceleration:

$ ../qemu/build/qemu-x86_64 apps/openssl speed -evp aes-128-ctr
version: 3.2.0-dev
built on: Thu Jun  1 17:06:09 2023 UTC
options: bn(64,64)
compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall
-O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL
-DNDEBUG
CPUINFO: OPENSSL_ia32cap=0xfed8320b0fcbfffd:0x8001020c01d843a9
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes
8192 bytes  16384 bytes
AES-128-CTR      25146.07k    50482.19k    69373.44k    76236.80k
78391.98k    78381.06k


With acceleration:

$ ../qemu/build/qemu-x86_64 apps/openssl speed -evp aes-128-ctr
version: 3.2.0-dev
built on: Thu Jun  1 17:06:09 2023 UTC
options: bn(64,64)
compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall
-O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL
-DNDEBUG
CPUINFO: OPENSSL_ia32cap=0xfed8320b0fcbfffd:0x8001020c01d843a9
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes
8192 bytes  16384 bytes
AES-128-CTR      28774.46k    81173.59k   162346.24k   206301.53k
224214.22k   225600.56k


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 02/35] target/arm: Move aesmc and aesimc tables to crypto/aes.c
  2023-06-03  2:33 ` [PATCH 02/35] target/arm: Move aesmc and aesimc tables to crypto/aes.c Richard Henderson
  2023-06-03 12:45   ` Ard Biesheuvel
@ 2023-06-05 10:45   ` Philippe Mathieu-Daudé
  2023-06-05 11:01     ` Philippe Mathieu-Daudé
  1 sibling, 1 reply; 48+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-05 10:45 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

On 3/6/23 04:33, Richard Henderson wrote:
> We do not currently have a table in crypto/ for
> just MixColumns.  Move both tables for consistency.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   include/crypto/aes.h           |   6 ++
>   crypto/aes.c                   | 142 ++++++++++++++++++++++++++++++++
>   target/arm/tcg/crypto_helper.c | 143 ++-------------------------------
>   3 files changed, 153 insertions(+), 138 deletions(-)


>       union CRYPTO_STATE st = { .l = { rm[0], rm[1] } };
> +    const uint32_t *mc = decrypt ? AES_imc_rot : AES_mc_rot;
>       int i;
>   
>       for (i = 0; i < 16; i += 4) {
>           CR_ST_WORD(st, i >> 2) =
> -            mc[decrypt][CR_ST_BYTE(st, i)] ^
> -            rol32(mc[decrypt][CR_ST_BYTE(st, i + 1)], 8) ^
> -            rol32(mc[decrypt][CR_ST_BYTE(st, i + 2)], 16) ^
> -            rol32(mc[decrypt][CR_ST_BYTE(st, i + 3)], 24);
> +            mc[CR_ST_BYTE(st, i)] ^
> +            rol32(mc[CR_ST_BYTE(st, i + 1)], 8) ^
> +            rol32(mc[CR_ST_BYTE(st, i + 2)], 16) ^
> +            rol32(mc[CR_ST_BYTE(st, i + 3)], 24);

Matter of style, (since you are changing these lines), I find starting
the lines with the ^ operator clearer to review:

             mc[CR_ST_BYTE(st, i)]
             ^ rol32(mc[CR_ST_BYTE(st, i + 1)], 8)
             ^ rol32(mc[CR_ST_BYTE(st, i + 2)], 16)
             ^ rol32(mc[CR_ST_BYTE(st, i + 3)], 24);

Anyhow,
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

>       }
>   
>       rd[0] = st.l[0];



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 03/35] crypto/aes: Add constants for ShiftRows, InvShiftRows
  2023-06-03  2:33 ` [PATCH 03/35] crypto/aes: Add constants for ShiftRows, InvShiftRows Richard Henderson
@ 2023-06-05 10:46   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 48+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-05 10:46 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

On 3/6/23 04:33, Richard Henderson wrote:
> These symbols will avoid the indirection through memory
> when fully unrolling some new primitives.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   crypto/aes.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 48 insertions(+), 2 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/35] target/arm: Demultiplex AESE and AESMC
  2023-06-03  2:33 ` [PATCH 06/35] target/arm: Demultiplex AESE and AESMC Richard Henderson
@ 2023-06-05 10:56   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 48+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-05 10:56 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

On 3/6/23 04:33, Richard Henderson wrote:
> Split these helpers so that we are not passing 'decrypt'
> within the simd descriptor.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/arm/helper.h             |  2 ++
>   target/arm/tcg/sve.decode       |  4 ++--
>   target/arm/tcg/crypto_helper.c  | 37 +++++++++++++++++++++++----------
>   target/arm/tcg/translate-a64.c  | 13 ++++--------
>   target/arm/tcg/translate-neon.c |  4 ++--
>   target/arm/tcg/translate-sve.c  |  8 ++++---
>   6 files changed, 41 insertions(+), 27 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 02/35] target/arm: Move aesmc and aesimc tables to crypto/aes.c
  2023-06-05 10:45   ` Philippe Mathieu-Daudé
@ 2023-06-05 11:01     ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 48+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-05 11:01 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-arm, qemu-ppc, qemu-riscv, pbonzini

On 5/6/23 12:45, Philippe Mathieu-Daudé wrote:
> On 3/6/23 04:33, Richard Henderson wrote:
>> We do not currently have a table in crypto/ for
>> just MixColumns.  Move both tables for consistency.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   include/crypto/aes.h           |   6 ++
>>   crypto/aes.c                   | 142 ++++++++++++++++++++++++++++++++
>>   target/arm/tcg/crypto_helper.c | 143 ++-------------------------------
>>   3 files changed, 153 insertions(+), 138 deletions(-)
> 
> 
>>       union CRYPTO_STATE st = { .l = { rm[0], rm[1] } };
>> +    const uint32_t *mc = decrypt ? AES_imc_rot : AES_mc_rot;
>>       int i;
>>       for (i = 0; i < 16; i += 4) {
>>           CR_ST_WORD(st, i >> 2) =
>> -            mc[decrypt][CR_ST_BYTE(st, i)] ^
>> -            rol32(mc[decrypt][CR_ST_BYTE(st, i + 1)], 8) ^
>> -            rol32(mc[decrypt][CR_ST_BYTE(st, i + 2)], 16) ^
>> -            rol32(mc[decrypt][CR_ST_BYTE(st, i + 3)], 24);
>> +            mc[CR_ST_BYTE(st, i)] ^
>> +            rol32(mc[CR_ST_BYTE(st, i + 1)], 8) ^
>> +            rol32(mc[CR_ST_BYTE(st, i + 2)], 16) ^
>> +            rol32(mc[CR_ST_BYTE(st, i + 3)], 24);
> 
> Matter of style, (since you are changing these lines), I find starting
> the lines with the ^ operator clearer to review:
> 
>              mc[CR_ST_BYTE(st, i)]
>              ^ rol32(mc[CR_ST_BYTE(st, i + 1)], 8)
>              ^ rol32(mc[CR_ST_BYTE(st, i + 2)], 16)
>              ^ rol32(mc[CR_ST_BYTE(st, i + 3)], 24);

Aesthetically nicer:

                        mc[CR_ST_BYTE(st, i)]
                ^ rol32(mc[CR_ST_BYTE(st, i + 1)], 8)
                ^ rol32(mc[CR_ST_BYTE(st, i + 2)], 16)
                ^ rol32(mc[CR_ST_BYTE(st, i + 3)], 24);

:)

> Anyhow,
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> 
>>       }
>>       rd[0] = st.l[0];
> 



^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2023-06-05 11:01 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-03  2:33 [PATCH 00/35] crypto: Provide aes-round.h and host accel Richard Henderson
2023-06-03  2:33 ` [PATCH 01/35] tests/multiarch: Add test-aes Richard Henderson
2023-06-03  2:33 ` [PATCH 02/35] target/arm: Move aesmc and aesimc tables to crypto/aes.c Richard Henderson
2023-06-03 12:45   ` Ard Biesheuvel
2023-06-03 15:21     ` Richard Henderson
2023-06-05 10:45   ` Philippe Mathieu-Daudé
2023-06-05 11:01     ` Philippe Mathieu-Daudé
2023-06-03  2:33 ` [PATCH 03/35] crypto/aes: Add constants for ShiftRows, InvShiftRows Richard Henderson
2023-06-05 10:46   ` Philippe Mathieu-Daudé
2023-06-03  2:33 ` [PATCH 04/35] crypto: Add aesenc_SB_SR Richard Henderson
2023-06-03 13:15   ` Ard Biesheuvel
2023-06-03 15:24     ` Richard Henderson
2023-06-03  2:33 ` [PATCH 05/35] target/i386: Use aesenc_SB_SR Richard Henderson
2023-06-03  2:33 ` [PATCH 06/35] target/arm: Demultiplex AESE and AESMC Richard Henderson
2023-06-05 10:56   ` Philippe Mathieu-Daudé
2023-06-03  2:33 ` [PATCH 07/35] target/arm: Use aesenc_SB_SR Richard Henderson
2023-06-03  2:33 ` [PATCH 08/35] target/ppc: " Richard Henderson
2023-06-03  2:34 ` [PATCH 09/35] target/riscv: " Richard Henderson
2023-06-03  2:34 ` [PATCH 10/35] crypto: Add aesdec_ISB_ISR Richard Henderson
2023-06-03  2:34 ` [PATCH 11/35] target/i386: Use aesdec_ISB_ISR Richard Henderson
2023-06-03  2:34 ` [PATCH 12/35] target/arm: " Richard Henderson
2023-06-03  2:34 ` [PATCH 13/35] target/ppc: " Richard Henderson
2023-06-03  2:34 ` [PATCH 14/35] target/riscv: " Richard Henderson
2023-06-03  2:34 ` [PATCH 15/35] crypto: Add aesenc_MC Richard Henderson
2023-06-03  2:34 ` [PATCH 16/35] target/arm: Use aesenc_MC Richard Henderson
2023-06-03  2:34 ` [PATCH 17/35] crypto: Add aesdec_IMC Richard Henderson
2023-06-03  2:34 ` [PATCH 18/35] target/i386: Use aesdec_IMC Richard Henderson
2023-06-03  2:34 ` [PATCH 19/35] target/arm: " Richard Henderson
2023-06-03  2:34 ` [PATCH 20/35] target/riscv: " Richard Henderson
2023-06-03  2:34 ` [PATCH 21/35] crypto: Add aesenc_SB_SR_MC_AK Richard Henderson
2023-06-03  2:34 ` [PATCH 22/35] target/i386: Use aesenc_SB_SR_MC_AK Richard Henderson
2023-06-03  2:34 ` [PATCH 23/35] target/ppc: " Richard Henderson
2023-06-03  2:34 ` [PATCH 24/35] target/riscv: " Richard Henderson
2023-06-03  2:34 ` [PATCH 25/35] crypto: Add aesdec_ISB_ISR_IMC_AK Richard Henderson
2023-06-03  2:34 ` [PATCH 26/35] target/i386: Use aesdec_ISB_ISR_IMC_AK Richard Henderson
2023-06-03  2:34 ` [PATCH 27/35] target/riscv: " Richard Henderson
2023-06-03  2:34 ` [PATCH 28/35] crypto: Add aesdec_ISB_ISR_AK_IMC Richard Henderson
2023-06-03  2:34 ` [PATCH 29/35] target/ppc: Use aesdec_ISB_ISR_AK_IMC Richard Henderson
2023-06-03  2:34 ` [PATCH 30/35] host/include/i386: Implement aes-round.h Richard Henderson
2023-06-03  2:34 ` [PATCH 31/35] host/include/aarch64: " Richard Henderson
2023-06-03 12:50   ` Ard Biesheuvel
2023-06-03 16:01     ` Richard Henderson
2023-06-03  2:34 ` [PATCH 32/35] crypto: Remove AES_shifts, AES_ishifts Richard Henderson
2023-06-03  2:34 ` [PATCH 33/35] crypto: Implement aesdec_IMC with AES_imc_rot Richard Henderson
2023-06-03  2:34 ` [PATCH 34/35] crypto: Remove AES_imc Richard Henderson
2023-06-03  2:34 ` [PATCH 35/35] crypto: Unexport AES_*_rot, AES_TeN, AES_TdN Richard Henderson
2023-06-03 13:23 ` [PATCH 00/35] crypto: Provide aes-round.h and host accel Ard Biesheuvel
2023-06-04 10:47   ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).