* [PATCH 0/3] Add Zhaoxin hardware engine driver support for SHA
@ 2024-01-16 6:35 Tony W Wang-oc
2024-01-16 6:35 ` [PATCH 1/3] crypto: padlock-sha: Matches CPU with Family with 6 explicitly Tony W Wang-oc
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Tony W Wang-oc @ 2024-01-16 6:35 UTC (permalink / raw)
To: 675146817, story_19872006, herbert, davem, linux-crypto,
linux-kernel, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
kim.phillips, kirill.shutemov, jmattson, babu.moger, kai.huang,
TonyWWang-oc, acme, aik, namhyung
Cc: CobeChen, TimGuo, LeoLiu-oc, GeorgeXue
Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its CPU
instructions, including SHA1, SHA256, SHA384 and SHA512, which conform
to the Secure Hash Algorithms specified by FIPS 180-3.
With the help of implementation of SHA in hardware instead of software,
can develop applications with higher performance, more security and more
flexibility.
Below table gives a summary of test using the driver tcrypt with different
crypt algorithm drivers on Zhaoxin KH-40000 platform:
---------------------------------------------------------------------------
tcrypt driver 16* 64 256 1024 2048 4096 8192
---------------------------------------------------------------------------
zhaoxin** 442.80 1309.21 3257.53 5221.56 5813.45 6136.39 6264.50***
403:SHA1 generic** 341.44 813.27 1458.98 1818.03 1896.60 1940.71 1939.06
ratio 1.30 1.61 2.23 2.87 3.07 3.16 3.23
---------------------------------------------------------------------------
zhaoxin 451.70 1313.65 2958.71 4658.55 5109.16 5359.08 5459.13
404:SHA256 generic 202.62 463.55 845.01 1070.50 1117.51 1144.79 1155.68
ratio 2.23 2.83 3.50 4.35 4.57 4.68 4.72
---------------------------------------------------------------------------
zhaoxin 350.90 1406.42 3166.16 5736.39 6627.77 7182.01 7429.18
405:SHA384 generic 161.76 654.88 979.06 1350.56 1423.08 1496.57 1513.12
ratio 2.17 2.15 3.23 4.25 4.66 4.80 4.91
---------------------------------------------------------------------------
zhaoxin 334.49 1394.71 3159.93 5728.86 6625.33 7169.23 7407.80
406:SHA512 generic 161.80 653.84 979.42 1351.41 1444.14 1495.35 1518.43
ratio 2.07 2.13 3.23 4.24 4.59 4.79 4.88
---------------------------------------------------------------------------
*: The length of each data block to be processed by one complete SHA
sequence, namely one INIT, multi UPDATEs and one FINAL.
**: Crypt algorithm driver used by tcrypt, "zhaoxin" represents zhaoxin-sha
while "generic" represents the generic software SHA driver.
***: The speed of each crypt algorithm driver processing different length
of data blocks, unit is Mb/s.
The ratio in the table implies the performance of SHA implemented by
zhaoxin-sha driver is much higher than the ones implemented by the generic
software driver of sha1/sha256/sha384/sha512.
In order to support Zhaoxin-sha driver, make padlock-sha driver matches
the CENTAUR CPUs with Family == 6 and add two Zhaoxin Hash Engine
cpufeatures.
Tony W Wang-oc (3):
crypto: padlock-sha: Matches CPU with Family with 6 explicitly
x86/cpufeatures: Add CPU feature flags for Zhaoxin Hash Engine
crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512
arch/x86/include/asm/cpufeatures.h | 4 +-
drivers/crypto/Kconfig | 15 +
drivers/crypto/Makefile | 1 +
drivers/crypto/padlock-sha.c | 2 +-
drivers/crypto/zhaoxin-sha.c | 500 +++++++++++++++++++++++
drivers/crypto/zhaoxin-sha.h | 16 +
tools/arch/x86/include/asm/cpufeatures.h | 4 +-
7 files changed, 539 insertions(+), 3 deletions(-)
create mode 100644 drivers/crypto/zhaoxin-sha.c
create mode 100644 drivers/crypto/zhaoxin-sha.h
--
2.25.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/3] crypto: padlock-sha: Matches CPU with Family with 6 explicitly
2024-01-16 6:35 [PATCH 0/3] Add Zhaoxin hardware engine driver support for SHA Tony W Wang-oc
@ 2024-01-16 6:35 ` Tony W Wang-oc
2024-01-16 6:35 ` [PATCH 2/3] x86/cpufeatures: Add CPU feature flags for Zhaoxin Hash Engine Tony W Wang-oc
2024-01-16 6:35 ` [PATCH 3/3] crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512 Tony W Wang-oc
2 siblings, 0 replies; 6+ messages in thread
From: Tony W Wang-oc @ 2024-01-16 6:35 UTC (permalink / raw)
To: 675146817, story_19872006, herbert, davem, linux-crypto,
linux-kernel, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
kim.phillips, kirill.shutemov, jmattson, babu.moger, kai.huang,
TonyWWang-oc, acme, aik, namhyung
Cc: CobeChen, TimGuo, LeoLiu-oc, GeorgeXue
Updates the supporting qualification for packlock-sha driver, making
it support CPUs whose vendor ID is Centaur and Famliy is 6.
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
drivers/crypto/padlock-sha.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/crypto/padlock-sha.c b/drivers/crypto/padlock-sha.c
index 6865c7f1fc1a..2e82c5e77f7a 100644
--- a/drivers/crypto/padlock-sha.c
+++ b/drivers/crypto/padlock-sha.c
@@ -491,7 +491,7 @@ static struct shash_alg sha256_alg_nano = {
};
static const struct x86_cpu_id padlock_sha_ids[] = {
- X86_MATCH_FEATURE(X86_FEATURE_PHE, NULL),
+ X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 6, X86_FEATURE_PHE, NULL),
{}
};
MODULE_DEVICE_TABLE(x86cpu, padlock_sha_ids);
--
2.25.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/3] x86/cpufeatures: Add CPU feature flags for Zhaoxin Hash Engine
2024-01-16 6:35 [PATCH 0/3] Add Zhaoxin hardware engine driver support for SHA Tony W Wang-oc
2024-01-16 6:35 ` [PATCH 1/3] crypto: padlock-sha: Matches CPU with Family with 6 explicitly Tony W Wang-oc
@ 2024-01-16 6:35 ` Tony W Wang-oc
2024-01-16 6:35 ` [PATCH 3/3] crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512 Tony W Wang-oc
2 siblings, 0 replies; 6+ messages in thread
From: Tony W Wang-oc @ 2024-01-16 6:35 UTC (permalink / raw)
To: 675146817, story_19872006, herbert, davem, linux-crypto,
linux-kernel, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
kim.phillips, kirill.shutemov, jmattson, babu.moger, kai.huang,
TonyWWang-oc, acme, aik, namhyung
Cc: CobeChen, TimGuo, LeoLiu-oc, GeorgeXue
Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its
instrucions.
Add two CPU feature flags indicated by CPUID.(EAX=C0000001,ECX=0):EDX
bit 25/26 which will be used by Zhaoxin SHA driver.
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
arch/x86/include/asm/cpufeatures.h | 4 +++-
tools/arch/x86/include/asm/cpufeatures.h | 4 +++-
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 29cb275a219d..28b0e62dbdf5 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -145,7 +145,7 @@
#define X86_FEATURE_RDRAND ( 4*32+30) /* RDRAND instruction */
#define X86_FEATURE_HYPERVISOR ( 4*32+31) /* Running on a hypervisor */
-/* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */
+/* VIA/Cyrix/Centaur/Zhaoxin-defined CPU features, CPUID level 0xC0000001, word 5 */
#define X86_FEATURE_XSTORE ( 5*32+ 2) /* "rng" RNG present (xstore) */
#define X86_FEATURE_XSTORE_EN ( 5*32+ 3) /* "rng_en" RNG enabled */
#define X86_FEATURE_XCRYPT ( 5*32+ 6) /* "ace" on-CPU crypto (xcrypt) */
@@ -156,6 +156,8 @@
#define X86_FEATURE_PHE_EN ( 5*32+11) /* PHE enabled */
#define X86_FEATURE_PMM ( 5*32+12) /* PadLock Montgomery Multiplier */
#define X86_FEATURE_PMM_EN ( 5*32+13) /* PMM enabled */
+#define X86_FEATURE_PHE2 ( 5*32+25) /* "phe2" Zhaoxin Hash Engine */
+#define X86_FEATURE_PHE2_EN ( 5*32+26) /* "phe2_en" PHE2 enabled */
/* More extended AMD flags: CPUID level 0x80000001, ECX, word 6 */
#define X86_FEATURE_LAHF_LM ( 6*32+ 0) /* LAHF/SAHF in long mode */
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index f4542d2718f4..21caba9d070b 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -145,7 +145,7 @@
#define X86_FEATURE_RDRAND ( 4*32+30) /* RDRAND instruction */
#define X86_FEATURE_HYPERVISOR ( 4*32+31) /* Running on a hypervisor */
-/* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */
+/* VIA/Cyrix/Centaur/Zhaoxin-defined CPU features, CPUID level 0xC0000001, word 5 */
#define X86_FEATURE_XSTORE ( 5*32+ 2) /* "rng" RNG present (xstore) */
#define X86_FEATURE_XSTORE_EN ( 5*32+ 3) /* "rng_en" RNG enabled */
#define X86_FEATURE_XCRYPT ( 5*32+ 6) /* "ace" on-CPU crypto (xcrypt) */
@@ -156,6 +156,8 @@
#define X86_FEATURE_PHE_EN ( 5*32+11) /* PHE enabled */
#define X86_FEATURE_PMM ( 5*32+12) /* PadLock Montgomery Multiplier */
#define X86_FEATURE_PMM_EN ( 5*32+13) /* PMM enabled */
+#define X86_FEATURE_PHE2 ( 5*32+25) /* "phe2" Zhaoxin Hash Engine */
+#define X86_FEATURE_PHE2_EN ( 5*32+26) /* "phe2_en" PHE2 enabled */
/* More extended AMD flags: CPUID level 0x80000001, ECX, word 6 */
#define X86_FEATURE_LAHF_LM ( 6*32+ 0) /* LAHF/SAHF in long mode */
--
2.25.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/3] crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512
2024-01-16 6:35 [PATCH 0/3] Add Zhaoxin hardware engine driver support for SHA Tony W Wang-oc
2024-01-16 6:35 ` [PATCH 1/3] crypto: padlock-sha: Matches CPU with Family with 6 explicitly Tony W Wang-oc
2024-01-16 6:35 ` [PATCH 2/3] x86/cpufeatures: Add CPU feature flags for Zhaoxin Hash Engine Tony W Wang-oc
@ 2024-01-16 6:35 ` Tony W Wang-oc
2024-01-17 0:45 ` kernel test robot
2024-01-19 17:04 ` kernel test robot
2 siblings, 2 replies; 6+ messages in thread
From: Tony W Wang-oc @ 2024-01-16 6:35 UTC (permalink / raw)
To: 675146817, story_19872006, herbert, davem, linux-crypto,
linux-kernel, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
kim.phillips, kirill.shutemov, jmattson, babu.moger, kai.huang,
TonyWWang-oc, acme, aik, namhyung
Cc: CobeChen, TimGuo, LeoLiu-oc, GeorgeXue
Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its CPU
instructions, including SHA1, SHA256, SHA384 and SHA512, which conform
to the Secure Hash Algorithms specified by FIPS 180-3.
With the help of implementation of SHA in hardware instead of software,
can develop applications with higher performance, more security and more
flexibility.
Below table gives a summary of test using the driver tcrypt with different
crypt algorithm drivers on Zhaoxin KH-40000 platform:
---------------------------------------------------------------------------
tcrypt driver 16* 64 256 1024 2048 4096 8192
---------------------------------------------------------------------------
zhaoxin** 442.80 1309.21 3257.53 5221.56 5813.45 6136.39 6264.50***
403:SHA1 generic** 341.44 813.27 1458.98 1818.03 1896.60 1940.71 1939.06
ratio 1.30 1.61 2.23 2.87 3.07 3.16 3.23
---------------------------------------------------------------------------
zhaoxin 451.70 1313.65 2958.71 4658.55 5109.16 5359.08 5459.13
404:SHA256 generic 202.62 463.55 845.01 1070.50 1117.51 1144.79 1155.68
ratio 2.23 2.83 3.50 4.35 4.57 4.68 4.72
---------------------------------------------------------------------------
zhaoxin 350.90 1406.42 3166.16 5736.39 6627.77 7182.01 7429.18
405:SHA384 generic 161.76 654.88 979.06 1350.56 1423.08 1496.57 1513.12
ratio 2.17 2.15 3.23 4.25 4.66 4.80 4.91
---------------------------------------------------------------------------
zhaoxin 334.49 1394.71 3159.93 5728.86 6625.33 7169.23 7407.80
406:SHA512 generic 161.80 653.84 979.42 1351.41 1444.14 1495.35 1518.43
ratio 2.07 2.13 3.23 4.24 4.59 4.79 4.88
---------------------------------------------------------------------------
*: The length of each data block to be processed by one complete SHA
sequence, namely one INIT, multi UPDATEs and one FINAL.
**: Crypt algorithm driver used by tcrypt, "zhaoxin" represents zhaoxin-sha
while "generic" represents the generic software SHA driver.
***: The speed of each crypt algorithm driver processing different length
of data blocks, unit is Mb/s.
The ratio in the table implies the performance of SHA implemented by
zhaoxin-sha driver is much higher than the ones implemented by the generic
software driver of sha1/sha256/sha384/sha512.
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
drivers/crypto/Kconfig | 15 ++
drivers/crypto/Makefile | 1 +
drivers/crypto/zhaoxin-sha.c | 500 +++++++++++++++++++++++++++++++++++
drivers/crypto/zhaoxin-sha.h | 16 ++
4 files changed, 532 insertions(+)
create mode 100644 drivers/crypto/zhaoxin-sha.c
create mode 100644 drivers/crypto/zhaoxin-sha.h
diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 79c3bb9c99c3..2698e8fcf06d 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -798,4 +798,19 @@ config CRYPTO_DEV_SA2UL
source "drivers/crypto/aspeed/Kconfig"
source "drivers/crypto/starfive/Kconfig"
+config CRYPTO_DEV_ZHAOXIN_SHA
+ tristate "Support for Zhaoxin SHA1/SHA256/SHA384/SHA512 algorithms"
+ select CRYPTO_HASH
+ select CRYPTO_SHA1
+ select CRYPTO_SHA256
+ select CRYPTO_SHA384
+ select CRYPTO_SHA512
+ help
+ Use Zhaoxin HW engine for SHA1/SHA256/SHA384/SHA512 algorithms.
+
+ Available in ZX-C+ and newer processors.
+
+ If unsure say M. The compiled module will be
+ called zhaoxin-sha.
+
endif # CRYPTO_HW
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index d859d6a5f3a4..b77c02d6dab7 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -51,3 +51,4 @@ obj-y += hisilicon/
obj-$(CONFIG_CRYPTO_DEV_AMLOGIC_GXL) += amlogic/
obj-y += intel/
obj-y += starfive/
+obj-$(CONFIG_CRYPTO_DEV_ZHAOXIN_SHA) += zhaoxin-sha.o
diff --git a/drivers/crypto/zhaoxin-sha.c b/drivers/crypto/zhaoxin-sha.c
new file mode 100644
index 000000000000..bcfc77440890
--- /dev/null
+++ b/drivers/crypto/zhaoxin-sha.c
@@ -0,0 +1,500 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Cryptographic API.
+ *
+ * Support for Zhaoxin hardware crypto engine.
+ *
+ * Copyright (c) 2023 George Xue <georgexue@zhaoxin.com>
+ */
+
+#include <crypto/internal/hash.h>
+#include <crypto/sha1.h>
+#include <crypto/sha2.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/interrupt.h>
+#include <linux/kernel.h>
+#include <linux/scatterlist.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include "zhaoxin-sha.h"
+
+static inline void zhaoxin_output_block(uint32_t *src, uint32_t *dst, size_t count)
+{
+ while (count--)
+ *dst++ = swab32(*src++);
+}
+
+static int zhaoxin_sha1_init(struct shash_desc *desc)
+{
+ struct sha1_state *sctx = shash_desc_ctx(desc);
+
+ *sctx = (struct sha1_state){
+ .state = { SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4 },
+ };
+
+ return 0;
+}
+
+static int zhaoxin_sha1_update(struct shash_desc *desc, const u8 *data, unsigned int len)
+{
+ struct sha1_state *sctx = shash_desc_ctx(desc);
+ unsigned int partial, done;
+ const u8 *src;
+ u8 buf[SHA1_BLOCK_SIZE * 2];
+ u8 *dst = &buf[0];
+
+ partial = sctx->count & (SHA1_BLOCK_SIZE - 1);
+ sctx->count += len;
+ done = 0;
+ src = data;
+ memcpy(dst, sctx->state, SHA1_DIGEST_SIZE);
+
+ if ((partial + len) >= SHA1_BLOCK_SIZE) {
+
+ /* Append the bytes in state's buffer to a block to handle */
+ if (partial) {
+ done = -partial;
+ memcpy(sctx->buffer + partial, data, done + SHA1_BLOCK_SIZE);
+ src = sctx->buffer;
+
+ asm volatile (".byte 0xf3,0x0f,0xa6,0xc8"
+ : "+S"(src), "+D"(dst)
+ : "a"(-1L), "c"(1UL));
+
+ done += SHA1_BLOCK_SIZE;
+ src = data + done;
+ }
+
+ /* Process the left bytes from the input data */
+ if (len - done >= SHA1_BLOCK_SIZE) {
+ asm volatile (".byte 0xf3,0x0f,0xa6,0xc8"
+ : "+S"(src), "+D"(dst)
+ : "a"(-1L),
+ "c"((unsigned long)((len - done) / SHA1_BLOCK_SIZE)));
+
+ done += ((len - done) - (len - done) % SHA1_BLOCK_SIZE);
+ src = data + done;
+ }
+ partial = 0;
+ }
+ memcpy(sctx->state, dst, SHA1_DIGEST_SIZE);
+ memcpy(sctx->buffer + partial, src, len - done);
+
+ return 0;
+}
+
+static int zhaoxin_sha1_final(struct shash_desc *desc, u8 *out)
+{
+ struct sha1_state *state = shash_desc_ctx(desc);
+ unsigned int partial, padlen;
+ __be64 bits;
+ static const u8 padding[SHA1_BLOCK_SIZE] = {SHA_PADDING_BYTE, };
+ const int bit_offset = SHA1_BLOCK_SIZE - sizeof(__be64);
+
+ bits = cpu_to_be64(state->count << 3);
+
+ /* Padding */
+ partial = state->count & (SHA1_BLOCK_SIZE - 1);
+ padlen = (partial < bit_offset) ? (bit_offset - partial) :
+ ((SHA1_BLOCK_SIZE + bit_offset) - partial);
+ zhaoxin_sha1_update(desc, padding, padlen);
+
+ /* Append length field bytes */
+ zhaoxin_sha1_update(desc, (const u8 *)&bits, sizeof(bits));
+
+ /* Swap to output */
+ zhaoxin_output_block(state->state, (uint32_t *)out, SHA1_DIGEST_SIZE/sizeof(uint32_t));
+
+ return 0;
+}
+
+static int zhaoxin_sha256_init(struct shash_desc *desc)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+
+ *sctx = (struct sha256_state){
+ .state = { SHA256_H0, SHA256_H1, SHA256_H2, SHA256_H3,
+ SHA256_H4, SHA256_H5, SHA256_H6, SHA256_H7},
+ };
+
+ return 0;
+}
+
+static int zhaoxin_sha256_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ struct sha256_state *sctx = shash_desc_ctx(desc);
+ unsigned int partial, done;
+ const u8 *src;
+ u8 buf[SHA256_BLOCK_SIZE*2];
+ u8 *dst = &buf[0];
+
+ partial = sctx->count & (SHA256_BLOCK_SIZE - 1);
+ sctx->count += len;
+ done = 0;
+ src = data;
+ memcpy(dst, sctx->state, SHA256_DIGEST_SIZE);
+
+ if ((partial + len) >= SHA256_BLOCK_SIZE) {
+
+ /* Append the bytes in state's buffer to a block to handle */
+ if (partial) {
+ done = -partial;
+ memcpy(sctx->buf + partial, data, done + SHA256_BLOCK_SIZE);
+ src = sctx->buf;
+
+ asm volatile (".byte 0xf3,0x0f,0xa6,0xd0"
+ : "+S"(src), "+D"(dst)
+ : "a"(-1L), "c"(1UL));
+
+ done += SHA256_BLOCK_SIZE;
+ src = data + done;
+ }
+
+ /* Process the left bytes from input data*/
+ if (len - done >= SHA256_BLOCK_SIZE) {
+ asm volatile (".byte 0xf3,0x0f,0xa6,0xd0"
+ : "+S"(src), "+D"(dst)
+ : "a"(-1L),
+ "c"((unsigned long)((len - done) / SHA256_BLOCK_SIZE)));
+
+ done += ((len - done) - (len - done) % SHA256_BLOCK_SIZE);
+ src = data + done;
+ }
+ partial = 0;
+ }
+ memcpy(sctx->state, dst, SHA256_DIGEST_SIZE);
+ memcpy(sctx->buf + partial, src, len - done);
+
+ return 0;
+}
+
+static int zhaoxin_sha256_final(struct shash_desc *desc, u8 *out)
+{
+ struct sha256_state *state = shash_desc_ctx(desc);
+ unsigned int partial, padlen;
+ __be64 bits;
+ static const u8 padding[SHA256_BLOCK_SIZE] = {SHA_PADDING_BYTE, };
+ const int bit_offset = SHA256_BLOCK_SIZE - sizeof(__be64);
+
+ bits = cpu_to_be64(state->count << 3);
+
+ /* Padding */
+ partial = state->count & (SHA256_BLOCK_SIZE - 1);
+ padlen = (partial < bit_offset) ? (bit_offset - partial) :
+ ((SHA256_BLOCK_SIZE + bit_offset) - partial);
+ zhaoxin_sha256_update(desc, padding, padlen);
+
+ /* Append length field bytes */
+ zhaoxin_sha256_update(desc, (const u8 *)&bits, sizeof(bits));
+
+ /* Swap to output */
+ zhaoxin_output_block(state->state, (uint32_t *)out, SHA256_DIGEST_SIZE/sizeof(uint32_t));
+
+ return 0;
+}
+
+static inline void zhaoxin_output_block_512(uint64_t *src,
+ uint64_t *dst, size_t count)
+{
+ while (count--)
+ *dst++ = swab64(*src++);
+}
+
+static int zhaoxin_sha384_init(struct shash_desc *desc)
+{
+ struct sha512_state *sctx = shash_desc_ctx(desc);
+
+ *sctx = (struct sha512_state){
+ .state = { SHA384_H0, SHA384_H1, SHA384_H2, SHA384_H3,
+ SHA384_H4, SHA384_H5, SHA384_H6, SHA384_H7},
+ .count = {0, 0},
+ };
+
+ return 0;
+}
+
+static int zhaoxin_sha512_init(struct shash_desc *desc)
+{
+ struct sha512_state *sctx = shash_desc_ctx(desc);
+
+ *sctx = (struct sha512_state){
+ .state = { SHA512_H0, SHA512_H1, SHA512_H2, SHA512_H3,
+ SHA512_H4, SHA512_H5, SHA512_H6, SHA512_H7},
+ .count = {0, 0},
+ };
+
+ return 0;
+}
+
+static int zhaoxin_sha512_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len)
+{
+ struct sha512_state *sctx = shash_desc_ctx(desc);
+ unsigned int partial, done;
+ const u8 *src;
+ u8 buf[SHA512_BLOCK_SIZE];
+ u8 *dst = &buf[0];
+
+ partial = sctx->count[0] % SHA512_BLOCK_SIZE;
+
+ sctx->count[0] += len;
+ if (sctx->count[0] < len)
+ sctx->count[1]++;
+
+ done = 0;
+ src = data;
+ memcpy(dst, sctx->state, SHA512_DIGEST_SIZE);
+
+ if ((partial + len) >= SHA512_BLOCK_SIZE) {
+ /* Append the bytes in state's buffer to a block to handle */
+ if (partial) {
+
+ done = -partial;
+ memcpy(sctx->buf + partial, data, done + SHA512_BLOCK_SIZE);
+
+ src = sctx->buf;
+
+ asm volatile (".byte 0xf3,0x0f,0xa6,0xe0"
+ : "+S"(src), "+D"(dst)
+ : "c"(1UL));
+
+ done += SHA512_BLOCK_SIZE;
+ src = data + done;
+ }
+
+ /* Process the left bytes from input data*/
+ if (len - done >= SHA512_BLOCK_SIZE) {
+ asm volatile (".byte 0xf3,0x0f,0xa6,0xe0"
+ : "+S"(src), "+D"(dst)
+ : "c"((unsigned long)((len - done) / SHA512_BLOCK_SIZE)));
+
+ done += ((len - done) - (len - done) % SHA512_BLOCK_SIZE);
+ src = data + done;
+ }
+ partial = 0;
+ }
+
+ memcpy(sctx->state, dst, SHA512_DIGEST_SIZE);
+ memcpy(sctx->buf + partial, src, len - done);
+
+ return 0;
+}
+
+static int zhaoxin_sha512_final(struct shash_desc *desc, u8 *out)
+{
+ const int bit_offset = SHA512_BLOCK_SIZE - sizeof(__be64[2]);
+ struct sha512_state *state = shash_desc_ctx(desc);
+ unsigned int partial = state->count[0] % SHA512_BLOCK_SIZE, padlen;
+ __be64 bits2[2];
+
+ // Both SHA384 and SHA512 may be supported.
+ int dgst_size = crypto_shash_digestsize(desc->tfm);
+
+ static u8 padding[SHA512_BLOCK_SIZE];
+
+ memset(padding, 0, SHA512_BLOCK_SIZE);
+ padding[0] = SHA_PADDING_BYTE;
+
+ // Convert byte count in little endian to bit count in big endian.
+ bits2[0] = cpu_to_be64(state->count[1] << 3 | state->count[0] >> 61);
+ bits2[1] = cpu_to_be64(state->count[0] << 3);
+
+ padlen = (partial < bit_offset) ? (bit_offset - partial) :
+ ((SHA512_BLOCK_SIZE + bit_offset) - partial);
+
+ zhaoxin_sha512_update(desc, padding, padlen);
+
+ /* Append length field bytes */
+ zhaoxin_sha512_update(desc, (const u8 *)bits2, sizeof(__be64[2]));
+
+ /* Swap to output */
+ zhaoxin_output_block_512(state->state, (uint64_t *)out, dgst_size/sizeof(uint64_t));
+
+ return 0;
+}
+
+static int zhaoxin_sha_export(struct shash_desc *desc,
+ void *out)
+{
+ int statesize = crypto_shash_statesize(desc->tfm);
+ void *sctx = shash_desc_ctx(desc);
+
+ memcpy(out, sctx, statesize);
+ return 0;
+}
+
+static int zhaoxin_sha_import(struct shash_desc *desc,
+ const void *in)
+{
+ int statesize = crypto_shash_statesize(desc->tfm);
+ void *sctx = shash_desc_ctx(desc);
+
+ memcpy(sctx, in, statesize);
+ return 0;
+}
+
+static struct shash_alg sha1_alg = {
+ .digestsize = SHA1_DIGEST_SIZE,
+ .init = zhaoxin_sha1_init,
+ .update = zhaoxin_sha1_update,
+ .final = zhaoxin_sha1_final,
+ .export = zhaoxin_sha_export,
+ .import = zhaoxin_sha_import,
+ .descsize = sizeof(struct sha1_state),
+ .statesize = sizeof(struct sha1_state),
+ .base = {
+ .cra_name = "sha1",
+ .cra_driver_name = "sha1-zhaoxin",
+ .cra_priority = ZHAOXIN_SHA_CRA_PRIORITY,
+ .cra_blocksize = SHA1_BLOCK_SIZE,
+ .cra_module = THIS_MODULE,
+ }
+};
+
+static struct shash_alg sha256_alg = {
+ .digestsize = SHA256_DIGEST_SIZE,
+ .init = zhaoxin_sha256_init,
+ .update = zhaoxin_sha256_update,
+ .final = zhaoxin_sha256_final,
+ .export = zhaoxin_sha_export,
+ .import = zhaoxin_sha_import,
+ .descsize = sizeof(struct sha256_state),
+ .statesize = sizeof(struct sha256_state),
+ .base = {
+ .cra_name = "sha256",
+ .cra_driver_name = "sha256-zhaoxin",
+ .cra_priority = ZHAOXIN_SHA_CRA_PRIORITY,
+ .cra_blocksize = SHA256_BLOCK_SIZE,
+ .cra_module = THIS_MODULE,
+ }
+};
+
+static struct shash_alg sha384_alg = {
+ .digestsize = SHA384_DIGEST_SIZE,
+ .init = zhaoxin_sha384_init,
+ .update = zhaoxin_sha512_update,
+ .final = zhaoxin_sha512_final,
+ .export = zhaoxin_sha_export,
+ .import = zhaoxin_sha_import,
+ .descsize = sizeof(struct sha512_state),
+ .statesize = sizeof(struct sha512_state),
+ .base = {
+ .cra_name = "sha384",
+ .cra_driver_name = "sha384-zhaoxin",
+ .cra_priority = ZHAOXIN_SHA_CRA_PRIORITY,
+ .cra_blocksize = SHA384_BLOCK_SIZE,
+ .cra_module = THIS_MODULE,
+ }
+};
+
+static struct shash_alg sha512_alg = {
+ .digestsize = SHA512_DIGEST_SIZE,
+ .init = zhaoxin_sha512_init,
+ .update = zhaoxin_sha512_update,
+ .final = zhaoxin_sha512_final,
+ .export = zhaoxin_sha_export,
+ .import = zhaoxin_sha_import,
+ .descsize = sizeof(struct sha512_state),
+ .statesize = sizeof(struct sha512_state),
+ .base = {
+ .cra_name = "sha512",
+ .cra_driver_name = "sha512-zhaoxin",
+ .cra_priority = ZHAOXIN_SHA_CRA_PRIORITY,
+ .cra_blocksize = SHA512_BLOCK_SIZE,
+ .cra_module = THIS_MODULE,
+ }
+};
+
+
+static const struct x86_cpu_id zhaoxin_sha_ids[] = {
+ X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 6, X86_FEATURE_PHE, NULL),
+ X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 7, X86_FEATURE_PHE, NULL),
+ X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 7, X86_FEATURE_PHE, NULL),
+ {}
+};
+MODULE_DEVICE_TABLE(x86cpu, zhaoxin_sha_ids);
+
+static int __init zhaoxin_sha_init(void)
+{
+ int rc = -ENODEV;
+
+ struct shash_alg *sha1;
+ struct shash_alg *sha256;
+ struct shash_alg *sha384;
+ struct shash_alg *sha512;
+
+ if (!x86_match_cpu(zhaoxin_sha_ids) || !boot_cpu_has(X86_FEATURE_PHE_EN))
+ return -ENODEV;
+
+ sha1 = &sha1_alg;
+ sha256 = &sha256_alg;
+
+ rc = crypto_register_shash(sha1);
+ if (rc)
+ goto out;
+
+ rc = crypto_register_shash(sha256);
+ if (rc)
+ goto out_unreg1;
+
+ if (boot_cpu_has(X86_FEATURE_PHE2_EN)) {
+
+ sha384 = &sha384_alg;
+ sha512 = &sha512_alg;
+
+ rc = crypto_register_shash(sha384);
+ if (rc)
+ goto out_unreg2;
+
+ rc = crypto_register_shash(sha512);
+ if (rc)
+ goto out_unreg3;
+
+ pr_notice("Using Zhaoxin Hardware Engine for SHA1/SHA256/SHA384/SHA512 algorithms.\n");
+ } else
+ pr_notice("Using Zhaoxin Hardware Engine for SHA1/SHA256 algorithms.\n");
+
+
+ return 0;
+
+out_unreg3:
+ if (boot_cpu_has(X86_FEATURE_PHE2_EN))
+ crypto_unregister_shash(sha384);
+
+out_unreg2:
+ crypto_unregister_shash(sha256);
+out_unreg1:
+ crypto_unregister_shash(sha1);
+
+out:
+ pr_err("Zhaoxin Hardware Engine for SHA1/SHA256/SHA384/SHA512 initialization failed.\n");
+ return rc;
+}
+
+static void __exit zhaoxin_sha_fini(void)
+{
+ crypto_unregister_shash(&sha1_alg);
+ crypto_unregister_shash(&sha256_alg);
+
+ if (boot_cpu_has(X86_FEATURE_PHE2_EN)) {
+ crypto_unregister_shash(&sha384_alg);
+ crypto_unregister_shash(&sha512_alg);
+ }
+
+}
+
+module_init(zhaoxin_sha_init);
+module_exit(zhaoxin_sha_fini);
+
+MODULE_DESCRIPTION("Zhaoxin Hardware SHA1/SHA256/SHA384/SHA512 algorithms support.");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("George Xue");
+
+MODULE_ALIAS_CRYPTO("sha1-zhaoxin");
+MODULE_ALIAS_CRYPTO("sha256-zhaoxin");
+MODULE_ALIAS_CRYPTO("sha384-zhaoxin");
+MODULE_ALIAS_CRYPTO("sha512-zhaoxin");
diff --git a/drivers/crypto/zhaoxin-sha.h b/drivers/crypto/zhaoxin-sha.h
new file mode 100644
index 000000000000..373484384f6e
--- /dev/null
+++ b/drivers/crypto/zhaoxin-sha.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Driver for Zhaoxin Sha
+ *
+ * Copyright (c) 2023 George Xue<georgexue@zhaoxin.com>
+ */
+
+#ifndef _ZHAOXIN_SHA_H
+#define _ZHAOXIN_SHA_H
+
+#define ZHAOXIN_SHA_CRA_PRIORITY 300
+#define ZHAOXIN_SHA_COMPOSITE_PRIORITY 400
+
+#define SHA_PADDING_BYTE 0x80
+
+#endif /* _ZHAOXIN_SHA_H */
--
2.25.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512
2024-01-16 6:35 ` [PATCH 3/3] crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512 Tony W Wang-oc
@ 2024-01-17 0:45 ` kernel test robot
2024-01-19 17:04 ` kernel test robot
1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2024-01-17 0:45 UTC (permalink / raw)
To: Tony W Wang-oc, 675146817, story_19872006, herbert, davem,
linux-crypto, linux-kernel, tglx, mingo, bp, dave.hansen, x86,
hpa, seanjc, kim.phillips, kirill.shutemov, jmattson, babu.moger,
kai.huang, acme, aik, namhyung
Cc: oe-kbuild-all, CobeChen, TimGuo, LeoLiu-oc, GeorgeXue
Hi Tony,
kernel test robot noticed the following build errors:
[auto build test ERROR on herbert-cryptodev-2.6/master]
[also build test ERROR on tip/x86/core linus/master v6.7 next-20240112]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Tony-W-Wang-oc/crypto-padlock-sha-Matches-CPU-with-Family-with-6-explicitly/20240116-144827
base: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
patch link: https://lore.kernel.org/r/20240116063549.3016-4-TonyWWang-oc%40zhaoxin.com
patch subject: [PATCH 3/3] crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512
config: hexagon-randconfig-r071-20240117 (https://download.01.org/0day-ci/archive/20240117/202401170833.HWvPThMS-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project.git f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240117/202401170833.HWvPThMS-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202401170833.HWvPThMS-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from drivers/crypto/zhaoxin-sha.c:17:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:337:
include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __raw_readb(PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
^
In file included from drivers/crypto/zhaoxin-sha.c:17:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:337:
include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
^
In file included from drivers/crypto/zhaoxin-sha.c:17:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:337:
include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writeb(value, PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
>> drivers/crypto/zhaoxin-sha.c:20:10: fatal error: 'asm/cpu_device_id.h' file not found
#include <asm/cpu_device_id.h>
^~~~~~~~~~~~~~~~~~~~~
6 warnings and 1 error generated.
vim +20 drivers/crypto/zhaoxin-sha.c
> 20 #include <asm/cpu_device_id.h>
21 #include <asm/fpu/api.h>
22 #include "zhaoxin-sha.h"
23
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512
2024-01-16 6:35 ` [PATCH 3/3] crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512 Tony W Wang-oc
2024-01-17 0:45 ` kernel test robot
@ 2024-01-19 17:04 ` kernel test robot
1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2024-01-19 17:04 UTC (permalink / raw)
To: Tony W Wang-oc, 675146817, story_19872006, herbert, davem,
linux-crypto, linux-kernel, tglx, mingo, bp, dave.hansen, x86,
hpa, seanjc, kim.phillips, kirill.shutemov, jmattson, babu.moger,
kai.huang, acme, aik, namhyung
Cc: oe-kbuild-all, CobeChen, TimGuo, LeoLiu-oc, GeorgeXue
Hi Tony,
kernel test robot noticed the following build errors:
[auto build test ERROR on herbert-cryptodev-2.6/master]
[also build test ERROR on tip/x86/core linus/master v6.7 next-20240119]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Tony-W-Wang-oc/crypto-padlock-sha-Matches-CPU-with-Family-with-6-explicitly/20240116-144827
base: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
patch link: https://lore.kernel.org/r/20240116063549.3016-4-TonyWWang-oc%40zhaoxin.com
patch subject: [PATCH 3/3] crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512
config: loongarch-allyesconfig (https://download.01.org/0day-ci/archive/20240120/202401200020.g4KDJOTm-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240120/202401200020.g4KDJOTm-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202401200020.g4KDJOTm-lkp@intel.com/
All errors (new ones prefixed by >>):
>> drivers/crypto/zhaoxin-sha.c:20:10: fatal error: asm/cpu_device_id.h: No such file or directory
20 | #include <asm/cpu_device_id.h>
| ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.
vim +20 drivers/crypto/zhaoxin-sha.c
> 20 #include <asm/cpu_device_id.h>
21 #include <asm/fpu/api.h>
22 #include "zhaoxin-sha.h"
23
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-01-19 17:07 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-16 6:35 [PATCH 0/3] Add Zhaoxin hardware engine driver support for SHA Tony W Wang-oc
2024-01-16 6:35 ` [PATCH 1/3] crypto: padlock-sha: Matches CPU with Family with 6 explicitly Tony W Wang-oc
2024-01-16 6:35 ` [PATCH 2/3] x86/cpufeatures: Add CPU feature flags for Zhaoxin Hash Engine Tony W Wang-oc
2024-01-16 6:35 ` [PATCH 3/3] crypto: Zhaoxin: Hardware Engine Driver for SHA1/256/384/512 Tony W Wang-oc
2024-01-17 0:45 ` kernel test robot
2024-01-19 17:04 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox