* [PATCH v4 0/2] lib/crypto: x86/sha: Add PHE Extensions support
@ 2026-03-13 8:01 AlanSong-oc
2026-03-13 8:01 ` [PATCH v4 1/2] crypto: padlock-sha - Disable for Zhaoxin processor AlanSong-oc
2026-03-13 8:01 ` [PATCH v4 2/2] lib/crypto: x86/sha256: PHE Extensions optimized SHA256 transform function AlanSong-oc
0 siblings, 2 replies; 5+ messages in thread
From: AlanSong-oc @ 2026-03-13 8:01 UTC (permalink / raw)
To: herbert, davem, ebiggers, Jason, ardb, linux-crypto, linux-kernel,
x86
Cc: CobeChen, TonyWWang-oc, YunShen, GeorgeXue, LeoLiu, HansHu,
AlanSong-oc
This series adds support for PHE Extensions optimized SHA256 transform
functions for Zhaoxin processors in lib/crypto, and disables
the padlock-sha driver on Zhaoxin platforms due to self-test failures.
After applying this patch series, the data block processing throughput
increases by approximately 2 to 5 times on the Zhaoxin KX-7000 platform,
depending on block size and hash algorithm, as measured by
CRYPTO_LIB_BENCHMARK. The KUnit test suites also pass successfully.
Changes in v4:
- Include benchmark results, test results, and the specification link
directly in the commit message instead of the cover letter.
- Check CONFIG_CPU_SUP_ZHAOXIN directly in the condition rather than
using #if/#endif for conditional compilation.
- Combine the CPU family check and the X86_FEATURE_PHE_EN feature check
into a single condition.
- Correct the comment describing the instruction register requirements
in both 32-bit and 64-bit operation modes.
- Fix the inline assembly constraints to match the instruction behavior
for input and output registers.
- Only include XSHA256 support for SHA-256 and drop XSHA1 support.
Changes in v3:
- Implement PHE Extensions optimized SHA1 and SHA256 transform functions
using inline assembly instead of separate assembly files
- Eliminate unnecessary casts
- Add CONFIG_CPU_SUP_ZHAOXIN check to compile out the code when disabled
- Use 'boot_cpu_data.x86' to identify the CPU family instead of
'cpu_data(0).x86'
- Only check X86_FEATURE_PHE_EN for CPU support, consistent with other
CPU feature checks.
- Disable the padlock-sha driver on Zhaoxin processors with CPU family
0x07 and newer.
Changes in v2:
- Add Zhaoxin support to lib/crypto instead of extending the existing
padlock-sha driver
AlanSong-oc (2):
crypto: padlock-sha - Disable for Zhaoxin processor
lib/crypto: x86/sha256: PHE Extensions optimized SHA256 transform
function
drivers/crypto/padlock-sha.c | 7 +++++++
lib/crypto/x86/sha256.h | 25 +++++++++++++++++++++++++
2 files changed, 32 insertions(+)
--
2.34.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v4 1/2] crypto: padlock-sha - Disable for Zhaoxin processor
2026-03-13 8:01 [PATCH v4 0/2] lib/crypto: x86/sha: Add PHE Extensions support AlanSong-oc
@ 2026-03-13 8:01 ` AlanSong-oc
2026-03-14 18:40 ` Eric Biggers
2026-03-13 8:01 ` [PATCH v4 2/2] lib/crypto: x86/sha256: PHE Extensions optimized SHA256 transform function AlanSong-oc
1 sibling, 1 reply; 5+ messages in thread
From: AlanSong-oc @ 2026-03-13 8:01 UTC (permalink / raw)
To: herbert, davem, ebiggers, Jason, ardb, linux-crypto, linux-kernel,
x86
Cc: CobeChen, TonyWWang-oc, YunShen, GeorgeXue, LeoLiu, HansHu,
AlanSong-oc, stable
For Zhaoxin processors, the XSHA1 instruction requires the total memory
allocated at %rdi register must be 32 bytes, while the XSHA1 and
XSHA256 instruction doesn't perform any operation when %ecx is zero.
Due to these requirements, the current padlock-sha driver does not work
correctly with Zhaoxin processors. It cannot pass the self-tests and
therefore does not activate the driver on Zhaoxin processors. This issue
has been reported in Debian [1]. The self-tests fail with the
following messages [2]:
alg: shash: sha1-padlock-nano test failed (wrong result) on test vector 0, cfg="init+update+final aligned buffer"
alg: self-tests for sha1 using sha1-padlock-nano failed (rc=-22)
------------[ cut here ]------------
alg: shash: sha256-padlock-nano test failed (wrong result) on test vector 0, cfg="init+update+final aligned buffer"
alg: self-tests for sha256 using sha256-padlock-nano failed (rc=-22)
------------[ cut here ]------------
Disable the padlock-sha driver on Zhaoxin processors with the CPU family
0x07 and newer. Following the suggestion in [3], add support for the PHE
extensions to lib/crypto. Only XSHA256 support for SHA-256 is included,
since SHA-1 has been cryptographically broken, as recommended in [4].
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1103397
[2] https://linux-hardware.org/?probe=271fabb7a4&log=dmesg
[3] https://lore.kernel.org/linux-crypto/aUI4CGp6kK7mxgEr@gondor.apana.org.au/
[4] https://lore.kernel.org/linux-crypto/20260116071513.12134-1-AlanSong-oc@zhaoxin.com/T/#m49436c4849dd64454b3554c105197ef9c61db23e
Fixes: 63dc06cd12f9 ("crypto: padlock-sha - Use API partial block handling")
Cc: stable@vger.kernel.org
Signed-off-by: AlanSong-oc <AlanSong-oc@zhaoxin.com>
---
drivers/crypto/padlock-sha.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/crypto/padlock-sha.c b/drivers/crypto/padlock-sha.c
index 329f60ad4..9214bbfc8 100644
--- a/drivers/crypto/padlock-sha.c
+++ b/drivers/crypto/padlock-sha.c
@@ -332,6 +332,13 @@ static int __init padlock_init(void)
if (!x86_match_cpu(padlock_sha_ids) || !boot_cpu_has(X86_FEATURE_PHE_EN))
return -ENODEV;
+ /*
+ * Skip family 0x07 and newer used by Zhaoxin processors,
+ * as the driver's self-tests fail on these CPUs.
+ */
+ if (c->x86 >= 0x07)
+ return -ENODEV;
+
/* Register the newly added algorithm module if on *
* VIA Nano processor, or else just do as before */
if (c->x86_model < 0x0f) {
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v4 2/2] lib/crypto: x86/sha256: PHE Extensions optimized SHA256 transform function
2026-03-13 8:01 [PATCH v4 0/2] lib/crypto: x86/sha: Add PHE Extensions support AlanSong-oc
2026-03-13 8:01 ` [PATCH v4 1/2] crypto: padlock-sha - Disable for Zhaoxin processor AlanSong-oc
@ 2026-03-13 8:01 ` AlanSong-oc
2026-03-14 18:50 ` Eric Biggers
1 sibling, 1 reply; 5+ messages in thread
From: AlanSong-oc @ 2026-03-13 8:01 UTC (permalink / raw)
To: herbert, davem, ebiggers, Jason, ardb, linux-crypto, linux-kernel,
x86
Cc: CobeChen, TonyWWang-oc, YunShen, GeorgeXue, LeoLiu, HansHu,
AlanSong-oc
Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its CPU
instructions by PHE(Padlock Hash Engine) Extensions, including XSHA1,
XSHA256, XSHA384 and XSHA512 instructions. The instruction specification
is available at the following link.
(https://gitee.com/openzhaoxin/zhaoxin_specifications/blob/20260227/ZX_Padlock_Reference.pdf)
With the help of implementation of SHA in hardware instead of software,
can develop applications with higher performance, more security and more
flexibility.
This patch includes the XSHA256 instruction optimized implementation of
SHA-256 transform function.
The table below shows the benchmark results before and after applying
this patch by using CRYPTO_LIB_BENCHMARK on Zhaoxin KX-7000 platform,
highlighting the achieved speedups.
+---------+--------------------------+
| | SHA256 |
+---------+--------+-----------------+
| Len | Before | After |
+---------+--------+-----------------+
| 1* | 2 | 7 (3.50x) |
| 16 | 35 | 119 (3.40x) |
| 64 | 74 | 280 (3.78x) |
| 127 | 99 | 387 (3.91x) |
| 128 | 103 | 427 (4.15x) |
| 200 | 123 | 537 (4.37x) |
| 256 | 128 | 582 (4.55x) |
| 511 | 144 | 679 (4.72x) |
| 512 | 146 | 714 (4.89x) |
| 1024 | 157 | 796 (5.07x) |
| 3173 | 167 | 883 (5.28x) |
| 4096 | 166 | 876 (5.28x) |
| 16384 | 169 | 899 (5.32x) |
+---------+--------+-----------------+
*: The length of each data block to be processed by one complete SHA
sequence.
**: The throughput of processing data blocks, unit is Mb/s.
After applying this patch, the SHA256 KUnit test suite passes on Zhaoxin
platforms. Detailed test logs are shown below.
[ 7.767257] # Subtest: sha256
[ 7.770542] # module: sha256_kunit
[ 7.770544] 1..15
[ 7.777383] ok 1 test_hash_test_vectors
[ 7.788563] ok 2 test_hash_all_lens_up_to_4096
[ 7.806090] ok 3 test_hash_incremental_updates
[ 7.813553] ok 4 test_hash_buffer_overruns
[ 7.822384] ok 5 test_hash_overlaps
[ 7.829388] ok 6 test_hash_alignment_consistency
[ 7.833843] ok 7 test_hash_ctx_zeroization
[ 7.915191] ok 8 test_hash_interrupt_context_1
[ 8.362312] ok 9 test_hash_interrupt_context_2
[ 8.401607] ok 10 test_hmac
[ 8.415458] ok 11 test_sha256_finup_2x
[ 8.419397] ok 12 test_sha256_finup_2x_defaultctx
[ 8.424107] ok 13 test_sha256_finup_2x_hugelen
[ 8.451289] # benchmark_hash: len=1: 7 MB/s
[ 8.465372] # benchmark_hash: len=16: 119 MB/s
[ 8.481760] # benchmark_hash: len=64: 280 MB/s
[ 8.499344] # benchmark_hash: len=127: 387 MB/s
[ 8.515800] # benchmark_hash: len=128: 427 MB/s
[ 8.531970] # benchmark_hash: len=200: 537 MB/s
[ 8.548241] # benchmark_hash: len=256: 582 MB/s
[ 8.564838] # benchmark_hash: len=511: 679 MB/s
[ 8.580872] # benchmark_hash: len=512: 714 MB/s
[ 8.596858] # benchmark_hash: len=1024: 796 MB/s
[ 8.612567] # benchmark_hash: len=3173: 883 MB/s
[ 8.628546] # benchmark_hash: len=4096: 876 MB/s
[ 8.644482] # benchmark_hash: len=16384: 899 MB/s
[ 8.649773] ok 14 benchmark_hash
[ 8.655505] ok 15 benchmark_sha256_finup_2x # SKIP not relevant
[ 8.659065] # sha256: pass:14 fail:0 skip:1 total:15
[ 8.665276] # Totals: pass:14 fail:0 skip:1 total:15
[ 8.670195] ok 7 sha256
Signed-off-by: AlanSong-oc <AlanSong-oc@zhaoxin.com>
---
lib/crypto/x86/sha256.h | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/lib/crypto/x86/sha256.h b/lib/crypto/x86/sha256.h
index 38e33b22a..5816b8928 100644
--- a/lib/crypto/x86/sha256.h
+++ b/lib/crypto/x86/sha256.h
@@ -31,6 +31,27 @@ DEFINE_X86_SHA256_FN(sha256_blocks_avx, sha256_transform_avx);
DEFINE_X86_SHA256_FN(sha256_blocks_avx2, sha256_transform_rorx);
DEFINE_X86_SHA256_FN(sha256_blocks_ni, sha256_ni_transform);
+#define PHE_ALIGNMENT 16
+static void sha256_blocks_phe(struct sha256_block_state *state,
+ const u8 *data, size_t nblocks)
+{
+ /*
+ * On Zhaoxin processors, XSHA256 requires the %rdi register
+ * in 64-bit mode (or %edi in 32-bit mode) to point to
+ * a 32-byte, 16-byte-aligned buffer.
+ */
+ u8 buf[32 + PHE_ALIGNMENT - 1];
+ u8 *dst = PTR_ALIGN(&buf[0], PHE_ALIGNMENT);
+ size_t padding = -1;
+
+ memcpy(dst, state, SHA256_DIGEST_SIZE);
+ asm volatile(".byte 0xf3,0x0f,0xa6,0xd0" /* REP XSHA256 */
+ : "+a"(padding), "+c"(nblocks), "+S"(data)
+ : "D"(dst)
+ : "memory");
+ memcpy(state, dst, SHA256_DIGEST_SIZE);
+}
+
static void sha256_blocks(struct sha256_block_state *state,
const u8 *data, size_t nblocks)
{
@@ -79,6 +100,10 @@ static void sha256_mod_init_arch(void)
if (boot_cpu_has(X86_FEATURE_SHA_NI)) {
static_call_update(sha256_blocks_x86, sha256_blocks_ni);
static_branch_enable(&have_sha_ni);
+ } else if (IS_ENABLED(CONFIG_CPU_SUP_ZHAOXIN) &&
+ boot_cpu_has(X86_FEATURE_PHE_EN) &&
+ boot_cpu_data.x86 >= 0x07) {
+ static_call_update(sha256_blocks_x86, sha256_blocks_phe);
} else if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
NULL) &&
boot_cpu_has(X86_FEATURE_AVX)) {
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v4 1/2] crypto: padlock-sha - Disable for Zhaoxin processor
2026-03-13 8:01 ` [PATCH v4 1/2] crypto: padlock-sha - Disable for Zhaoxin processor AlanSong-oc
@ 2026-03-14 18:40 ` Eric Biggers
0 siblings, 0 replies; 5+ messages in thread
From: Eric Biggers @ 2026-03-14 18:40 UTC (permalink / raw)
To: AlanSong-oc
Cc: herbert, davem, Jason, ardb, linux-crypto, linux-kernel, x86,
CobeChen, TonyWWang-oc, YunShen, GeorgeXue, LeoLiu, HansHu,
stable
On Fri, Mar 13, 2026 at 04:01:49PM +0800, AlanSong-oc wrote:
> For Zhaoxin processors, the XSHA1 instruction requires the total memory
> allocated at %rdi register must be 32 bytes, while the XSHA1 and
> XSHA256 instruction doesn't perform any operation when %ecx is zero.
Applied to https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=libcrypto-fixes
I made a few tweaks to your commit message, as noted below:
> ------------[ cut here ]------------
>
> alg: shash: sha256-padlock-nano test failed (wrong result) on test vector 0, cfg="init+update+final aligned buffer"
> alg: self-tests for sha256 using sha256-padlock-nano failed (rc=-22)
> ------------[ cut here ]------------
Removed the "cut here" lines because they caused checkpatch errors
> Disable the padlock-sha driver on Zhaoxin processors with the CPU family
> 0x07 and newer. Following the suggestion in [3], add support for the PHE
> extensions to lib/crypto. Only XSHA256 support for SHA-256 is included,
> since SHA-1 has been cryptographically broken, as recommended in [4].
Changed to clarify that the lib/crypto/ support is in a different patch:
Disable the padlock-sha driver on Zhaoxin processors with the CPU
family 0x07 and newer. Following the suggestion in [3], support for
PHE will be added to lib/crypto/ instead.
> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1103397
Changed to correct link https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1113996
- Eric
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v4 2/2] lib/crypto: x86/sha256: PHE Extensions optimized SHA256 transform function
2026-03-13 8:01 ` [PATCH v4 2/2] lib/crypto: x86/sha256: PHE Extensions optimized SHA256 transform function AlanSong-oc
@ 2026-03-14 18:50 ` Eric Biggers
0 siblings, 0 replies; 5+ messages in thread
From: Eric Biggers @ 2026-03-14 18:50 UTC (permalink / raw)
To: AlanSong-oc
Cc: herbert, davem, Jason, ardb, linux-crypto, linux-kernel, x86,
CobeChen, TonyWWang-oc, YunShen, GeorgeXue, LeoLiu, HansHu
On Fri, Mar 13, 2026 at 04:01:50PM +0800, AlanSong-oc wrote:
> Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its CPU
> instructions by PHE(Padlock Hash Engine) Extensions, including XSHA1,
> XSHA256, XSHA384 and XSHA512 instructions. The instruction specification
> is available at the following link.
> (https://gitee.com/openzhaoxin/zhaoxin_specifications/blob/20260227/ZX_Padlock_Reference.pdf)
Applied to https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=libcrypto-next
- Eric
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-14 18:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-13 8:01 [PATCH v4 0/2] lib/crypto: x86/sha: Add PHE Extensions support AlanSong-oc
2026-03-13 8:01 ` [PATCH v4 1/2] crypto: padlock-sha - Disable for Zhaoxin processor AlanSong-oc
2026-03-14 18:40 ` Eric Biggers
2026-03-13 8:01 ` [PATCH v4 2/2] lib/crypto: x86/sha256: PHE Extensions optimized SHA256 transform function AlanSong-oc
2026-03-14 18:50 ` Eric Biggers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox