public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/1] lib/crypto: tests: KUnit test-suite for AES
@ 2026-01-15 18:38 Holger Dengler
  2026-01-15 18:38 ` [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests " Holger Dengler
  0 siblings, 1 reply; 14+ messages in thread
From: Holger Dengler @ 2026-01-15 18:38 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	Harald Freudenberger, Holger Dengler, linux-kernel, linux-crypto

The following patch adds a kunit tests for the aes library. It does a very
minimal verification of the aes operation for all key-sizes. The benchmarks,
which are also part of the test-suite, can be used to get some rough
performance measurements of the aes encrypt and decrypt functions. The
aes_prepare*key() APIs are not covered by the benchmarks.

Changes since RFC [1]:
- reorder entries in Kconfig/Makefile alphabetically
- fail (instead of skip) in case of failures in key preparation
- replace local constant definitions
- replace macros with helper functions

[1] https://lore.kernel.org/linux-crypto/20260114153138.4896-1-dengler@linux.ibm.com

Example output of the aes_kunit test-suite:

[  178.640161] KTAP version 1
[  178.640167] 1..1
[  178.642268]     KTAP version 1
[  178.642269]     # Subtest: aes
[  178.642270]     # module: aes_kunit
[  178.642272]     1..9
[  178.642330]     ok 1 test_aes128_encrypt
[  178.642378]     ok 2 test_aes128_decrypt
[  178.642427]     ok 3 test_aes192_encrypt
[  178.642477]     ok 4 test_aes192_decrypt
[  178.642531]     ok 5 test_aes256_encrypt
[  178.642584]     ok 6 test_aes256_decrypt
[  179.345355]     # benchmark_aes128: enc (iter. 10000000, duration 351128093ns)
[  179.345359]     # benchmark_aes128: enc (len=16): 434 MB/s
[  179.345361]     # benchmark_aes128: dec (iter. 10000000, duration 351541596ns)
[  179.345363]     # benchmark_aes128: dec (len=16): 434 MB/s
[  179.345398]     ok 7 benchmark_aes128
[  180.082939]     # benchmark_aes192: enc (iter. 10000000, duration 370559483ns)
[  180.082942]     # benchmark_aes192: enc (len=16): 411 MB/s
[  180.082944]     # benchmark_aes192: dec (iter. 10000000, duration 366888529ns)
[  180.082946]     # benchmark_aes192: dec (len=16): 415 MB/s
[  180.082982]     ok 8 benchmark_aes192
[  180.810447]     # benchmark_aes256: enc (iter. 10000000, duration 363703684ns)
[  180.810450]     # benchmark_aes256: enc (len=16): 419 MB/s
[  180.810452]     # benchmark_aes256: dec (iter. 10000000, duration 363671689ns)
[  180.810454]     # benchmark_aes256: dec (len=16): 419 MB/s
[  180.810490]     ok 9 benchmark_aes256
[  180.810495] # aes: pass:9 fail:0 skip:0 total:9
[  180.810498] # Totals: pass:9 fail:0 skip:0 total:9
[  180.810500] ok 1 aes

Holger Dengler (1):
  lib/crypto: tests: Add KUnit tests for AES

 lib/crypto/tests/Kconfig        |  12 +++
 lib/crypto/tests/Makefile       |   1 +
 lib/crypto/tests/aes-testvecs.h |  78 +++++++++++++++++
 lib/crypto/tests/aes_kunit.c    | 149 ++++++++++++++++++++++++++++++++
 4 files changed, 240 insertions(+)
 create mode 100644 lib/crypto/tests/aes-testvecs.h
 create mode 100644 lib/crypto/tests/aes_kunit.c


base-commit: 47753e09a15d9fd7cdf114550510f4f2af9333ec
-- 
2.51.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES
  2026-01-15 18:38 [PATCH v1 0/1] lib/crypto: tests: KUnit test-suite for AES Holger Dengler
@ 2026-01-15 18:38 ` Holger Dengler
  2026-01-15 20:43   ` Eric Biggers
  2026-01-16  0:25   ` kernel test robot
  0 siblings, 2 replies; 14+ messages in thread
From: Holger Dengler @ 2026-01-15 18:38 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	Harald Freudenberger, Holger Dengler, linux-kernel, linux-crypto

Add a KUnit test suite for AES library functions, including KAT and
benchmarks.

Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
---
 lib/crypto/tests/Kconfig        |  12 +++
 lib/crypto/tests/Makefile       |   1 +
 lib/crypto/tests/aes-testvecs.h |  78 +++++++++++++++++
 lib/crypto/tests/aes_kunit.c    | 149 ++++++++++++++++++++++++++++++++
 4 files changed, 240 insertions(+)
 create mode 100644 lib/crypto/tests/aes-testvecs.h
 create mode 100644 lib/crypto/tests/aes_kunit.c

diff --git a/lib/crypto/tests/Kconfig b/lib/crypto/tests/Kconfig
index 4970463ea0aa..8ac06b6e2f12 100644
--- a/lib/crypto/tests/Kconfig
+++ b/lib/crypto/tests/Kconfig
@@ -1,5 +1,17 @@
 # SPDX-License-Identifier: GPL-2.0-or-later
 
+config CRYPTO_LIB_AES_KUNIT_TEST
+	tristate "KUnit tests for AES" if !KUNIT_ALL_TESTS
+	depends on KUNIT
+	default KUNIT_ALL_TESTS || CRYPTO_SELFTESTS
+	select CRYPTO_LIB_BENCHMARK_VISIBLE
+	select CRYPTO_LIB_AES
+	help
+	  KUnit tests for the AES library functions, including known answer
+	  tests and benchmarks for encrypt/decrypt with all key sizes. The
+	  test suite does not contain any key generation test, nor any error
+	  cases.
+
 config CRYPTO_LIB_BLAKE2B_KUNIT_TEST
 	tristate "KUnit tests for BLAKE2b" if !KUNIT_ALL_TESTS
 	depends on KUNIT
diff --git a/lib/crypto/tests/Makefile b/lib/crypto/tests/Makefile
index f4262379f56c..e0a30bdc02ac 100644
--- a/lib/crypto/tests/Makefile
+++ b/lib/crypto/tests/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-or-later
 
+obj-$(CONFIG_CRYPTO_LIB_AES_KUNIT_TEST) += aes_kunit.o
 obj-$(CONFIG_CRYPTO_LIB_BLAKE2B_KUNIT_TEST) += blake2b_kunit.o
 obj-$(CONFIG_CRYPTO_LIB_BLAKE2S_KUNIT_TEST) += blake2s_kunit.o
 obj-$(CONFIG_CRYPTO_LIB_CURVE25519_KUNIT_TEST) += curve25519_kunit.o
diff --git a/lib/crypto/tests/aes-testvecs.h b/lib/crypto/tests/aes-testvecs.h
new file mode 100644
index 000000000000..dfa528db7f02
--- /dev/null
+++ b/lib/crypto/tests/aes-testvecs.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _AES_TESTVECS_H
+#define _AES_TESTVECS_H
+
+#include <crypto/aes.h>
+
+struct buf {
+	size_t blen;
+	u8 b[];
+};
+
+struct aes_testvector {
+	u8 plain[AES_BLOCK_SIZE];
+	u8 cipher[AES_BLOCK_SIZE];
+	struct {
+		size_t len;
+		u8 b[32];
+	} key;
+};
+
+static const struct aes_testvector aes128_kat = {
+	.plain = {
+		0x6b, 0xc1, 0xbe, 0xe2, 0x2e, 0x40, 0x9f, 0x96,
+		0xe9, 0x3d, 0x7e, 0x11, 0x73, 0x93, 0x17, 0x2a,
+	},
+	.cipher = {
+		0x3a, 0xd7, 0x7b, 0xb4, 0x0d, 0x7a, 0x36, 0x60,
+		0xa8, 0x9e, 0xca, 0xf3, 0x24, 0x66, 0xef, 0x97,
+	},
+	.key = {
+		.len = 16,
+		.b = {
+			0x2b, 0x7e, 0x15, 0x16, 0x28, 0xae, 0xd2, 0xa6,
+			0xab, 0xf7, 0x15, 0x88, 0x09, 0xcf, 0x4f, 0x3c,
+		},
+	},
+};
+
+static const struct aes_testvector aes192_kat = {
+	.plain = {
+		0x6b, 0xc1, 0xbe, 0xe2, 0x2e, 0x40, 0x9f, 0x96,
+		0xe9, 0x3d, 0x7e, 0x11, 0x73, 0x93, 0x17, 0x2a,
+	},
+	.cipher = {
+		0xbd, 0x33, 0x4f, 0x1d, 0x6e, 0x45, 0xf2, 0x5f,
+		0xf7, 0x12, 0xa2, 0x14, 0x57, 0x1f, 0xa5, 0xcc,
+	},
+	.key = {
+		.len = 24,
+		.b = {
+			0x8e, 0x73, 0xb0, 0xf7, 0xda, 0x0e, 0x64, 0x52,
+			0xc8, 0x10, 0xf3, 0x2b, 0x80, 0x90, 0x79, 0xe5,
+			0x62, 0xf8, 0xea, 0xd2, 0x52, 0x2c, 0x6b, 0x7b,
+		},
+	},
+};
+
+static const struct aes_testvector aes256_kat = {
+	.plain = {
+		0x6b, 0xc1, 0xbe, 0xe2, 0x2e, 0x40, 0x9f, 0x96,
+		0xe9, 0x3d, 0x7e, 0x11, 0x73, 0x93, 0x17, 0x2a,
+	},
+	.cipher = {
+		0xf3, 0xee, 0xd1, 0xbd, 0xb5, 0xd2, 0xa0, 0x3c,
+		0x06, 0x4b, 0x5a, 0x7e, 0x3d, 0xb1, 0x81, 0xf8,
+	},
+	.key = {
+		.len = 32,
+		.b = {
+			0x60, 0x3d, 0xeb, 0x10, 0x15, 0xca, 0x71, 0xbe,
+			0x2b, 0x73, 0xae, 0xf0, 0x85, 0x7d, 0x77, 0x81,
+			0x1f, 0x35, 0x2c, 0x07, 0x3b, 0x61, 0x08, 0xd7,
+			0x2d, 0x98, 0x10, 0xa3, 0x09, 0x14, 0xdf, 0xf4,
+		},
+	},
+};
+
+#endif /* _AES_TESTVECS_H */
diff --git a/lib/crypto/tests/aes_kunit.c b/lib/crypto/tests/aes_kunit.c
new file mode 100644
index 000000000000..1f4a4f53f553
--- /dev/null
+++ b/lib/crypto/tests/aes_kunit.c
@@ -0,0 +1,149 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <kunit/test.h>
+#include "aes-testvecs.h"
+
+static void test_aes(struct kunit *test, const struct aes_testvector *tv,
+		     bool enc)
+{
+	struct aes_key aes_key;
+	u8 out[AES_BLOCK_SIZE];
+	const u8 *input, *expect;
+	int rc;
+
+	rc = aes_preparekey(&aes_key, tv->key.b, tv->key.len);
+	KUNIT_ASSERT_EQ(test, 0, rc);
+
+	if (enc) {
+		input = tv->plain;
+		expect = tv->cipher;
+		aes_encrypt(&aes_key, out, input);
+	} else {
+		input = tv->cipher;
+		expect = tv->plain;
+		aes_decrypt(&aes_key, out, input);
+	}
+	KUNIT_ASSERT_MEMEQ(test, out, expect, sizeof(out));
+}
+
+static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv)
+{
+	const size_t num_iters = 10000000;
+	u8 out[AES_BLOCK_SIZE];
+	struct aes_key aes_key;
+	u64 t_enc, t_dec;
+	int rc;
+
+	if (!IS_ENABLED(CONFIG_CRYPTO_LIB_BENCHMARK))
+		kunit_skip(test, "not enabled");
+
+	rc = aes_preparekey(&aes_key, tv->key.b, tv->key.len);
+	KUNIT_ASSERT_EQ(test, 0, rc);
+
+	/* warm-up enc */
+	for (size_t i = 0; i < 1000; i++)
+		aes_encrypt(&aes_key, out, tv->plain);
+
+	preempt_disable();
+	t_enc = ktime_get_ns();
+
+	for (size_t i = 0; i < num_iters; i++)
+		aes_encrypt(&aes_key, out, tv->plain);
+
+	t_enc = ktime_get_ns() - t_enc;
+	preempt_enable();
+
+	/* warm-up dec */
+	for (size_t i = 0; i < 1000; i++)
+		aes_decrypt(&aes_key, out, tv->cipher);
+
+	preempt_disable();
+	t_dec = ktime_get_ns();
+
+	for (size_t i = 0; i < num_iters; i++)
+		aes_decrypt(&aes_key, out, tv->cipher);
+
+	t_dec = ktime_get_ns() - t_dec;
+	preempt_enable();
+
+	kunit_info(test, "enc (iter. %zu, duration %lluns)",
+		   num_iters, t_enc);
+	kunit_info(test, "enc (len=%zu): %llu MB/s",
+		   (size_t)AES_BLOCK_SIZE,
+		   div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC,
+			     (t_enc ?: 1) * SZ_1M));
+
+	kunit_info(test, "dec (iter. %zu, duration %lluns)",
+		   num_iters, t_dec);
+	kunit_info(test, "dec (len=%zu): %llu MB/s",
+		   (size_t)AES_BLOCK_SIZE,
+		   div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC,
+			     (t_dec ?: 1) * SZ_1M));
+}
+
+static void test_aes128_encrypt(struct kunit *test)
+{
+	test_aes(test, &aes128_kat, true);
+}
+
+static void test_aes128_decrypt(struct kunit *test)
+{
+	test_aes(test, &aes128_kat, false);
+}
+
+static void test_aes192_encrypt(struct kunit *test)
+{
+	test_aes(test, &aes192_kat, true);
+}
+
+static void test_aes192_decrypt(struct kunit *test)
+{
+	test_aes(test, &aes192_kat, false);
+}
+
+static void test_aes256_encrypt(struct kunit *test)
+{
+	test_aes(test, &aes256_kat, true);
+}
+
+static void test_aes256_decrypt(struct kunit *test)
+{
+	test_aes(test, &aes256_kat, false);
+}
+
+static void benchmark_aes128(struct kunit *test)
+{
+	benchmark_aes(test, &aes128_kat);
+}
+
+static void benchmark_aes192(struct kunit *test)
+{
+	benchmark_aes(test, &aes192_kat);
+}
+
+static void benchmark_aes256(struct kunit *test)
+{
+	benchmark_aes(test, &aes256_kat);
+}
+
+static struct kunit_case aes_test_cases[] = {
+	KUNIT_CASE(test_aes128_encrypt),
+	KUNIT_CASE(test_aes128_decrypt),
+	KUNIT_CASE(test_aes192_encrypt),
+	KUNIT_CASE(test_aes192_decrypt),
+	KUNIT_CASE(test_aes256_encrypt),
+	KUNIT_CASE(test_aes256_decrypt),
+	KUNIT_CASE(benchmark_aes128),
+	KUNIT_CASE(benchmark_aes192),
+	KUNIT_CASE(benchmark_aes256),
+	{},
+};
+
+static struct kunit_suite aes_test_suite = {
+	.name = "aes",
+	.test_cases = aes_test_cases,
+};
+
+kunit_test_suite(aes_test_suite);
+
+MODULE_DESCRIPTION("KUnit tests and benchmark aes library");
+MODULE_LICENSE("GPL");
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES
  2026-01-15 18:38 ` [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests " Holger Dengler
@ 2026-01-15 20:43   ` Eric Biggers
  2026-01-15 21:51     ` Holger Dengler
  2026-01-15 22:05     ` David Laight
  2026-01-16  0:25   ` kernel test robot
  1 sibling, 2 replies; 14+ messages in thread
From: Eric Biggers @ 2026-01-15 20:43 UTC (permalink / raw)
  To: Holger Dengler
  Cc: Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	Harald Freudenberger, linux-kernel, linux-crypto

On Thu, Jan 15, 2026 at 07:38:31PM +0100, Holger Dengler wrote:
> Add a KUnit test suite for AES library functions, including KAT and
> benchmarks.
> 
> Signed-off-by: Holger Dengler <dengler@linux.ibm.com>

The cover letter had some more information.  Could you put it in the
commit message directly?  Normally cover letters aren't used for a
single patch: the explanation should just be in the patch itself.

> diff --git a/lib/crypto/tests/aes-testvecs.h b/lib/crypto/tests/aes-testvecs.h
> new file mode 100644
> index 000000000000..dfa528db7f02
> --- /dev/null
> +++ b/lib/crypto/tests/aes-testvecs.h
> @@ -0,0 +1,78 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _AES_TESTVECS_H
> +#define _AES_TESTVECS_H
> +
> +#include <crypto/aes.h>
> +
> +struct buf {
> +	size_t blen;
> +	u8 b[];
> +};

'struct buf' is never used.

> +static const struct aes_testvector aes128_kat = {

Where do these test vectors come from?  All test vectors should have a
documented source.

> +static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv)
> +{
> +	const size_t num_iters = 10000000;

10000000 iterations is too many.  That's 160 MB of data in each
direction per AES key length.  Some CPUs without AES instructions can do
only ~20 MB AES per second.  In that case, this benchmark would take 16
seconds to run per AES key length, for 48 seconds total.

hash-test-template.h and crc_kunit.c use 10000000 / (len + 128)
iterations.  That would be 69444 in this case (considering len=16),
which is less than 1% of the iterations you've used.  Choosing a number
similar to that would seem more appropriate.

Ultimately these are just made-up numbers.  But I think we should aim
for the benchmark test in each KUnit test suite to take less than a
second or so.  The existing tests roughly achieve that, whereas it seems
this one can go over it by quite a bit due to the 10000000 iterations.

> +	kunit_info(test, "enc (iter. %zu, duration %lluns)",
> +		   num_iters, t_enc);
> +	kunit_info(test, "enc (len=%zu): %llu MB/s",
> +		   (size_t)AES_BLOCK_SIZE,
> +		   div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC,
> +			     (t_enc ?: 1) * SZ_1M));
> +
> +	kunit_info(test, "dec (iter. %zu, duration %lluns)",
> +		   num_iters, t_dec);
> +	kunit_info(test, "dec (len=%zu): %llu MB/s",
> +		   (size_t)AES_BLOCK_SIZE,
> +		   div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC,
> +			     (t_dec ?: 1) * SZ_1M));

Maybe delete the first line of each pair, and switch from power-of-2
megabytes to power-of-10?  That would be consistent with how the other
crypto and CRC benchmarks print their output.

> +MODULE_DESCRIPTION("KUnit tests and benchmark aes library");

"aes library" => "for the AES library"

- Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES
  2026-01-15 20:43   ` Eric Biggers
@ 2026-01-15 21:51     ` Holger Dengler
  2026-01-15 21:58       ` Eric Biggers
  2026-01-15 22:05     ` David Laight
  1 sibling, 1 reply; 14+ messages in thread
From: Holger Dengler @ 2026-01-15 21:51 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	Harald Freudenberger, linux-kernel, linux-crypto

On 15/01/2026 21:43, Eric Biggers wrote:
> On Thu, Jan 15, 2026 at 07:38:31PM +0100, Holger Dengler wrote:
>> Add a KUnit test suite for AES library functions, including KAT and
>> benchmarks.
>>
>> Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
> 
> The cover letter had some more information.  Could you put it in the
> commit message directly?  Normally cover letters aren't used for a
> single patch: the explanation should just be in the patch itself.

Ok, I'll move the explanation to the commit message. I assume that the example
output of the test can be dropped?

>> diff --git a/lib/crypto/tests/aes-testvecs.h b/lib/crypto/tests/aes-testvecs.h
>> new file mode 100644
>> index 000000000000..dfa528db7f02
>> --- /dev/null
>> +++ b/lib/crypto/tests/aes-testvecs.h
>> @@ -0,0 +1,78 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _AES_TESTVECS_H
>> +#define _AES_TESTVECS_H
>> +
>> +#include <crypto/aes.h>
>> +
>> +struct buf {
>> +	size_t blen;
>> +	u8 b[];
>> +};
> 
> 'struct buf' is never used.

This is a left-over, will be removed in the next series.

> 
>> +static const struct aes_testvector aes128_kat = {
> 
> Where do these test vectors come from?  All test vectors should have a
> documented source.

ok, I will add this information as well.

>> +static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv)
>> +{
>> +	const size_t num_iters = 10000000;
> 
> 10000000 iterations is too many.  That's 160 MB of data in each
> direction per AES key length.  Some CPUs without AES instructions can do
> only ~20 MB AES per second.  In that case, this benchmark would take 16
> seconds to run per AES key length, for 48 seconds total.
> 
> hash-test-template.h and crc_kunit.c use 10000000 / (len + 128)
> iterations.  That would be 69444 in this case (considering len=16),
> which is less than 1% of the iterations you've used.  Choosing a number
> similar to that would seem more appropriate.
> 
> Ultimately these are just made-up numbers.  But I think we should aim
> for the benchmark test in each KUnit test suite to take less than a
> second or so.  The existing tests roughly achieve that, whereas it seems
> this one can go over it by quite a bit due to the 10000000 iterations.

As we have a fixed length, I would go stay with a fix value for the iterations
(instead of calculating it based on len).

The benchmark has a separate loop for encrypt and decrypt, so I will do the
half iterations on encrypt and the other half on decrypt. I will also reduce
the iterations for the warm-ups.

What about 100 iterations for each warm-up and 500.000 iterations for each
real measurement? Means processing 2x 8MiB with preemption disabled.

>> +	kunit_info(test, "enc (iter. %zu, duration %lluns)",
>> +		   num_iters, t_enc);
>> +	kunit_info(test, "enc (len=%zu): %llu MB/s",
>> +		   (size_t)AES_BLOCK_SIZE,
>> +		   div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC,
>> +			     (t_enc ?: 1) * SZ_1M));
>> +
>> +	kunit_info(test, "dec (iter. %zu, duration %lluns)",
>> +		   num_iters, t_dec);
>> +	kunit_info(test, "dec (len=%zu): %llu MB/s",
>> +		   (size_t)AES_BLOCK_SIZE,
>> +		   div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC,
>> +			     (t_dec ?: 1) * SZ_1M));
> 
> Maybe delete the first line of each pair, and switch from power-of-2
> megabytes to power-of-10?  That would be consistent with how the other
> crypto and CRC benchmarks print their output.
> 
>> +MODULE_DESCRIPTION("KUnit tests and benchmark aes library");
> 
> "aes library" => "for the AES library"

ok

-- 
Mit freundlichen Grüßen / Kind regards
Holger Dengler
--
IBM Systems, Linux on IBM Z Development
dengler@linux.ibm.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES
  2026-01-15 21:51     ` Holger Dengler
@ 2026-01-15 21:58       ` Eric Biggers
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Biggers @ 2026-01-15 21:58 UTC (permalink / raw)
  To: Holger Dengler
  Cc: Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	Harald Freudenberger, linux-kernel, linux-crypto

On Thu, Jan 15, 2026 at 10:51:25PM +0100, Holger Dengler wrote:
> On 15/01/2026 21:43, Eric Biggers wrote:
> > On Thu, Jan 15, 2026 at 07:38:31PM +0100, Holger Dengler wrote:
> >> Add a KUnit test suite for AES library functions, including KAT and
> >> benchmarks.
> >>
> >> Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
> > 
> > The cover letter had some more information.  Could you put it in the
> > commit message directly?  Normally cover letters aren't used for a
> > single patch: the explanation should just be in the patch itself.
> 
> Ok, I'll move the explanation to the commit message. I assume that the example
> output of the test can be dropped?

Yes, that's fine.

> > 10000000 iterations is too many.  That's 160 MB of data in each
> > direction per AES key length.  Some CPUs without AES instructions can do
> > only ~20 MB AES per second.  In that case, this benchmark would take 16
> > seconds to run per AES key length, for 48 seconds total.
> > 
> > hash-test-template.h and crc_kunit.c use 10000000 / (len + 128)
> > iterations.  That would be 69444 in this case (considering len=16),
> > which is less than 1% of the iterations you've used.  Choosing a number
> > similar to that would seem more appropriate.
> > 
> > Ultimately these are just made-up numbers.  But I think we should aim
> > for the benchmark test in each KUnit test suite to take less than a
> > second or so.  The existing tests roughly achieve that, whereas it seems
> > this one can go over it by quite a bit due to the 10000000 iterations.
> 
> As we have a fixed length, I would go stay with a fix value for the iterations
> (instead of calculating it based on len).
> 
> The benchmark has a separate loop for encrypt and decrypt, so I will do the
> half iterations on encrypt and the other half on decrypt. I will also reduce
> the iterations for the warm-ups.
> 
> What about 100 iterations for each warm-up and 500.000 iterations for each
> real measurement? Means processing 2x 8MiB with preemption disabled.

I'd suggest 50000 for each direction as well as the warm-up loop.

- Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES
  2026-01-15 20:43   ` Eric Biggers
  2026-01-15 21:51     ` Holger Dengler
@ 2026-01-15 22:05     ` David Laight
  2026-01-16 17:31       ` Holger Dengler
  1 sibling, 1 reply; 14+ messages in thread
From: David Laight @ 2026-01-15 22:05 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Holger Dengler, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	Harald Freudenberger, linux-kernel, linux-crypto

On Thu, 15 Jan 2026 12:43:32 -0800
Eric Biggers <ebiggers@kernel.org> wrote:

> On Thu, Jan 15, 2026 at 07:38:31PM +0100, Holger Dengler wrote:
> > Add a KUnit test suite for AES library functions, including KAT and
> > benchmarks.
> > 
> > Signed-off-by: Holger Dengler <dengler@linux.ibm.com>  
> 
> The cover letter had some more information.  Could you put it in the
> commit message directly?  Normally cover letters aren't used for a
> single patch: the explanation should just be in the patch itself.
> 
> > diff --git a/lib/crypto/tests/aes-testvecs.h b/lib/crypto/tests/aes-testvecs.h
> > new file mode 100644
> > index 000000000000..dfa528db7f02
> > --- /dev/null
> > +++ b/lib/crypto/tests/aes-testvecs.h
> > @@ -0,0 +1,78 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _AES_TESTVECS_H
> > +#define _AES_TESTVECS_H
> > +
> > +#include <crypto/aes.h>
> > +
> > +struct buf {
> > +	size_t blen;
> > +	u8 b[];
> > +};  
> 
> 'struct buf' is never used.
> 
> > +static const struct aes_testvector aes128_kat = {  
> 
> Where do these test vectors come from?  All test vectors should have a
> documented source.
> 
> > +static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv)
> > +{
> > +	const size_t num_iters = 10000000;  
> 
> 10000000 iterations is too many.  That's 160 MB of data in each
> direction per AES key length.  Some CPUs without AES instructions can do
> only ~20 MB AES per second.  In that case, this benchmark would take 16
> seconds to run per AES key length, for 48 seconds total.

Probably best to first do a test that would take a 'reasonable' time
on a cpu without AES. If that is 'very fast' then do a longer test
to get more accuracy on a faster implementation.

> 
> hash-test-template.h and crc_kunit.c use 10000000 / (len + 128)
> iterations.  That would be 69444 in this case (considering len=16),
> which is less than 1% of the iterations you've used.  Choosing a number
> similar to that would seem more appropriate.
> 
> Ultimately these are just made-up numbers.  But I think we should aim
> for the benchmark test in each KUnit test suite to take less than a
> second or so.  The existing tests roughly achieve that, whereas it seems
> this one can go over it by quite a bit due to the 10000000 iterations.

Even 1 second is a long time, you end up getting multiple interrupts included.
I think a lot of these benchmarks are far too long.
Timing differences less that 1% can be created by scheduling noise.
Running a test that takes 200 'quanta' of the timer used has an
error margin of under 1% (100 quanta might be enough).
While the kernel timestamps have a resolution of 1ns the accuracy is worse.
If you run a test for even just 10us you ought to get reasonable accuracy
with a reasonable hope of not getting an interrupt.
Run the test 10 times and report the fastest value.

You'll then find the results are entirely unstable because the cpu clock
frequency keeps changing.
And long enough buffers can get limited by the d-cache loads.

For something as slow as AES you can count the number of cpu cycles for
a single call and get a reasonably consistent figure.
That will tell you whether the loop is running at the speed you might
expect it to run at.
(You need to use data dependencies between the start/end 'times' and
start/end of the code being timed, x86 lfence/mfence are too slow and
can hide the 'setup' cost of some instructions.)

	David


> 
> > +	kunit_info(test, "enc (iter. %zu, duration %lluns)",
> > +		   num_iters, t_enc);
> > +	kunit_info(test, "enc (len=%zu): %llu MB/s",
> > +		   (size_t)AES_BLOCK_SIZE,
> > +		   div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC,
> > +			     (t_enc ?: 1) * SZ_1M));
> > +
> > +	kunit_info(test, "dec (iter. %zu, duration %lluns)",
> > +		   num_iters, t_dec);
> > +	kunit_info(test, "dec (len=%zu): %llu MB/s",
> > +		   (size_t)AES_BLOCK_SIZE,
> > +		   div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC,
> > +			     (t_dec ?: 1) * SZ_1M));  
> 
> Maybe delete the first line of each pair, and switch from power-of-2
> megabytes to power-of-10?  That would be consistent with how the other
> crypto and CRC benchmarks print their output.
> 
> > +MODULE_DESCRIPTION("KUnit tests and benchmark aes library");  
> 
> "aes library" => "for the AES library"
> 
> - Eric
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES
  2026-01-15 18:38 ` [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests " Holger Dengler
  2026-01-15 20:43   ` Eric Biggers
@ 2026-01-16  0:25   ` kernel test robot
  1 sibling, 0 replies; 14+ messages in thread
From: kernel test robot @ 2026-01-16  0:25 UTC (permalink / raw)
  To: Holger Dengler, Eric Biggers
  Cc: llvm, oe-kbuild-all, Ard Biesheuvel, Jason A . Donenfeld,
	Herbert Xu, Harald Freudenberger, Holger Dengler, linux-kernel,
	linux-crypto

Hi Holger,

kernel test robot noticed the following build errors:

[auto build test ERROR on 47753e09a15d9fd7cdf114550510f4f2af9333ec]

url:    https://github.com/intel-lab-lkp/linux/commits/Holger-Dengler/lib-crypto-tests-Add-KUnit-tests-for-AES/20260116-030041
base:   47753e09a15d9fd7cdf114550510f4f2af9333ec
patch link:    https://lore.kernel.org/r/20260115183831.72010-2-dengler%40linux.ibm.com
patch subject: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES
config: hexagon-randconfig-002-20260116 (https://download.01.org/0day-ci/archive/20260116/202601160822.yLaBf86R-lkp@intel.com/config)
compiler: clang version 19.1.7 (https://github.com/llvm/llvm-project cd708029e0b2869e80abe31ddb175f7c35361f90)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260116/202601160822.yLaBf86R-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202601160822.yLaBf86R-lkp@intel.com/

All errors (new ones prefixed by >>):

>> lib/crypto/tests/aes_kunit.c:73:24: error: use of undeclared identifier 'SZ_1M'
      73 |                              (t_enc ?: 1) * SZ_1M));
         |                                             ^
>> lib/crypto/tests/aes_kunit.c:73:24: error: use of undeclared identifier 'SZ_1M'
   lib/crypto/tests/aes_kunit.c:80:24: error: use of undeclared identifier 'SZ_1M'
      80 |                              (t_dec ?: 1) * SZ_1M));
         |                                             ^
   lib/crypto/tests/aes_kunit.c:80:24: error: use of undeclared identifier 'SZ_1M'
   4 errors generated.


vim +/SZ_1M +73 lib/crypto/tests/aes_kunit.c

    27	
    28	static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv)
    29	{
    30		const size_t num_iters = 10000000;
    31		u8 out[AES_BLOCK_SIZE];
    32		struct aes_key aes_key;
    33		u64 t_enc, t_dec;
    34		int rc;
    35	
    36		if (!IS_ENABLED(CONFIG_CRYPTO_LIB_BENCHMARK))
    37			kunit_skip(test, "not enabled");
    38	
    39		rc = aes_preparekey(&aes_key, tv->key.b, tv->key.len);
    40		KUNIT_ASSERT_EQ(test, 0, rc);
    41	
    42		/* warm-up enc */
    43		for (size_t i = 0; i < 1000; i++)
    44			aes_encrypt(&aes_key, out, tv->plain);
    45	
    46		preempt_disable();
    47		t_enc = ktime_get_ns();
    48	
    49		for (size_t i = 0; i < num_iters; i++)
    50			aes_encrypt(&aes_key, out, tv->plain);
    51	
    52		t_enc = ktime_get_ns() - t_enc;
    53		preempt_enable();
    54	
    55		/* warm-up dec */
    56		for (size_t i = 0; i < 1000; i++)
    57			aes_decrypt(&aes_key, out, tv->cipher);
    58	
    59		preempt_disable();
    60		t_dec = ktime_get_ns();
    61	
    62		for (size_t i = 0; i < num_iters; i++)
    63			aes_decrypt(&aes_key, out, tv->cipher);
    64	
    65		t_dec = ktime_get_ns() - t_dec;
    66		preempt_enable();
    67	
    68		kunit_info(test, "enc (iter. %zu, duration %lluns)",
    69			   num_iters, t_enc);
    70		kunit_info(test, "enc (len=%zu): %llu MB/s",
    71			   (size_t)AES_BLOCK_SIZE,
    72			   div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC,
  > 73				     (t_enc ?: 1) * SZ_1M));
    74	
    75		kunit_info(test, "dec (iter. %zu, duration %lluns)",
    76			   num_iters, t_dec);
    77		kunit_info(test, "dec (len=%zu): %llu MB/s",
    78			   (size_t)AES_BLOCK_SIZE,
    79			   div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC,
    80				     (t_dec ?: 1) * SZ_1M));
    81	}
    82	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES
  2026-01-15 22:05     ` David Laight
@ 2026-01-16 17:31       ` Holger Dengler
  2026-01-16 18:37         ` David Laight
  0 siblings, 1 reply; 14+ messages in thread
From: Holger Dengler @ 2026-01-16 17:31 UTC (permalink / raw)
  To: David Laight, Eric Biggers
  Cc: Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	Harald Freudenberger, linux-kernel, linux-crypto

Hi David,

On 15/01/2026 23:05, David Laight wrote:
> On Thu, 15 Jan 2026 12:43:32 -0800
> Eric Biggers <ebiggers@kernel.org> wrote:
>>> +static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv)
>>> +{
>>> +	const size_t num_iters = 10000000;  
>>
>> 10000000 iterations is too many.  That's 160 MB of data in each
>> direction per AES key length.  Some CPUs without AES instructions can do
>> only ~20 MB AES per second.  In that case, this benchmark would take 16
>> seconds to run per AES key length, for 48 seconds total.
> 
> Probably best to first do a test that would take a 'reasonable' time
> on a cpu without AES. If that is 'very fast' then do a longer test
> to get more accuracy on a faster implementation.
> 
>>
>> hash-test-template.h and crc_kunit.c use 10000000 / (len + 128)
>> iterations.  That would be 69444 in this case (considering len=16),
>> which is less than 1% of the iterations you've used.  Choosing a number
>> similar to that would seem more appropriate.
>>
>> Ultimately these are just made-up numbers.  But I think we should aim
>> for the benchmark test in each KUnit test suite to take less than a
>> second or so.  The existing tests roughly achieve that, whereas it seems
>> this one can go over it by quite a bit due to the 10000000 iterations.
> 
> Even 1 second is a long time, you end up getting multiple interrupts included.
> I think a lot of these benchmarks are far too long.
> Timing differences less that 1% can be created by scheduling noise.
> Running a test that takes 200 'quanta' of the timer used has an
> error margin of under 1% (100 quanta might be enough).
> While the kernel timestamps have a resolution of 1ns the accuracy is worse.
> If you run a test for even just 10us you ought to get reasonable accuracy
> with a reasonable hope of not getting an interrupt.
> Run the test 10 times and report the fastest value.
> 
> You'll then find the results are entirely unstable because the cpu clock
> frequency keeps changing.
> And long enough buffers can get limited by the d-cache loads.
> 
> For something as slow as AES you can count the number of cpu cycles for
> a single call and get a reasonably consistent figure.
> That will tell you whether the loop is running at the speed you might
> expect it to run at.
> (You need to use data dependencies between the start/end 'times' and
> start/end of the code being timed, x86 lfence/mfence are too slow and
> can hide the 'setup' cost of some instructions.)

Thanks a lot for your feedback. I tried a few of your ideas and it turns out,
that they work quite well. First of all, with a single-block aes
encrypt/decrypt in our hardware (CPACF), we're very close to the resolution of
our CPU clock.

Disclaimer: The encryption/decryption of one block takes ~32ns (~500MB/s).
These numbers should be taken with some care, as on s390 the operating system
always runs virtualized. In my test environment, I also only have access to a
machine with shared CPUs, so there might be some negative impact from other
workload.

The benchmark loops for 100 iterations now without any warm-up. In each
iteration, I measure a single aes_encrypt()/aes_decrypt() call. The lowest
value of these measurements is takes as the value for the bandwidth
calculations. Although it is not necessary in my environment, I'm doing all
iterations with preemption disabled. I think, that this might help on other
platforms to reduce the jitter of the measurement values.

The removal of the warm-up does not have any impact on the numbers.

Just for information: I also tried to measure the cycles with the same
results. The minimal measurement value of a few iterations is much more stable
that the average over a larger number of iterations.

I also did some tests with IRQs disabled (instead of only preemption), but the
numbers stay the same. So I think, it is save enough to stay with disables
preemption.

I also tried you idea, first to do a few measurements and if they are fast
enough, increase the number of iterations. But it turns out, that this it not
really necessary (at least in my env). But I can add this, it it makes sense
on other platforms.

-- 
Mit freundlichen Grüßen / Kind regards
Holger Dengler
--
IBM Systems, Linux on IBM Z Development
dengler@linux.ibm.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES
  2026-01-16 17:31       ` Holger Dengler
@ 2026-01-16 18:37         ` David Laight
  2026-01-16 19:20           ` Holger Dengler
  0 siblings, 1 reply; 14+ messages in thread
From: David Laight @ 2026-01-16 18:37 UTC (permalink / raw)
  To: Holger Dengler
  Cc: Eric Biggers, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	Harald Freudenberger, linux-kernel, linux-crypto

On Fri, 16 Jan 2026 18:31:58 +0100
Holger Dengler <dengler@linux.ibm.com> wrote:

> Hi David,
> 
> On 15/01/2026 23:05, David Laight wrote:
> > On Thu, 15 Jan 2026 12:43:32 -0800
> > Eric Biggers <ebiggers@kernel.org> wrote:  
> >>> +static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv)
> >>> +{
> >>> +	const size_t num_iters = 10000000;    
> >>
> >> 10000000 iterations is too many.  That's 160 MB of data in each
> >> direction per AES key length.  Some CPUs without AES instructions can do
> >> only ~20 MB AES per second.  In that case, this benchmark would take 16
> >> seconds to run per AES key length, for 48 seconds total.  
> > 
> > Probably best to first do a test that would take a 'reasonable' time
> > on a cpu without AES. If that is 'very fast' then do a longer test
> > to get more accuracy on a faster implementation.
> >   
> >>
> >> hash-test-template.h and crc_kunit.c use 10000000 / (len + 128)
> >> iterations.  That would be 69444 in this case (considering len=16),
> >> which is less than 1% of the iterations you've used.  Choosing a number
> >> similar to that would seem more appropriate.
> >>
> >> Ultimately these are just made-up numbers.  But I think we should aim
> >> for the benchmark test in each KUnit test suite to take less than a
> >> second or so.  The existing tests roughly achieve that, whereas it seems
> >> this one can go over it by quite a bit due to the 10000000 iterations.  
> > 
> > Even 1 second is a long time, you end up getting multiple interrupts included.
> > I think a lot of these benchmarks are far too long.
> > Timing differences less that 1% can be created by scheduling noise.
> > Running a test that takes 200 'quanta' of the timer used has an
> > error margin of under 1% (100 quanta might be enough).
> > While the kernel timestamps have a resolution of 1ns the accuracy is worse.
> > If you run a test for even just 10us you ought to get reasonable accuracy
> > with a reasonable hope of not getting an interrupt.
> > Run the test 10 times and report the fastest value.
> > 
> > You'll then find the results are entirely unstable because the cpu clock
> > frequency keeps changing.
> > And long enough buffers can get limited by the d-cache loads.
> > 
> > For something as slow as AES you can count the number of cpu cycles for
> > a single call and get a reasonably consistent figure.
> > That will tell you whether the loop is running at the speed you might
> > expect it to run at.
> > (You need to use data dependencies between the start/end 'times' and
> > start/end of the code being timed, x86 lfence/mfence are too slow and
> > can hide the 'setup' cost of some instructions.)  
> 
> Thanks a lot for your feedback. I tried a few of your ideas and it turns out,
> that they work quite well. First of all, with a single-block aes
> encrypt/decrypt in our hardware (CPACF), we're very close to the resolution of
> our CPU clock.
> 
> Disclaimer: The encryption/decryption of one block takes ~32ns (~500MB/s).
> These numbers should be taken with some care, as on s390 the operating system
> always runs virtualized. In my test environment, I also only have access to a
> machine with shared CPUs, so there might be some negative impact from other
> workload.

The impact of other workloads is much less likely for a short test,
and if it does happen you are likely to see a value that is abnormally large.

> The benchmark loops for 100 iterations now without any warm-up. In each
> iteration, I measure a single aes_encrypt()/aes_decrypt() call. The lowest
> value of these measurements is takes as the value for the bandwidth
> calculations. Although it is not necessary in my environment, I'm doing all
> iterations with preemption disabled. I think, that this might help on other
> platforms to reduce the jitter of the measurement values.
> 
> The removal of the warm-up does not have any impact on the numbers.

I'm not sure what the 'warm-up' was for.
The first test will be slow(er) due to I-cache misses.
(That will be more noticeable for big software loops - like blake2.)
Change to test parameters can affect branch prediction but that also only
usually affects the first test with each set of parameters.
(Unlikely to affect AES, but I could see that effect when testing
mul_u64_u64_div_u64().)
The only other reason for a 'warm-up' is to get the cpu frequency fast
and fixed - and there ought to be a better way of doing that.

> 
> Just for information: I also tried to measure the cycles with the same
> results. The minimal measurement value of a few iterations is much more stable
> that the average over a larger number of iterations.

My userspace test code runs each test 10 times and prints all 10 values.
I then look at them to see how consistent they are.

> I also did some tests with IRQs disabled (instead of only preemption), but the
> numbers stay the same. So I think, it is save enough to stay with disables
> preemption.

I'd actually go for disabling interrupts.
What you are seeing is the effect of interrupts not happening
(which is likely for a short test, but not for a long one).

> 
> I also tried you idea, first to do a few measurements and if they are fast
> enough, increase the number of iterations. But it turns out, that this it not
> really necessary (at least in my env). But I can add this, it it makes sense
> on other platforms.

The main reason for doing that is reducing the time the tests take on a
system that is massively slower (and doing software AES).
Maybe someone want to run the test cases on an m68k :-)

	David



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES
  2026-01-16 18:37         ` David Laight
@ 2026-01-16 19:20           ` Holger Dengler
  2026-01-16 19:44             ` Eric Biggers
  0 siblings, 1 reply; 14+ messages in thread
From: Holger Dengler @ 2026-01-16 19:20 UTC (permalink / raw)
  To: David Laight
  Cc: Eric Biggers, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	Harald Freudenberger, linux-kernel, linux-crypto

On 16/01/2026 19:37, David Laight wrote:
> On Fri, 16 Jan 2026 18:31:58 +0100
> Holger Dengler <dengler@linux.ibm.com> wrote:
> 
>> Hi David,
>>
>> On 15/01/2026 23:05, David Laight wrote:
>>> On Thu, 15 Jan 2026 12:43:32 -0800
>>> Eric Biggers <ebiggers@kernel.org> wrote:  
>>>>> +static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv)
>>>>> +{
>>>>> +	const size_t num_iters = 10000000;    
>>>>
>>>> 10000000 iterations is too many.  That's 160 MB of data in each
>>>> direction per AES key length.  Some CPUs without AES instructions can do
>>>> only ~20 MB AES per second.  In that case, this benchmark would take 16
>>>> seconds to run per AES key length, for 48 seconds total.  
>>>
>>> Probably best to first do a test that would take a 'reasonable' time
>>> on a cpu without AES. If that is 'very fast' then do a longer test
>>> to get more accuracy on a faster implementation.
>>>   
>>>>
>>>> hash-test-template.h and crc_kunit.c use 10000000 / (len + 128)
>>>> iterations.  That would be 69444 in this case (considering len=16),
>>>> which is less than 1% of the iterations you've used.  Choosing a number
>>>> similar to that would seem more appropriate.
>>>>
>>>> Ultimately these are just made-up numbers.  But I think we should aim
>>>> for the benchmark test in each KUnit test suite to take less than a
>>>> second or so.  The existing tests roughly achieve that, whereas it seems
>>>> this one can go over it by quite a bit due to the 10000000 iterations.  
>>>
>>> Even 1 second is a long time, you end up getting multiple interrupts included.
>>> I think a lot of these benchmarks are far too long.
>>> Timing differences less that 1% can be created by scheduling noise.
>>> Running a test that takes 200 'quanta' of the timer used has an
>>> error margin of under 1% (100 quanta might be enough).
>>> While the kernel timestamps have a resolution of 1ns the accuracy is worse.
>>> If you run a test for even just 10us you ought to get reasonable accuracy
>>> with a reasonable hope of not getting an interrupt.
>>> Run the test 10 times and report the fastest value.
>>>
>>> You'll then find the results are entirely unstable because the cpu clock
>>> frequency keeps changing.
>>> And long enough buffers can get limited by the d-cache loads.
>>>
>>> For something as slow as AES you can count the number of cpu cycles for
>>> a single call and get a reasonably consistent figure.
>>> That will tell you whether the loop is running at the speed you might
>>> expect it to run at.
>>> (You need to use data dependencies between the start/end 'times' and
>>> start/end of the code being timed, x86 lfence/mfence are too slow and
>>> can hide the 'setup' cost of some instructions.)  
>>
>> Thanks a lot for your feedback. I tried a few of your ideas and it turns out,
>> that they work quite well. First of all, with a single-block aes
>> encrypt/decrypt in our hardware (CPACF), we're very close to the resolution of
>> our CPU clock.
>>
>> Disclaimer: The encryption/decryption of one block takes ~32ns (~500MB/s).
>> These numbers should be taken with some care, as on s390 the operating system
>> always runs virtualized. In my test environment, I also only have access to a
>> machine with shared CPUs, so there might be some negative impact from other
>> workload.
> 
> The impact of other workloads is much less likely for a short test,
> and if it does happen you are likely to see a value that is abnormally large.
> 
>> The benchmark loops for 100 iterations now without any warm-up. In each
>> iteration, I measure a single aes_encrypt()/aes_decrypt() call. The lowest
>> value of these measurements is takes as the value for the bandwidth
>> calculations. Although it is not necessary in my environment, I'm doing all
>> iterations with preemption disabled. I think, that this might help on other
>> platforms to reduce the jitter of the measurement values.
>>
>> The removal of the warm-up does not have any impact on the numbers.
> 
> I'm not sure what the 'warm-up' was for.
> The first test will be slow(er) due to I-cache misses.
> (That will be more noticeable for big software loops - like blake2.)
> Change to test parameters can affect branch prediction but that also only
> usually affects the first test with each set of parameters.
> (Unlikely to affect AES, but I could see that effect when testing
> mul_u64_u64_div_u64().)
> The only other reason for a 'warm-up' is to get the cpu frequency fast
> and fixed - and there ought to be a better way of doing that.
> 
>>
>> Just for information: I also tried to measure the cycles with the same
>> results. The minimal measurement value of a few iterations is much more stable
>> that the average over a larger number of iterations.
> 
> My userspace test code runs each test 10 times and prints all 10 values.
> I then look at them to see how consistent they are.
> 
>> I also did some tests with IRQs disabled (instead of only preemption), but the
>> numbers stay the same. So I think, it is save enough to stay with disables
>> preemption.
> 
> I'd actually go for disabling interrupts.
> What you are seeing is the effect of interrupts not happening
> (which is likely for a short test, but not for a long one).

Ok, I'll send the next series with IRQ disabled. I don't see any difference on
my systems.

>> I also tried you idea, first to do a few measurements and if they are fast
>> enough, increase the number of iterations. But it turns out, that this it not
>> really necessary (at least in my env). But I can add this, it it makes sense
>> on other platforms.
> 
> The main reason for doing that is reducing the time the tests take on a
> system that is massively slower (and doing software AES).
> Maybe someone want to run the test cases on an m68k :-)

So I've currently 100 iterations. The first one or two iterations will be for
the warm-up (cache misses, branch prediction, etc). But with the interrupts
disabled, the rest of the iterations should give us enough stable measurements
for the benchmark. Maybe it would be worth to test the next version of teh
test on other platforms as well.

-- 
Mit freundlichen Grüßen / Kind regards
Holger Dengler
--
IBM Systems, Linux on IBM Z Development
dengler@linux.ibm.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES
  2026-01-16 19:20           ` Holger Dengler
@ 2026-01-16 19:44             ` Eric Biggers
  2026-01-16 20:55               ` Holger Dengler
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Biggers @ 2026-01-16 19:44 UTC (permalink / raw)
  To: Holger Dengler
  Cc: David Laight, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	Harald Freudenberger, linux-kernel, linux-crypto

On Fri, Jan 16, 2026 at 08:20:51PM +0100, Holger Dengler wrote:
> >> The benchmark loops for 100 iterations now without any warm-up. In each
> >> iteration, I measure a single aes_encrypt()/aes_decrypt() call. The lowest
> >> value of these measurements is takes as the value for the bandwidth
> >> calculations. Although it is not necessary in my environment, I'm doing all
> >> iterations with preemption disabled. I think, that this might help on other
> >> platforms to reduce the jitter of the measurement values.
> >>
> >> The removal of the warm-up does not have any impact on the numbers.
> > 
> > I'm not sure what the 'warm-up' was for.
> > The first test will be slow(er) due to I-cache misses.
> > (That will be more noticeable for big software loops - like blake2.)
> > Change to test parameters can affect branch prediction but that also only
> > usually affects the first test with each set of parameters.
> > (Unlikely to affect AES, but I could see that effect when testing
> > mul_u64_u64_div_u64().)
> > The only other reason for a 'warm-up' is to get the cpu frequency fast
> > and fixed - and there ought to be a better way of doing that.

The warm-up loops in the existing benchmarks are both for cache warming
and to get the CPU frequency fast and fixed.  It's not anything
sophisticated, but rather just something that's simple and seems to
works well enough across CPUs without depending on any special APIs.  If
your CPU doesn't do much frequency scaling, you may not notice a
difference, but other CPUs may need it.

> >> I also did some tests with IRQs disabled (instead of only preemption), but the
> >> numbers stay the same. So I think, it is save enough to stay with disables
> >> preemption.
> > 
> > I'd actually go for disabling interrupts.
> > What you are seeing is the effect of interrupts not happening
> > (which is likely for a short test, but not for a long one).
> 
> Ok, I'll send the next series with IRQ disabled. I don't see any difference on
> my systems.

Some architectures don't allow vector registers to be used when IRQs are
disabled.  On those architectures, disabling IRQs would always trigger
the fallback to the generic code, which would make the benchmark not
very useful.  That's why I've only been disabling preemption, not IRQs.

- Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES
  2026-01-16 19:44             ` Eric Biggers
@ 2026-01-16 20:55               ` Holger Dengler
  2026-01-16 22:30                 ` David Laight
  0 siblings, 1 reply; 14+ messages in thread
From: Holger Dengler @ 2026-01-16 20:55 UTC (permalink / raw)
  To: Eric Biggers
  Cc: David Laight, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	Harald Freudenberger, linux-kernel, linux-crypto

On 16/01/2026 20:44, Eric Biggers wrote:
> On Fri, Jan 16, 2026 at 08:20:51PM +0100, Holger Dengler wrote:
>>>> The benchmark loops for 100 iterations now without any warm-up. In each
>>>> iteration, I measure a single aes_encrypt()/aes_decrypt() call. The lowest
>>>> value of these measurements is takes as the value for the bandwidth
>>>> calculations. Although it is not necessary in my environment, I'm doing all
>>>> iterations with preemption disabled. I think, that this might help on other
>>>> platforms to reduce the jitter of the measurement values.
>>>>
>>>> The removal of the warm-up does not have any impact on the numbers.
>>>
>>> I'm not sure what the 'warm-up' was for.
>>> The first test will be slow(er) due to I-cache misses.
>>> (That will be more noticeable for big software loops - like blake2.)
>>> Change to test parameters can affect branch prediction but that also only
>>> usually affects the first test with each set of parameters.
>>> (Unlikely to affect AES, but I could see that effect when testing
>>> mul_u64_u64_div_u64().)
>>> The only other reason for a 'warm-up' is to get the cpu frequency fast
>>> and fixed - and there ought to be a better way of doing that.
> 
> The warm-up loops in the existing benchmarks are both for cache warming
> and to get the CPU frequency fast and fixed.  It's not anything
> sophisticated, but rather just something that's simple and seems to
> works well enough across CPUs without depending on any special APIs.  If
> your CPU doesn't do much frequency scaling, you may not notice a
> difference, but other CPUs may need it.

Do you have a gut feeling how many iterations it takes to get the CPU speed
up? If it takes less than 50 iterations, it would be sufficient with the new
method.

>>>> I also did some tests with IRQs disabled (instead of only preemption), but the
>>>> numbers stay the same. So I think, it is save enough to stay with disables
>>>> preemption.
>>>
>>> I'd actually go for disabling interrupts.
>>> What you are seeing is the effect of interrupts not happening
>>> (which is likely for a short test, but not for a long one).
>>
>> Ok, I'll send the next series with IRQ disabled. I don't see any difference on
>> my systems.
> 
> Some architectures don't allow vector registers to be used when IRQs are
> disabled.  On those architectures, disabling IRQs would always trigger
> the fallback to the generic code, which would make the benchmark not
> very useful.  That's why I've only been disabling preemption, not IRQs.

Ok, this is a very strong argument against disabling IRQs.

-- 
Mit freundlichen Grüßen / Kind regards
Holger Dengler
--
IBM Systems, Linux on IBM Z Development
dengler@linux.ibm.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES
  2026-01-16 20:55               ` Holger Dengler
@ 2026-01-16 22:30                 ` David Laight
  2026-01-17 23:59                   ` Eric Biggers
  0 siblings, 1 reply; 14+ messages in thread
From: David Laight @ 2026-01-16 22:30 UTC (permalink / raw)
  To: Holger Dengler
  Cc: Eric Biggers, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	Harald Freudenberger, linux-kernel, linux-crypto

On Fri, 16 Jan 2026 21:55:04 +0100
Holger Dengler <dengler@linux.ibm.com> wrote:

> On 16/01/2026 20:44, Eric Biggers wrote:
...
> > The warm-up loops in the existing benchmarks are both for cache warming
> > and to get the CPU frequency fast and fixed.  It's not anything
> > sophisticated, but rather just something that's simple and seems to
> > works well enough across CPUs without depending on any special APIs.  If
> > your CPU doesn't do much frequency scaling, you may not notice a
> > difference, but other CPUs may need it.  
> 
> Do you have a gut feeling how many iterations it takes to get the CPU speed
> up? If it takes less than 50 iterations, it would be sufficient with the new
> method.

It may not matter what you do to get the cpu speed fixed.
Looping calling ktime_get_ns() for 'long enough' should do it.
That would be test independent but the 'long enough' very
cpu dependent.
The benchmarks probably ought to have some common API - even if it
just in the kunit code.

The advantage of counting cpu clocks is the frequency then doesn't
matter as much - L1 cache miss timings might change.

The difficulty is finding a cpu clock counter. Architecture dependent
and may not exist (you don't want the fixed frequency 'sanitised' TSC).

	David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES
  2026-01-16 22:30                 ` David Laight
@ 2026-01-17 23:59                   ` Eric Biggers
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Biggers @ 2026-01-17 23:59 UTC (permalink / raw)
  To: David Laight
  Cc: Holger Dengler, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu,
	Harald Freudenberger, linux-kernel, linux-crypto

On Fri, Jan 16, 2026 at 10:30:15PM +0000, David Laight wrote:
> It may not matter what you do to get the cpu speed fixed.
> Looping calling ktime_get_ns() for 'long enough' should do it.

For the CPU frequency, sure.  But as I mentioned, the warm-up loop is
also intended to load the target code into cache, as the benchmarks are
intended to measure the warm cache case.  Yes, that part only needs one
call, but the loop accomplishes both.

> That would be test independent but the 'long enough' very
> cpu dependent.
> The benchmarks probably ought to have some common API - even if it
> just in the kunit code.
> 
> The advantage of counting cpu clocks is the frequency then doesn't
> matter as much - L1 cache miss timings might change.
> 
> The difficulty is finding a cpu clock counter. Architecture dependent
> and may not exist (you don't want the fixed frequency 'sanitised' TSC).

Yes, not all architectures supported by Linux have a high-resolution
timer or cycle counter.  IIUC, for some the best resolution available is
that of "jiffies", which can increment as infrequently as once per 10
ms.  On such kernels, the benchmark naturally needs to run for
significantly longer than that to get a reasonably accurate time.

I certainly agree that the benchmarking code I've written is ad-hoc.
But at the same time, there's a bit more reasoning behind it than you
might think.  The "obvious" improvements suggested in this thread
(disabling IRQs, doing only 1 warm-up iteration, doing only 100
iterations) make assumptions that are not true on many systems.

- Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-01-17 23:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-15 18:38 [PATCH v1 0/1] lib/crypto: tests: KUnit test-suite for AES Holger Dengler
2026-01-15 18:38 ` [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests " Holger Dengler
2026-01-15 20:43   ` Eric Biggers
2026-01-15 21:51     ` Holger Dengler
2026-01-15 21:58       ` Eric Biggers
2026-01-15 22:05     ` David Laight
2026-01-16 17:31       ` Holger Dengler
2026-01-16 18:37         ` David Laight
2026-01-16 19:20           ` Holger Dengler
2026-01-16 19:44             ` Eric Biggers
2026-01-16 20:55               ` Holger Dengler
2026-01-16 22:30                 ` David Laight
2026-01-17 23:59                   ` Eric Biggers
2026-01-16  0:25   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox