* [PATCH v1 0/1] lib/crypto: tests: KUnit test-suite for AES @ 2026-01-15 18:38 Holger Dengler 2026-01-15 18:38 ` [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests " Holger Dengler 0 siblings, 1 reply; 14+ messages in thread From: Holger Dengler @ 2026-01-15 18:38 UTC (permalink / raw) To: Eric Biggers Cc: Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu, Harald Freudenberger, Holger Dengler, linux-kernel, linux-crypto The following patch adds a kunit tests for the aes library. It does a very minimal verification of the aes operation for all key-sizes. The benchmarks, which are also part of the test-suite, can be used to get some rough performance measurements of the aes encrypt and decrypt functions. The aes_prepare*key() APIs are not covered by the benchmarks. Changes since RFC [1]: - reorder entries in Kconfig/Makefile alphabetically - fail (instead of skip) in case of failures in key preparation - replace local constant definitions - replace macros with helper functions [1] https://lore.kernel.org/linux-crypto/20260114153138.4896-1-dengler@linux.ibm.com Example output of the aes_kunit test-suite: [ 178.640161] KTAP version 1 [ 178.640167] 1..1 [ 178.642268] KTAP version 1 [ 178.642269] # Subtest: aes [ 178.642270] # module: aes_kunit [ 178.642272] 1..9 [ 178.642330] ok 1 test_aes128_encrypt [ 178.642378] ok 2 test_aes128_decrypt [ 178.642427] ok 3 test_aes192_encrypt [ 178.642477] ok 4 test_aes192_decrypt [ 178.642531] ok 5 test_aes256_encrypt [ 178.642584] ok 6 test_aes256_decrypt [ 179.345355] # benchmark_aes128: enc (iter. 10000000, duration 351128093ns) [ 179.345359] # benchmark_aes128: enc (len=16): 434 MB/s [ 179.345361] # benchmark_aes128: dec (iter. 10000000, duration 351541596ns) [ 179.345363] # benchmark_aes128: dec (len=16): 434 MB/s [ 179.345398] ok 7 benchmark_aes128 [ 180.082939] # benchmark_aes192: enc (iter. 10000000, duration 370559483ns) [ 180.082942] # benchmark_aes192: enc (len=16): 411 MB/s [ 180.082944] # benchmark_aes192: dec (iter. 10000000, duration 366888529ns) [ 180.082946] # benchmark_aes192: dec (len=16): 415 MB/s [ 180.082982] ok 8 benchmark_aes192 [ 180.810447] # benchmark_aes256: enc (iter. 10000000, duration 363703684ns) [ 180.810450] # benchmark_aes256: enc (len=16): 419 MB/s [ 180.810452] # benchmark_aes256: dec (iter. 10000000, duration 363671689ns) [ 180.810454] # benchmark_aes256: dec (len=16): 419 MB/s [ 180.810490] ok 9 benchmark_aes256 [ 180.810495] # aes: pass:9 fail:0 skip:0 total:9 [ 180.810498] # Totals: pass:9 fail:0 skip:0 total:9 [ 180.810500] ok 1 aes Holger Dengler (1): lib/crypto: tests: Add KUnit tests for AES lib/crypto/tests/Kconfig | 12 +++ lib/crypto/tests/Makefile | 1 + lib/crypto/tests/aes-testvecs.h | 78 +++++++++++++++++ lib/crypto/tests/aes_kunit.c | 149 ++++++++++++++++++++++++++++++++ 4 files changed, 240 insertions(+) create mode 100644 lib/crypto/tests/aes-testvecs.h create mode 100644 lib/crypto/tests/aes_kunit.c base-commit: 47753e09a15d9fd7cdf114550510f4f2af9333ec -- 2.51.0 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES 2026-01-15 18:38 [PATCH v1 0/1] lib/crypto: tests: KUnit test-suite for AES Holger Dengler @ 2026-01-15 18:38 ` Holger Dengler 2026-01-15 20:43 ` Eric Biggers 2026-01-16 0:25 ` kernel test robot 0 siblings, 2 replies; 14+ messages in thread From: Holger Dengler @ 2026-01-15 18:38 UTC (permalink / raw) To: Eric Biggers Cc: Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu, Harald Freudenberger, Holger Dengler, linux-kernel, linux-crypto Add a KUnit test suite for AES library functions, including KAT and benchmarks. Signed-off-by: Holger Dengler <dengler@linux.ibm.com> --- lib/crypto/tests/Kconfig | 12 +++ lib/crypto/tests/Makefile | 1 + lib/crypto/tests/aes-testvecs.h | 78 +++++++++++++++++ lib/crypto/tests/aes_kunit.c | 149 ++++++++++++++++++++++++++++++++ 4 files changed, 240 insertions(+) create mode 100644 lib/crypto/tests/aes-testvecs.h create mode 100644 lib/crypto/tests/aes_kunit.c diff --git a/lib/crypto/tests/Kconfig b/lib/crypto/tests/Kconfig index 4970463ea0aa..8ac06b6e2f12 100644 --- a/lib/crypto/tests/Kconfig +++ b/lib/crypto/tests/Kconfig @@ -1,5 +1,17 @@ # SPDX-License-Identifier: GPL-2.0-or-later +config CRYPTO_LIB_AES_KUNIT_TEST + tristate "KUnit tests for AES" if !KUNIT_ALL_TESTS + depends on KUNIT + default KUNIT_ALL_TESTS || CRYPTO_SELFTESTS + select CRYPTO_LIB_BENCHMARK_VISIBLE + select CRYPTO_LIB_AES + help + KUnit tests for the AES library functions, including known answer + tests and benchmarks for encrypt/decrypt with all key sizes. The + test suite does not contain any key generation test, nor any error + cases. + config CRYPTO_LIB_BLAKE2B_KUNIT_TEST tristate "KUnit tests for BLAKE2b" if !KUNIT_ALL_TESTS depends on KUNIT diff --git a/lib/crypto/tests/Makefile b/lib/crypto/tests/Makefile index f4262379f56c..e0a30bdc02ac 100644 --- a/lib/crypto/tests/Makefile +++ b/lib/crypto/tests/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-or-later +obj-$(CONFIG_CRYPTO_LIB_AES_KUNIT_TEST) += aes_kunit.o obj-$(CONFIG_CRYPTO_LIB_BLAKE2B_KUNIT_TEST) += blake2b_kunit.o obj-$(CONFIG_CRYPTO_LIB_BLAKE2S_KUNIT_TEST) += blake2s_kunit.o obj-$(CONFIG_CRYPTO_LIB_CURVE25519_KUNIT_TEST) += curve25519_kunit.o diff --git a/lib/crypto/tests/aes-testvecs.h b/lib/crypto/tests/aes-testvecs.h new file mode 100644 index 000000000000..dfa528db7f02 --- /dev/null +++ b/lib/crypto/tests/aes-testvecs.h @@ -0,0 +1,78 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _AES_TESTVECS_H +#define _AES_TESTVECS_H + +#include <crypto/aes.h> + +struct buf { + size_t blen; + u8 b[]; +}; + +struct aes_testvector { + u8 plain[AES_BLOCK_SIZE]; + u8 cipher[AES_BLOCK_SIZE]; + struct { + size_t len; + u8 b[32]; + } key; +}; + +static const struct aes_testvector aes128_kat = { + .plain = { + 0x6b, 0xc1, 0xbe, 0xe2, 0x2e, 0x40, 0x9f, 0x96, + 0xe9, 0x3d, 0x7e, 0x11, 0x73, 0x93, 0x17, 0x2a, + }, + .cipher = { + 0x3a, 0xd7, 0x7b, 0xb4, 0x0d, 0x7a, 0x36, 0x60, + 0xa8, 0x9e, 0xca, 0xf3, 0x24, 0x66, 0xef, 0x97, + }, + .key = { + .len = 16, + .b = { + 0x2b, 0x7e, 0x15, 0x16, 0x28, 0xae, 0xd2, 0xa6, + 0xab, 0xf7, 0x15, 0x88, 0x09, 0xcf, 0x4f, 0x3c, + }, + }, +}; + +static const struct aes_testvector aes192_kat = { + .plain = { + 0x6b, 0xc1, 0xbe, 0xe2, 0x2e, 0x40, 0x9f, 0x96, + 0xe9, 0x3d, 0x7e, 0x11, 0x73, 0x93, 0x17, 0x2a, + }, + .cipher = { + 0xbd, 0x33, 0x4f, 0x1d, 0x6e, 0x45, 0xf2, 0x5f, + 0xf7, 0x12, 0xa2, 0x14, 0x57, 0x1f, 0xa5, 0xcc, + }, + .key = { + .len = 24, + .b = { + 0x8e, 0x73, 0xb0, 0xf7, 0xda, 0x0e, 0x64, 0x52, + 0xc8, 0x10, 0xf3, 0x2b, 0x80, 0x90, 0x79, 0xe5, + 0x62, 0xf8, 0xea, 0xd2, 0x52, 0x2c, 0x6b, 0x7b, + }, + }, +}; + +static const struct aes_testvector aes256_kat = { + .plain = { + 0x6b, 0xc1, 0xbe, 0xe2, 0x2e, 0x40, 0x9f, 0x96, + 0xe9, 0x3d, 0x7e, 0x11, 0x73, 0x93, 0x17, 0x2a, + }, + .cipher = { + 0xf3, 0xee, 0xd1, 0xbd, 0xb5, 0xd2, 0xa0, 0x3c, + 0x06, 0x4b, 0x5a, 0x7e, 0x3d, 0xb1, 0x81, 0xf8, + }, + .key = { + .len = 32, + .b = { + 0x60, 0x3d, 0xeb, 0x10, 0x15, 0xca, 0x71, 0xbe, + 0x2b, 0x73, 0xae, 0xf0, 0x85, 0x7d, 0x77, 0x81, + 0x1f, 0x35, 0x2c, 0x07, 0x3b, 0x61, 0x08, 0xd7, + 0x2d, 0x98, 0x10, 0xa3, 0x09, 0x14, 0xdf, 0xf4, + }, + }, +}; + +#endif /* _AES_TESTVECS_H */ diff --git a/lib/crypto/tests/aes_kunit.c b/lib/crypto/tests/aes_kunit.c new file mode 100644 index 000000000000..1f4a4f53f553 --- /dev/null +++ b/lib/crypto/tests/aes_kunit.c @@ -0,0 +1,149 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <kunit/test.h> +#include "aes-testvecs.h" + +static void test_aes(struct kunit *test, const struct aes_testvector *tv, + bool enc) +{ + struct aes_key aes_key; + u8 out[AES_BLOCK_SIZE]; + const u8 *input, *expect; + int rc; + + rc = aes_preparekey(&aes_key, tv->key.b, tv->key.len); + KUNIT_ASSERT_EQ(test, 0, rc); + + if (enc) { + input = tv->plain; + expect = tv->cipher; + aes_encrypt(&aes_key, out, input); + } else { + input = tv->cipher; + expect = tv->plain; + aes_decrypt(&aes_key, out, input); + } + KUNIT_ASSERT_MEMEQ(test, out, expect, sizeof(out)); +} + +static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv) +{ + const size_t num_iters = 10000000; + u8 out[AES_BLOCK_SIZE]; + struct aes_key aes_key; + u64 t_enc, t_dec; + int rc; + + if (!IS_ENABLED(CONFIG_CRYPTO_LIB_BENCHMARK)) + kunit_skip(test, "not enabled"); + + rc = aes_preparekey(&aes_key, tv->key.b, tv->key.len); + KUNIT_ASSERT_EQ(test, 0, rc); + + /* warm-up enc */ + for (size_t i = 0; i < 1000; i++) + aes_encrypt(&aes_key, out, tv->plain); + + preempt_disable(); + t_enc = ktime_get_ns(); + + for (size_t i = 0; i < num_iters; i++) + aes_encrypt(&aes_key, out, tv->plain); + + t_enc = ktime_get_ns() - t_enc; + preempt_enable(); + + /* warm-up dec */ + for (size_t i = 0; i < 1000; i++) + aes_decrypt(&aes_key, out, tv->cipher); + + preempt_disable(); + t_dec = ktime_get_ns(); + + for (size_t i = 0; i < num_iters; i++) + aes_decrypt(&aes_key, out, tv->cipher); + + t_dec = ktime_get_ns() - t_dec; + preempt_enable(); + + kunit_info(test, "enc (iter. %zu, duration %lluns)", + num_iters, t_enc); + kunit_info(test, "enc (len=%zu): %llu MB/s", + (size_t)AES_BLOCK_SIZE, + div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC, + (t_enc ?: 1) * SZ_1M)); + + kunit_info(test, "dec (iter. %zu, duration %lluns)", + num_iters, t_dec); + kunit_info(test, "dec (len=%zu): %llu MB/s", + (size_t)AES_BLOCK_SIZE, + div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC, + (t_dec ?: 1) * SZ_1M)); +} + +static void test_aes128_encrypt(struct kunit *test) +{ + test_aes(test, &aes128_kat, true); +} + +static void test_aes128_decrypt(struct kunit *test) +{ + test_aes(test, &aes128_kat, false); +} + +static void test_aes192_encrypt(struct kunit *test) +{ + test_aes(test, &aes192_kat, true); +} + +static void test_aes192_decrypt(struct kunit *test) +{ + test_aes(test, &aes192_kat, false); +} + +static void test_aes256_encrypt(struct kunit *test) +{ + test_aes(test, &aes256_kat, true); +} + +static void test_aes256_decrypt(struct kunit *test) +{ + test_aes(test, &aes256_kat, false); +} + +static void benchmark_aes128(struct kunit *test) +{ + benchmark_aes(test, &aes128_kat); +} + +static void benchmark_aes192(struct kunit *test) +{ + benchmark_aes(test, &aes192_kat); +} + +static void benchmark_aes256(struct kunit *test) +{ + benchmark_aes(test, &aes256_kat); +} + +static struct kunit_case aes_test_cases[] = { + KUNIT_CASE(test_aes128_encrypt), + KUNIT_CASE(test_aes128_decrypt), + KUNIT_CASE(test_aes192_encrypt), + KUNIT_CASE(test_aes192_decrypt), + KUNIT_CASE(test_aes256_encrypt), + KUNIT_CASE(test_aes256_decrypt), + KUNIT_CASE(benchmark_aes128), + KUNIT_CASE(benchmark_aes192), + KUNIT_CASE(benchmark_aes256), + {}, +}; + +static struct kunit_suite aes_test_suite = { + .name = "aes", + .test_cases = aes_test_cases, +}; + +kunit_test_suite(aes_test_suite); + +MODULE_DESCRIPTION("KUnit tests and benchmark aes library"); +MODULE_LICENSE("GPL"); -- 2.51.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES 2026-01-15 18:38 ` [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests " Holger Dengler @ 2026-01-15 20:43 ` Eric Biggers 2026-01-15 21:51 ` Holger Dengler 2026-01-15 22:05 ` David Laight 2026-01-16 0:25 ` kernel test robot 1 sibling, 2 replies; 14+ messages in thread From: Eric Biggers @ 2026-01-15 20:43 UTC (permalink / raw) To: Holger Dengler Cc: Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu, Harald Freudenberger, linux-kernel, linux-crypto On Thu, Jan 15, 2026 at 07:38:31PM +0100, Holger Dengler wrote: > Add a KUnit test suite for AES library functions, including KAT and > benchmarks. > > Signed-off-by: Holger Dengler <dengler@linux.ibm.com> The cover letter had some more information. Could you put it in the commit message directly? Normally cover letters aren't used for a single patch: the explanation should just be in the patch itself. > diff --git a/lib/crypto/tests/aes-testvecs.h b/lib/crypto/tests/aes-testvecs.h > new file mode 100644 > index 000000000000..dfa528db7f02 > --- /dev/null > +++ b/lib/crypto/tests/aes-testvecs.h > @@ -0,0 +1,78 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _AES_TESTVECS_H > +#define _AES_TESTVECS_H > + > +#include <crypto/aes.h> > + > +struct buf { > + size_t blen; > + u8 b[]; > +}; 'struct buf' is never used. > +static const struct aes_testvector aes128_kat = { Where do these test vectors come from? All test vectors should have a documented source. > +static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv) > +{ > + const size_t num_iters = 10000000; 10000000 iterations is too many. That's 160 MB of data in each direction per AES key length. Some CPUs without AES instructions can do only ~20 MB AES per second. In that case, this benchmark would take 16 seconds to run per AES key length, for 48 seconds total. hash-test-template.h and crc_kunit.c use 10000000 / (len + 128) iterations. That would be 69444 in this case (considering len=16), which is less than 1% of the iterations you've used. Choosing a number similar to that would seem more appropriate. Ultimately these are just made-up numbers. But I think we should aim for the benchmark test in each KUnit test suite to take less than a second or so. The existing tests roughly achieve that, whereas it seems this one can go over it by quite a bit due to the 10000000 iterations. > + kunit_info(test, "enc (iter. %zu, duration %lluns)", > + num_iters, t_enc); > + kunit_info(test, "enc (len=%zu): %llu MB/s", > + (size_t)AES_BLOCK_SIZE, > + div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC, > + (t_enc ?: 1) * SZ_1M)); > + > + kunit_info(test, "dec (iter. %zu, duration %lluns)", > + num_iters, t_dec); > + kunit_info(test, "dec (len=%zu): %llu MB/s", > + (size_t)AES_BLOCK_SIZE, > + div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC, > + (t_dec ?: 1) * SZ_1M)); Maybe delete the first line of each pair, and switch from power-of-2 megabytes to power-of-10? That would be consistent with how the other crypto and CRC benchmarks print their output. > +MODULE_DESCRIPTION("KUnit tests and benchmark aes library"); "aes library" => "for the AES library" - Eric ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES 2026-01-15 20:43 ` Eric Biggers @ 2026-01-15 21:51 ` Holger Dengler 2026-01-15 21:58 ` Eric Biggers 2026-01-15 22:05 ` David Laight 1 sibling, 1 reply; 14+ messages in thread From: Holger Dengler @ 2026-01-15 21:51 UTC (permalink / raw) To: Eric Biggers Cc: Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu, Harald Freudenberger, linux-kernel, linux-crypto On 15/01/2026 21:43, Eric Biggers wrote: > On Thu, Jan 15, 2026 at 07:38:31PM +0100, Holger Dengler wrote: >> Add a KUnit test suite for AES library functions, including KAT and >> benchmarks. >> >> Signed-off-by: Holger Dengler <dengler@linux.ibm.com> > > The cover letter had some more information. Could you put it in the > commit message directly? Normally cover letters aren't used for a > single patch: the explanation should just be in the patch itself. Ok, I'll move the explanation to the commit message. I assume that the example output of the test can be dropped? >> diff --git a/lib/crypto/tests/aes-testvecs.h b/lib/crypto/tests/aes-testvecs.h >> new file mode 100644 >> index 000000000000..dfa528db7f02 >> --- /dev/null >> +++ b/lib/crypto/tests/aes-testvecs.h >> @@ -0,0 +1,78 @@ >> +/* SPDX-License-Identifier: GPL-2.0 */ >> +#ifndef _AES_TESTVECS_H >> +#define _AES_TESTVECS_H >> + >> +#include <crypto/aes.h> >> + >> +struct buf { >> + size_t blen; >> + u8 b[]; >> +}; > > 'struct buf' is never used. This is a left-over, will be removed in the next series. > >> +static const struct aes_testvector aes128_kat = { > > Where do these test vectors come from? All test vectors should have a > documented source. ok, I will add this information as well. >> +static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv) >> +{ >> + const size_t num_iters = 10000000; > > 10000000 iterations is too many. That's 160 MB of data in each > direction per AES key length. Some CPUs without AES instructions can do > only ~20 MB AES per second. In that case, this benchmark would take 16 > seconds to run per AES key length, for 48 seconds total. > > hash-test-template.h and crc_kunit.c use 10000000 / (len + 128) > iterations. That would be 69444 in this case (considering len=16), > which is less than 1% of the iterations you've used. Choosing a number > similar to that would seem more appropriate. > > Ultimately these are just made-up numbers. But I think we should aim > for the benchmark test in each KUnit test suite to take less than a > second or so. The existing tests roughly achieve that, whereas it seems > this one can go over it by quite a bit due to the 10000000 iterations. As we have a fixed length, I would go stay with a fix value for the iterations (instead of calculating it based on len). The benchmark has a separate loop for encrypt and decrypt, so I will do the half iterations on encrypt and the other half on decrypt. I will also reduce the iterations for the warm-ups. What about 100 iterations for each warm-up and 500.000 iterations for each real measurement? Means processing 2x 8MiB with preemption disabled. >> + kunit_info(test, "enc (iter. %zu, duration %lluns)", >> + num_iters, t_enc); >> + kunit_info(test, "enc (len=%zu): %llu MB/s", >> + (size_t)AES_BLOCK_SIZE, >> + div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC, >> + (t_enc ?: 1) * SZ_1M)); >> + >> + kunit_info(test, "dec (iter. %zu, duration %lluns)", >> + num_iters, t_dec); >> + kunit_info(test, "dec (len=%zu): %llu MB/s", >> + (size_t)AES_BLOCK_SIZE, >> + div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC, >> + (t_dec ?: 1) * SZ_1M)); > > Maybe delete the first line of each pair, and switch from power-of-2 > megabytes to power-of-10? That would be consistent with how the other > crypto and CRC benchmarks print their output. > >> +MODULE_DESCRIPTION("KUnit tests and benchmark aes library"); > > "aes library" => "for the AES library" ok -- Mit freundlichen Grüßen / Kind regards Holger Dengler -- IBM Systems, Linux on IBM Z Development dengler@linux.ibm.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES 2026-01-15 21:51 ` Holger Dengler @ 2026-01-15 21:58 ` Eric Biggers 0 siblings, 0 replies; 14+ messages in thread From: Eric Biggers @ 2026-01-15 21:58 UTC (permalink / raw) To: Holger Dengler Cc: Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu, Harald Freudenberger, linux-kernel, linux-crypto On Thu, Jan 15, 2026 at 10:51:25PM +0100, Holger Dengler wrote: > On 15/01/2026 21:43, Eric Biggers wrote: > > On Thu, Jan 15, 2026 at 07:38:31PM +0100, Holger Dengler wrote: > >> Add a KUnit test suite for AES library functions, including KAT and > >> benchmarks. > >> > >> Signed-off-by: Holger Dengler <dengler@linux.ibm.com> > > > > The cover letter had some more information. Could you put it in the > > commit message directly? Normally cover letters aren't used for a > > single patch: the explanation should just be in the patch itself. > > Ok, I'll move the explanation to the commit message. I assume that the example > output of the test can be dropped? Yes, that's fine. > > 10000000 iterations is too many. That's 160 MB of data in each > > direction per AES key length. Some CPUs without AES instructions can do > > only ~20 MB AES per second. In that case, this benchmark would take 16 > > seconds to run per AES key length, for 48 seconds total. > > > > hash-test-template.h and crc_kunit.c use 10000000 / (len + 128) > > iterations. That would be 69444 in this case (considering len=16), > > which is less than 1% of the iterations you've used. Choosing a number > > similar to that would seem more appropriate. > > > > Ultimately these are just made-up numbers. But I think we should aim > > for the benchmark test in each KUnit test suite to take less than a > > second or so. The existing tests roughly achieve that, whereas it seems > > this one can go over it by quite a bit due to the 10000000 iterations. > > As we have a fixed length, I would go stay with a fix value for the iterations > (instead of calculating it based on len). > > The benchmark has a separate loop for encrypt and decrypt, so I will do the > half iterations on encrypt and the other half on decrypt. I will also reduce > the iterations for the warm-ups. > > What about 100 iterations for each warm-up and 500.000 iterations for each > real measurement? Means processing 2x 8MiB with preemption disabled. I'd suggest 50000 for each direction as well as the warm-up loop. - Eric ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES 2026-01-15 20:43 ` Eric Biggers 2026-01-15 21:51 ` Holger Dengler @ 2026-01-15 22:05 ` David Laight 2026-01-16 17:31 ` Holger Dengler 1 sibling, 1 reply; 14+ messages in thread From: David Laight @ 2026-01-15 22:05 UTC (permalink / raw) To: Eric Biggers Cc: Holger Dengler, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu, Harald Freudenberger, linux-kernel, linux-crypto On Thu, 15 Jan 2026 12:43:32 -0800 Eric Biggers <ebiggers@kernel.org> wrote: > On Thu, Jan 15, 2026 at 07:38:31PM +0100, Holger Dengler wrote: > > Add a KUnit test suite for AES library functions, including KAT and > > benchmarks. > > > > Signed-off-by: Holger Dengler <dengler@linux.ibm.com> > > The cover letter had some more information. Could you put it in the > commit message directly? Normally cover letters aren't used for a > single patch: the explanation should just be in the patch itself. > > > diff --git a/lib/crypto/tests/aes-testvecs.h b/lib/crypto/tests/aes-testvecs.h > > new file mode 100644 > > index 000000000000..dfa528db7f02 > > --- /dev/null > > +++ b/lib/crypto/tests/aes-testvecs.h > > @@ -0,0 +1,78 @@ > > +/* SPDX-License-Identifier: GPL-2.0 */ > > +#ifndef _AES_TESTVECS_H > > +#define _AES_TESTVECS_H > > + > > +#include <crypto/aes.h> > > + > > +struct buf { > > + size_t blen; > > + u8 b[]; > > +}; > > 'struct buf' is never used. > > > +static const struct aes_testvector aes128_kat = { > > Where do these test vectors come from? All test vectors should have a > documented source. > > > +static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv) > > +{ > > + const size_t num_iters = 10000000; > > 10000000 iterations is too many. That's 160 MB of data in each > direction per AES key length. Some CPUs without AES instructions can do > only ~20 MB AES per second. In that case, this benchmark would take 16 > seconds to run per AES key length, for 48 seconds total. Probably best to first do a test that would take a 'reasonable' time on a cpu without AES. If that is 'very fast' then do a longer test to get more accuracy on a faster implementation. > > hash-test-template.h and crc_kunit.c use 10000000 / (len + 128) > iterations. That would be 69444 in this case (considering len=16), > which is less than 1% of the iterations you've used. Choosing a number > similar to that would seem more appropriate. > > Ultimately these are just made-up numbers. But I think we should aim > for the benchmark test in each KUnit test suite to take less than a > second or so. The existing tests roughly achieve that, whereas it seems > this one can go over it by quite a bit due to the 10000000 iterations. Even 1 second is a long time, you end up getting multiple interrupts included. I think a lot of these benchmarks are far too long. Timing differences less that 1% can be created by scheduling noise. Running a test that takes 200 'quanta' of the timer used has an error margin of under 1% (100 quanta might be enough). While the kernel timestamps have a resolution of 1ns the accuracy is worse. If you run a test for even just 10us you ought to get reasonable accuracy with a reasonable hope of not getting an interrupt. Run the test 10 times and report the fastest value. You'll then find the results are entirely unstable because the cpu clock frequency keeps changing. And long enough buffers can get limited by the d-cache loads. For something as slow as AES you can count the number of cpu cycles for a single call and get a reasonably consistent figure. That will tell you whether the loop is running at the speed you might expect it to run at. (You need to use data dependencies between the start/end 'times' and start/end of the code being timed, x86 lfence/mfence are too slow and can hide the 'setup' cost of some instructions.) David > > > + kunit_info(test, "enc (iter. %zu, duration %lluns)", > > + num_iters, t_enc); > > + kunit_info(test, "enc (len=%zu): %llu MB/s", > > + (size_t)AES_BLOCK_SIZE, > > + div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC, > > + (t_enc ?: 1) * SZ_1M)); > > + > > + kunit_info(test, "dec (iter. %zu, duration %lluns)", > > + num_iters, t_dec); > > + kunit_info(test, "dec (len=%zu): %llu MB/s", > > + (size_t)AES_BLOCK_SIZE, > > + div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC, > > + (t_dec ?: 1) * SZ_1M)); > > Maybe delete the first line of each pair, and switch from power-of-2 > megabytes to power-of-10? That would be consistent with how the other > crypto and CRC benchmarks print their output. > > > +MODULE_DESCRIPTION("KUnit tests and benchmark aes library"); > > "aes library" => "for the AES library" > > - Eric > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES 2026-01-15 22:05 ` David Laight @ 2026-01-16 17:31 ` Holger Dengler 2026-01-16 18:37 ` David Laight 0 siblings, 1 reply; 14+ messages in thread From: Holger Dengler @ 2026-01-16 17:31 UTC (permalink / raw) To: David Laight, Eric Biggers Cc: Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu, Harald Freudenberger, linux-kernel, linux-crypto Hi David, On 15/01/2026 23:05, David Laight wrote: > On Thu, 15 Jan 2026 12:43:32 -0800 > Eric Biggers <ebiggers@kernel.org> wrote: >>> +static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv) >>> +{ >>> + const size_t num_iters = 10000000; >> >> 10000000 iterations is too many. That's 160 MB of data in each >> direction per AES key length. Some CPUs without AES instructions can do >> only ~20 MB AES per second. In that case, this benchmark would take 16 >> seconds to run per AES key length, for 48 seconds total. > > Probably best to first do a test that would take a 'reasonable' time > on a cpu without AES. If that is 'very fast' then do a longer test > to get more accuracy on a faster implementation. > >> >> hash-test-template.h and crc_kunit.c use 10000000 / (len + 128) >> iterations. That would be 69444 in this case (considering len=16), >> which is less than 1% of the iterations you've used. Choosing a number >> similar to that would seem more appropriate. >> >> Ultimately these are just made-up numbers. But I think we should aim >> for the benchmark test in each KUnit test suite to take less than a >> second or so. The existing tests roughly achieve that, whereas it seems >> this one can go over it by quite a bit due to the 10000000 iterations. > > Even 1 second is a long time, you end up getting multiple interrupts included. > I think a lot of these benchmarks are far too long. > Timing differences less that 1% can be created by scheduling noise. > Running a test that takes 200 'quanta' of the timer used has an > error margin of under 1% (100 quanta might be enough). > While the kernel timestamps have a resolution of 1ns the accuracy is worse. > If you run a test for even just 10us you ought to get reasonable accuracy > with a reasonable hope of not getting an interrupt. > Run the test 10 times and report the fastest value. > > You'll then find the results are entirely unstable because the cpu clock > frequency keeps changing. > And long enough buffers can get limited by the d-cache loads. > > For something as slow as AES you can count the number of cpu cycles for > a single call and get a reasonably consistent figure. > That will tell you whether the loop is running at the speed you might > expect it to run at. > (You need to use data dependencies between the start/end 'times' and > start/end of the code being timed, x86 lfence/mfence are too slow and > can hide the 'setup' cost of some instructions.) Thanks a lot for your feedback. I tried a few of your ideas and it turns out, that they work quite well. First of all, with a single-block aes encrypt/decrypt in our hardware (CPACF), we're very close to the resolution of our CPU clock. Disclaimer: The encryption/decryption of one block takes ~32ns (~500MB/s). These numbers should be taken with some care, as on s390 the operating system always runs virtualized. In my test environment, I also only have access to a machine with shared CPUs, so there might be some negative impact from other workload. The benchmark loops for 100 iterations now without any warm-up. In each iteration, I measure a single aes_encrypt()/aes_decrypt() call. The lowest value of these measurements is takes as the value for the bandwidth calculations. Although it is not necessary in my environment, I'm doing all iterations with preemption disabled. I think, that this might help on other platforms to reduce the jitter of the measurement values. The removal of the warm-up does not have any impact on the numbers. Just for information: I also tried to measure the cycles with the same results. The minimal measurement value of a few iterations is much more stable that the average over a larger number of iterations. I also did some tests with IRQs disabled (instead of only preemption), but the numbers stay the same. So I think, it is save enough to stay with disables preemption. I also tried you idea, first to do a few measurements and if they are fast enough, increase the number of iterations. But it turns out, that this it not really necessary (at least in my env). But I can add this, it it makes sense on other platforms. -- Mit freundlichen Grüßen / Kind regards Holger Dengler -- IBM Systems, Linux on IBM Z Development dengler@linux.ibm.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES 2026-01-16 17:31 ` Holger Dengler @ 2026-01-16 18:37 ` David Laight 2026-01-16 19:20 ` Holger Dengler 0 siblings, 1 reply; 14+ messages in thread From: David Laight @ 2026-01-16 18:37 UTC (permalink / raw) To: Holger Dengler Cc: Eric Biggers, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu, Harald Freudenberger, linux-kernel, linux-crypto On Fri, 16 Jan 2026 18:31:58 +0100 Holger Dengler <dengler@linux.ibm.com> wrote: > Hi David, > > On 15/01/2026 23:05, David Laight wrote: > > On Thu, 15 Jan 2026 12:43:32 -0800 > > Eric Biggers <ebiggers@kernel.org> wrote: > >>> +static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv) > >>> +{ > >>> + const size_t num_iters = 10000000; > >> > >> 10000000 iterations is too many. That's 160 MB of data in each > >> direction per AES key length. Some CPUs without AES instructions can do > >> only ~20 MB AES per second. In that case, this benchmark would take 16 > >> seconds to run per AES key length, for 48 seconds total. > > > > Probably best to first do a test that would take a 'reasonable' time > > on a cpu without AES. If that is 'very fast' then do a longer test > > to get more accuracy on a faster implementation. > > > >> > >> hash-test-template.h and crc_kunit.c use 10000000 / (len + 128) > >> iterations. That would be 69444 in this case (considering len=16), > >> which is less than 1% of the iterations you've used. Choosing a number > >> similar to that would seem more appropriate. > >> > >> Ultimately these are just made-up numbers. But I think we should aim > >> for the benchmark test in each KUnit test suite to take less than a > >> second or so. The existing tests roughly achieve that, whereas it seems > >> this one can go over it by quite a bit due to the 10000000 iterations. > > > > Even 1 second is a long time, you end up getting multiple interrupts included. > > I think a lot of these benchmarks are far too long. > > Timing differences less that 1% can be created by scheduling noise. > > Running a test that takes 200 'quanta' of the timer used has an > > error margin of under 1% (100 quanta might be enough). > > While the kernel timestamps have a resolution of 1ns the accuracy is worse. > > If you run a test for even just 10us you ought to get reasonable accuracy > > with a reasonable hope of not getting an interrupt. > > Run the test 10 times and report the fastest value. > > > > You'll then find the results are entirely unstable because the cpu clock > > frequency keeps changing. > > And long enough buffers can get limited by the d-cache loads. > > > > For something as slow as AES you can count the number of cpu cycles for > > a single call and get a reasonably consistent figure. > > That will tell you whether the loop is running at the speed you might > > expect it to run at. > > (You need to use data dependencies between the start/end 'times' and > > start/end of the code being timed, x86 lfence/mfence are too slow and > > can hide the 'setup' cost of some instructions.) > > Thanks a lot for your feedback. I tried a few of your ideas and it turns out, > that they work quite well. First of all, with a single-block aes > encrypt/decrypt in our hardware (CPACF), we're very close to the resolution of > our CPU clock. > > Disclaimer: The encryption/decryption of one block takes ~32ns (~500MB/s). > These numbers should be taken with some care, as on s390 the operating system > always runs virtualized. In my test environment, I also only have access to a > machine with shared CPUs, so there might be some negative impact from other > workload. The impact of other workloads is much less likely for a short test, and if it does happen you are likely to see a value that is abnormally large. > The benchmark loops for 100 iterations now without any warm-up. In each > iteration, I measure a single aes_encrypt()/aes_decrypt() call. The lowest > value of these measurements is takes as the value for the bandwidth > calculations. Although it is not necessary in my environment, I'm doing all > iterations with preemption disabled. I think, that this might help on other > platforms to reduce the jitter of the measurement values. > > The removal of the warm-up does not have any impact on the numbers. I'm not sure what the 'warm-up' was for. The first test will be slow(er) due to I-cache misses. (That will be more noticeable for big software loops - like blake2.) Change to test parameters can affect branch prediction but that also only usually affects the first test with each set of parameters. (Unlikely to affect AES, but I could see that effect when testing mul_u64_u64_div_u64().) The only other reason for a 'warm-up' is to get the cpu frequency fast and fixed - and there ought to be a better way of doing that. > > Just for information: I also tried to measure the cycles with the same > results. The minimal measurement value of a few iterations is much more stable > that the average over a larger number of iterations. My userspace test code runs each test 10 times and prints all 10 values. I then look at them to see how consistent they are. > I also did some tests with IRQs disabled (instead of only preemption), but the > numbers stay the same. So I think, it is save enough to stay with disables > preemption. I'd actually go for disabling interrupts. What you are seeing is the effect of interrupts not happening (which is likely for a short test, but not for a long one). > > I also tried you idea, first to do a few measurements and if they are fast > enough, increase the number of iterations. But it turns out, that this it not > really necessary (at least in my env). But I can add this, it it makes sense > on other platforms. The main reason for doing that is reducing the time the tests take on a system that is massively slower (and doing software AES). Maybe someone want to run the test cases on an m68k :-) David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES 2026-01-16 18:37 ` David Laight @ 2026-01-16 19:20 ` Holger Dengler 2026-01-16 19:44 ` Eric Biggers 0 siblings, 1 reply; 14+ messages in thread From: Holger Dengler @ 2026-01-16 19:20 UTC (permalink / raw) To: David Laight Cc: Eric Biggers, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu, Harald Freudenberger, linux-kernel, linux-crypto On 16/01/2026 19:37, David Laight wrote: > On Fri, 16 Jan 2026 18:31:58 +0100 > Holger Dengler <dengler@linux.ibm.com> wrote: > >> Hi David, >> >> On 15/01/2026 23:05, David Laight wrote: >>> On Thu, 15 Jan 2026 12:43:32 -0800 >>> Eric Biggers <ebiggers@kernel.org> wrote: >>>>> +static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv) >>>>> +{ >>>>> + const size_t num_iters = 10000000; >>>> >>>> 10000000 iterations is too many. That's 160 MB of data in each >>>> direction per AES key length. Some CPUs without AES instructions can do >>>> only ~20 MB AES per second. In that case, this benchmark would take 16 >>>> seconds to run per AES key length, for 48 seconds total. >>> >>> Probably best to first do a test that would take a 'reasonable' time >>> on a cpu without AES. If that is 'very fast' then do a longer test >>> to get more accuracy on a faster implementation. >>> >>>> >>>> hash-test-template.h and crc_kunit.c use 10000000 / (len + 128) >>>> iterations. That would be 69444 in this case (considering len=16), >>>> which is less than 1% of the iterations you've used. Choosing a number >>>> similar to that would seem more appropriate. >>>> >>>> Ultimately these are just made-up numbers. But I think we should aim >>>> for the benchmark test in each KUnit test suite to take less than a >>>> second or so. The existing tests roughly achieve that, whereas it seems >>>> this one can go over it by quite a bit due to the 10000000 iterations. >>> >>> Even 1 second is a long time, you end up getting multiple interrupts included. >>> I think a lot of these benchmarks are far too long. >>> Timing differences less that 1% can be created by scheduling noise. >>> Running a test that takes 200 'quanta' of the timer used has an >>> error margin of under 1% (100 quanta might be enough). >>> While the kernel timestamps have a resolution of 1ns the accuracy is worse. >>> If you run a test for even just 10us you ought to get reasonable accuracy >>> with a reasonable hope of not getting an interrupt. >>> Run the test 10 times and report the fastest value. >>> >>> You'll then find the results are entirely unstable because the cpu clock >>> frequency keeps changing. >>> And long enough buffers can get limited by the d-cache loads. >>> >>> For something as slow as AES you can count the number of cpu cycles for >>> a single call and get a reasonably consistent figure. >>> That will tell you whether the loop is running at the speed you might >>> expect it to run at. >>> (You need to use data dependencies between the start/end 'times' and >>> start/end of the code being timed, x86 lfence/mfence are too slow and >>> can hide the 'setup' cost of some instructions.) >> >> Thanks a lot for your feedback. I tried a few of your ideas and it turns out, >> that they work quite well. First of all, with a single-block aes >> encrypt/decrypt in our hardware (CPACF), we're very close to the resolution of >> our CPU clock. >> >> Disclaimer: The encryption/decryption of one block takes ~32ns (~500MB/s). >> These numbers should be taken with some care, as on s390 the operating system >> always runs virtualized. In my test environment, I also only have access to a >> machine with shared CPUs, so there might be some negative impact from other >> workload. > > The impact of other workloads is much less likely for a short test, > and if it does happen you are likely to see a value that is abnormally large. > >> The benchmark loops for 100 iterations now without any warm-up. In each >> iteration, I measure a single aes_encrypt()/aes_decrypt() call. The lowest >> value of these measurements is takes as the value for the bandwidth >> calculations. Although it is not necessary in my environment, I'm doing all >> iterations with preemption disabled. I think, that this might help on other >> platforms to reduce the jitter of the measurement values. >> >> The removal of the warm-up does not have any impact on the numbers. > > I'm not sure what the 'warm-up' was for. > The first test will be slow(er) due to I-cache misses. > (That will be more noticeable for big software loops - like blake2.) > Change to test parameters can affect branch prediction but that also only > usually affects the first test with each set of parameters. > (Unlikely to affect AES, but I could see that effect when testing > mul_u64_u64_div_u64().) > The only other reason for a 'warm-up' is to get the cpu frequency fast > and fixed - and there ought to be a better way of doing that. > >> >> Just for information: I also tried to measure the cycles with the same >> results. The minimal measurement value of a few iterations is much more stable >> that the average over a larger number of iterations. > > My userspace test code runs each test 10 times and prints all 10 values. > I then look at them to see how consistent they are. > >> I also did some tests with IRQs disabled (instead of only preemption), but the >> numbers stay the same. So I think, it is save enough to stay with disables >> preemption. > > I'd actually go for disabling interrupts. > What you are seeing is the effect of interrupts not happening > (which is likely for a short test, but not for a long one). Ok, I'll send the next series with IRQ disabled. I don't see any difference on my systems. >> I also tried you idea, first to do a few measurements and if they are fast >> enough, increase the number of iterations. But it turns out, that this it not >> really necessary (at least in my env). But I can add this, it it makes sense >> on other platforms. > > The main reason for doing that is reducing the time the tests take on a > system that is massively slower (and doing software AES). > Maybe someone want to run the test cases on an m68k :-) So I've currently 100 iterations. The first one or two iterations will be for the warm-up (cache misses, branch prediction, etc). But with the interrupts disabled, the rest of the iterations should give us enough stable measurements for the benchmark. Maybe it would be worth to test the next version of teh test on other platforms as well. -- Mit freundlichen Grüßen / Kind regards Holger Dengler -- IBM Systems, Linux on IBM Z Development dengler@linux.ibm.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES 2026-01-16 19:20 ` Holger Dengler @ 2026-01-16 19:44 ` Eric Biggers 2026-01-16 20:55 ` Holger Dengler 0 siblings, 1 reply; 14+ messages in thread From: Eric Biggers @ 2026-01-16 19:44 UTC (permalink / raw) To: Holger Dengler Cc: David Laight, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu, Harald Freudenberger, linux-kernel, linux-crypto On Fri, Jan 16, 2026 at 08:20:51PM +0100, Holger Dengler wrote: > >> The benchmark loops for 100 iterations now without any warm-up. In each > >> iteration, I measure a single aes_encrypt()/aes_decrypt() call. The lowest > >> value of these measurements is takes as the value for the bandwidth > >> calculations. Although it is not necessary in my environment, I'm doing all > >> iterations with preemption disabled. I think, that this might help on other > >> platforms to reduce the jitter of the measurement values. > >> > >> The removal of the warm-up does not have any impact on the numbers. > > > > I'm not sure what the 'warm-up' was for. > > The first test will be slow(er) due to I-cache misses. > > (That will be more noticeable for big software loops - like blake2.) > > Change to test parameters can affect branch prediction but that also only > > usually affects the first test with each set of parameters. > > (Unlikely to affect AES, but I could see that effect when testing > > mul_u64_u64_div_u64().) > > The only other reason for a 'warm-up' is to get the cpu frequency fast > > and fixed - and there ought to be a better way of doing that. The warm-up loops in the existing benchmarks are both for cache warming and to get the CPU frequency fast and fixed. It's not anything sophisticated, but rather just something that's simple and seems to works well enough across CPUs without depending on any special APIs. If your CPU doesn't do much frequency scaling, you may not notice a difference, but other CPUs may need it. > >> I also did some tests with IRQs disabled (instead of only preemption), but the > >> numbers stay the same. So I think, it is save enough to stay with disables > >> preemption. > > > > I'd actually go for disabling interrupts. > > What you are seeing is the effect of interrupts not happening > > (which is likely for a short test, but not for a long one). > > Ok, I'll send the next series with IRQ disabled. I don't see any difference on > my systems. Some architectures don't allow vector registers to be used when IRQs are disabled. On those architectures, disabling IRQs would always trigger the fallback to the generic code, which would make the benchmark not very useful. That's why I've only been disabling preemption, not IRQs. - Eric ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES 2026-01-16 19:44 ` Eric Biggers @ 2026-01-16 20:55 ` Holger Dengler 2026-01-16 22:30 ` David Laight 0 siblings, 1 reply; 14+ messages in thread From: Holger Dengler @ 2026-01-16 20:55 UTC (permalink / raw) To: Eric Biggers Cc: David Laight, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu, Harald Freudenberger, linux-kernel, linux-crypto On 16/01/2026 20:44, Eric Biggers wrote: > On Fri, Jan 16, 2026 at 08:20:51PM +0100, Holger Dengler wrote: >>>> The benchmark loops for 100 iterations now without any warm-up. In each >>>> iteration, I measure a single aes_encrypt()/aes_decrypt() call. The lowest >>>> value of these measurements is takes as the value for the bandwidth >>>> calculations. Although it is not necessary in my environment, I'm doing all >>>> iterations with preemption disabled. I think, that this might help on other >>>> platforms to reduce the jitter of the measurement values. >>>> >>>> The removal of the warm-up does not have any impact on the numbers. >>> >>> I'm not sure what the 'warm-up' was for. >>> The first test will be slow(er) due to I-cache misses. >>> (That will be more noticeable for big software loops - like blake2.) >>> Change to test parameters can affect branch prediction but that also only >>> usually affects the first test with each set of parameters. >>> (Unlikely to affect AES, but I could see that effect when testing >>> mul_u64_u64_div_u64().) >>> The only other reason for a 'warm-up' is to get the cpu frequency fast >>> and fixed - and there ought to be a better way of doing that. > > The warm-up loops in the existing benchmarks are both for cache warming > and to get the CPU frequency fast and fixed. It's not anything > sophisticated, but rather just something that's simple and seems to > works well enough across CPUs without depending on any special APIs. If > your CPU doesn't do much frequency scaling, you may not notice a > difference, but other CPUs may need it. Do you have a gut feeling how many iterations it takes to get the CPU speed up? If it takes less than 50 iterations, it would be sufficient with the new method. >>>> I also did some tests with IRQs disabled (instead of only preemption), but the >>>> numbers stay the same. So I think, it is save enough to stay with disables >>>> preemption. >>> >>> I'd actually go for disabling interrupts. >>> What you are seeing is the effect of interrupts not happening >>> (which is likely for a short test, but not for a long one). >> >> Ok, I'll send the next series with IRQ disabled. I don't see any difference on >> my systems. > > Some architectures don't allow vector registers to be used when IRQs are > disabled. On those architectures, disabling IRQs would always trigger > the fallback to the generic code, which would make the benchmark not > very useful. That's why I've only been disabling preemption, not IRQs. Ok, this is a very strong argument against disabling IRQs. -- Mit freundlichen Grüßen / Kind regards Holger Dengler -- IBM Systems, Linux on IBM Z Development dengler@linux.ibm.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES 2026-01-16 20:55 ` Holger Dengler @ 2026-01-16 22:30 ` David Laight 2026-01-17 23:59 ` Eric Biggers 0 siblings, 1 reply; 14+ messages in thread From: David Laight @ 2026-01-16 22:30 UTC (permalink / raw) To: Holger Dengler Cc: Eric Biggers, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu, Harald Freudenberger, linux-kernel, linux-crypto On Fri, 16 Jan 2026 21:55:04 +0100 Holger Dengler <dengler@linux.ibm.com> wrote: > On 16/01/2026 20:44, Eric Biggers wrote: ... > > The warm-up loops in the existing benchmarks are both for cache warming > > and to get the CPU frequency fast and fixed. It's not anything > > sophisticated, but rather just something that's simple and seems to > > works well enough across CPUs without depending on any special APIs. If > > your CPU doesn't do much frequency scaling, you may not notice a > > difference, but other CPUs may need it. > > Do you have a gut feeling how many iterations it takes to get the CPU speed > up? If it takes less than 50 iterations, it would be sufficient with the new > method. It may not matter what you do to get the cpu speed fixed. Looping calling ktime_get_ns() for 'long enough' should do it. That would be test independent but the 'long enough' very cpu dependent. The benchmarks probably ought to have some common API - even if it just in the kunit code. The advantage of counting cpu clocks is the frequency then doesn't matter as much - L1 cache miss timings might change. The difficulty is finding a cpu clock counter. Architecture dependent and may not exist (you don't want the fixed frequency 'sanitised' TSC). David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES 2026-01-16 22:30 ` David Laight @ 2026-01-17 23:59 ` Eric Biggers 0 siblings, 0 replies; 14+ messages in thread From: Eric Biggers @ 2026-01-17 23:59 UTC (permalink / raw) To: David Laight Cc: Holger Dengler, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu, Harald Freudenberger, linux-kernel, linux-crypto On Fri, Jan 16, 2026 at 10:30:15PM +0000, David Laight wrote: > It may not matter what you do to get the cpu speed fixed. > Looping calling ktime_get_ns() for 'long enough' should do it. For the CPU frequency, sure. But as I mentioned, the warm-up loop is also intended to load the target code into cache, as the benchmarks are intended to measure the warm cache case. Yes, that part only needs one call, but the loop accomplishes both. > That would be test independent but the 'long enough' very > cpu dependent. > The benchmarks probably ought to have some common API - even if it > just in the kunit code. > > The advantage of counting cpu clocks is the frequency then doesn't > matter as much - L1 cache miss timings might change. > > The difficulty is finding a cpu clock counter. Architecture dependent > and may not exist (you don't want the fixed frequency 'sanitised' TSC). Yes, not all architectures supported by Linux have a high-resolution timer or cycle counter. IIUC, for some the best resolution available is that of "jiffies", which can increment as infrequently as once per 10 ms. On such kernels, the benchmark naturally needs to run for significantly longer than that to get a reasonably accurate time. I certainly agree that the benchmarking code I've written is ad-hoc. But at the same time, there's a bit more reasoning behind it than you might think. The "obvious" improvements suggested in this thread (disabling IRQs, doing only 1 warm-up iteration, doing only 100 iterations) make assumptions that are not true on many systems. - Eric ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES 2026-01-15 18:38 ` [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests " Holger Dengler 2026-01-15 20:43 ` Eric Biggers @ 2026-01-16 0:25 ` kernel test robot 1 sibling, 0 replies; 14+ messages in thread From: kernel test robot @ 2026-01-16 0:25 UTC (permalink / raw) To: Holger Dengler, Eric Biggers Cc: llvm, oe-kbuild-all, Ard Biesheuvel, Jason A . Donenfeld, Herbert Xu, Harald Freudenberger, Holger Dengler, linux-kernel, linux-crypto Hi Holger, kernel test robot noticed the following build errors: [auto build test ERROR on 47753e09a15d9fd7cdf114550510f4f2af9333ec] url: https://github.com/intel-lab-lkp/linux/commits/Holger-Dengler/lib-crypto-tests-Add-KUnit-tests-for-AES/20260116-030041 base: 47753e09a15d9fd7cdf114550510f4f2af9333ec patch link: https://lore.kernel.org/r/20260115183831.72010-2-dengler%40linux.ibm.com patch subject: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES config: hexagon-randconfig-002-20260116 (https://download.01.org/0day-ci/archive/20260116/202601160822.yLaBf86R-lkp@intel.com/config) compiler: clang version 19.1.7 (https://github.com/llvm/llvm-project cd708029e0b2869e80abe31ddb175f7c35361f90) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260116/202601160822.yLaBf86R-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202601160822.yLaBf86R-lkp@intel.com/ All errors (new ones prefixed by >>): >> lib/crypto/tests/aes_kunit.c:73:24: error: use of undeclared identifier 'SZ_1M' 73 | (t_enc ?: 1) * SZ_1M)); | ^ >> lib/crypto/tests/aes_kunit.c:73:24: error: use of undeclared identifier 'SZ_1M' lib/crypto/tests/aes_kunit.c:80:24: error: use of undeclared identifier 'SZ_1M' 80 | (t_dec ?: 1) * SZ_1M)); | ^ lib/crypto/tests/aes_kunit.c:80:24: error: use of undeclared identifier 'SZ_1M' 4 errors generated. vim +/SZ_1M +73 lib/crypto/tests/aes_kunit.c 27 28 static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv) 29 { 30 const size_t num_iters = 10000000; 31 u8 out[AES_BLOCK_SIZE]; 32 struct aes_key aes_key; 33 u64 t_enc, t_dec; 34 int rc; 35 36 if (!IS_ENABLED(CONFIG_CRYPTO_LIB_BENCHMARK)) 37 kunit_skip(test, "not enabled"); 38 39 rc = aes_preparekey(&aes_key, tv->key.b, tv->key.len); 40 KUNIT_ASSERT_EQ(test, 0, rc); 41 42 /* warm-up enc */ 43 for (size_t i = 0; i < 1000; i++) 44 aes_encrypt(&aes_key, out, tv->plain); 45 46 preempt_disable(); 47 t_enc = ktime_get_ns(); 48 49 for (size_t i = 0; i < num_iters; i++) 50 aes_encrypt(&aes_key, out, tv->plain); 51 52 t_enc = ktime_get_ns() - t_enc; 53 preempt_enable(); 54 55 /* warm-up dec */ 56 for (size_t i = 0; i < 1000; i++) 57 aes_decrypt(&aes_key, out, tv->cipher); 58 59 preempt_disable(); 60 t_dec = ktime_get_ns(); 61 62 for (size_t i = 0; i < num_iters; i++) 63 aes_decrypt(&aes_key, out, tv->cipher); 64 65 t_dec = ktime_get_ns() - t_dec; 66 preempt_enable(); 67 68 kunit_info(test, "enc (iter. %zu, duration %lluns)", 69 num_iters, t_enc); 70 kunit_info(test, "enc (len=%zu): %llu MB/s", 71 (size_t)AES_BLOCK_SIZE, 72 div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC, > 73 (t_enc ?: 1) * SZ_1M)); 74 75 kunit_info(test, "dec (iter. %zu, duration %lluns)", 76 num_iters, t_dec); 77 kunit_info(test, "dec (len=%zu): %llu MB/s", 78 (size_t)AES_BLOCK_SIZE, 79 div64_u64((u64)AES_BLOCK_SIZE * num_iters * NSEC_PER_SEC, 80 (t_dec ?: 1) * SZ_1M)); 81 } 82 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-01-17 23:59 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-15 18:38 [PATCH v1 0/1] lib/crypto: tests: KUnit test-suite for AES Holger Dengler 2026-01-15 18:38 ` [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests " Holger Dengler 2026-01-15 20:43 ` Eric Biggers 2026-01-15 21:51 ` Holger Dengler 2026-01-15 21:58 ` Eric Biggers 2026-01-15 22:05 ` David Laight 2026-01-16 17:31 ` Holger Dengler 2026-01-16 18:37 ` David Laight 2026-01-16 19:20 ` Holger Dengler 2026-01-16 19:44 ` Eric Biggers 2026-01-16 20:55 ` Holger Dengler 2026-01-16 22:30 ` David Laight 2026-01-17 23:59 ` Eric Biggers 2026-01-16 0:25 ` kernel test robot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox