All of lore.kernel.org
 help / color / mirror / Atom feed
From: Holger Dengler <dengler@linux.ibm.com>
To: David Laight <david.laight.linux@gmail.com>,
	Eric Biggers <ebiggers@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>,
	"Jason A . Donenfeld" <Jason@zx2c4.com>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Harald Freudenberger <freude@linux.ibm.com>,
	linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org
Subject: Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES
Date: Fri, 16 Jan 2026 18:31:58 +0100	[thread overview]
Message-ID: <389595e9-e13a-42e3-b0ff-9ca0dd3effe3@linux.ibm.com> (raw)
In-Reply-To: <20260115220558.25390c0e@pumpkin>

Hi David,

On 15/01/2026 23:05, David Laight wrote:
> On Thu, 15 Jan 2026 12:43:32 -0800
> Eric Biggers <ebiggers@kernel.org> wrote:
>>> +static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv)
>>> +{
>>> +	const size_t num_iters = 10000000;  
>>
>> 10000000 iterations is too many.  That's 160 MB of data in each
>> direction per AES key length.  Some CPUs without AES instructions can do
>> only ~20 MB AES per second.  In that case, this benchmark would take 16
>> seconds to run per AES key length, for 48 seconds total.
> 
> Probably best to first do a test that would take a 'reasonable' time
> on a cpu without AES. If that is 'very fast' then do a longer test
> to get more accuracy on a faster implementation.
> 
>>
>> hash-test-template.h and crc_kunit.c use 10000000 / (len + 128)
>> iterations.  That would be 69444 in this case (considering len=16),
>> which is less than 1% of the iterations you've used.  Choosing a number
>> similar to that would seem more appropriate.
>>
>> Ultimately these are just made-up numbers.  But I think we should aim
>> for the benchmark test in each KUnit test suite to take less than a
>> second or so.  The existing tests roughly achieve that, whereas it seems
>> this one can go over it by quite a bit due to the 10000000 iterations.
> 
> Even 1 second is a long time, you end up getting multiple interrupts included.
> I think a lot of these benchmarks are far too long.
> Timing differences less that 1% can be created by scheduling noise.
> Running a test that takes 200 'quanta' of the timer used has an
> error margin of under 1% (100 quanta might be enough).
> While the kernel timestamps have a resolution of 1ns the accuracy is worse.
> If you run a test for even just 10us you ought to get reasonable accuracy
> with a reasonable hope of not getting an interrupt.
> Run the test 10 times and report the fastest value.
> 
> You'll then find the results are entirely unstable because the cpu clock
> frequency keeps changing.
> And long enough buffers can get limited by the d-cache loads.
> 
> For something as slow as AES you can count the number of cpu cycles for
> a single call and get a reasonably consistent figure.
> That will tell you whether the loop is running at the speed you might
> expect it to run at.
> (You need to use data dependencies between the start/end 'times' and
> start/end of the code being timed, x86 lfence/mfence are too slow and
> can hide the 'setup' cost of some instructions.)

Thanks a lot for your feedback. I tried a few of your ideas and it turns out,
that they work quite well. First of all, with a single-block aes
encrypt/decrypt in our hardware (CPACF), we're very close to the resolution of
our CPU clock.

Disclaimer: The encryption/decryption of one block takes ~32ns (~500MB/s).
These numbers should be taken with some care, as on s390 the operating system
always runs virtualized. In my test environment, I also only have access to a
machine with shared CPUs, so there might be some negative impact from other
workload.

The benchmark loops for 100 iterations now without any warm-up. In each
iteration, I measure a single aes_encrypt()/aes_decrypt() call. The lowest
value of these measurements is takes as the value for the bandwidth
calculations. Although it is not necessary in my environment, I'm doing all
iterations with preemption disabled. I think, that this might help on other
platforms to reduce the jitter of the measurement values.

The removal of the warm-up does not have any impact on the numbers.

Just for information: I also tried to measure the cycles with the same
results. The minimal measurement value of a few iterations is much more stable
that the average over a larger number of iterations.

I also did some tests with IRQs disabled (instead of only preemption), but the
numbers stay the same. So I think, it is save enough to stay with disables
preemption.

I also tried you idea, first to do a few measurements and if they are fast
enough, increase the number of iterations. But it turns out, that this it not
really necessary (at least in my env). But I can add this, it it makes sense
on other platforms.

-- 
Mit freundlichen Grüßen / Kind regards
Holger Dengler
--
IBM Systems, Linux on IBM Z Development
dengler@linux.ibm.com


  reply	other threads:[~2026-01-16 17:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-15 18:38 [PATCH v1 0/1] lib/crypto: tests: KUnit test-suite for AES Holger Dengler
2026-01-15 18:38 ` [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests " Holger Dengler
2026-01-15 20:43   ` Eric Biggers
2026-01-15 21:51     ` Holger Dengler
2026-01-15 21:58       ` Eric Biggers
2026-01-15 22:05     ` David Laight
2026-01-16 17:31       ` Holger Dengler [this message]
2026-01-16 18:37         ` David Laight
2026-01-16 19:20           ` Holger Dengler
2026-01-16 19:44             ` Eric Biggers
2026-01-16 20:55               ` Holger Dengler
2026-01-16 22:30                 ` David Laight
2026-01-17 23:59                   ` Eric Biggers
2026-01-16  0:25   ` kernel test robot
  -- strict thread matches above, loose matches on Subject: below --
2026-01-16 13:20 kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=389595e9-e13a-42e3-b0ff-9ca0dd3effe3@linux.ibm.com \
    --to=dengler@linux.ibm.com \
    --cc=Jason@zx2c4.com \
    --cc=ardb@kernel.org \
    --cc=david.laight.linux@gmail.com \
    --cc=ebiggers@kernel.org \
    --cc=freude@linux.ibm.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.