public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: Holger Dengler <dengler@linux.ibm.com>
Cc: Eric Biggers <ebiggers@kernel.org>,
	Ard Biesheuvel <ardb@kernel.org>,
	"Jason A . Donenfeld" <Jason@zx2c4.com>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Harald Freudenberger <freude@linux.ibm.com>,
	linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org
Subject: Re: [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests for AES
Date: Fri, 16 Jan 2026 18:37:44 +0000	[thread overview]
Message-ID: <20260116183744.04781509@pumpkin> (raw)
In-Reply-To: <389595e9-e13a-42e3-b0ff-9ca0dd3effe3@linux.ibm.com>

On Fri, 16 Jan 2026 18:31:58 +0100
Holger Dengler <dengler@linux.ibm.com> wrote:

> Hi David,
> 
> On 15/01/2026 23:05, David Laight wrote:
> > On Thu, 15 Jan 2026 12:43:32 -0800
> > Eric Biggers <ebiggers@kernel.org> wrote:  
> >>> +static void benchmark_aes(struct kunit *test, const struct aes_testvector *tv)
> >>> +{
> >>> +	const size_t num_iters = 10000000;    
> >>
> >> 10000000 iterations is too many.  That's 160 MB of data in each
> >> direction per AES key length.  Some CPUs without AES instructions can do
> >> only ~20 MB AES per second.  In that case, this benchmark would take 16
> >> seconds to run per AES key length, for 48 seconds total.  
> > 
> > Probably best to first do a test that would take a 'reasonable' time
> > on a cpu without AES. If that is 'very fast' then do a longer test
> > to get more accuracy on a faster implementation.
> >   
> >>
> >> hash-test-template.h and crc_kunit.c use 10000000 / (len + 128)
> >> iterations.  That would be 69444 in this case (considering len=16),
> >> which is less than 1% of the iterations you've used.  Choosing a number
> >> similar to that would seem more appropriate.
> >>
> >> Ultimately these are just made-up numbers.  But I think we should aim
> >> for the benchmark test in each KUnit test suite to take less than a
> >> second or so.  The existing tests roughly achieve that, whereas it seems
> >> this one can go over it by quite a bit due to the 10000000 iterations.  
> > 
> > Even 1 second is a long time, you end up getting multiple interrupts included.
> > I think a lot of these benchmarks are far too long.
> > Timing differences less that 1% can be created by scheduling noise.
> > Running a test that takes 200 'quanta' of the timer used has an
> > error margin of under 1% (100 quanta might be enough).
> > While the kernel timestamps have a resolution of 1ns the accuracy is worse.
> > If you run a test for even just 10us you ought to get reasonable accuracy
> > with a reasonable hope of not getting an interrupt.
> > Run the test 10 times and report the fastest value.
> > 
> > You'll then find the results are entirely unstable because the cpu clock
> > frequency keeps changing.
> > And long enough buffers can get limited by the d-cache loads.
> > 
> > For something as slow as AES you can count the number of cpu cycles for
> > a single call and get a reasonably consistent figure.
> > That will tell you whether the loop is running at the speed you might
> > expect it to run at.
> > (You need to use data dependencies between the start/end 'times' and
> > start/end of the code being timed, x86 lfence/mfence are too slow and
> > can hide the 'setup' cost of some instructions.)  
> 
> Thanks a lot for your feedback. I tried a few of your ideas and it turns out,
> that they work quite well. First of all, with a single-block aes
> encrypt/decrypt in our hardware (CPACF), we're very close to the resolution of
> our CPU clock.
> 
> Disclaimer: The encryption/decryption of one block takes ~32ns (~500MB/s).
> These numbers should be taken with some care, as on s390 the operating system
> always runs virtualized. In my test environment, I also only have access to a
> machine with shared CPUs, so there might be some negative impact from other
> workload.

The impact of other workloads is much less likely for a short test,
and if it does happen you are likely to see a value that is abnormally large.

> The benchmark loops for 100 iterations now without any warm-up. In each
> iteration, I measure a single aes_encrypt()/aes_decrypt() call. The lowest
> value of these measurements is takes as the value for the bandwidth
> calculations. Although it is not necessary in my environment, I'm doing all
> iterations with preemption disabled. I think, that this might help on other
> platforms to reduce the jitter of the measurement values.
> 
> The removal of the warm-up does not have any impact on the numbers.

I'm not sure what the 'warm-up' was for.
The first test will be slow(er) due to I-cache misses.
(That will be more noticeable for big software loops - like blake2.)
Change to test parameters can affect branch prediction but that also only
usually affects the first test with each set of parameters.
(Unlikely to affect AES, but I could see that effect when testing
mul_u64_u64_div_u64().)
The only other reason for a 'warm-up' is to get the cpu frequency fast
and fixed - and there ought to be a better way of doing that.

> 
> Just for information: I also tried to measure the cycles with the same
> results. The minimal measurement value of a few iterations is much more stable
> that the average over a larger number of iterations.

My userspace test code runs each test 10 times and prints all 10 values.
I then look at them to see how consistent they are.

> I also did some tests with IRQs disabled (instead of only preemption), but the
> numbers stay the same. So I think, it is save enough to stay with disables
> preemption.

I'd actually go for disabling interrupts.
What you are seeing is the effect of interrupts not happening
(which is likely for a short test, but not for a long one).

> 
> I also tried you idea, first to do a few measurements and if they are fast
> enough, increase the number of iterations. But it turns out, that this it not
> really necessary (at least in my env). But I can add this, it it makes sense
> on other platforms.

The main reason for doing that is reducing the time the tests take on a
system that is massively slower (and doing software AES).
Maybe someone want to run the test cases on an m68k :-)

	David



  reply	other threads:[~2026-01-16 18:37 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-15 18:38 [PATCH v1 0/1] lib/crypto: tests: KUnit test-suite for AES Holger Dengler
2026-01-15 18:38 ` [PATCH v1 1/1] lib/crypto: tests: Add KUnit tests " Holger Dengler
2026-01-15 20:43   ` Eric Biggers
2026-01-15 21:51     ` Holger Dengler
2026-01-15 21:58       ` Eric Biggers
2026-01-15 22:05     ` David Laight
2026-01-16 17:31       ` Holger Dengler
2026-01-16 18:37         ` David Laight [this message]
2026-01-16 19:20           ` Holger Dengler
2026-01-16 19:44             ` Eric Biggers
2026-01-16 20:55               ` Holger Dengler
2026-01-16 22:30                 ` David Laight
2026-01-17 23:59                   ` Eric Biggers
2026-01-16  0:25   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260116183744.04781509@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=Jason@zx2c4.com \
    --cc=ardb@kernel.org \
    --cc=dengler@linux.ibm.com \
    --cc=ebiggers@kernel.org \
    --cc=freude@linux.ibm.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox