All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ard Biesheuvel <ardb@kernel.org>
To: linux-crypto@vger.kernel.org
Cc: herbert@gondor.apana.org.au, Ard Biesheuvel <ardb@kernel.org>,
	Douglas Anderson <dianders@chromium.org>,
	David Laight <David.Laight@aculab.com>
Subject: [PATCH v2 2/2] crypto: xor - use ktime for template benchmarking
Date: Sat, 26 Sep 2020 12:26:51 +0200	[thread overview]
Message-ID: <20200926102651.31598-3-ardb@kernel.org> (raw)
In-Reply-To: <20200926102651.31598-1-ardb@kernel.org>

Currently, we use the jiffies counter as a time source, by staring at
it until a HZ period elapses, and then staring at it again and perform
as many XOR operations as we can at the same time until another HZ
period elapses, so that we can calculate the throughput. This takes
longer than necessary, and depends on HZ, which is undesirable, since
HZ is system dependent.

Let's use the ktime interface instead, and use it to time a fixed
number of XOR operations, which can be done much faster, and makes
the time spent depend on the performance level of the system itself,
which is much more reasonable. To ensure that we have the resolution
we need even on systems with 32 kHz time sources, while not spending too
much time in the benchmark on a slow CPU, let's switch to 3 attempts of
800 repetitions each: that way, we will only misidentify algorithms that
perform within 10% of each other as the fastest if they are faster than
10 GB/s to begin with, which is not expected to occur on systems with
such coarse clocks.

On ThunderX2, I get the following results:

Before:

  [72625.956765] xor: measuring software checksum speed
  [72625.993104]    8regs     : 10169.000 MB/sec
  [72626.033099]    32regs    : 12050.000 MB/sec
  [72626.073095]    arm64_neon: 11100.000 MB/sec
  [72626.073097] xor: using function: 32regs (12050.000 MB/sec)

After:

  [72599.650216] xor: measuring software checksum speed
  [72599.651188]    8regs           : 10491 MB/sec
  [72599.652006]    32regs          : 12345 MB/sec
  [72599.652871]    arm64_neon      : 11402 MB/sec
  [72599.652873] xor: using function: 32regs (12345 MB/sec)

Link: https://lore.kernel.org/linux-crypto/20200923182230.22715-3-ardb@kernel.org/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 crypto/xor.c | 38 +++++++++-----------
 1 file changed, 16 insertions(+), 22 deletions(-)

diff --git a/crypto/xor.c b/crypto/xor.c
index b42c38343733..a0badbc03577 100644
--- a/crypto/xor.c
+++ b/crypto/xor.c
@@ -76,49 +76,43 @@ static int __init register_xor_blocks(void)
 }
 #endif
 
-#define BENCH_SIZE (PAGE_SIZE)
+#define BENCH_SIZE	4096
+#define REPS		800U
 
 static void __init
 do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2)
 {
 	int speed;
-	unsigned long now, j;
-	int i, count, max;
+	int i, j, count;
+	ktime_t min, start, diff;
 
 	tmpl->next = template_list;
 	template_list = tmpl;
 
 	preempt_disable();
 
-	/*
-	 * Count the number of XORs done during a whole jiffy, and use
-	 * this to calculate the speed of checksumming.  We use a 2-page
-	 * allocation to have guaranteed color L1-cache layout.
-	 */
-	max = 0;
-	for (i = 0; i < 5; i++) {
-		j = jiffies;
-		count = 0;
-		while ((now = jiffies) == j)
-			cpu_relax();
-		while (time_before(jiffies, now + 1)) {
+	min = (ktime_t)S64_MAX;
+	for (i = 0; i < 3; i++) {
+		start = ktime_get();
+		for (j = 0; j < REPS; j++) {
 			mb(); /* prevent loop optimzation */
 			tmpl->do_2(BENCH_SIZE, b1, b2);
 			mb();
 			count++;
 			mb();
 		}
-		if (count > max)
-			max = count;
+		diff = ktime_sub(ktime_get(), start);
+		if (diff < min)
+			min = diff;
 	}
 
 	preempt_enable();
 
-	speed = max * (HZ * BENCH_SIZE / 1024);
+	// bytes/ns == GB/s, multiply by 1000 to get MB/s [not MiB/s]
+	speed = (1000 * REPS * BENCH_SIZE) / (unsigned int)ktime_to_ns(min);
 	tmpl->speed = speed;
 
-	printk(KERN_INFO "   %-10s: %5d.%03d MB/sec\n", tmpl->name,
-	       speed / 1000, speed % 1000);
+	pr_info("   %-16s: %5d MB/sec\n", tmpl->name, speed);
 }
 
 static int __init
@@ -158,8 +152,8 @@ calibrate_xor_blocks(void)
 		if (f->speed > fastest->speed)
 			fastest = f;
 
-	printk(KERN_INFO "xor: using function: %s (%d.%03d MB/sec)\n",
-	       fastest->name, fastest->speed / 1000, fastest->speed % 1000);
+	pr_info("xor: using function: %s (%d MB/sec)\n",
+	       fastest->name, fastest->speed);
 
 #undef xor_speed
 
-- 
2.17.1


  parent reply	other threads:[~2020-09-26 10:27 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-26 10:26 [PATCH v2 0/2] crypto: xor - defer and optimize boot time benchmark Ard Biesheuvel
2020-09-26 10:26 ` [PATCH v2 1/2] crypto: xor - defer load time benchmark to a later time Ard Biesheuvel
2020-09-26 10:26 ` Ard Biesheuvel [this message]
2020-09-28 23:47   ` [PATCH v2 2/2] crypto: xor - use ktime for template benchmarking Doug Anderson
2020-10-02 11:55 ` [PATCH v2 0/2] crypto: xor - defer and optimize boot time benchmark Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200926102651.31598-3-ardb@kernel.org \
    --to=ardb@kernel.org \
    --cc=David.Laight@aculab.com \
    --cc=dianders@chromium.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-crypto@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.