All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Richard Henderson <richard.henderson@linaro.org>
Cc: Alexander Monakov <amonakov@ispras.ru>,
	qemu-devel@nongnu.org, mmromanov@ispras.ru
Subject: Re: [PATCH v5 10/10] tests/bench: Add bufferiszero-bench
Date: Mon, 19 Feb 2024 10:02:32 +0000	[thread overview]
Message-ID: <ZdMnONFiJDtP-X42@redhat.com> (raw)
In-Reply-To: <ba0548c4-d47c-4bf0-8f27-1f753b41b603@linaro.org>

On Sat, Feb 17, 2024 at 09:21:50AM -1000, Richard Henderson wrote:
> On 2/16/24 23:49, Alexander Monakov wrote:
> > 
> > On Fri, 16 Feb 2024, Richard Henderson wrote:
> > 
> > > Benchmark each acceleration function vs an aligned buffer of zeros.
> > > 
> > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> > > ---
> > > +
> > > +static void test(const void *opaque)
> > > +{
> > > +    size_t len = 64 * KiB;
> > 
> > This exceeds L1 cache capacity, so the performance ceiling of L2 cache
> > throughput is easier to hit with a suboptimal implementation. It also
> > seems to vastly exceed typical buffer sizes in Qemu.
> > 
> > When preparing the patch we mostly tested at 8 KiB. The size decides
> > whether the branch exiting the loop becomes perfectly predictable in
> > the microbenchmark, e.g. at 128 bytes per iteration it exits on the
> > 63'rd iteration, which Intel predictors cannot track, so we get
> > one mispredict per call.
> > 
> > (so perhaps smaller sizes like 2 or 4 KiB are better)
> 
> Fair.  I've adjusted to loop over 1, 4, 16, 64 KiB.
> 
> # Start of bufferiszero tests
> # buffer_is_zero #0: 1KB 49227.29 MB/sec
> # buffer_is_zero #0: 4KB 137461.28 MB/sec
> # buffer_is_zero #0: 16KB 224220.41 MB/sec
> # buffer_is_zero #0: 64KB 142461.00 MB/sec
> # buffer_is_zero #1: 1KB 45423.59 MB/sec
> # buffer_is_zero #1: 4KB 91409.69 MB/sec
> # buffer_is_zero #1: 16KB 123819.94 MB/sec
> # buffer_is_zero #1: 64KB 71173.75 MB/sec
> # buffer_is_zero #2: 1KB 35465.03 MB/sec
> # buffer_is_zero #2: 4KB 56110.46 MB/sec
> # buffer_is_zero #2: 16KB 68852.28 MB/sec
> # buffer_is_zero #2: 64KB 39043.80 MB/sec

Totally nit-picking, but it would be easier to read with a little
alignment and blanks lines:

 # buffer_is_zero #0:  1KB  49227.29 MB/sec
 # buffer_is_zero #0:  4KB 137461.28 MB/sec
 # buffer_is_zero #0: 16KB 224220.41 MB/sec
 # buffer_is_zero #0: 64KB 142461.00 MB/sec
 
 # buffer_is_zero #1:  1KB  45423.59 MB/sec
 # buffer_is_zero #1:  4KB  91409.69 MB/sec
 # buffer_is_zero #1: 16KB 123819.94 MB/sec
 # buffer_is_zero #1: 64KB  71173.75 MB/sec
 
 # buffer_is_zero #2:  1KB  35465.03 MB/sec
 # buffer_is_zero #2:  4KB  56110.46 MB/sec
 # buffer_is_zero #2: 16KB  68852.28 MB/sec
 # buffer_is_zero #2: 64KB  39043.80 MB/sec

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



      reply	other threads:[~2024-02-19 10:03 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-17  0:39 [PATCH v5 00/10] Optimize buffer_is_zero Richard Henderson
2024-02-17  0:39 ` [PATCH v5 01/10] util/bufferiszero: Remove SSE4.1 variant Richard Henderson
2024-02-17  0:39 ` [PATCH v5 02/10] util/bufferiszero: Remove AVX512 variant Richard Henderson
2024-02-17  0:39 ` [PATCH v5 03/10] util/bufferiszero: Reorganize for early test for acceleration Richard Henderson
2024-02-17  0:39 ` [PATCH v5 04/10] util/bufferiszero: Remove useless prefetches Richard Henderson
2024-02-17  0:39 ` [PATCH v5 05/10] util/bufferiszero: Optimize SSE2 and AVX2 variants Richard Henderson
2024-02-17  0:39 ` [PATCH v5 06/10] util/bufferiszero: Improve scalar variant Richard Henderson
2024-02-17 12:13   ` Alexander Monakov
2024-02-17 19:18     ` Richard Henderson
2024-02-17  0:39 ` [PATCH v5 07/10] util/bufferiszero: Introduce biz_accel_fn typedef Richard Henderson
2024-02-17  0:39 ` [PATCH v5 08/10] util/bufferiszero: Simplify test_buffer_is_zero_next_accel Richard Henderson
2024-02-17  0:39 ` [PATCH v5 09/10] util/bufferiszero: Add simd acceleration for aarch64 Richard Henderson
2024-02-17 11:33   ` Alexander Monakov
2024-02-17 19:19     ` Richard Henderson
2024-02-17  0:39 ` [PATCH v5 10/10] tests/bench: Add bufferiszero-bench Richard Henderson
2024-02-17  9:49   ` Alexander Monakov
2024-02-17 19:21     ` Richard Henderson
2024-02-19 10:02       ` Daniel P. Berrangé [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZdMnONFiJDtP-X42@redhat.com \
    --to=berrange@redhat.com \
    --cc=amonakov@ispras.ru \
    --cc=mmromanov@ispras.ru \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.