Re: [Qemu-devel] [PATCH v3 25/34] tests: add atomic_add-bench

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: Richard Henderson <rth@twiddle.net>
Cc: qemu-devel@nongnu.org, "Emilio G. Cota" <cota@braap.org>
Subject: Re: [Qemu-devel] [PATCH v3 25/34] tests: add atomic_add-bench
Date: Wed, 14 Sep 2016 14:53:14 +0100	[thread overview]
Message-ID: <87vaxyio51.fsf@linaro.org> (raw)
In-Reply-To: <1472935202-3342-26-git-send-email-rth@twiddle.net>


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> With this microbenchmark we can measure the overhead of emulating atomic
> instructions with a configurable degree of contention.
>
> The benchmark spawns $n threads, each performing $o atomic ops (additions)
> in a loop. Each atomic operation is performed on a different cache line
> (assuming lines are 64b long) that is randomly selected from a range [0, $r).
>
> [ Note: each $foo corresponds to a -foo flag ]
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> Message-Id: <1467054136-10430-20-git-send-email-cota@braap.org>
> ---
>  tests/.gitignore         |   1 +
>  tests/Makefile.include   |   4 +-
>  tests/atomic_add-bench.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 184 insertions(+), 1 deletion(-)
>  create mode 100644 tests/atomic_add-bench.c
>
> diff --git a/tests/.gitignore b/tests/.gitignore
> index dbb5263..ec3137a 100644
> --- a/tests/.gitignore
> +++ b/tests/.gitignore
> @@ -1,3 +1,4 @@
> +atomic_add-bench
>  check-qdict
>  check-qfloat
>  check-qint
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index 14be491..e1957ed 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -421,7 +421,8 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o tests/check-qdict.o \
>  	tests/test-opts-visitor.o tests/test-qmp-event.o \
>  	tests/rcutorture.o tests/test-rcu-list.o \
>  	tests/test-qdist.o \
> -	tests/test-qht.o tests/qht-bench.o tests/test-qht-par.o
> +	tests/test-qht.o tests/qht-bench.o tests/test-qht-par.o \
> +	tests/atomic_add-bench.o
>
>  $(test-obj-y): QEMU_INCLUDES += -Itests
>  QEMU_CFLAGS += -I$(SRC_PATH)/tests
> @@ -465,6 +466,7 @@ tests/test-qdist$(EXESUF): tests/test-qdist.o $(test-util-obj-y)
>  tests/test-qht$(EXESUF): tests/test-qht.o $(test-util-obj-y)
>  tests/test-qht-par$(EXESUF): tests/test-qht-par.o tests/qht-bench$(EXESUF) $(test-util-obj-y)
>  tests/qht-bench$(EXESUF): tests/qht-bench.o $(test-util-obj-y)
> +tests/atomic_add-bench$(EXESUF): tests/atomic_add-bench.o
>  $(test-util-obj-y)

This probably more properly lives in tests/tcg/generic or some such but
that needs the tcg/tests being rehabilitated into the build system so at
least here it gets built.

>
>  tests/test-qdev-global-props$(EXESUF): tests/test-qdev-global-props.o \
>  	hw/core/qdev.o hw/core/qdev-properties.o hw/core/hotplug.o\
> diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c
> new file mode 100644
> index 0000000..5bbecf6
> --- /dev/null
> +++ b/tests/atomic_add-bench.c

I wonder if this would be worth making atomic-bench and adding the other
atomic operations into the benchmark? I know given the current helper
overhead its unlikely to show much difference between the ops but if we
move to backend support for the tcg atomics it would be a useful tool to
have.

> @@ -0,0 +1,180 @@
> +#include "qemu/osdep.h"
> +#include "qemu/thread.h"
> +#include "qemu/host-utils.h"
> +#include "qemu/processor.h"
> +
> +struct thread_info {
> +    uint64_t r;
> +} QEMU_ALIGNED(64);
> +
> +struct count {
> +    unsigned long val;
> +} QEMU_ALIGNED(64);
> +
> +static QemuThread *threads;
> +static struct thread_info *th_info;
> +static unsigned int n_threads = 1;
> +static unsigned int n_ready_threads;
> +static struct count *counts;
> +static unsigned long n_ops = 10000;
> +static double duration;
> +static unsigned int range = 1;
> +static bool test_start;
> +
> +static const char commands_string[] =
> +    " -n = number of threads\n"
> +    " -o = number of ops per thread\n"
> +    " -r = range (will be rounded up to pow2)";
> +
> +static void usage_complete(char *argv[])
> +{
> +    fprintf(stderr, "Usage: %s [options]\n", argv[0]);
> +    fprintf(stderr, "options:\n%s\n", commands_string);
> +}
> +
> +/*
> + * From: https://en.wikipedia.org/wiki/Xorshift
> + * This is faster than rand_r(), and gives us a wider range (RAND_MAX is only
> + * guaranteed to be >= INT_MAX).
> + */
> +static uint64_t xorshift64star(uint64_t x)
> +{
> +    x ^= x >> 12; /* a */
> +    x ^= x << 25; /* b */
> +    x ^= x >> 27; /* c */
> +    return x * UINT64_C(2685821657736338717);
> +}
> +
> +static void *thread_func(void *arg)
> +{
> +    struct thread_info *info = arg;
> +    unsigned long i;
> +
> +    atomic_inc(&n_ready_threads);
> +    while (!atomic_mb_read(&test_start)) {
> +        cpu_relax();
> +    }
> +
> +    for (i = 0; i < n_ops; i++) {
> +        unsigned int index;
> +
> +        info->r = xorshift64star(info->r);
> +        index = info->r & (range - 1);
> +        atomic_inc(&counts[index].val);
> +    }
> +    return NULL;
> +}
> +
> +static inline
> +uint64_t ts_subtract(const struct timespec *a, const struct timespec *b)
> +{
> +    uint64_t ns;
> +
> +    ns = (b->tv_sec - a->tv_sec) * 1000000000ULL;
> +    ns += (b->tv_nsec - a->tv_nsec);
> +    return ns;
> +}
> +
> +static void run_test(void)
> +{
> +    unsigned int i;
> +    struct timespec ts_start, ts_end;
> +
> +    while (atomic_read(&n_ready_threads) != n_threads) {
> +        cpu_relax();
> +    }
> +    atomic_mb_set(&test_start, true);
> +
> +    clock_gettime(CLOCK_MONOTONIC, &ts_start);
> +    for (i = 0; i < n_threads; i++) {
> +        qemu_thread_join(&threads[i]);
> +    }
> +    clock_gettime(CLOCK_MONOTONIC, &ts_end);
> +    duration = ts_subtract(&ts_start, &ts_end) / 1e9;
> +}
> +
> +static void create_threads(void)
> +{
> +    unsigned int i;
> +
> +    threads = g_new(QemuThread, n_threads);
> +    th_info = g_new(struct thread_info, n_threads);
> +    counts = qemu_memalign(64, sizeof(*counts) * range);

This fails on my setup as AFAICT qemu_memalign doesn't give you zeroed
memory. I added a memset after to zero it out.

> +
> +    for (i = 0; i < n_threads; i++) {
> +        struct thread_info *info = &th_info[i];
> +
> +        info->r = (i + 1) ^ time(NULL);
> +        qemu_thread_create(&threads[i], NULL, thread_func, info,
> +                           QEMU_THREAD_JOINABLE);
> +    }
> +}
> +
> +static void pr_params(void)
> +{
> +    printf("Parameters:\n");
> +    printf(" # of threads:      %u\n", n_threads);
> +    printf(" n_ops:             %lu\n", n_ops);
> +    printf(" ops' range:        %u\n", range);
> +}
> +
> +static void pr_stats(void)
> +{
> +    unsigned long long val = 0;
> +    unsigned int i;
> +    double tx;
> +
> +    for (i = 0; i < range; i++) {
> +        val += counts[i].val;
> +    }
> +    assert(val == n_threads * n_ops);

Again while I was testing this failed due to the above. It would proably
also be worth reporting the fail condition for the test so my current
hacky patch looks like:

modified   tests/atomic_add-bench.c
@@ -100,6 +100,7 @@ static void create_threads(void)
     threads = g_new(QemuThread, n_threads);
     th_info = g_new(struct thread_info, n_threads);
     counts = qemu_memalign(64, sizeof(*counts) * range);
+    memset(counts, 0, sizeof(*counts) * range);

     for (i = 0; i < n_threads; i++) {
         struct thread_info *info = &th_info[i];
@@ -118,22 +119,29 @@ static void pr_params(void)
     printf(" ops' range:        %u\n", range);
 }

-static void pr_stats(void)
+static int pr_stats(void)
 {
-    unsigned long long val = 0;
+    unsigned long long target_val, val = 0;
     unsigned int i;
     double tx;

     for (i = 0; i < range; i++) {
         val += counts[i].val;
     }
-    assert(val == n_threads * n_ops);
+
+    target_val = (n_threads * n_ops);
+    if (val != target_val) {
+        printf("Bad total: %llu vs %llu\n", val, target_val);
+        return -1;
+    };
     tx = val / duration / 1e6;

     printf("Results:\n");
     printf("Duration:            %.2f s\n", duration);
     printf(" Throughput:         %.2f Mops/s\n", tx);
     printf(" Throughput/thread:  %.2f Mops/s/thread\n", tx / n_threads);
+
+    return 0;
 }

 static void parse_args(int argc, char *argv[])
@@ -175,6 +183,5 @@ int main(int argc, char *argv[])
     pr_params();
     create_threads();
     run_test();
-    pr_stats();
-    return 0;
+    return pr_stats();
 }

--
Alex Bennée

next prev parent reply	other threads:[~2016-09-14 13:54 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-03 20:39 [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 01/34] atomics: add atomic_xor Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 02/34] atomics: add atomic_op_fetch variants Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 03/34] exec: Avoid direct references to Int128 parts Richard Henderson
2016-09-09 17:14   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 04/34] int128: Use __int128 if available Richard Henderson
2016-09-09 17:19   ` Alex Bennée
2016-09-09 17:38     ` Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 05/34] int128: Add int128_make128 Richard Henderson
2016-09-09 13:01   ` Leon Alrae
2016-09-09 20:16     ` Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 06/34] tcg: Add EXCP_ATOMIC Richard Henderson
2016-09-12 14:16   ` Alex Bennée
2016-09-12 20:19     ` Richard Henderson
2016-09-13  6:42       ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 07/34] HACK: Always enable parallel_cpus Richard Henderson
2016-09-12 14:20   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 08/34] cputlb: Replace SHIFT with DATA_SIZE Richard Henderson
2016-09-12 14:22   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 09/34] cputlb: Move probe_write out of softmmu_template.h Richard Henderson
2016-09-12 14:35   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 10/34] cputlb: Remove includes from softmmu_template.h Richard Henderson
2016-09-12 14:38   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 11/34] cputlb: Move most of iotlb code out of line Richard Henderson
2016-09-12 15:26   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 12/34] cputlb: Tidy some macros Richard Henderson
2016-09-12 15:28   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 13/34] tcg: Add atomic helpers Richard Henderson
2016-09-09 13:11   ` Leon Alrae
2016-09-09 14:46   ` Leon Alrae
2016-09-09 16:26     ` Richard Henderson
2016-09-12  7:59       ` Leon Alrae
2016-09-12 16:13         ` Richard Henderson
2016-09-13 12:32           ` Leon Alrae
2016-09-12 13:47   ` Alex Bennée
2016-09-13 18:00     ` Richard Henderson
2017-03-24 10:14       ` Nikunj A Dadhania
2017-03-24 10:58         ` Alex Bennée
2017-03-24 17:27           ` Nikunj A Dadhania
2017-03-27 11:56           ` Nikunj A Dadhania
2016-09-13 17:06   ` Alex Bennée
2016-09-13 17:26     ` Richard Henderson
2016-09-13 18:45       ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 14/34] tcg: Add atomic128 helpers Richard Henderson
2016-09-13 11:18   ` Alex Bennée
2016-09-13 14:18     ` Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 15/34] tcg: Add CONFIG_ATOMIC64 Richard Henderson
2016-09-14 10:12   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 16/34] target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 17/34] target-i386: emulate LOCK'ed OP instructions using atomic helpers Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 18/34] target-i386: emulate LOCK'ed INC using atomic helper Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 19/34] target-i386: emulate LOCK'ed NOT " Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 20/34] target-i386: emulate LOCK'ed NEG using cmpxchg helper Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 21/34] target-i386: emulate LOCK'ed XADD using atomic helper Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 22/34] target-i386: emulate LOCK'ed BTX ops using atomic helpers Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 23/34] target-i386: emulate XCHG using atomic helper Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 24/34] target-i386: remove helper_lock() Richard Henderson
2016-09-14 11:14   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 25/34] tests: add atomic_add-bench Richard Henderson
2016-09-14 13:53   ` Alex Bennée [this message]
2016-09-15  2:23     ` Emilio G. Cota
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 26/34] target-arm: Rearrange aa32 load and store functions Richard Henderson
2016-09-14 15:58   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 27/34] target-arm: emulate LL/SC using cmpxchg helpers Richard Henderson
2016-09-14 16:03   ` Alex Bennée
2016-09-14 16:38     ` Richard Henderson
2016-10-20 17:51       ` Pranith Kumar
2016-10-20 18:00         ` Richard Henderson
2016-10-20 18:58           ` Pranith Kumar
2016-10-20 19:02             ` Richard Henderson
2016-10-20 19:07               ` Pranith Kumar
2016-10-21  4:34                 ` Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 28/34] target-arm: emulate SWP with atomic_xchg helper Richard Henderson
2016-09-14 16:05   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 29/34] target-arm: emulate aarch64's LL/SC using cmpxchg helpers Richard Henderson
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 30/34] linux-user: remove handling of ARM's EXCP_STREX Richard Henderson
2016-09-15  9:36   ` Alex Bennée
2016-09-03 20:39 ` [Qemu-devel] [PATCH v3 31/34] linux-user: remove handling of aarch64's EXCP_STREX Richard Henderson
2016-09-15  9:36   ` Alex Bennée
2016-09-03 20:40 ` [Qemu-devel] [PATCH v3 32/34] target-arm: remove EXCP_STREX + cpu_exclusive_{test, info} Richard Henderson
2016-09-15  9:39   ` Alex Bennée
2016-09-03 20:40 ` [Qemu-devel] [PATCH v3 33/34] target-alpha: Introduce MMU_PHYS_IDX Richard Henderson
2016-09-15 10:10   ` Alex Bennée
2016-09-15 16:38     ` Richard Henderson
2016-09-03 20:40 ` [Qemu-devel] [PATCH v3 34/34] target-alpha: Emulate LL/SC using cmpxchg helpers Richard Henderson
2016-09-15 14:38   ` Alex Bennée
2016-09-15 16:48     ` Richard Henderson
2016-09-15 17:48       ` Alex Bennée
2016-09-15 18:28         ` Richard Henderson
2016-09-03 21:25 ` [Qemu-devel] [PATCH v3 00/34] cmpxchg-based emulation of atomics no-reply
2016-09-03 21:26 ` no-reply
2016-09-09 18:33 ` Alex Bennée
2016-09-09 19:07   ` Richard Henderson
2016-09-09 19:29     ` Alex Bennée
2016-09-09 20:03       ` Richard Henderson
2016-09-09 20:11       ` Richard Henderson
2016-09-15 14:39 ` Alex Bennée

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87vaxyio51.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=cota@braap.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.