From: "Alex Bennée" <alex.bennee@linaro.org>
To: alvise rigo <a.rigo@virtualopensystems.com>
Cc: mttcg@listserver.greensocs.com,
"Peter Maydell" <peter.maydell@linaro.org>,
"Andrew Jones" <drjones@redhat.com>,
"Claudio Fontana" <claudio.fontana@huawei.com>,
kvm@vger.kernel.org,
"Alexander Spyridakis" <a.spyridakis@virtualopensystems.com>,
"Mark Burton" <mark.burton@greensocs.com>,
"QEMU Developers" <qemu-devel@nongnu.org>,
"KONRAD Frédéric" <fred.konrad@greensocs.com>
Subject: Re: [Qemu-devel] [kvm-unit-tests PATCH v5 11/11] new: arm/barrier-test for memory barriers
Date: Mon, 03 Aug 2015 17:06:26 +0100 [thread overview]
Message-ID: <87mvy8l5kt.fsf@linaro.org> (raw)
In-Reply-To: <CAH47eN3jhWtaeYTMJOJVSgBwzPNpFfU+vgHXCCOsaroDjdkKog@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1104 bytes --]
alvise rigo <a.rigo@virtualopensystems.com> writes:
> On Mon, Aug 3, 2015 at 12:30 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> alvise rigo <a.rigo@virtualopensystems.com> writes:
>>
>>> Hi Alex,
>>>
>>> Nice set of tests, they are proving to be helpful.
>>> One question below.
>>>
<snip>
>>>
>>> Why are we calling these last two instructions with the 'eq' suffix?
>>> Shouldn't we just strex r1, r0, [sptr] and then cmp r1, #0?
>>
>> Possibly, my armv7 is a little rusty. I'm just looking at tweaking this
>> test now so I'll try and clean that up.
Please find the updated test attached. I've also included some new test
modes. In theory the barrier test by itself should still fail but it
passes on real ARMv7 as well as TCG. I'm trying to run the test on a
heavier core-ed ARMv7 to check. I suspect we get away with it on
ARMv7-on-x86_64 due to the strong ordering of the x86.
The "excl" and "acqrel" tests now run without issue (although again
plain acqrel semantics shouldn't stop a race corrupting shared_value).
I'll tweak the v8 versions of the test tomorrow.
--
Alex Bennée
[-- Attachment #2: updated version of the test --]
[-- Type: text/x-diff, Size: 8958 bytes --]
>From 0953549985134268bf9079a7a01b2631d8a4fdee Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Alex=20Benn=C3=A9e?= <alex@bennee.com>
Date: Thu, 30 Jul 2015 15:13:33 +0000
Subject: [kvm-unit-tests PATCH] arm/barrier-test: add memory barrier tests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This test has been written mainly to stress multi-threaded TCG behaviour
but will demonstrate failure by default on real hardware. The test takes
the following parameters:
- "lock" use GCC's locking semantics
- "atomic" use GCC's __atomic primitives
- "barrier" use plain dmb() barriers
- "wfelock" use WaitForEvent sleep
- "excl" use load/store exclusive semantics
- "acqrel" use acquire/release semantics
Also two more options allow the test to be tweaked
- "noshuffle" disables the memory shuffling
- "count=%ld" set your own per-CPU increment count
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
v2
- Don't use thumb style strexeq stuff
- Add atomic, barrier and wfelock tests
- Add count/noshuffle test controls
---
arm/barrier-test.c | 284 +++++++++++++++++++++++++++++++++++++++++++
config/config-arm-common.mak | 2 +
2 files changed, 286 insertions(+)
create mode 100644 arm/barrier-test.c
diff --git a/arm/barrier-test.c b/arm/barrier-test.c
new file mode 100644
index 0000000..765f8f6
--- /dev/null
+++ b/arm/barrier-test.c
@@ -0,0 +1,284 @@
+#include <libcflat.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+
+#include <prng.h>
+
+#define MAX_CPUS 8
+
+/* How many increments to do */
+static int increment_count = 10000000;
+static int do_shuffle = 1;
+
+
+/* shared value all the tests attempt to safely increment */
+static unsigned int shared_value;
+
+/* PAGE_SIZE * uint32_t means we span several pages */
+static uint32_t memory_array[PAGE_SIZE];
+
+/* We use the alignment of the following to ensure accesses to locking
+ * and synchronisation primatives don't interfere with the page of the
+ * shared value
+ */
+__attribute__((aligned(PAGE_SIZE))) static unsigned int per_cpu_value[MAX_CPUS];
+__attribute__((aligned(PAGE_SIZE))) static cpumask_t smp_test_complete;
+__attribute__((aligned(PAGE_SIZE))) static int global_lock;
+
+struct isaac_ctx prng_context[MAX_CPUS];
+
+void (*inc_fn)(void);
+
+
+/* In any SMP setting this *should* fail due to cores stepping on
+ * each other updating the shared variable
+ */
+static void increment_shared(void)
+{
+ shared_value++;
+}
+
+/* GCC __sync primitives are deprecated in favour of __atomic */
+static void increment_shared_with_lock(void)
+{
+ while (__sync_lock_test_and_set(&global_lock, 1));
+ shared_value++;
+ __sync_lock_release(&global_lock);
+}
+
+/* In practice even __ATOMIC_RELAXED uses ARM's ldxr/stex exclusive
+ * semantics */
+static void increment_shared_with_atomic(void)
+{
+ __atomic_add_fetch(&shared_value, 1, __ATOMIC_SEQ_CST);
+}
+
+
+/* By themselves barriers do not gaurentee atomicity */
+static void increment_shared_with_barrier(void)
+{
+#if defined (__LP64__) || defined (_LP64)
+#else
+ asm volatile(
+ " ldr r0, [%[sptr]]\n"
+ " dmb\n"
+ " add r0, r0, #0x1\n"
+ " str r1, r0, [%[sptr]]\n"
+ " dmb\n"
+ : /* out */
+ : [sptr] "r" (&shared_value) /* in */
+ : "r0", "r1", "cc");
+#endif
+}
+
+/*
+ * Load/store exclusive with WFE (wait-for-event)
+ */
+
+static void increment_shared_with_wfelock(void)
+{
+#if defined (__LP64__) || defined (_LP64)
+#else
+ asm volatile(
+ " mov r1, #1\n"
+ "1: ldrex r0, [%[lock]]\n"
+ " cmp r0, #0\n"
+ " wfene\n"
+ " strexeq r0, r1, [%[lock]]\n"
+ " cmpeq r0, #0\n"
+ " bne 1b\n"
+ " dmb\n"
+ /* lock held */
+ " ldr r0, [%[sptr]]\n"
+ " add r0, r0, #0x1\n"
+ " str r0, [%[sptr]]\n"
+ /* now release */
+ " mov r0, #0\n"
+ " dmb\n"
+ " str r0, [%[lock]]\n"
+ " dsb\n"
+ " sev\n"
+ : /* out */
+ : [lock] "r" (&global_lock), [sptr] "r" (&shared_value) /* in */
+ : "r0", "r1", "cc");
+#endif
+}
+
+
+/*
+ * Hand-written version of the load/store exclusive
+ */
+static void increment_shared_with_excl(void)
+{
+#if defined (__LP64__) || defined (_LP64)
+ asm volatile(
+ "1: ldxr w0, [%[sptr]]\n"
+ " add w0, w0, #0x1\n"
+ " stxr w1, w0, [%[sptr]]\n"
+ " cbnz w1, 1b\n"
+ : /* out */
+ : [sptr] "r" (&shared_value) /* in */
+ : "w0", "w1", "cc");
+#else
+ asm volatile(
+ "1: ldrex r0, [%[sptr]]\n"
+ " add r0, r0, #0x1\n"
+ " strex r1, r0, [%[sptr]]\n"
+ " cmp r1, #0\n"
+ " bne 1b\n"
+ : /* out */
+ : [sptr] "r" (&shared_value) /* in */
+ : "r0", "r1", "cc");
+#endif
+}
+
+static void increment_shared_with_acqrel(void)
+{
+#if defined (__LP64__) || defined (_LP64)
+ asm volatile(
+ " ldar w0, [%[sptr]]\n"
+ " add w0, w0, #0x1\n"
+ " str w0, [%[sptr]]\n"
+ : /* out */
+ : [sptr] "r" (&shared_value) /* in */
+ : "w0");
+#else
+ /* ARMv7 has no acquire/release semantics but we
+ * can ensure the results of the write are propagated
+ * with the use of barriers.
+ */
+ asm volatile(
+ "1: ldrex r0, [%[sptr]]\n"
+ " add r0, r0, #0x1\n"
+ " strex r1, r0, [%[sptr]]\n"
+ " cmp r1, #0\n"
+ " bne 1b\n"
+ " dmb\n"
+ : /* out */
+ : [sptr] "r" (&shared_value) /* in */
+ : "r0", "r1", "cc");
+#endif
+
+}
+
+/* The idea of this is just to generate some random load/store
+ * activity which may or may not race with an un-barried incremented
+ * of the shared counter
+ */
+static void shuffle_memory(int cpu)
+{
+ int i;
+ uint32_t lspat = isaac_next_uint32(&prng_context[cpu]);
+ uint32_t seq = isaac_next_uint32(&prng_context[cpu]);
+ int count = seq & 0x1f;
+ uint32_t val=0;
+
+ seq >>= 5;
+
+ for (i=0; i<count; i++) {
+ int index = seq & ~PAGE_MASK;
+ if (lspat & 1) {
+ val ^= memory_array[index];
+ } else {
+ memory_array[index] = val;
+ }
+ seq >>= PAGE_SHIFT;
+ seq ^= lspat;
+ lspat >>= 1;
+ }
+
+}
+
+static void do_increment(void)
+{
+ int i;
+ int cpu = smp_processor_id();
+
+ printf("CPU%d online\n", cpu);
+
+ for (i=0; i < increment_count; i++) {
+ per_cpu_value[cpu]++;
+ inc_fn();
+
+ if (do_shuffle)
+ shuffle_memory(cpu);
+ }
+
+ printf("CPU%d: Done, %d incs\n", cpu, per_cpu_value[cpu]);
+
+ cpumask_set_cpu(cpu, &smp_test_complete);
+ if (cpu != 0)
+ halt();
+}
+
+int main(int argc, char **argv)
+{
+ int cpu;
+ unsigned int i, sum = 0;
+ static const unsigned char seed[] = "myseed";
+
+ inc_fn = &increment_shared;
+
+ isaac_init(&prng_context[0], &seed[0], sizeof(seed));
+
+ for (i=0; i<argc; i++) {
+ char *arg = argv[i];
+
+ if (strcmp(arg, "lock") == 0) {
+ inc_fn = &increment_shared_with_lock;
+ report_prefix_push("lock");
+ } else if (strcmp(arg, "atomic") == 0) {
+ inc_fn = &increment_shared_with_atomic;
+ report_prefix_push("atomic");
+ } else if (strcmp(arg, "barrier") == 0) {
+ inc_fn = &increment_shared_with_atomic;
+ report_prefix_push("barrier");
+ } else if (strcmp(arg, "wfelock") == 0) {
+ inc_fn = &increment_shared_with_wfelock;
+ report_prefix_push("wfelock");
+ } else if (strcmp(arg, "excl") == 0) {
+ inc_fn = &increment_shared_with_excl;
+ report_prefix_push("excl");
+ } else if (strcmp(arg, "acqrel") == 0) {
+ inc_fn = &increment_shared_with_acqrel;
+ report_prefix_push("acqrel");
+ } else if (strcmp(arg, "noshuffle") == 0) {
+ do_shuffle = 0;
+ report_prefix_push("noshuffle");
+ } else if (strstr(arg, "count=") != NULL) {
+ char *p = strstr(arg, "=");
+ increment_count = atol(p+1);
+ } else {
+ isaac_reseed(&prng_context[0], (unsigned char *) arg, strlen(arg));
+ }
+ }
+
+ /* fill our random page */
+ for (i=0; i<PAGE_SIZE; i++) {
+ memory_array[i] = isaac_next_uint32(&prng_context[0]);
+ }
+
+ for_each_present_cpu(cpu) {
+ uint32_t seed2 = isaac_next_uint32(&prng_context[0]);
+ if (cpu == 0)
+ continue;
+
+ isaac_init(&prng_context[cpu], (unsigned char *) &seed2, sizeof(seed2));
+ smp_boot_secondary(cpu, do_increment);
+ }
+
+ do_increment();
+
+ while (!cpumask_full(&smp_test_complete))
+ cpu_relax();
+
+ /* All CPUs done, do we add up */
+ for_each_present_cpu(cpu) {
+ sum += per_cpu_value[cpu];
+ }
+ report("total incs %d", sum == shared_value, shared_value);
+
+ return report_summary();
+}
diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
index de2c4a3..4c5abb6 100644
--- a/config/config-arm-common.mak
+++ b/config/config-arm-common.mak
@@ -12,6 +12,7 @@ endif
tests-common = $(TEST_DIR)/selftest.flat
tests-common += $(TEST_DIR)/spinlock-test.flat
tests-common += $(TEST_DIR)/tlbflush-test.flat
+tests-common += $(TEST_DIR)/barrier-test.flat
ifneq ($(TEST),)
tests = $(TEST_DIR)/$(TEST).flat
@@ -78,3 +79,4 @@ $(TEST_DIR)/$(TEST).elf: $(cstart.o) $(TEST_DIR)/$(TEST).o
$(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
$(TEST_DIR)/spinlock-test.elf: $(cstart.o) $(TEST_DIR)/spinlock-test.o
$(TEST_DIR)/tlbflush-test.elf: $(cstart.o) $(TEST_DIR)/tlbflush-test.o
+$(TEST_DIR)/barrier-test.elf: $(cstart.o) $(TEST_DIR)/barrier-test.o
--
2.5.0
next prev parent reply other threads:[~2015-08-03 16:06 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-31 15:53 [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Alex Bennée
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 01/11] arm/run: set indentation defaults for emacs Alex Bennée
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 02/11] README: add some CONTRIBUTING notes Alex Bennée
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 03/11] configure: emit HOST=$host to config.mak Alex Bennée
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 04/11] arm/run: introduce usingkvm var and use it Alex Bennée
2015-08-02 16:36 ` Andrew Jones
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 05/11] lib/printf: support the %u unsigned fmt field Alex Bennée
2015-07-31 18:25 ` Andrew Jones
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 06/11] lib/arm: add flush_tlb_page mmu function Alex Bennée
2015-07-31 18:35 ` Andrew Jones
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 07/11] new arm/tlbflush-test: TLB torture test Alex Bennée
2015-07-31 18:51 ` Andrew Jones
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 08/11] arm/unittests.cfg: add the tlbflush tests Alex Bennée
2015-07-31 18:53 ` Andrew Jones
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 09/11] arm: query /dev/kvm for maximum vcpus Alex Bennée
2015-07-31 19:17 ` Andrew Jones
2015-08-02 16:40 ` Andrew Jones
2015-07-31 15:54 ` [Qemu-devel] [kvm-unit-tests PATCH v5 10/11] new: add isaac prng library from CCAN Alex Bennée
2015-07-31 19:22 ` Andrew Jones
2015-07-31 15:54 ` [Qemu-devel] [kvm-unit-tests PATCH v5 11/11] new: arm/barrier-test for memory barriers Alex Bennée
2015-07-31 19:30 ` Andrew Jones
2015-08-03 10:02 ` alvise rigo
2015-08-03 10:30 ` Alex Bennée
2015-08-03 10:34 ` alvise rigo
2015-08-03 16:06 ` Alex Bennée [this message]
2015-08-03 16:46 ` alvise rigo
2015-08-04 7:30 ` Alex Bennée
2015-08-02 16:44 ` [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Andrew Jones
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mvy8l5kt.fsf@linaro.org \
--to=alex.bennee@linaro.org \
--cc=a.rigo@virtualopensystems.com \
--cc=a.spyridakis@virtualopensystems.com \
--cc=claudio.fontana@huawei.com \
--cc=drjones@redhat.com \
--cc=fred.konrad@greensocs.com \
--cc=kvm@vger.kernel.org \
--cc=mark.burton@greensocs.com \
--cc=mttcg@listserver.greensocs.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).