[Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests
@ 2015-07-31 15:53 Alex Bennée
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 01/11] arm/run: set indentation defaults for emacs Alex Bennée
                   ` (11 more replies)
  0 siblings, 12 replies; 28+ messages in thread
From: Alex Bennée @ 2015-07-31 15:53 UTC (permalink / raw)
  To: mttcg, mark.burton, fred.konrad
  Cc: peter.maydell, drjones, kvm, a.spyridakis, claudio.fontana,
	a.rigo, qemu-devel, Alex Bennée

Hi,

This is the current state of my MTTCG tests based on the KVM's unit
testing framework. The earlier patches in the series have already been
reviewed and will (with the exception of the emacs patch) be making
their way upstream.

There are a couple of addition to library functions:
  - printf %u suppport
  - flush_tlb_page for arm and arm64
  - a generic prng from CCAN

The two actual tests are:
  - tlbflush-test
  - barrier-test

The latter barrier test hangs the current -v6 MTTCG patch set in both
"excl" and "acqrel" modes and will make a good torture test for
Alvise's atomic patch set. I suspect the load/store ordering issues
will show up better once tested on a weak-ordered backend. I'm open to
suggestions for other tests worth adding to show up the issues.

The github tree can be found at:

https://github.com/stsquad/kvm-unit-tests/tree/current-mttcg-tests


Alex Bennée (11):
  arm/run: set indentation defaults for emacs
  README: add some CONTRIBUTING notes
  configure: emit HOST=$host to config.mak
  arm/run: introduce usingkvm var and use it
  lib/printf: support the %u unsigned fmt field
  lib/arm: add flush_tlb_page mmu function
  new arm/tlbflush-test: TLB torture test
  arm/unittests.cfg: add the tlbflush tests
  arm: query /dev/kvm for maximum vcpus
  new: add isaac prng library from CCAN
  new: arm/barrier-test for memory barriers

 README                       |  26 ++++++
 arm/barrier-test.c           | 206 +++++++++++++++++++++++++++++++++++++++++++
 arm/run                      |  19 +++-
 arm/tlbflush-test.c          | 194 ++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg            |  26 +++++-
 arm/utils/kvm-query.c        |  41 +++++++++
 config/config-arm-common.mak |  18 +++-
 configure                    |   2 +
 lib/arm/asm/mmu.h            |  11 +++
 lib/arm64/asm/mmu.h          |   8 ++
 lib/printf.c                 |  13 +++
 lib/prng.c                   | 162 ++++++++++++++++++++++++++++++++++
 lib/prng.h                   |  82 +++++++++++++++++
 13 files changed, 801 insertions(+), 7 deletions(-)
 create mode 100644 arm/barrier-test.c
 create mode 100644 arm/tlbflush-test.c
 create mode 100644 arm/utils/kvm-query.c
 create mode 100644 lib/prng.c
 create mode 100644 lib/prng.h

-- 
2.5.0

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v5 01/11] arm/run: set indentation defaults for emacs
  2015-07-31 15:53 [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Alex Bennée
@ 2015-07-31 15:53 ` Alex Bennée
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 02/11] README: add some CONTRIBUTING notes Alex Bennée
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Alex Bennée @ 2015-07-31 15:53 UTC (permalink / raw)
  To: mttcg, mark.burton, fred.konrad
  Cc: peter.maydell, drjones, kvm, a.spyridakis, claudio.fontana,
	a.rigo, qemu-devel, Alex Bennée

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 arm/run | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arm/run b/arm/run
index 662a856..6b42a2e 100755
--- a/arm/run
+++ b/arm/run
@@ -1,4 +1,5 @@
 #!/bin/bash
+# -*- sh-basic-offset:8 indent-tabs-mode: t -*-
 
 if [ ! -f config.mak ]; then
 	echo run ./configure first. See ./configure -h
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v5 02/11] README: add some CONTRIBUTING notes
  2015-07-31 15:53 [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Alex Bennée
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 01/11] arm/run: set indentation defaults for emacs Alex Bennée
@ 2015-07-31 15:53 ` Alex Bennée
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 03/11] configure: emit HOST=$host to config.mak Alex Bennée
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Alex Bennée @ 2015-07-31 15:53 UTC (permalink / raw)
  To: mttcg, mark.burton, fred.konrad
  Cc: peter.maydell, drjones, kvm, a.spyridakis, claudio.fontana,
	a.rigo, qemu-devel, Alex Bennée

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>

---
v2
  - mention consistency
v3
  - add r-b tag
---
 README | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/README b/README
index e9869d1..9389a26 100644
--- a/README
+++ b/README
@@ -25,3 +25,29 @@ Directory structure:
 ./<ARCH>:	the sources of the tests and the created objects/images
 
 See <ARCH>/README for architecture specific documentation.
+
+CONTRIBUTING:
+=============
+
+Style
+-----
+
+Currently there is a mix of indentation styles so any changes to
+existing files should be consistent with the existing style. For new
+files:
+
+  - C: please use standard linux-with-tabs
+  - Shell: use TABs for indentation
+
+Patches
+-------
+
+Patches are welcome at the KVM mailing list <kvm@vger.kernel.org>.
+
+Please prefix messages with: [kvm-unit-tests PATCH]
+
+You can add the following to .git/config to do this automatically for you:
+
+[format]
+	subjectprefix = kvm-unit-tests PATCH
+
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v5 03/11] configure: emit HOST=$host to config.mak
  2015-07-31 15:53 [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Alex Bennée
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 01/11] arm/run: set indentation defaults for emacs Alex Bennée
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 02/11] README: add some CONTRIBUTING notes Alex Bennée
@ 2015-07-31 15:53 ` Alex Bennée
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 04/11] arm/run: introduce usingkvm var and use it Alex Bennée
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Alex Bennée @ 2015-07-31 15:53 UTC (permalink / raw)
  To: mttcg, mark.burton, fred.konrad
  Cc: peter.maydell, drjones, kvm, a.spyridakis, claudio.fontana,
	a.rigo, qemu-devel, Alex Bennée

This is useful information for the run scripts to know, especially if
they want to drop to using TCG.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>

---
v3
  - add r-b tag
---
 configure | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/configure b/configure
index b2ad32a..078b70c 100755
--- a/configure
+++ b/configure
@@ -7,6 +7,7 @@ ld=ld
 objcopy=objcopy
 ar=ar
 arch=`uname -m | sed -e s/i.86/i386/ | sed -e 's/arm.*/arm/'`
+host=$arch
 cross_prefix=
 
 usage() {
@@ -122,6 +123,7 @@ ln -s $asm lib/asm
 cat <<EOF > config.mak
 PREFIX=$prefix
 KERNELDIR=$(readlink -f $kerneldir)
+HOST=$host
 ARCH=$arch
 ARCH_NAME=$arch_name
 PROCESSOR=$processor
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v5 04/11] arm/run: introduce usingkvm var and use it
  2015-07-31 15:53 [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Alex Bennée
                   ` (2 preceding siblings ...)
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 03/11] configure: emit HOST=$host to config.mak Alex Bennée
@ 2015-07-31 15:53 ` Alex Bennée
  2015-08-02 16:36   ` Andrew Jones
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 05/11] lib/printf: support the %u unsigned fmt field Alex Bennée
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Alex Bennée @ 2015-07-31 15:53 UTC (permalink / raw)
  To: mttcg, mark.burton, fred.konrad
  Cc: peter.maydell, drjones, kvm, a.spyridakis, claudio.fontana,
	a.rigo, qemu-devel, Alex Bennée

This makes the script a little cleaner by only checking for KVM support
in one place. If KVM isn't available we can fall back to TCG emulation
and echo the fact to the screen rather than let QEMU complain.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>

---
v2
  - rm redundant M= statement
v3
  - make usingkvm use "yes"
  - merge patches 3/4 into one
v4
  - use single quotes consistently
  - add r-b tag
---
 arm/run | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arm/run b/arm/run
index 6b42a2e..6b3d558 100755
--- a/arm/run
+++ b/arm/run
@@ -8,6 +8,15 @@ fi
 source config.mak
 processor="$PROCESSOR"
 
+# Default to using KVM if available and on the right ARM host
+if [ -c /dev/kvm ]; then
+	if [ "$HOST" = "arm" ] && [ "$ARCH" = "arm" ]; then
+		usingkvm=yes
+	elif [ "$HOST" = "aarch64" ]; then
+		usingkvm=yes
+	fi
+fi
+
 qemu="${QEMU:-qemu-system-$ARCH_NAME}"
 qpath=$(which $qemu 2>/dev/null)
 
@@ -22,6 +31,12 @@ if ! $qemu -machine '?' 2>&1 | grep 'ARM Virtual Machine' > /dev/null; then
 fi
 
 M='-machine virt'
+if [ "$usingkvm" = "yes" ]; then
+	M+=',accel=kvm'
+else
+	echo "Running with TCG"
+	M+=',accel=tcg'
+fi
 
 if ! $qemu $M -device '?' 2>&1 | grep virtconsole > /dev/null; then
 	echo "$qpath doesn't support virtio-console for chr-testdev. Exiting."
@@ -34,12 +49,11 @@ if $qemu $M -chardev testdev,id=id -initrd . 2>&1 \
 	exit 2
 fi
 
-M='-machine virt,accel=kvm:tcg'
 chr_testdev='-device virtio-serial-device'
 chr_testdev+=' -device virtconsole,chardev=ctd -chardev testdev,id=ctd'
 
 # arm64 must use '-cpu host' with kvm
-if [ "$(arch)" = "aarch64" ] && [ "$ARCH" = "arm64" ] && [ -c /dev/kvm ]; then
+if [ "$usingkvm" = "yes" ] && [ "$ARCH" = "arm64" ]; then
 	processor="host"
 fi
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v5 04/11] arm/run: introduce usingkvm var and use it
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 04/11] arm/run: introduce usingkvm var and use it Alex Bennée
@ 2015-08-02 16:36   ` Andrew Jones
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Jones @ 2015-08-02 16:36 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, claudio.fontana, kvm, a.spyridakis,
	mark.burton, a.rigo, qemu-devel, fred.konrad

On Fri, Jul 31, 2015 at 04:53:54PM +0100, Alex Bennée wrote:
> This makes the script a little cleaner by only checking for KVM support
> in one place. If KVM isn't available we can fall back to TCG emulation
> and echo the fact to the screen rather than let QEMU complain.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Andrew Jones <drjones@redhat.com>

To work with mkstandalone (which is in upstream's next branch), we need
some additions to this patch. See

https://github.com/rhdrjones/kvm-unit-tests/commit/be290c1d49c72dd100ab066a11c4ef6fa9017a1c

Thanks,
drew

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v5 05/11] lib/printf: support the %u unsigned fmt field
  2015-07-31 15:53 [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Alex Bennée
                   ` (3 preceding siblings ...)
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 04/11] arm/run: introduce usingkvm var and use it Alex Bennée
@ 2015-07-31 15:53 ` Alex Bennée
  2015-07-31 18:25   ` Andrew Jones
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 06/11] lib/arm: add flush_tlb_page mmu function Alex Bennée
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Alex Bennée @ 2015-07-31 15:53 UTC (permalink / raw)
  To: mttcg, mark.burton, fred.konrad
  Cc: peter.maydell, drjones, kvm, a.spyridakis, claudio.fontana,
	a.rigo, qemu-devel, Alex Bennée

---
 lib/printf.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/lib/printf.c b/lib/printf.c
index 89308fb..5d83605 100644
--- a/lib/printf.c
+++ b/lib/printf.c
@@ -180,6 +180,19 @@ int vsnprintf(char *buf, int size, const char *fmt, va_list va)
 		break;
 	    }
 	    break;
+	case 'u':
+	    switch (nlong) {
+	    case 0:
+		print_unsigned(&s, va_arg(va, unsigned), 10, props);
+		break;
+	    case 1:
+		print_unsigned(&s, va_arg(va, unsigned long), 10, props);
+		break;
+	    default:
+		print_unsigned(&s, va_arg(va, unsigned long long), 10, props);
+		break;
+	    }
+	    break;
 	case 'x':
 	    switch (nlong) {
 	    case 0:
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v5 05/11] lib/printf: support the %u unsigned fmt field
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 05/11] lib/printf: support the %u unsigned fmt field Alex Bennée
@ 2015-07-31 18:25   ` Andrew Jones
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Jones @ 2015-07-31 18:25 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, claudio.fontana, kvm, a.spyridakis,
	mark.burton, a.rigo, qemu-devel, fred.konrad

On Fri, Jul 31, 2015 at 04:53:55PM +0100, Alex Bennée wrote:
> ---
>  lib/printf.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)

missing sign-off

> 
> diff --git a/lib/printf.c b/lib/printf.c
> index 89308fb..5d83605 100644
> --- a/lib/printf.c
> +++ b/lib/printf.c
> @@ -180,6 +180,19 @@ int vsnprintf(char *buf, int size, const char *fmt, va_list va)
>  		break;
>  	    }
>  	    break;
> +	case 'u':
> +	    switch (nlong) {
> +	    case 0:
> +		print_unsigned(&s, va_arg(va, unsigned), 10, props);
> +		break;
> +	    case 1:
> +		print_unsigned(&s, va_arg(va, unsigned long), 10, props);
> +		break;
> +	    default:
> +		print_unsigned(&s, va_arg(va, unsigned long long), 10, props);
> +		break;
> +	    }
> +	    break;
>  	case 'x':
>  	    switch (nlong) {
>  	    case 0:
> -- 
> 2.5.0

Otherwise
Reviewed-by: Andrew Jones <drjones@redhat.com>
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v5 06/11] lib/arm: add flush_tlb_page mmu function
  2015-07-31 15:53 [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Alex Bennée
                   ` (4 preceding siblings ...)
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 05/11] lib/printf: support the %u unsigned fmt field Alex Bennée
@ 2015-07-31 15:53 ` Alex Bennée
  2015-07-31 18:35   ` Andrew Jones
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 07/11] new arm/tlbflush-test: TLB torture test Alex Bennée
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Alex Bennée @ 2015-07-31 15:53 UTC (permalink / raw)
  To: mttcg, mark.burton, fred.konrad
  Cc: peter.maydell, drjones, kvm, a.spyridakis, claudio.fontana,
	a.rigo, qemu-devel, Alex Bennée

This introduces a new flush_tlb_page function which does exactly what
you expect. It's going to be useful for the future TLB torture test.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 lib/arm/asm/mmu.h   | 11 +++++++++++
 lib/arm64/asm/mmu.h |  8 ++++++++
 2 files changed, 19 insertions(+)

diff --git a/lib/arm/asm/mmu.h b/lib/arm/asm/mmu.h
index c1bd01c..2bb0cde 100644
--- a/lib/arm/asm/mmu.h
+++ b/lib/arm/asm/mmu.h
@@ -14,8 +14,11 @@
 #define PTE_AF			PTE_EXT_AF
 #define PTE_WBWA		L_PTE_MT_WRITEALLOC
 
+/* See B3.18.7 TLB maintenance operations */
+
 static inline void local_flush_tlb_all(void)
 {
+	/* TLBIALL */
 	asm volatile("mcr p15, 0, %0, c8, c7, 0" :: "r" (0));
 	dsb();
 	isb();
@@ -27,6 +30,14 @@ static inline void flush_tlb_all(void)
 	local_flush_tlb_all();
 }
 
+static inline void flush_tlb_page(unsigned long vaddr)
+{
+	/* TLBIMVAA */
+	asm volatile("mcr p15, 0, %0, c8, c7, 3" :: "r" (vaddr));
+	dsb();
+	isb();
+}
+
 #include <asm/mmu-api.h>
 
 #endif /* __ASMARM_MMU_H_ */
diff --git a/lib/arm64/asm/mmu.h b/lib/arm64/asm/mmu.h
index 18b4d6b..3bc31c9 100644
--- a/lib/arm64/asm/mmu.h
+++ b/lib/arm64/asm/mmu.h
@@ -19,6 +19,14 @@ static inline void flush_tlb_all(void)
 	isb();
 }
 
+static inline void flush_tlb_page(unsigned long vaddr)
+{
+	unsigned long page = vaddr >> 12;
+	dsb(ishst);
+	asm("tlbi	vaae1is, %0" :: "r" (page));
+	dsb(ish);
+}
+
 #include <asm/mmu-api.h>
 
 #endif /* __ASMARM64_MMU_H_ */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v5 06/11] lib/arm: add flush_tlb_page mmu function
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 06/11] lib/arm: add flush_tlb_page mmu function Alex Bennée
@ 2015-07-31 18:35   ` Andrew Jones
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Jones @ 2015-07-31 18:35 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, claudio.fontana, kvm, a.spyridakis,
	mark.burton, a.rigo, qemu-devel, fred.konrad

On Fri, Jul 31, 2015 at 04:53:56PM +0100, Alex Bennée wrote:
> This introduces a new flush_tlb_page function which does exactly what
> you expect. It's going to be useful for the future TLB torture test.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  lib/arm/asm/mmu.h   | 11 +++++++++++
>  lib/arm64/asm/mmu.h |  8 ++++++++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/lib/arm/asm/mmu.h b/lib/arm/asm/mmu.h
> index c1bd01c..2bb0cde 100644
> --- a/lib/arm/asm/mmu.h
> +++ b/lib/arm/asm/mmu.h
> @@ -14,8 +14,11 @@
>  #define PTE_AF			PTE_EXT_AF
>  #define PTE_WBWA		L_PTE_MT_WRITEALLOC
>  
> +/* See B3.18.7 TLB maintenance operations */
> +
>  static inline void local_flush_tlb_all(void)
>  {
> +	/* TLBIALL */
>  	asm volatile("mcr p15, 0, %0, c8, c7, 0" :: "r" (0));
>  	dsb();
>  	isb();
> @@ -27,6 +30,14 @@ static inline void flush_tlb_all(void)
>  	local_flush_tlb_all();
>  }
>  
> +static inline void flush_tlb_page(unsigned long vaddr)
> +{
> +	/* TLBIMVAA */
> +	asm volatile("mcr p15, 0, %0, c8, c7, 3" :: "r" (vaddr));
> +	dsb();
> +	isb();
> +}
> +
>  #include <asm/mmu-api.h>
>  
>  #endif /* __ASMARM_MMU_H_ */
> diff --git a/lib/arm64/asm/mmu.h b/lib/arm64/asm/mmu.h
> index 18b4d6b..3bc31c9 100644
> --- a/lib/arm64/asm/mmu.h
> +++ b/lib/arm64/asm/mmu.h
> @@ -19,6 +19,14 @@ static inline void flush_tlb_all(void)
>  	isb();
>  }
>  
> +static inline void flush_tlb_page(unsigned long vaddr)
> +{
> +	unsigned long page = vaddr >> 12;
> +	dsb(ishst);
> +	asm("tlbi	vaae1is, %0" :: "r" (page));
> +	dsb(ish);
> +}
> +
>  #include <asm/mmu-api.h>
>  
>  #endif /* __ASMARM64_MMU_H_ */
> -- 
> 2.5.0
> 
>

Reviewed-by: Andrew Jones <drjones@redhat.com> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v5 07/11] new arm/tlbflush-test: TLB torture test
  2015-07-31 15:53 [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Alex Bennée
                   ` (5 preceding siblings ...)
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 06/11] lib/arm: add flush_tlb_page mmu function Alex Bennée
@ 2015-07-31 15:53 ` Alex Bennée
  2015-07-31 18:51   ` Andrew Jones
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 08/11] arm/unittests.cfg: add the tlbflush tests Alex Bennée
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Alex Bennée @ 2015-07-31 15:53 UTC (permalink / raw)
  To: mttcg, mark.burton, fred.konrad
  Cc: peter.maydell, drjones, kvm, a.spyridakis, claudio.fontana,
	a.rigo, qemu-devel, Alex Bennée

This adds a fairly brain dead torture test for TLB flushes intended for
stressing the MTTCG QEMU build. It takes the usual -smp option for
multiple CPUs.

By default it CPU0 will do a TLBIALL flush after each cycle. You can
pass options via -append to control additional aspects of the test:

  - "page" flush each page in turn (one per function)
  - "self" do the flush after each computation cycle
  - "verbose" report progress on each computation cycle

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v2
  - rename to tlbflush-test
  - made makefile changes cleaner
  - added self/other flush mode
  - create specific prefix
  - whitespace fixes
---
 arm/tlbflush-test.c          | 194 +++++++++++++++++++++++++++++++++++++++++++
 config/config-arm-common.mak |   7 +-
 2 files changed, 198 insertions(+), 3 deletions(-)
 create mode 100644 arm/tlbflush-test.c

diff --git a/arm/tlbflush-test.c b/arm/tlbflush-test.c
new file mode 100644
index 0000000..0375ad9
--- /dev/null
+++ b/arm/tlbflush-test.c
@@ -0,0 +1,194 @@
+#include <libcflat.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+
+#define SEQ_LENGTH 10
+#define SEQ_HASH 0x7cd707fe
+
+static cpumask_t smp_test_complete;
+static int flush_count = 1000000;
+static int flush_self = 0;
+static int flush_page = 0;
+static int flush_verbose = 0;
+
+/* Work functions
+ *
+ * These work functions need to be:
+ *
+ *  - page aligned, so we can flush one function at a time
+ *  - have branches, so QEMU TCG generates multiple basic blocks
+ *  - call across pages, so we exercise the TCG basic block slow path
+ */
+
+/* Adler32 */
+__attribute__((aligned(PAGE_SIZE))) uint32_t hash_array(const void *buf,
+							size_t buflen)
+{
+	const uint8_t *data = (uint8_t *) buf;
+	uint32_t s1 = 1;
+	uint32_t s2 = 0;
+
+	for (size_t n = 0; n < buflen; n++) {
+		s1 = (s1 + data[n]) % 65521;
+		s2 = (s2 + s1) % 65521;
+	}
+	return (s2 << 16) | s1;
+}
+
+__attribute__((aligned(PAGE_SIZE))) void create_fib_sequence(int length,
+							unsigned int *array)
+{
+	int i;
+
+	/* first two values */
+	array[0] = 0;
+	array[1] = 1;
+	for (i=2; i<length; i++) {
+		array[i] = array[i-2] + array[i-1];
+	}
+}
+
+__attribute__((aligned(PAGE_SIZE))) unsigned long long factorial(unsigned int n)
+{
+	unsigned int i;
+	unsigned long long fac = 1;
+	for (i=1; i<=n; i++)
+	{
+		fac = fac * i;
+	}
+	return fac;
+}
+
+__attribute__((aligned(PAGE_SIZE))) void factorial_array
+(unsigned int n, unsigned int *input, unsigned long long *output)
+{
+	unsigned int i;
+	for (i=0; i<n; i++) {
+		output[i] = factorial(input[i]);
+	}
+}
+
+__attribute__((aligned(PAGE_SIZE))) unsigned int do_computation(void)
+{
+	unsigned int fib_array[SEQ_LENGTH];
+	unsigned long long facfib_array[SEQ_LENGTH];
+	uint32_t fib_hash, facfib_hash;
+
+	create_fib_sequence(SEQ_LENGTH, &fib_array[0]);
+	fib_hash = hash_array(&fib_array[0], sizeof(fib_array));
+	factorial_array(SEQ_LENGTH, &fib_array[0], &facfib_array[0]);
+	facfib_hash = hash_array(&facfib_array[0], sizeof(facfib_array));
+
+	return (fib_hash ^ facfib_hash);
+}
+
+/* This provides a table of the work functions so we can flush each
+ * page individually
+ */
+static void * pages[] = {&hash_array, &create_fib_sequence, &factorial,
+			 &factorial_array, &do_computation};
+
+static void do_flush(int i)
+{
+	if (flush_page) {
+		flush_tlb_page((unsigned long)pages[i % ARRAY_SIZE(pages)]);
+	} else {
+		flush_tlb_all();
+	}
+}
+
+
+static void just_compute(void)
+{
+	int i, errors = 0;
+	int cpu = smp_processor_id();
+
+	uint32_t result;
+
+	printf("CPU%d online\n", cpu);
+
+	for (i=0; i < flush_count; i++) {
+		result = do_computation();
+
+		if (result != SEQ_HASH) {
+			errors++;
+			printf("CPU%d: seq%d 0x%x!=0x%x\n",
+				cpu, i, result, SEQ_HASH);
+		}
+
+		if (flush_verbose && (i % 1000) == 0) {
+			printf("CPU%d: seq%d\n", cpu, i);
+		}
+
+		if (flush_self) {
+			do_flush(i);
+		}
+	}
+
+	report("CPU%d: Done - Errors: %d\n", errors == 0, cpu, errors);
+
+	cpumask_set_cpu(cpu, &smp_test_complete);
+	if (cpu != 0)
+		halt();
+}
+
+static void just_flush(void)
+{
+	int cpu = smp_processor_id();
+	int i = 0;
+
+	/* set our CPU as done, keep flushing until everyone else
+	   finished */
+	cpumask_set_cpu(cpu, &smp_test_complete);
+
+	while (!cpumask_full(&smp_test_complete)) {
+		do_flush(i++);
+	}
+
+	report("CPU%d: Done - Triggered %d flushes\n", true, cpu, i);
+}
+
+int main(int argc, char **argv)
+{
+	int cpu, i;
+	char prefix[100];
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+
+		if (strcmp(arg, "page") == 0) {
+			flush_page = 1;
+                }
+
+                if (strcmp(arg, "self") == 0) {
+			flush_self = 1;
+                }
+
+		if (strcmp(arg, "verbose") == 0) {
+			flush_verbose = 1;
+                }
+	}
+
+	snprintf(prefix, sizeof(prefix), "tlbflush_%s_%s",
+		flush_page?"page":"all",
+		flush_self?"self":"other");
+	report_prefix_push(prefix);
+
+	for_each_present_cpu(cpu) {
+		if (cpu == 0)
+			continue;
+		smp_boot_secondary(cpu, just_compute);
+	}
+
+	if (flush_self)
+		just_compute();
+	else
+		just_flush();
+
+	while (!cpumask_full(&smp_test_complete))
+		cpu_relax();
+
+	return report_summary();
+}
diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
index 0674daa..164199b 100644
--- a/config/config-arm-common.mak
+++ b/config/config-arm-common.mak
@@ -9,9 +9,9 @@ ifeq ($(LOADADDR),)
 	LOADADDR = 0x40000000
 endif
 
-tests-common = \
-	$(TEST_DIR)/selftest.flat \
-	$(TEST_DIR)/spinlock-test.flat
+tests-common = $(TEST_DIR)/selftest.flat
+tests-common += $(TEST_DIR)/spinlock-test.flat
+tests-common += $(TEST_DIR)/tlbflush-test.flat
 
 all: test_cases
 
@@ -72,3 +72,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
 
 $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
 $(TEST_DIR)/spinlock-test.elf: $(cstart.o) $(TEST_DIR)/spinlock-test.o
+$(TEST_DIR)/tlbflush-test.elf: $(cstart.o) $(TEST_DIR)/tlbflush-test.o
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v5 07/11] new arm/tlbflush-test: TLB torture test
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 07/11] new arm/tlbflush-test: TLB torture test Alex Bennée
@ 2015-07-31 18:51   ` Andrew Jones
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Jones @ 2015-07-31 18:51 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, claudio.fontana, kvm, a.spyridakis,
	mark.burton, a.rigo, qemu-devel, fred.konrad

On Fri, Jul 31, 2015 at 04:53:57PM +0100, Alex Bennée wrote:
> This adds a fairly brain dead torture test for TLB flushes intended for
> stressing the MTTCG QEMU build. It takes the usual -smp option for
> multiple CPUs.
> 
> By default it CPU0 will do a TLBIALL flush after each cycle. You can
> pass options via -append to control additional aspects of the test:
> 
>   - "page" flush each page in turn (one per function)
>   - "self" do the flush after each computation cycle
>   - "verbose" report progress on each computation cycle
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> 
> ---
> v2
>   - rename to tlbflush-test
>   - made makefile changes cleaner
>   - added self/other flush mode
>   - create specific prefix
>   - whitespace fixes
> ---
>  arm/tlbflush-test.c          | 194 +++++++++++++++++++++++++++++++++++++++++++
>  config/config-arm-common.mak |   7 +-
>  2 files changed, 198 insertions(+), 3 deletions(-)
>  create mode 100644 arm/tlbflush-test.c
> 
> diff --git a/arm/tlbflush-test.c b/arm/tlbflush-test.c
> new file mode 100644
> index 0000000..0375ad9
> --- /dev/null
> +++ b/arm/tlbflush-test.c

missing GPL header

> @@ -0,0 +1,194 @@
> +#include <libcflat.h>
> +#include <asm/smp.h>
> +#include <asm/cpumask.h>
> +#include <asm/barrier.h>
> +#include <asm/mmu.h>
> +
> +#define SEQ_LENGTH 10
> +#define SEQ_HASH 0x7cd707fe
> +
> +static cpumask_t smp_test_complete;
> +static int flush_count = 1000000;
> +static int flush_self = 0;
> +static int flush_page = 0;
> +static int flush_verbose = 0;

nit: explicit = 0 not necessary and could use type bool

> +
> +/* Work functions
> + *
> + * These work functions need to be:
> + *
> + *  - page aligned, so we can flush one function at a time
> + *  - have branches, so QEMU TCG generates multiple basic blocks
> + *  - call across pages, so we exercise the TCG basic block slow path
> + */

nit: qemu comment style, please use kernel style (also other comments below)

> +
> +/* Adler32 */
> +__attribute__((aligned(PAGE_SIZE))) uint32_t hash_array(const void *buf,
> +							size_t buflen)
> +{
> +	const uint8_t *data = (uint8_t *) buf;
> +	uint32_t s1 = 1;
> +	uint32_t s2 = 0;
> +
> +	for (size_t n = 0; n < buflen; n++) {
> +		s1 = (s1 + data[n]) % 65521;
> +		s2 = (s2 + s1) % 65521;
> +	}
> +	return (s2 << 16) | s1;
> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) void create_fib_sequence(int length,
> +							unsigned int *array)
> +{
> +	int i;
> +
> +	/* first two values */
> +	array[0] = 0;
> +	array[1] = 1;
> +	for (i=2; i<length; i++) {
> +		array[i] = array[i-2] + array[i-1];
> +	}
> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) unsigned long long factorial(unsigned int n)
> +{
> +	unsigned int i;
> +	unsigned long long fac = 1;
> +	for (i=1; i<=n; i++)
> +	{
> +		fac = fac * i;
> +	}
nit: brace style
> +	return fac;
> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) void factorial_array
> +(unsigned int n, unsigned int *input, unsigned long long *output)
> +{
> +	unsigned int i;
> +	for (i=0; i<n; i++) {
> +		output[i] = factorial(input[i]);
> +	}
> +}
> +
> +__attribute__((aligned(PAGE_SIZE))) unsigned int do_computation(void)
> +{
> +	unsigned int fib_array[SEQ_LENGTH];
> +	unsigned long long facfib_array[SEQ_LENGTH];
> +	uint32_t fib_hash, facfib_hash;
> +
> +	create_fib_sequence(SEQ_LENGTH, &fib_array[0]);
> +	fib_hash = hash_array(&fib_array[0], sizeof(fib_array));
> +	factorial_array(SEQ_LENGTH, &fib_array[0], &facfib_array[0]);
> +	facfib_hash = hash_array(&facfib_array[0], sizeof(facfib_array));
> +
> +	return (fib_hash ^ facfib_hash);
> +}

I still find the complex hash distracting. But if you believe it's
necessary for your mttcg test, then please explain why in a comment.

> +
> +/* This provides a table of the work functions so we can flush each
> + * page individually
> + */
> +static void * pages[] = {&hash_array, &create_fib_sequence, &factorial,
> +			 &factorial_array, &do_computation};
> +
> +static void do_flush(int i)
> +{
> +	if (flush_page) {
> +		flush_tlb_page((unsigned long)pages[i % ARRAY_SIZE(pages)]);
> +	} else {
> +		flush_tlb_all();
> +	}
nit: no need for braces
> +}
> +
> +
> +static void just_compute(void)
name it do_compute? (it's not just_compute when flush_self is true)
> +{
> +	int i, errors = 0;
> +	int cpu = smp_processor_id();
> +
> +	uint32_t result;
> +
> +	printf("CPU%d online\n", cpu);
> +
> +	for (i=0; i < flush_count; i++) {
> +		result = do_computation();
> +
> +		if (result != SEQ_HASH) {
> +			errors++;
> +			printf("CPU%d: seq%d 0x%x!=0x%x\n",
> +				cpu, i, result, SEQ_HASH);
> +		}
> +
> +		if (flush_verbose && (i % 1000) == 0) {
> +			printf("CPU%d: seq%d\n", cpu, i);
> +		}
> +
> +		if (flush_self) {
> +			do_flush(i);
> +		}
nit: braces again (I feel like I'm reading qemu code, but with tabs :-)
> +	}
> +
> +	report("CPU%d: Done - Errors: %d\n", errors == 0, cpu, errors);
> +
> +	cpumask_set_cpu(cpu, &smp_test_complete);
> +	if (cpu != 0)
> +		halt();
> +}
> +
> +static void just_flush(void)
> +{
> +	int cpu = smp_processor_id();
> +	int i = 0;
> +
> +	/* set our CPU as done, keep flushing until everyone else
> +	   finished */
> +	cpumask_set_cpu(cpu, &smp_test_complete);
> +
> +	while (!cpumask_full(&smp_test_complete)) {
> +		do_flush(i++);
> +	}
braces
> +
> +	report("CPU%d: Done - Triggered %d flushes\n", true, cpu, i);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	int cpu, i;
> +	char prefix[100];
> +
> +	for (i=0; i<argc; i++) {
> +		char *arg = argv[i];
> +
> +		if (strcmp(arg, "page") == 0) {
> +			flush_page = 1;
> +                }
strange spaces after tab before brace here
> +
> +                if (strcmp(arg, "self") == 0) {
> +			flush_self = 1;
> +                }
what happened to tabs here?
> +
> +		if (strcmp(arg, "verbose") == 0) {
> +			flush_verbose = 1;
> +                }
> +	}
> +
> +	snprintf(prefix, sizeof(prefix), "tlbflush_%s_%s",
> +		flush_page?"page":"all",
> +		flush_self?"self":"other");
> +	report_prefix_push(prefix);
> +
> +	for_each_present_cpu(cpu) {
> +		if (cpu == 0)
> +			continue;
> +		smp_boot_secondary(cpu, just_compute);
> +	}
> +
> +	if (flush_self)
> +		just_compute();
> +	else
> +		just_flush();
> +
> +	while (!cpumask_full(&smp_test_complete))
> +		cpu_relax();
> +
> +	return report_summary();
> +}
> diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
> index 0674daa..164199b 100644
> --- a/config/config-arm-common.mak
> +++ b/config/config-arm-common.mak
> @@ -9,9 +9,9 @@ ifeq ($(LOADADDR),)
>  	LOADADDR = 0x40000000
>  endif
>  
> -tests-common = \
> -	$(TEST_DIR)/selftest.flat \
> -	$(TEST_DIR)/spinlock-test.flat
> +tests-common = $(TEST_DIR)/selftest.flat
> +tests-common += $(TEST_DIR)/spinlock-test.flat
> +tests-common += $(TEST_DIR)/tlbflush-test.flat
>  
>  all: test_cases
>  
> @@ -72,3 +72,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
>  
>  $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
>  $(TEST_DIR)/spinlock-test.elf: $(cstart.o) $(TEST_DIR)/spinlock-test.o
> +$(TEST_DIR)/tlbflush-test.elf: $(cstart.o) $(TEST_DIR)/tlbflush-test.o
> -- 
> 2.5.0
> 
>

Please run $KERNEL/scripts/checkpatch.pl -f on this file like I suggested
before.

Thanks,
drew

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v5 08/11] arm/unittests.cfg: add the tlbflush tests
  2015-07-31 15:53 [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Alex Bennée
                   ` (6 preceding siblings ...)
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 07/11] new arm/tlbflush-test: TLB torture test Alex Bennée
@ 2015-07-31 15:53 ` Alex Bennée
  2015-07-31 18:53   ` Andrew Jones
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 09/11] arm: query /dev/kvm for maximum vcpus Alex Bennée
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Alex Bennée @ 2015-07-31 15:53 UTC (permalink / raw)
  To: mttcg, mark.burton, fred.konrad
  Cc: peter.maydell, drjones, kvm, a.spyridakis, claudio.fontana,
	a.rigo, qemu-devel, Alex Bennée

---
 arm/unittests.cfg | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index ee655b2..19d72ad 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -35,3 +35,27 @@ file = selftest.flat
 smp = $(getconf _NPROCESSORS_CONF)
 extra_params = -append 'smp'
 groups = selftest
+
+# TLB Torture Tests
+[tlbflush::all_other]
+file = tlbflush-test.flat
+smp = $(getconf _NPROCESSORS_CONF)
+groups = tlbflush
+
+[tlbflush::page_other]
+file = tlbflush-test.flat
+smp = $(getconf _NPROCESSORS_CONF)
+extra_params = -append 'page'
+groups = tlbflush
+
+[tlbflush::all_self]
+file = tlbflush-test.flat
+smp = $(getconf _NPROCESSORS_CONF)
+extra_params = -append 'self'
+groups = tlbflush
+
+[tlbflush::page_self]
+file = tlbflush-test.flat
+smp = $(getconf _NPROCESSORS_CONF)
+extra_params = -append 'page self'
+groups = tlbflush
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v5 08/11] arm/unittests.cfg: add the tlbflush tests
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 08/11] arm/unittests.cfg: add the tlbflush tests Alex Bennée
@ 2015-07-31 18:53   ` Andrew Jones
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Jones @ 2015-07-31 18:53 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, claudio.fontana, kvm, a.spyridakis,
	mark.burton, a.rigo, qemu-devel, fred.konrad

On Fri, Jul 31, 2015 at 04:53:58PM +0100, Alex Bennée wrote:
> ---
>  arm/unittests.cfg | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index ee655b2..19d72ad 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -35,3 +35,27 @@ file = selftest.flat
>  smp = $(getconf _NPROCESSORS_CONF)
>  extra_params = -append 'smp'
>  groups = selftest
> +
> +# TLB Torture Tests
> +[tlbflush::all_other]
> +file = tlbflush-test.flat
> +smp = $(getconf _NPROCESSORS_CONF)
> +groups = tlbflush
> +
> +[tlbflush::page_other]
> +file = tlbflush-test.flat
> +smp = $(getconf _NPROCESSORS_CONF)
> +extra_params = -append 'page'
> +groups = tlbflush
> +
> +[tlbflush::all_self]
> +file = tlbflush-test.flat
> +smp = $(getconf _NPROCESSORS_CONF)
> +extra_params = -append 'self'
> +groups = tlbflush
> +
> +[tlbflush::page_self]
> +file = tlbflush-test.flat
> +smp = $(getconf _NPROCESSORS_CONF)
> +extra_params = -append 'page self'
> +groups = tlbflush

Please see f7eafd76270 (in kvm-unit-tests next branch) for how
I've redone some lines in this file to make them more standalone
friendly.

drew

> -- 
> 2.5.0
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v5 09/11] arm: query /dev/kvm for maximum vcpus
  2015-07-31 15:53 [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Alex Bennée
                   ` (7 preceding siblings ...)
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 08/11] arm/unittests.cfg: add the tlbflush tests Alex Bennée
@ 2015-07-31 15:53 ` Alex Bennée
  2015-07-31 19:17   ` Andrew Jones
  2015-07-31 15:54 ` [Qemu-devel] [kvm-unit-tests PATCH v5 10/11] new: add isaac prng library from CCAN Alex Bennée
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Alex Bennée @ 2015-07-31 15:53 UTC (permalink / raw)
  To: mttcg, mark.burton, fred.konrad
  Cc: peter.maydell, drjones, Alex Bennée, kvm, a.spyridakis,
	claudio.fontana, a.rigo, qemu-devel, Alex Bennée

From: Alex Bennée <alex@bennee.com>

The previous $(getconf _NPROCESSORS_CONF) isn't correct as the default
maximum VCPU configuration is 4 on arm64 machines which typically have
more actual cores.

This introduces a simple utility program to query the KVM capabilities
and use the correct maximum number of vcpus.

[FOR DISCUSSION: this fails on TCG which could use _NPROCESSORS_CONF]

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 arm/unittests.cfg            | 10 +++++-----
 arm/utils/kvm-query.c        | 41 +++++++++++++++++++++++++++++++++++++++++
 config/config-arm-common.mak |  8 +++++++-
 3 files changed, 53 insertions(+), 6 deletions(-)
 create mode 100644 arm/utils/kvm-query.c

diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 19d72ad..7a3a32b 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -32,30 +32,30 @@ groups = selftest
 # Test SMP support
 [selftest::smp]
 file = selftest.flat
-smp = $(getconf _NPROCESSORS_CONF)
+smp = $(./arm/utils/kvm-query max_vcpu)
 extra_params = -append 'smp'
 groups = selftest
 
 # TLB Torture Tests
 [tlbflush::all_other]
 file = tlbflush-test.flat
-smp = $(getconf _NPROCESSORS_CONF)
+smp = $(./arm/utils/kvm-query max_vcpu)
 groups = tlbflush
 
 [tlbflush::page_other]
 file = tlbflush-test.flat
-smp = $(getconf _NPROCESSORS_CONF)
+smp = $(./arm/utils/kvm-query max_vcpu)
 extra_params = -append 'page'
 groups = tlbflush
 
 [tlbflush::all_self]
 file = tlbflush-test.flat
-smp = $(getconf _NPROCESSORS_CONF)
+smp = $(./arm/utils/kvm-query max_vcpu)
 extra_params = -append 'self'
 groups = tlbflush
 
 [tlbflush::page_self]
 file = tlbflush-test.flat
-smp = $(getconf _NPROCESSORS_CONF)
+smp = $(./arm/utils/kvm-query max_vcpu)
 extra_params = -append 'page self'
 groups = tlbflush
diff --git a/arm/utils/kvm-query.c b/arm/utils/kvm-query.c
new file mode 100644
index 0000000..4f979b1
--- /dev/null
+++ b/arm/utils/kvm-query.c
@@ -0,0 +1,41 @@
+/*
+ * kvm-query.c
+ *
+ * A simple utility to query KVM capabilities.
+ */
+
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include <string.h>
+#include <stdio.h>
+
+#include <linux/kvm.h>
+
+int get_max_vcpu(void)
+{
+	int fd = open("/dev/kvm", O_RDWR);
+	if (fd>0) {
+		int ret = ioctl(fd, KVM_CHECK_EXTENSION, KVM_CAP_MAX_VCPUS);
+		printf("%d\n", ret > 0 ? ret : 0);
+		close(fd);
+		return 0;
+	} else {
+		return -1;
+	}
+}
+
+int main(int argc, char **argv)
+{
+	for (int i=0; i<argc; i++) {
+		char *arg = argv[i];
+		if (strcmp(arg, "max_vcpu") == 0) {
+			return get_max_vcpu();
+		}
+	}
+
+	return  -1;
+}
diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
index 164199b..9c7607b 100644
--- a/config/config-arm-common.mak
+++ b/config/config-arm-common.mak
@@ -13,7 +13,9 @@ tests-common = $(TEST_DIR)/selftest.flat
 tests-common += $(TEST_DIR)/spinlock-test.flat
 tests-common += $(TEST_DIR)/tlbflush-test.flat
 
-all: test_cases
+utils-common = $(TEST_DIR)/utils/kvm-query
+
+all: test_cases utils
 
 ##################################################################
 phys_base = $(LOADADDR)
@@ -58,6 +60,9 @@ FLATLIBS = $(libcflat) $(LIBFDT_archive) $(libgcc) $(libeabi)
 $(libeabi): $(eabiobjs)
 	$(AR) rcs $@ $^
 
+$(TEST_DIR)/utils/%: $(TEST_DIR)/utils/%.c
+	$(CC) -std=gnu99 -o $@ $^
+
 arm_clean: libfdt_clean asm_offsets_clean
 	$(RM) $(TEST_DIR)/*.{o,flat,elf} $(libeabi) $(eabiobjs) \
 	      $(TEST_DIR)/.*.d lib/arm/.*.d
@@ -69,6 +74,7 @@ tests_and_config = $(TEST_DIR)/*.flat $(TEST_DIR)/unittests.cfg
 generated_files = $(asm-offsets)
 
 test_cases: $(generated_files) $(tests-common) $(tests)
+utils: $(utils-common)
 
 $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
 $(TEST_DIR)/spinlock-test.elf: $(cstart.o) $(TEST_DIR)/spinlock-test.o
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v5 09/11] arm: query /dev/kvm for maximum vcpus
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 09/11] arm: query /dev/kvm for maximum vcpus Alex Bennée
@ 2015-07-31 19:17   ` Andrew Jones
  2015-08-02 16:40     ` Andrew Jones
  0 siblings, 1 reply; 28+ messages in thread
From: Andrew Jones @ 2015-07-31 19:17 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, Alex Bennée, claudio.fontana, kvm,
	a.spyridakis, mark.burton, a.rigo, qemu-devel, fred.konrad

On Fri, Jul 31, 2015 at 04:53:59PM +0100, Alex Bennée wrote:
> From: Alex Bennée <alex@bennee.com>
> 
> The previous $(getconf _NPROCESSORS_CONF) isn't correct as the default
> maximum VCPU configuration is 4 on arm64 machines which typically have
> more actual cores.

The kernel should probably bump that up to 8, but that's an aside,
as it wouldn't resolve the general issue.

For this patch, I don't think we need a C utility as we can probe
with QEMU, and thus just need a script.

  $QEMU -machine virt,accel=kvm -smp 9
  Number of SMP cpus requested (9), exceeds max cpus supported by machine `virt' (8)

The script can live in $root/scripts/, which is a directory recently
created for standalone test support.

Thanks,
drew

> 
> This introduces a simple utility program to query the KVM capabilities
> and use the correct maximum number of vcpus.
> 
> [FOR DISCUSSION: this fails on TCG which could use _NPROCESSORS_CONF]
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  arm/unittests.cfg            | 10 +++++-----
>  arm/utils/kvm-query.c        | 41 +++++++++++++++++++++++++++++++++++++++++
>  config/config-arm-common.mak |  8 +++++++-
>  3 files changed, 53 insertions(+), 6 deletions(-)
>  create mode 100644 arm/utils/kvm-query.c
> 
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index 19d72ad..7a3a32b 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -32,30 +32,30 @@ groups = selftest
>  # Test SMP support
>  [selftest::smp]
>  file = selftest.flat
> -smp = $(getconf _NPROCESSORS_CONF)
> +smp = $(./arm/utils/kvm-query max_vcpu)
>  extra_params = -append 'smp'
>  groups = selftest
>  
>  # TLB Torture Tests
>  [tlbflush::all_other]
>  file = tlbflush-test.flat
> -smp = $(getconf _NPROCESSORS_CONF)
> +smp = $(./arm/utils/kvm-query max_vcpu)
>  groups = tlbflush
>  
>  [tlbflush::page_other]
>  file = tlbflush-test.flat
> -smp = $(getconf _NPROCESSORS_CONF)
> +smp = $(./arm/utils/kvm-query max_vcpu)
>  extra_params = -append 'page'
>  groups = tlbflush
>  
>  [tlbflush::all_self]
>  file = tlbflush-test.flat
> -smp = $(getconf _NPROCESSORS_CONF)
> +smp = $(./arm/utils/kvm-query max_vcpu)
>  extra_params = -append 'self'
>  groups = tlbflush
>  
>  [tlbflush::page_self]
>  file = tlbflush-test.flat
> -smp = $(getconf _NPROCESSORS_CONF)
> +smp = $(./arm/utils/kvm-query max_vcpu)
>  extra_params = -append 'page self'
>  groups = tlbflush
> diff --git a/arm/utils/kvm-query.c b/arm/utils/kvm-query.c
> new file mode 100644
> index 0000000..4f979b1
> --- /dev/null
> +++ b/arm/utils/kvm-query.c
> @@ -0,0 +1,41 @@
> +/*
> + * kvm-query.c
> + *
> + * A simple utility to query KVM capabilities.
> + */
> +
> +#include <sys/ioctl.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +
> +#include <string.h>
> +#include <stdio.h>
> +
> +#include <linux/kvm.h>
> +
> +int get_max_vcpu(void)
> +{
> +	int fd = open("/dev/kvm", O_RDWR);
> +	if (fd>0) {
> +		int ret = ioctl(fd, KVM_CHECK_EXTENSION, KVM_CAP_MAX_VCPUS);
> +		printf("%d\n", ret > 0 ? ret : 0);
> +		close(fd);
> +		return 0;
> +	} else {
> +		return -1;
> +	}
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	for (int i=0; i<argc; i++) {
> +		char *arg = argv[i];
> +		if (strcmp(arg, "max_vcpu") == 0) {
> +			return get_max_vcpu();
> +		}
> +	}
> +
> +	return  -1;
> +}
> diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
> index 164199b..9c7607b 100644
> --- a/config/config-arm-common.mak
> +++ b/config/config-arm-common.mak
> @@ -13,7 +13,9 @@ tests-common = $(TEST_DIR)/selftest.flat
>  tests-common += $(TEST_DIR)/spinlock-test.flat
>  tests-common += $(TEST_DIR)/tlbflush-test.flat
>  
> -all: test_cases
> +utils-common = $(TEST_DIR)/utils/kvm-query
> +
> +all: test_cases utils
>  
>  ##################################################################
>  phys_base = $(LOADADDR)
> @@ -58,6 +60,9 @@ FLATLIBS = $(libcflat) $(LIBFDT_archive) $(libgcc) $(libeabi)
>  $(libeabi): $(eabiobjs)
>  	$(AR) rcs $@ $^
>  
> +$(TEST_DIR)/utils/%: $(TEST_DIR)/utils/%.c
> +	$(CC) -std=gnu99 -o $@ $^
> +
>  arm_clean: libfdt_clean asm_offsets_clean
>  	$(RM) $(TEST_DIR)/*.{o,flat,elf} $(libeabi) $(eabiobjs) \
>  	      $(TEST_DIR)/.*.d lib/arm/.*.d
> @@ -69,6 +74,7 @@ tests_and_config = $(TEST_DIR)/*.flat $(TEST_DIR)/unittests.cfg
>  generated_files = $(asm-offsets)
>  
>  test_cases: $(generated_files) $(tests-common) $(tests)
> +utils: $(utils-common)
>  
>  $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
>  $(TEST_DIR)/spinlock-test.elf: $(cstart.o) $(TEST_DIR)/spinlock-test.o
> -- 
> 2.5.0
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v5 09/11] arm: query /dev/kvm for maximum vcpus
  2015-07-31 19:17   ` Andrew Jones
@ 2015-08-02 16:40     ` Andrew Jones
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Jones @ 2015-08-02 16:40 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, Alex Bennée, claudio.fontana, kvm,
	a.spyridakis, mark.burton, a.rigo, qemu-devel, fred.konrad

On Fri, Jul 31, 2015 at 09:17:12PM +0200, Andrew Jones wrote:
> On Fri, Jul 31, 2015 at 04:53:59PM +0100, Alex Bennée wrote:
> > From: Alex Bennée <alex@bennee.com>
> > 
> > The previous $(getconf _NPROCESSORS_CONF) isn't correct as the default
> > maximum VCPU configuration is 4 on arm64 machines which typically have
> > more actual cores.
> 
> The kernel should probably bump that up to 8, but that's an aside,
> as it wouldn't resolve the general issue.
> 
> For this patch, I don't think we need a C utility as we can probe
> with QEMU, and thus just need a script.
> 
>   $QEMU -machine virt,accel=kvm -smp 9
>   Number of SMP cpus requested (9), exceeds max cpus supported by machine `virt' (8)
> 
> The script can live in $root/scripts/, which is a directory recently
> created for standalone test support.

I decided to take a stab at this and came up with

https://github.com/rhdrjones/kvm-unit-tests/commit/336d87338da86a0e7f2018a1e509b606b9a2baa2

Thanks,
drew

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v5 10/11] new: add isaac prng library from CCAN
  2015-07-31 15:53 [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Alex Bennée
                   ` (8 preceding siblings ...)
  2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 09/11] arm: query /dev/kvm for maximum vcpus Alex Bennée
@ 2015-07-31 15:54 ` Alex Bennée
  2015-07-31 19:22   ` Andrew Jones
  2015-07-31 15:54 ` [Qemu-devel] [kvm-unit-tests PATCH v5 11/11] new: arm/barrier-test for memory barriers Alex Bennée
  2015-08-02 16:44 ` [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Andrew Jones
  11 siblings, 1 reply; 28+ messages in thread
From: Alex Bennée @ 2015-07-31 15:54 UTC (permalink / raw)
  To: mttcg, mark.burton, fred.konrad
  Cc: peter.maydell, drjones, kvm, a.spyridakis, claudio.fontana,
	a.rigo, qemu-devel, Alex Bennée, Timothy B. Terriberry

It's often useful to introduce some sort of random variation when
testing several racing CPU conditions. Instead of each test implementing
some half-arsed PRNG bring in a a decent one which has good statistical
randomness. Obviously it is deterministic for a given seed value which
is likely the behaviour you want.

I've pulled in the ISAAC library from CCAN:

    http://ccodearchive.net/info/isaac.html

I shaved off the float related stuff which is less useful for unit
testing and re-indented to fit the style. The original license was
CC0 (Public Domain) which is compatible with the LGPL v2 of
kvm-unit-tests.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
CC: Timothy B. Terriberry <tterribe@xiph.org>
---
 config/config-arm-common.mak |   1 +
 lib/prng.c                   | 162 +++++++++++++++++++++++++++++++++++++++++++
 lib/prng.h                   |  82 ++++++++++++++++++++++
 3 files changed, 245 insertions(+)
 create mode 100644 lib/prng.c
 create mode 100644 lib/prng.h

diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
index 9c7607b..67a9dda 100644
--- a/config/config-arm-common.mak
+++ b/config/config-arm-common.mak
@@ -34,6 +34,7 @@ cflatobjs += lib/devicetree.o
 cflatobjs += lib/virtio.o
 cflatobjs += lib/virtio-mmio.o
 cflatobjs += lib/chr-testdev.o
+cflatobjs += lib/prng.o
 cflatobjs += lib/arm/io.o
 cflatobjs += lib/arm/setup.o
 cflatobjs += lib/arm/mmu.o
diff --git a/lib/prng.c b/lib/prng.c
new file mode 100644
index 0000000..ebd6df7
--- /dev/null
+++ b/lib/prng.c
@@ -0,0 +1,162 @@
+/*
+ * Pseudo Random Number Generator
+ *
+ * Lifted from ccan modules ilog/isaac under CC0
+ *   - http://ccodearchive.net/info/isaac.html
+ *   - http://ccodearchive.net/info/ilog.html
+ *
+ * And lightly hacked to compile under the KVM unit test environment.
+ * This provides a handy RNG for torture tests that want to vary
+ * delays and the like.
+ *
+ */
+
+/*Written by Timothy B. Terriberry (tterribe@xiph.org) 1999-2009.
+  CC0 (Public domain) - see LICENSE file for details
+  Based on the public domain implementation by Robert J. Jenkins Jr.*/
+
+#include "libcflat.h"
+
+#include <string.h>
+#include "prng.h"
+
+#define ISAAC_MASK        (0xFFFFFFFFU)
+
+/* Extract ISAAC_SZ_LOG bits (starting at bit 2). */
+static inline uint32_t lower_bits(uint32_t x)
+{
+	return (x & ((ISAAC_SZ-1) << 2)) >> 2;
+}
+
+/* Extract next ISAAC_SZ_LOG bits (starting at bit ISAAC_SZ_LOG+2). */
+static inline uint32_t upper_bits(uint32_t y)
+{
+	return (y >> (ISAAC_SZ_LOG+2)) & (ISAAC_SZ-1);
+}
+
+static void isaac_update(isaac_ctx *_ctx){
+	uint32_t *m;
+	uint32_t *r;
+	uint32_t  a;
+	uint32_t  b;
+	uint32_t  x;
+	uint32_t  y;
+	int       i;
+	m=_ctx->m;
+	r=_ctx->r;
+	a=_ctx->a;
+	b=_ctx->b+(++_ctx->c);
+	for(i=0;i<ISAAC_SZ/2;i++){
+		x=m[i];
+		a=(a^a<<13)+m[i+ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a>>6)+m[i+ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a<<2)+m[i+ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a>>16)+m[i+ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+	}
+	for(i=ISAAC_SZ/2;i<ISAAC_SZ;i++){
+		x=m[i];
+		a=(a^a<<13)+m[i-ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a>>6)+m[i-ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a<<2)+m[i-ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+		x=m[++i];
+		a=(a^a>>16)+m[i-ISAAC_SZ/2];
+		m[i]=y=m[lower_bits(x)]+a+b;
+		r[i]=b=m[upper_bits(y)]+x;
+	}
+	_ctx->b=b;
+	_ctx->a=a;
+	_ctx->n=ISAAC_SZ;
+}
+
+static void isaac_mix(uint32_t _x[8]){
+	static const unsigned char SHIFT[8]={11,2,8,16,10,4,8,9};
+	int i;
+	for(i=0;i<8;i++){
+		_x[i]^=_x[(i+1)&7]<<SHIFT[i];
+		_x[(i+3)&7]+=_x[i];
+		_x[(i+1)&7]+=_x[(i+2)&7];
+		i++;
+		_x[i]^=_x[(i+1)&7]>>SHIFT[i];
+		_x[(i+3)&7]+=_x[i];
+		_x[(i+1)&7]+=_x[(i+2)&7];
+	}
+}
+
+
+void isaac_init(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed){
+	_ctx->a=_ctx->b=_ctx->c=0;
+	memset(_ctx->r,0,sizeof(_ctx->r));
+	isaac_reseed(_ctx,_seed,_nseed);
+}
+
+void isaac_reseed(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed){
+	uint32_t *m;
+	uint32_t *r;
+	uint32_t  x[8];
+	int       i;
+	int       j;
+	m=_ctx->m;
+	r=_ctx->r;
+	if(_nseed>ISAAC_SEED_SZ_MAX)_nseed=ISAAC_SEED_SZ_MAX;
+	for(i=0;i<_nseed>>2;i++){
+		r[i]^=(uint32_t)_seed[i<<2|3]<<24|(uint32_t)_seed[i<<2|2]<<16|
+			(uint32_t)_seed[i<<2|1]<<8|_seed[i<<2];
+	}
+	_nseed-=i<<2;
+	if(_nseed>0){
+		uint32_t ri;
+		ri=_seed[i<<2];
+		for(j=1;j<_nseed;j++)ri|=(uint32_t)_seed[i<<2|j]<<(j<<3);
+		r[i++]^=ri;
+	}
+	x[0]=x[1]=x[2]=x[3]=x[4]=x[5]=x[6]=x[7]=0x9E3779B9U;
+	for(i=0;i<4;i++)isaac_mix(x);
+	for(i=0;i<ISAAC_SZ;i+=8){
+		for(j=0;j<8;j++)x[j]+=r[i+j];
+		isaac_mix(x);
+		memcpy(m+i,x,sizeof(x));
+	}
+	for(i=0;i<ISAAC_SZ;i+=8){
+		for(j=0;j<8;j++)x[j]+=m[i+j];
+		isaac_mix(x);
+		memcpy(m+i,x,sizeof(x));
+	}
+	isaac_update(_ctx);
+}
+
+uint32_t isaac_next_uint32(isaac_ctx *_ctx){
+	if(!_ctx->n)isaac_update(_ctx);
+	return _ctx->r[--_ctx->n];
+}
+
+uint32_t isaac_next_uint(isaac_ctx *_ctx,uint32_t _n){
+	uint32_t r;
+	uint32_t v;
+	uint32_t d;
+	do{
+		r=isaac_next_uint32(_ctx);
+		v=r%_n;
+		d=r-v;
+	}
+	while(((d+_n-1)&ISAAC_MASK)<d);
+	return v;
+}
diff --git a/lib/prng.h b/lib/prng.h
new file mode 100644
index 0000000..bf5776d
--- /dev/null
+++ b/lib/prng.h
@@ -0,0 +1,82 @@
+/*
+ * PRNG Header
+ */
+#ifndef __PRNG_H__
+#define __PRNG_H__
+
+# include <stdint.h>
+
+
+
+typedef struct isaac_ctx isaac_ctx;
+
+
+
+/*This value may be lowered to reduce memory usage on embedded platforms, at
+  the cost of reducing security and increasing bias.
+  Quoting Bob Jenkins: "The current best guess is that bias is detectable after
+  2**37 values for [ISAAC_SZ_LOG]=3, 2**45 for 4, 2**53 for 5, 2**61 for 6,
+  2**69 for 7, and 2**77 values for [ISAAC_SZ_LOG]=8."*/
+#define ISAAC_SZ_LOG      (8)
+#define ISAAC_SZ          (1<<ISAAC_SZ_LOG)
+#define ISAAC_SEED_SZ_MAX (ISAAC_SZ<<2)
+
+
+
+/*ISAAC is the most advanced of a series of pseudo-random number generators
+  designed by Robert J. Jenkins Jr. in 1996.
+  http://www.burtleburtle.net/bob/rand/isaac.html
+  To quote:
+  No efficient method is known for deducing their internal states.
+  ISAAC requires an amortized 18.75 instructions to produce a 32-bit value.
+  There are no cycles in ISAAC shorter than 2**40 values.
+  The expected cycle length is 2**8295 values.*/
+struct isaac_ctx{
+	unsigned n;
+	uint32_t r[ISAAC_SZ];
+	uint32_t m[ISAAC_SZ];
+	uint32_t a;
+	uint32_t b;
+	uint32_t c;
+};
+
+
+/**
+ * isaac_init - Initialize an instance of the ISAAC random number generator.
+ * @_ctx:   The instance to initialize.
+ * @_seed:  The specified seed bytes.
+ *          This may be NULL if _nseed is less than or equal to zero.
+ * @_nseed: The number of bytes to use for the seed.
+ *          If this is greater than ISAAC_SEED_SZ_MAX, the extra bytes are
+ *           ignored.
+ */
+void isaac_init(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed);
+
+/**
+ * isaac_reseed - Mix a new batch of entropy into the current state.
+ * To reset ISAAC to a known state, call isaac_init() again instead.
+ * @_ctx:   The instance to reseed.
+ * @_seed:  The specified seed bytes.
+ *          This may be NULL if _nseed is zero.
+ * @_nseed: The number of bytes to use for the seed.
+ *          If this is greater than ISAAC_SEED_SZ_MAX, the extra bytes are
+ *           ignored.
+ */
+void isaac_reseed(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed);
+/**
+ * isaac_next_uint32 - Return the next random 32-bit value.
+ * @_ctx: The ISAAC instance to generate the value with.
+ */
+uint32_t isaac_next_uint32(isaac_ctx *_ctx);
+/**
+ * isaac_next_uint - Uniform random integer less than the given value.
+ * @_ctx: The ISAAC instance to generate the value with.
+ * @_n:   The upper bound on the range of numbers returned (not inclusive).
+ *        This must be greater than zero and less than 2**32.
+ *        To return integers in the full range 0...2**32-1, use
+ *         isaac_next_uint32() instead.
+ * Return: An integer uniformly distributed between 0 and _n-1 (inclusive).
+ */
+uint32_t isaac_next_uint(isaac_ctx *_ctx,uint32_t _n);
+
+#endif
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v5 10/11] new: add isaac prng library from CCAN
  2015-07-31 15:54 ` [Qemu-devel] [kvm-unit-tests PATCH v5 10/11] new: add isaac prng library from CCAN Alex Bennée
@ 2015-07-31 19:22   ` Andrew Jones
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Jones @ 2015-07-31 19:22 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, claudio.fontana, kvm, a.spyridakis,
	mark.burton, a.rigo, qemu-devel, Timothy B. Terriberry,
	fred.konrad

On Fri, Jul 31, 2015 at 04:54:00PM +0100, Alex Bennée wrote:
> It's often useful to introduce some sort of random variation when
> testing several racing CPU conditions. Instead of each test implementing
> some half-arsed PRNG bring in a a decent one which has good statistical
> randomness. Obviously it is deterministic for a given seed value which
> is likely the behaviour you want.
> 
> I've pulled in the ISAAC library from CCAN:
> 
>     http://ccodearchive.net/info/isaac.html
> 
> I shaved off the float related stuff which is less useful for unit
> testing and re-indented to fit the style. The original license was
> CC0 (Public Domain) which is compatible with the LGPL v2 of
> kvm-unit-tests.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> CC: Timothy B. Terriberry <tterribe@xiph.org>
> ---
>  config/config-arm-common.mak |   1 +
>  lib/prng.c                   | 162 +++++++++++++++++++++++++++++++++++++++++++
>  lib/prng.h                   |  82 ++++++++++++++++++++++
>  3 files changed, 245 insertions(+)
>  create mode 100644 lib/prng.c
>  create mode 100644 lib/prng.h
> 
> diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
> index 9c7607b..67a9dda 100644
> --- a/config/config-arm-common.mak
> +++ b/config/config-arm-common.mak
> @@ -34,6 +34,7 @@ cflatobjs += lib/devicetree.o
>  cflatobjs += lib/virtio.o
>  cflatobjs += lib/virtio-mmio.o
>  cflatobjs += lib/chr-testdev.o
> +cflatobjs += lib/prng.o
>  cflatobjs += lib/arm/io.o
>  cflatobjs += lib/arm/setup.o
>  cflatobjs += lib/arm/mmu.o
> diff --git a/lib/prng.c b/lib/prng.c
> new file mode 100644
> index 0000000..ebd6df7
> --- /dev/null
> +++ b/lib/prng.c
> @@ -0,0 +1,162 @@
> +/*
> + * Pseudo Random Number Generator
> + *
> + * Lifted from ccan modules ilog/isaac under CC0
> + *   - http://ccodearchive.net/info/isaac.html
> + *   - http://ccodearchive.net/info/ilog.html
> + *
> + * And lightly hacked to compile under the KVM unit test environment.
> + * This provides a handy RNG for torture tests that want to vary
> + * delays and the like.
> + *
> + */
> +
> +/*Written by Timothy B. Terriberry (tterribe@xiph.org) 1999-2009.
> +  CC0 (Public domain) - see LICENSE file for details
> +  Based on the public domain implementation by Robert J. Jenkins Jr.*/
> +
> +#include "libcflat.h"
> +
> +#include <string.h>
> +#include "prng.h"
> +
> +#define ISAAC_MASK        (0xFFFFFFFFU)
> +
> +/* Extract ISAAC_SZ_LOG bits (starting at bit 2). */
> +static inline uint32_t lower_bits(uint32_t x)
> +{
> +	return (x & ((ISAAC_SZ-1) << 2)) >> 2;
> +}
> +
> +/* Extract next ISAAC_SZ_LOG bits (starting at bit ISAAC_SZ_LOG+2). */
> +static inline uint32_t upper_bits(uint32_t y)
> +{
> +	return (y >> (ISAAC_SZ_LOG+2)) & (ISAAC_SZ-1);
> +}
> +
> +static void isaac_update(isaac_ctx *_ctx){
> +	uint32_t *m;
> +	uint32_t *r;
> +	uint32_t  a;
> +	uint32_t  b;
> +	uint32_t  x;
> +	uint32_t  y;
> +	int       i;
> +	m=_ctx->m;
> +	r=_ctx->r;
> +	a=_ctx->a;
> +	b=_ctx->b+(++_ctx->c);
> +	for(i=0;i<ISAAC_SZ/2;i++){
> +		x=m[i];
> +		a=(a^a<<13)+m[i+ISAAC_SZ/2];
> +		m[i]=y=m[lower_bits(x)]+a+b;
> +		r[i]=b=m[upper_bits(y)]+x;
> +		x=m[++i];
> +		a=(a^a>>6)+m[i+ISAAC_SZ/2];
> +		m[i]=y=m[lower_bits(x)]+a+b;
> +		r[i]=b=m[upper_bits(y)]+x;
> +		x=m[++i];
> +		a=(a^a<<2)+m[i+ISAAC_SZ/2];
> +		m[i]=y=m[lower_bits(x)]+a+b;
> +		r[i]=b=m[upper_bits(y)]+x;
> +		x=m[++i];
> +		a=(a^a>>16)+m[i+ISAAC_SZ/2];
> +		m[i]=y=m[lower_bits(x)]+a+b;
> +		r[i]=b=m[upper_bits(y)]+x;
> +	}
> +	for(i=ISAAC_SZ/2;i<ISAAC_SZ;i++){
> +		x=m[i];
> +		a=(a^a<<13)+m[i-ISAAC_SZ/2];
> +		m[i]=y=m[lower_bits(x)]+a+b;
> +		r[i]=b=m[upper_bits(y)]+x;
> +		x=m[++i];
> +		a=(a^a>>6)+m[i-ISAAC_SZ/2];
> +		m[i]=y=m[lower_bits(x)]+a+b;
> +		r[i]=b=m[upper_bits(y)]+x;
> +		x=m[++i];
> +		a=(a^a<<2)+m[i-ISAAC_SZ/2];
> +		m[i]=y=m[lower_bits(x)]+a+b;
> +		r[i]=b=m[upper_bits(y)]+x;
> +		x=m[++i];
> +		a=(a^a>>16)+m[i-ISAAC_SZ/2];
> +		m[i]=y=m[lower_bits(x)]+a+b;
> +		r[i]=b=m[upper_bits(y)]+x;
> +	}
> +	_ctx->b=b;
> +	_ctx->a=a;
> +	_ctx->n=ISAAC_SZ;
> +}
> +
> +static void isaac_mix(uint32_t _x[8]){
> +	static const unsigned char SHIFT[8]={11,2,8,16,10,4,8,9};
> +	int i;
> +	for(i=0;i<8;i++){
> +		_x[i]^=_x[(i+1)&7]<<SHIFT[i];
> +		_x[(i+3)&7]+=_x[i];
> +		_x[(i+1)&7]+=_x[(i+2)&7];
> +		i++;
> +		_x[i]^=_x[(i+1)&7]>>SHIFT[i];
> +		_x[(i+3)&7]+=_x[i];
> +		_x[(i+1)&7]+=_x[(i+2)&7];
> +	}
> +}
> +
> +
> +void isaac_init(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed){
> +	_ctx->a=_ctx->b=_ctx->c=0;
> +	memset(_ctx->r,0,sizeof(_ctx->r));
> +	isaac_reseed(_ctx,_seed,_nseed);
> +}
> +
> +void isaac_reseed(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed){
> +	uint32_t *m;
> +	uint32_t *r;
> +	uint32_t  x[8];
> +	int       i;
> +	int       j;
> +	m=_ctx->m;
> +	r=_ctx->r;
> +	if(_nseed>ISAAC_SEED_SZ_MAX)_nseed=ISAAC_SEED_SZ_MAX;
> +	for(i=0;i<_nseed>>2;i++){
> +		r[i]^=(uint32_t)_seed[i<<2|3]<<24|(uint32_t)_seed[i<<2|2]<<16|
> +			(uint32_t)_seed[i<<2|1]<<8|_seed[i<<2];
> +	}
> +	_nseed-=i<<2;
> +	if(_nseed>0){
> +		uint32_t ri;
> +		ri=_seed[i<<2];
> +		for(j=1;j<_nseed;j++)ri|=(uint32_t)_seed[i<<2|j]<<(j<<3);
> +		r[i++]^=ri;
> +	}
> +	x[0]=x[1]=x[2]=x[3]=x[4]=x[5]=x[6]=x[7]=0x9E3779B9U;
> +	for(i=0;i<4;i++)isaac_mix(x);
> +	for(i=0;i<ISAAC_SZ;i+=8){
> +		for(j=0;j<8;j++)x[j]+=r[i+j];
> +		isaac_mix(x);
> +		memcpy(m+i,x,sizeof(x));
> +	}
> +	for(i=0;i<ISAAC_SZ;i+=8){
> +		for(j=0;j<8;j++)x[j]+=m[i+j];
> +		isaac_mix(x);
> +		memcpy(m+i,x,sizeof(x));
> +	}
> +	isaac_update(_ctx);
> +}
> +
> +uint32_t isaac_next_uint32(isaac_ctx *_ctx){
> +	if(!_ctx->n)isaac_update(_ctx);
> +	return _ctx->r[--_ctx->n];
> +}
> +
> +uint32_t isaac_next_uint(isaac_ctx *_ctx,uint32_t _n){
> +	uint32_t r;
> +	uint32_t v;
> +	uint32_t d;
> +	do{
> +		r=isaac_next_uint32(_ctx);
> +		v=r%_n;
> +		d=r-v;
> +	}
> +	while(((d+_n-1)&ISAAC_MASK)<d);
> +	return v;
> +}
> diff --git a/lib/prng.h b/lib/prng.h
> new file mode 100644
> index 0000000..bf5776d
> --- /dev/null
> +++ b/lib/prng.h
> @@ -0,0 +1,82 @@
> +/*
> + * PRNG Header
> + */
> +#ifndef __PRNG_H__
> +#define __PRNG_H__
> +
> +# include <stdint.h>
> +
> +
> +
> +typedef struct isaac_ctx isaac_ctx;
> +
> +
> +
> +/*This value may be lowered to reduce memory usage on embedded platforms, at
> +  the cost of reducing security and increasing bias.
> +  Quoting Bob Jenkins: "The current best guess is that bias is detectable after
> +  2**37 values for [ISAAC_SZ_LOG]=3, 2**45 for 4, 2**53 for 5, 2**61 for 6,
> +  2**69 for 7, and 2**77 values for [ISAAC_SZ_LOG]=8."*/
> +#define ISAAC_SZ_LOG      (8)
> +#define ISAAC_SZ          (1<<ISAAC_SZ_LOG)
> +#define ISAAC_SEED_SZ_MAX (ISAAC_SZ<<2)
> +
> +
> +
> +/*ISAAC is the most advanced of a series of pseudo-random number generators
> +  designed by Robert J. Jenkins Jr. in 1996.
> +  http://www.burtleburtle.net/bob/rand/isaac.html
> +  To quote:
> +  No efficient method is known for deducing their internal states.
> +  ISAAC requires an amortized 18.75 instructions to produce a 32-bit value.
> +  There are no cycles in ISAAC shorter than 2**40 values.
> +  The expected cycle length is 2**8295 values.*/
> +struct isaac_ctx{
> +	unsigned n;
> +	uint32_t r[ISAAC_SZ];
> +	uint32_t m[ISAAC_SZ];
> +	uint32_t a;
> +	uint32_t b;
> +	uint32_t c;
> +};
> +
> +
> +/**
> + * isaac_init - Initialize an instance of the ISAAC random number generator.
> + * @_ctx:   The instance to initialize.
> + * @_seed:  The specified seed bytes.
> + *          This may be NULL if _nseed is less than or equal to zero.
> + * @_nseed: The number of bytes to use for the seed.
> + *          If this is greater than ISAAC_SEED_SZ_MAX, the extra bytes are
> + *           ignored.
> + */
> +void isaac_init(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed);
> +
> +/**
> + * isaac_reseed - Mix a new batch of entropy into the current state.
> + * To reset ISAAC to a known state, call isaac_init() again instead.
> + * @_ctx:   The instance to reseed.
> + * @_seed:  The specified seed bytes.
> + *          This may be NULL if _nseed is zero.
> + * @_nseed: The number of bytes to use for the seed.
> + *          If this is greater than ISAAC_SEED_SZ_MAX, the extra bytes are
> + *           ignored.
> + */
> +void isaac_reseed(isaac_ctx *_ctx,const unsigned char *_seed,int _nseed);
> +/**
> + * isaac_next_uint32 - Return the next random 32-bit value.
> + * @_ctx: The ISAAC instance to generate the value with.
> + */
> +uint32_t isaac_next_uint32(isaac_ctx *_ctx);
> +/**
> + * isaac_next_uint - Uniform random integer less than the given value.
> + * @_ctx: The ISAAC instance to generate the value with.
> + * @_n:   The upper bound on the range of numbers returned (not inclusive).
> + *        This must be greater than zero and less than 2**32.
> + *        To return integers in the full range 0...2**32-1, use
> + *         isaac_next_uint32() instead.
> + * Return: An integer uniformly distributed between 0 and _n-1 (inclusive).
> + */
> +uint32_t isaac_next_uint(isaac_ctx *_ctx,uint32_t _n);
> +
> +#endif
> -- 
> 2.5.0
> 
>

Agreed that (pseudo) random numbers could be useful in unit tests.

Acked-by: Andrew Jones <drjones@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [kvm-unit-tests PATCH v5 11/11] new: arm/barrier-test for memory barriers
  2015-07-31 15:53 [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Alex Bennée
                   ` (9 preceding siblings ...)
  2015-07-31 15:54 ` [Qemu-devel] [kvm-unit-tests PATCH v5 10/11] new: add isaac prng library from CCAN Alex Bennée
@ 2015-07-31 15:54 ` Alex Bennée
  2015-07-31 19:30   ` Andrew Jones
  2015-08-03 10:02   ` alvise rigo
  2015-08-02 16:44 ` [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Andrew Jones
  11 siblings, 2 replies; 28+ messages in thread
From: Alex Bennée @ 2015-07-31 15:54 UTC (permalink / raw)
  To: mttcg, mark.burton, fred.konrad
  Cc: peter.maydell, drjones, Alex Bennée, kvm, a.spyridakis,
	claudio.fontana, a.rigo, qemu-devel, Alex Bennée

From: Alex Bennée <alex@bennee.com>

This test has been written mainly to stress multi-threaded TCG behaviour
but will demonstrate failure by default on real hardware. The test takes
the following parameters:

  - "lock" use GCC's locking semantics
  - "excl" use load/store exclusive semantics
  - "acqrel" use acquire/release semantics

Currently excl/acqrel lock up on MTTCG

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 arm/barrier-test.c           | 206 +++++++++++++++++++++++++++++++++++++++++++
 config/config-arm-common.mak |   2 +
 2 files changed, 208 insertions(+)
 create mode 100644 arm/barrier-test.c

diff --git a/arm/barrier-test.c b/arm/barrier-test.c
new file mode 100644
index 0000000..53d690b
--- /dev/null
+++ b/arm/barrier-test.c
@@ -0,0 +1,206 @@
+#include <libcflat.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+
+#include <prng.h>
+
+#define MAX_CPUS 4
+
+/* How many increments to do */
+static int increment_count = 10000000;
+
+
+/* shared value, we use the alignment to ensure the global_lock value
+ * doesn't share a page */
+static unsigned int shared_value;
+
+/* PAGE_SIZE * uint32_t means we span several pages */
+static uint32_t memory_array[PAGE_SIZE];
+
+__attribute__((aligned(PAGE_SIZE))) static unsigned int per_cpu_value[MAX_CPUS];
+__attribute__((aligned(PAGE_SIZE))) static cpumask_t smp_test_complete;
+__attribute__((aligned(PAGE_SIZE))) static int global_lock;
+
+struct isaac_ctx prng_context[MAX_CPUS];
+
+void (*inc_fn)(void);
+
+static void lock(int *lock_var)
+{
+	while (__sync_lock_test_and_set(lock_var, 1));
+}
+static void unlock(int *lock_var)
+{
+	__sync_lock_release(lock_var);
+}
+
+static void increment_shared(void)
+{
+	shared_value++;
+}
+
+static void increment_shared_with_lock(void)
+{
+	lock(&global_lock);
+	shared_value++;
+	unlock(&global_lock);
+}
+
+static void increment_shared_with_excl(void)
+{
+#if defined (__LP64__) || defined (_LP64)
+	asm volatile(
+	"1:	ldxr	w0, [%[sptr]]\n"
+	"	add     w0, w0, #0x1\n"
+	"	stxr	w1, w0, [%[sptr]]\n"
+	"	cbnz	w1, 1b\n"
+	: /* out */
+	: [sptr] "r" (&shared_value) /* in */
+	: "w0", "w1", "cc");
+#else
+	asm volatile(
+	"1:	ldrex	r0, [%[sptr]]\n"
+	"	add     r0, r0, #0x1\n"
+	"	strexeq	r1, r0, [%[sptr]]\n"
+	"	cmpeq	r1, #0\n"
+	"	bne	1b\n"
+	: /* out */
+	: [sptr] "r" (&shared_value) /* in */
+	: "r0", "r1", "cc");
+#endif
+}
+
+static void increment_shared_with_acqrel(void)
+{
+#if defined (__LP64__) || defined (_LP64)
+	asm volatile(
+	"	ldar	w0, [%[sptr]]\n"
+	"	add     w0, w0, #0x1\n"
+	"	str	w0, [%[sptr]]\n"
+	: /* out */
+	: [sptr] "r" (&shared_value) /* in */
+	: "w0");
+#else
+	/* ARMv7 has no acquire/release semantics but we
+	 * can ensure the results of the write are propagated
+	 * with the use of barriers.
+	 */
+	asm volatile(
+	"1:	ldrex	r0, [%[sptr]]\n"
+	"	add     r0, r0, #0x1\n"
+	"	strexeq	r1, r0, [%[sptr]]\n"
+	"	cmpeq	r1, #0\n"
+	"	bne	1b\n"
+	"	dmb\n"
+	: /* out */
+	: [sptr] "r" (&shared_value) /* in */
+	: "r0", "r1", "cc");
+#endif
+
+}
+
+/* The idea of this is just to generate some random load/store
+ * activity which may or may not race with an un-barried incremented
+ * of the shared counter
+ */
+static void shuffle_memory(int cpu)
+{
+	int i;
+	uint32_t lspat = isaac_next_uint32(&prng_context[cpu]);
+	uint32_t seq = isaac_next_uint32(&prng_context[cpu]);
+	int count = seq & 0x1f;
+	uint32_t val=0;
+
+	seq >>= 5;
+
+	for (i=0; i<count; i++) {
+		int index = seq & ~PAGE_MASK;
+		if (lspat & 1) {
+			val ^= memory_array[index];
+		} else {
+			memory_array[index] = val;
+		}
+		seq >>= PAGE_SHIFT;
+		seq ^= lspat;
+		lspat >>= 1;
+	}
+
+}
+
+static void do_increment(void)
+{
+	int i;
+	int cpu = smp_processor_id();
+
+	printf("CPU%d online\n", cpu);
+
+	for (i=0; i < increment_count; i++) {
+		per_cpu_value[cpu]++;
+		inc_fn();
+
+		shuffle_memory(cpu);
+	}
+
+	printf("CPU%d: Done, %d incs\n", cpu, per_cpu_value[cpu]);
+
+	cpumask_set_cpu(cpu, &smp_test_complete);
+	if (cpu != 0)
+		halt();
+}
+
+int main(int argc, char **argv)
+{
+	int cpu;
+	unsigned int i, sum = 0;
+	static const unsigned char seed[] = "myseed";
+
+	inc_fn = &increment_shared;
+
+	isaac_init(&prng_context[0], &seed[0], sizeof(seed));
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+
+		if (strcmp(arg, "lock") == 0) {
+			inc_fn = &increment_shared_with_lock;
+			report_prefix_push("lock");
+		} else if (strcmp(arg, "excl") == 0) {
+			inc_fn = &increment_shared_with_excl;
+			report_prefix_push("excl");
+		} else if (strcmp(arg, "acqrel") == 0) {
+			inc_fn = &increment_shared_with_acqrel;
+			report_prefix_push("acqrel");
+		} else {
+			isaac_reseed(&prng_context[0], (unsigned char *) arg, strlen(arg));
+		}
+	}
+
+	/* fill our random page */
+	for (i=0; i<PAGE_SIZE; i++) {
+		memory_array[i] = isaac_next_uint32(&prng_context[0]);
+	}
+
+	for_each_present_cpu(cpu) {
+		uint32_t seed2 = isaac_next_uint32(&prng_context[0]);
+		if (cpu == 0)
+			continue;
+
+		isaac_init(&prng_context[cpu], (unsigned char *) &seed2, sizeof(seed2));
+		smp_boot_secondary(cpu, do_increment);
+	}
+
+	do_increment();
+
+	while (!cpumask_full(&smp_test_complete))
+		cpu_relax();
+
+	/* All CPUs done, do we add up */
+	for_each_present_cpu(cpu) {
+		sum += per_cpu_value[cpu];
+	}
+	report("total incs %d", sum == shared_value, shared_value);
+
+	return report_summary();
+}
diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
index 67a9dda..af628e6 100644
--- a/config/config-arm-common.mak
+++ b/config/config-arm-common.mak
@@ -12,6 +12,7 @@ endif
 tests-common = $(TEST_DIR)/selftest.flat
 tests-common += $(TEST_DIR)/spinlock-test.flat
 tests-common += $(TEST_DIR)/tlbflush-test.flat
+tests-common += $(TEST_DIR)/barrier-test.flat
 
 utils-common = $(TEST_DIR)/utils/kvm-query
 
@@ -80,3 +81,4 @@ utils: $(utils-common)
 $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
 $(TEST_DIR)/spinlock-test.elf: $(cstart.o) $(TEST_DIR)/spinlock-test.o
 $(TEST_DIR)/tlbflush-test.elf: $(cstart.o) $(TEST_DIR)/tlbflush-test.o
+$(TEST_DIR)/barrier-test.elf: $(cstart.o) $(TEST_DIR)/barrier-test.o
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v5 11/11] new: arm/barrier-test for memory barriers
  2015-07-31 15:54 ` [Qemu-devel] [kvm-unit-tests PATCH v5 11/11] new: arm/barrier-test for memory barriers Alex Bennée
@ 2015-07-31 19:30   ` Andrew Jones
  2015-08-03 10:02   ` alvise rigo
  1 sibling, 0 replies; 28+ messages in thread
From: Andrew Jones @ 2015-07-31 19:30 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, Alex Bennée, claudio.fontana, kvm,
	a.spyridakis, mark.burton, a.rigo, qemu-devel, fred.konrad

On Fri, Jul 31, 2015 at 04:54:01PM +0100, Alex Bennée wrote:
> From: Alex Bennée <alex@bennee.com>
> 
> This test has been written mainly to stress multi-threaded TCG behaviour
> but will demonstrate failure by default on real hardware. The test takes
> the following parameters:
> 
>   - "lock" use GCC's locking semantics
>   - "excl" use load/store exclusive semantics
>   - "acqrel" use acquire/release semantics
> 
> Currently excl/acqrel lock up on MTTCG
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  arm/barrier-test.c           | 206 +++++++++++++++++++++++++++++++++++++++++++
>  config/config-arm-common.mak |   2 +
>  2 files changed, 208 insertions(+)
>  create mode 100644 arm/barrier-test.c
> 
> diff --git a/arm/barrier-test.c b/arm/barrier-test.c
> new file mode 100644
> index 0000000..53d690b
> --- /dev/null
> +++ b/arm/barrier-test.c
> @@ -0,0 +1,206 @@
> +#include <libcflat.h>
> +#include <asm/smp.h>
> +#include <asm/cpumask.h>
> +#include <asm/barrier.h>
> +#include <asm/mmu.h>
> +
> +#include <prng.h>
> +
> +#define MAX_CPUS 4
> +
> +/* How many increments to do */
> +static int increment_count = 10000000;
> +
> +
> +/* shared value, we use the alignment to ensure the global_lock value
> + * doesn't share a page */
> +static unsigned int shared_value;
> +
> +/* PAGE_SIZE * uint32_t means we span several pages */
> +static uint32_t memory_array[PAGE_SIZE];
> +
> +__attribute__((aligned(PAGE_SIZE))) static unsigned int per_cpu_value[MAX_CPUS];
> +__attribute__((aligned(PAGE_SIZE))) static cpumask_t smp_test_complete;
> +__attribute__((aligned(PAGE_SIZE))) static int global_lock;
> +
> +struct isaac_ctx prng_context[MAX_CPUS];
> +
> +void (*inc_fn)(void);
> +
> +static void lock(int *lock_var)
> +{
> +	while (__sync_lock_test_and_set(lock_var, 1));
> +}
> +static void unlock(int *lock_var)
> +{
> +	__sync_lock_release(lock_var);
> +}
> +
> +static void increment_shared(void)
> +{
> +	shared_value++;
> +}
> +
> +static void increment_shared_with_lock(void)
> +{
> +	lock(&global_lock);
> +	shared_value++;
> +	unlock(&global_lock);
> +}
> +
> +static void increment_shared_with_excl(void)
> +{
> +#if defined (__LP64__) || defined (_LP64)
> +	asm volatile(
> +	"1:	ldxr	w0, [%[sptr]]\n"
> +	"	add     w0, w0, #0x1\n"
> +	"	stxr	w1, w0, [%[sptr]]\n"
> +	"	cbnz	w1, 1b\n"
> +	: /* out */
> +	: [sptr] "r" (&shared_value) /* in */
> +	: "w0", "w1", "cc");
> +#else
> +	asm volatile(
> +	"1:	ldrex	r0, [%[sptr]]\n"
> +	"	add     r0, r0, #0x1\n"
> +	"	strexeq	r1, r0, [%[sptr]]\n"
> +	"	cmpeq	r1, #0\n"
> +	"	bne	1b\n"
> +	: /* out */
> +	: [sptr] "r" (&shared_value) /* in */
> +	: "r0", "r1", "cc");
> +#endif
> +}
> +
> +static void increment_shared_with_acqrel(void)
> +{
> +#if defined (__LP64__) || defined (_LP64)
> +	asm volatile(
> +	"	ldar	w0, [%[sptr]]\n"
> +	"	add     w0, w0, #0x1\n"
> +	"	str	w0, [%[sptr]]\n"
> +	: /* out */
> +	: [sptr] "r" (&shared_value) /* in */
> +	: "w0");
> +#else
> +	/* ARMv7 has no acquire/release semantics but we
> +	 * can ensure the results of the write are propagated
> +	 * with the use of barriers.
> +	 */
> +	asm volatile(
> +	"1:	ldrex	r0, [%[sptr]]\n"
> +	"	add     r0, r0, #0x1\n"
> +	"	strexeq	r1, r0, [%[sptr]]\n"
> +	"	cmpeq	r1, #0\n"
> +	"	bne	1b\n"
> +	"	dmb\n"
> +	: /* out */
> +	: [sptr] "r" (&shared_value) /* in */
> +	: "r0", "r1", "cc");
> +#endif
> +
> +}
> +
> +/* The idea of this is just to generate some random load/store
> + * activity which may or may not race with an un-barried incremented
> + * of the shared counter
> + */
> +static void shuffle_memory(int cpu)
> +{
> +	int i;
> +	uint32_t lspat = isaac_next_uint32(&prng_context[cpu]);
> +	uint32_t seq = isaac_next_uint32(&prng_context[cpu]);
> +	int count = seq & 0x1f;
> +	uint32_t val=0;
> +
> +	seq >>= 5;
> +
> +	for (i=0; i<count; i++) {
> +		int index = seq & ~PAGE_MASK;
> +		if (lspat & 1) {
> +			val ^= memory_array[index];
> +		} else {
> +			memory_array[index] = val;
> +		}
> +		seq >>= PAGE_SHIFT;
> +		seq ^= lspat;
> +		lspat >>= 1;
> +	}
> +
> +}
> +
> +static void do_increment(void)
> +{
> +	int i;
> +	int cpu = smp_processor_id();
> +
> +	printf("CPU%d online\n", cpu);
> +
> +	for (i=0; i < increment_count; i++) {
> +		per_cpu_value[cpu]++;
> +		inc_fn();
> +
> +		shuffle_memory(cpu);
> +	}
> +
> +	printf("CPU%d: Done, %d incs\n", cpu, per_cpu_value[cpu]);
> +
> +	cpumask_set_cpu(cpu, &smp_test_complete);
> +	if (cpu != 0)
> +		halt();
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	int cpu;
> +	unsigned int i, sum = 0;
> +	static const unsigned char seed[] = "myseed";
> +
> +	inc_fn = &increment_shared;
> +
> +	isaac_init(&prng_context[0], &seed[0], sizeof(seed));
> +
> +	for (i=0; i<argc; i++) {
> +		char *arg = argv[i];
> +
> +		if (strcmp(arg, "lock") == 0) {
> +			inc_fn = &increment_shared_with_lock;
> +			report_prefix_push("lock");
> +		} else if (strcmp(arg, "excl") == 0) {
> +			inc_fn = &increment_shared_with_excl;
> +			report_prefix_push("excl");
> +		} else if (strcmp(arg, "acqrel") == 0) {
> +			inc_fn = &increment_shared_with_acqrel;
> +			report_prefix_push("acqrel");
> +		} else {
> +			isaac_reseed(&prng_context[0], (unsigned char *) arg, strlen(arg));
> +		}
> +	}
> +
> +	/* fill our random page */
> +	for (i=0; i<PAGE_SIZE; i++) {
> +		memory_array[i] = isaac_next_uint32(&prng_context[0]);
> +	}
> +
> +	for_each_present_cpu(cpu) {
> +		uint32_t seed2 = isaac_next_uint32(&prng_context[0]);
> +		if (cpu == 0)
> +			continue;
> +
> +		isaac_init(&prng_context[cpu], (unsigned char *) &seed2, sizeof(seed2));
> +		smp_boot_secondary(cpu, do_increment);
> +	}
> +
> +	do_increment();
> +
> +	while (!cpumask_full(&smp_test_complete))
> +		cpu_relax();
> +
> +	/* All CPUs done, do we add up */
> +	for_each_present_cpu(cpu) {
> +		sum += per_cpu_value[cpu];
> +	}
> +	report("total incs %d", sum == shared_value, shared_value);
> +
> +	return report_summary();
> +}
> diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
> index 67a9dda..af628e6 100644
> --- a/config/config-arm-common.mak
> +++ b/config/config-arm-common.mak
> @@ -12,6 +12,7 @@ endif
>  tests-common = $(TEST_DIR)/selftest.flat
>  tests-common += $(TEST_DIR)/spinlock-test.flat
>  tests-common += $(TEST_DIR)/tlbflush-test.flat
> +tests-common += $(TEST_DIR)/barrier-test.flat
>  
>  utils-common = $(TEST_DIR)/utils/kvm-query
>  
> @@ -80,3 +81,4 @@ utils: $(utils-common)
>  $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
>  $(TEST_DIR)/spinlock-test.elf: $(cstart.o) $(TEST_DIR)/spinlock-test.o
>  $(TEST_DIR)/tlbflush-test.elf: $(cstart.o) $(TEST_DIR)/tlbflush-test.o
> +$(TEST_DIR)/barrier-test.elf: $(cstart.o) $(TEST_DIR)/barrier-test.o
> -- 
> 2.5.0
> 
>

Same nits on missing header and style as for tlbflush-test. I'll look at
the test closer next week. I guess it's time I install mttcg too...

drew

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v5 11/11] new: arm/barrier-test for memory barriers
  2015-07-31 15:54 ` [Qemu-devel] [kvm-unit-tests PATCH v5 11/11] new: arm/barrier-test for memory barriers Alex Bennée
  2015-07-31 19:30   ` Andrew Jones
@ 2015-08-03 10:02   ` alvise rigo
  2015-08-03 10:30     ` Alex Bennée
  1 sibling, 1 reply; 28+ messages in thread
From: alvise rigo @ 2015-08-03 10:02 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, Peter Maydell, Andrew Jones, Alex Bennée,
	Claudio Fontana, kvm, Alexander Spyridakis, Mark Burton,
	QEMU Developers, KONRAD Frédéric

Hi Alex,

Nice set of tests, they are proving to be helpful.
One question below.

On Fri, Jul 31, 2015 at 5:54 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
> From: Alex Bennée <alex@bennee.com>
>
> This test has been written mainly to stress multi-threaded TCG behaviour
> but will demonstrate failure by default on real hardware. The test takes
> the following parameters:
>
>   - "lock" use GCC's locking semantics
>   - "excl" use load/store exclusive semantics
>   - "acqrel" use acquire/release semantics
>
> Currently excl/acqrel lock up on MTTCG
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  arm/barrier-test.c           | 206 +++++++++++++++++++++++++++++++++++++++++++
>  config/config-arm-common.mak |   2 +
>  2 files changed, 208 insertions(+)
>  create mode 100644 arm/barrier-test.c
>
> diff --git a/arm/barrier-test.c b/arm/barrier-test.c
> new file mode 100644
> index 0000000..53d690b
> --- /dev/null
> +++ b/arm/barrier-test.c
> @@ -0,0 +1,206 @@
> +#include <libcflat.h>
> +#include <asm/smp.h>
> +#include <asm/cpumask.h>
> +#include <asm/barrier.h>
> +#include <asm/mmu.h>
> +
> +#include <prng.h>
> +
> +#define MAX_CPUS 4
> +
> +/* How many increments to do */
> +static int increment_count = 10000000;
> +
> +
> +/* shared value, we use the alignment to ensure the global_lock value
> + * doesn't share a page */
> +static unsigned int shared_value;
> +
> +/* PAGE_SIZE * uint32_t means we span several pages */
> +static uint32_t memory_array[PAGE_SIZE];
> +
> +__attribute__((aligned(PAGE_SIZE))) static unsigned int per_cpu_value[MAX_CPUS];
> +__attribute__((aligned(PAGE_SIZE))) static cpumask_t smp_test_complete;
> +__attribute__((aligned(PAGE_SIZE))) static int global_lock;
> +
> +struct isaac_ctx prng_context[MAX_CPUS];
> +
> +void (*inc_fn)(void);
> +
> +static void lock(int *lock_var)
> +{
> +       while (__sync_lock_test_and_set(lock_var, 1));
> +}
> +static void unlock(int *lock_var)
> +{
> +       __sync_lock_release(lock_var);
> +}
> +
> +static void increment_shared(void)
> +{
> +       shared_value++;
> +}
> +
> +static void increment_shared_with_lock(void)
> +{
> +       lock(&global_lock);
> +       shared_value++;
> +       unlock(&global_lock);
> +}
> +
> +static void increment_shared_with_excl(void)
> +{
> +#if defined (__LP64__) || defined (_LP64)
> +       asm volatile(
> +       "1:     ldxr    w0, [%[sptr]]\n"
> +       "       add     w0, w0, #0x1\n"
> +       "       stxr    w1, w0, [%[sptr]]\n"
> +       "       cbnz    w1, 1b\n"
> +       : /* out */
> +       : [sptr] "r" (&shared_value) /* in */
> +       : "w0", "w1", "cc");
> +#else
> +       asm volatile(
> +       "1:     ldrex   r0, [%[sptr]]\n"
> +       "       add     r0, r0, #0x1\n"
> +       "       strexeq r1, r0, [%[sptr]]\n"
> +       "       cmpeq   r1, #0\n"

Why are we calling these last two instructions with the 'eq' suffix?
Shouldn't we just strex r1, r0, [sptr] and then cmp r1, #0?

Thank you,
alvise

> +       "       bne     1b\n"
> +       : /* out */
> +       : [sptr] "r" (&shared_value) /* in */
> +       : "r0", "r1", "cc");
> +#endif
> +}
> +
> +static void increment_shared_with_acqrel(void)
> +{
> +#if defined (__LP64__) || defined (_LP64)
> +       asm volatile(
> +       "       ldar    w0, [%[sptr]]\n"
> +       "       add     w0, w0, #0x1\n"
> +       "       str     w0, [%[sptr]]\n"
> +       : /* out */
> +       : [sptr] "r" (&shared_value) /* in */
> +       : "w0");
> +#else
> +       /* ARMv7 has no acquire/release semantics but we
> +        * can ensure the results of the write are propagated
> +        * with the use of barriers.
> +        */
> +       asm volatile(
> +       "1:     ldrex   r0, [%[sptr]]\n"
> +       "       add     r0, r0, #0x1\n"
> +       "       strexeq r1, r0, [%[sptr]]\n"
> +       "       cmpeq   r1, #0\n"
> +       "       bne     1b\n"
> +       "       dmb\n"
> +       : /* out */
> +       : [sptr] "r" (&shared_value) /* in */
> +       : "r0", "r1", "cc");
> +#endif
> +
> +}
> +
> +/* The idea of this is just to generate some random load/store
> + * activity which may or may not race with an un-barried incremented
> + * of the shared counter
> + */
> +static void shuffle_memory(int cpu)
> +{
> +       int i;
> +       uint32_t lspat = isaac_next_uint32(&prng_context[cpu]);
> +       uint32_t seq = isaac_next_uint32(&prng_context[cpu]);
> +       int count = seq & 0x1f;
> +       uint32_t val=0;
> +
> +       seq >>= 5;
> +
> +       for (i=0; i<count; i++) {
> +               int index = seq & ~PAGE_MASK;
> +               if (lspat & 1) {
> +                       val ^= memory_array[index];
> +               } else {
> +                       memory_array[index] = val;
> +               }
> +               seq >>= PAGE_SHIFT;
> +               seq ^= lspat;
> +               lspat >>= 1;
> +       }
> +
> +}
> +
> +static void do_increment(void)
> +{
> +       int i;
> +       int cpu = smp_processor_id();
> +
> +       printf("CPU%d online\n", cpu);
> +
> +       for (i=0; i < increment_count; i++) {
> +               per_cpu_value[cpu]++;
> +               inc_fn();
> +
> +               shuffle_memory(cpu);
> +       }
> +
> +       printf("CPU%d: Done, %d incs\n", cpu, per_cpu_value[cpu]);
> +
> +       cpumask_set_cpu(cpu, &smp_test_complete);
> +       if (cpu != 0)
> +               halt();
> +}
> +
> +int main(int argc, char **argv)
> +{
> +       int cpu;
> +       unsigned int i, sum = 0;
> +       static const unsigned char seed[] = "myseed";
> +
> +       inc_fn = &increment_shared;
> +
> +       isaac_init(&prng_context[0], &seed[0], sizeof(seed));
> +
> +       for (i=0; i<argc; i++) {
> +               char *arg = argv[i];
> +
> +               if (strcmp(arg, "lock") == 0) {
> +                       inc_fn = &increment_shared_with_lock;
> +                       report_prefix_push("lock");
> +               } else if (strcmp(arg, "excl") == 0) {
> +                       inc_fn = &increment_shared_with_excl;
> +                       report_prefix_push("excl");
> +               } else if (strcmp(arg, "acqrel") == 0) {
> +                       inc_fn = &increment_shared_with_acqrel;
> +                       report_prefix_push("acqrel");
> +               } else {
> +                       isaac_reseed(&prng_context[0], (unsigned char *) arg, strlen(arg));
> +               }
> +       }
> +
> +       /* fill our random page */
> +       for (i=0; i<PAGE_SIZE; i++) {
> +               memory_array[i] = isaac_next_uint32(&prng_context[0]);
> +       }
> +
> +       for_each_present_cpu(cpu) {
> +               uint32_t seed2 = isaac_next_uint32(&prng_context[0]);
> +               if (cpu == 0)
> +                       continue;
> +
> +               isaac_init(&prng_context[cpu], (unsigned char *) &seed2, sizeof(seed2));
> +               smp_boot_secondary(cpu, do_increment);
> +       }
> +
> +       do_increment();
> +
> +       while (!cpumask_full(&smp_test_complete))
> +               cpu_relax();
> +
> +       /* All CPUs done, do we add up */
> +       for_each_present_cpu(cpu) {
> +               sum += per_cpu_value[cpu];
> +       }
> +       report("total incs %d", sum == shared_value, shared_value);
> +
> +       return report_summary();
> +}
> diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
> index 67a9dda..af628e6 100644
> --- a/config/config-arm-common.mak
> +++ b/config/config-arm-common.mak
> @@ -12,6 +12,7 @@ endif
>  tests-common = $(TEST_DIR)/selftest.flat
>  tests-common += $(TEST_DIR)/spinlock-test.flat
>  tests-common += $(TEST_DIR)/tlbflush-test.flat
> +tests-common += $(TEST_DIR)/barrier-test.flat
>
>  utils-common = $(TEST_DIR)/utils/kvm-query
>
> @@ -80,3 +81,4 @@ utils: $(utils-common)
>  $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
>  $(TEST_DIR)/spinlock-test.elf: $(cstart.o) $(TEST_DIR)/spinlock-test.o
>  $(TEST_DIR)/tlbflush-test.elf: $(cstart.o) $(TEST_DIR)/tlbflush-test.o
> +$(TEST_DIR)/barrier-test.elf: $(cstart.o) $(TEST_DIR)/barrier-test.o
> --
> 2.5.0
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v5 11/11] new: arm/barrier-test for memory barriers
  2015-08-03 10:02   ` alvise rigo
@ 2015-08-03 10:30     ` Alex Bennée
  2015-08-03 10:34       ` alvise rigo
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Bennée @ 2015-08-03 10:30 UTC (permalink / raw)
  To: alvise rigo
  Cc: mttcg, Peter Maydell, Andrew Jones, Claudio Fontana, kvm,
	Alexander Spyridakis, Mark Burton, QEMU Developers,
	KONRAD Frédéric


alvise rigo <a.rigo@virtualopensystems.com> writes:

> Hi Alex,
>
> Nice set of tests, they are proving to be helpful.
> One question below.
>
> On Fri, Jul 31, 2015 at 5:54 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>> From: Alex Bennée <alex@bennee.com>
>>
>> This test has been written mainly to stress multi-threaded TCG behaviour
>> but will demonstrate failure by default on real hardware. The test takes
>> the following parameters:
>>
>>   - "lock" use GCC's locking semantics
>>   - "excl" use load/store exclusive semantics
>>   - "acqrel" use acquire/release semantics
>>
>> Currently excl/acqrel lock up on MTTCG
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> ---
>>  arm/barrier-test.c           | 206 +++++++++++++++++++++++++++++++++++++++++++
>>  config/config-arm-common.mak |   2 +
>>  2 files changed, 208 insertions(+)
>>  create mode 100644 arm/barrier-test.c
>>
>> diff --git a/arm/barrier-test.c b/arm/barrier-test.c
>> new file mode 100644
>> index 0000000..53d690b
>> --- /dev/null
>> +++ b/arm/barrier-test.c
>> @@ -0,0 +1,206 @@
>> +#include <libcflat.h>
>> +#include <asm/smp.h>
>> +#include <asm/cpumask.h>
>> +#include <asm/barrier.h>
>> +#include <asm/mmu.h>
>> +
>> +#include <prng.h>
>> +
>> +#define MAX_CPUS 4
>> +
>> +/* How many increments to do */
>> +static int increment_count = 10000000;
>> +
>> +
>> +/* shared value, we use the alignment to ensure the global_lock value
>> + * doesn't share a page */
>> +static unsigned int shared_value;
>> +
>> +/* PAGE_SIZE * uint32_t means we span several pages */
>> +static uint32_t memory_array[PAGE_SIZE];
>> +
>> +__attribute__((aligned(PAGE_SIZE))) static unsigned int per_cpu_value[MAX_CPUS];
>> +__attribute__((aligned(PAGE_SIZE))) static cpumask_t smp_test_complete;
>> +__attribute__((aligned(PAGE_SIZE))) static int global_lock;
>> +
>> +struct isaac_ctx prng_context[MAX_CPUS];
>> +
>> +void (*inc_fn)(void);
>> +
>> +static void lock(int *lock_var)
>> +{
>> +       while (__sync_lock_test_and_set(lock_var, 1));
>> +}
>> +static void unlock(int *lock_var)
>> +{
>> +       __sync_lock_release(lock_var);
>> +}
>> +
>> +static void increment_shared(void)
>> +{
>> +       shared_value++;
>> +}
>> +
>> +static void increment_shared_with_lock(void)
>> +{
>> +       lock(&global_lock);
>> +       shared_value++;
>> +       unlock(&global_lock);
>> +}
>> +
>> +static void increment_shared_with_excl(void)
>> +{
>> +#if defined (__LP64__) || defined (_LP64)
>> +       asm volatile(
>> +       "1:     ldxr    w0, [%[sptr]]\n"
>> +       "       add     w0, w0, #0x1\n"
>> +       "       stxr    w1, w0, [%[sptr]]\n"
>> +       "       cbnz    w1, 1b\n"
>> +       : /* out */
>> +       : [sptr] "r" (&shared_value) /* in */
>> +       : "w0", "w1", "cc");
>> +#else
>> +       asm volatile(
>> +       "1:     ldrex   r0, [%[sptr]]\n"
>> +       "       add     r0, r0, #0x1\n"
>> +       "       strexeq r1, r0, [%[sptr]]\n"
>> +       "       cmpeq   r1, #0\n"
>
> Why are we calling these last two instructions with the 'eq' suffix?
> Shouldn't we just strex r1, r0, [sptr] and then cmp r1, #0?

Possibly, my armv7 is a little rusty. I'm just looking at tweaking this
test now so I'll try and clean that up.

>
> Thank you,
> alvise
>
>> +       "       bne     1b\n"
>> +       : /* out */
>> +       : [sptr] "r" (&shared_value) /* in */
>> +       : "r0", "r1", "cc");
>> +#endif
>> +}
>> +
>> +static void increment_shared_with_acqrel(void)
>> +{
>> +#if defined (__LP64__) || defined (_LP64)
>> +       asm volatile(
>> +       "       ldar    w0, [%[sptr]]\n"
>> +       "       add     w0, w0, #0x1\n"
>> +       "       str     w0, [%[sptr]]\n"
>> +       : /* out */
>> +       : [sptr] "r" (&shared_value) /* in */
>> +       : "w0");
>> +#else
>> +       /* ARMv7 has no acquire/release semantics but we
>> +        * can ensure the results of the write are propagated
>> +        * with the use of barriers.
>> +        */
>> +       asm volatile(
>> +       "1:     ldrex   r0, [%[sptr]]\n"
>> +       "       add     r0, r0, #0x1\n"
>> +       "       strexeq r1, r0, [%[sptr]]\n"
>> +       "       cmpeq   r1, #0\n"
>> +       "       bne     1b\n"
>> +       "       dmb\n"
>> +       : /* out */
>> +       : [sptr] "r" (&shared_value) /* in */
>> +       : "r0", "r1", "cc");
>> +#endif
>> +
>> +}
>> +
>> +/* The idea of this is just to generate some random load/store
>> + * activity which may or may not race with an un-barried incremented
>> + * of the shared counter
>> + */
>> +static void shuffle_memory(int cpu)
>> +{
>> +       int i;
>> +       uint32_t lspat = isaac_next_uint32(&prng_context[cpu]);
>> +       uint32_t seq = isaac_next_uint32(&prng_context[cpu]);
>> +       int count = seq & 0x1f;
>> +       uint32_t val=0;
>> +
>> +       seq >>= 5;
>> +
>> +       for (i=0; i<count; i++) {
>> +               int index = seq & ~PAGE_MASK;
>> +               if (lspat & 1) {
>> +                       val ^= memory_array[index];
>> +               } else {
>> +                       memory_array[index] = val;
>> +               }
>> +               seq >>= PAGE_SHIFT;
>> +               seq ^= lspat;
>> +               lspat >>= 1;
>> +       }
>> +
>> +}
>> +
>> +static void do_increment(void)
>> +{
>> +       int i;
>> +       int cpu = smp_processor_id();
>> +
>> +       printf("CPU%d online\n", cpu);
>> +
>> +       for (i=0; i < increment_count; i++) {
>> +               per_cpu_value[cpu]++;
>> +               inc_fn();
>> +
>> +               shuffle_memory(cpu);
>> +       }
>> +
>> +       printf("CPU%d: Done, %d incs\n", cpu, per_cpu_value[cpu]);
>> +
>> +       cpumask_set_cpu(cpu, &smp_test_complete);
>> +       if (cpu != 0)
>> +               halt();
>> +}
>> +
>> +int main(int argc, char **argv)
>> +{
>> +       int cpu;
>> +       unsigned int i, sum = 0;
>> +       static const unsigned char seed[] = "myseed";
>> +
>> +       inc_fn = &increment_shared;
>> +
>> +       isaac_init(&prng_context[0], &seed[0], sizeof(seed));
>> +
>> +       for (i=0; i<argc; i++) {
>> +               char *arg = argv[i];
>> +
>> +               if (strcmp(arg, "lock") == 0) {
>> +                       inc_fn = &increment_shared_with_lock;
>> +                       report_prefix_push("lock");
>> +               } else if (strcmp(arg, "excl") == 0) {
>> +                       inc_fn = &increment_shared_with_excl;
>> +                       report_prefix_push("excl");
>> +               } else if (strcmp(arg, "acqrel") == 0) {
>> +                       inc_fn = &increment_shared_with_acqrel;
>> +                       report_prefix_push("acqrel");
>> +               } else {
>> +                       isaac_reseed(&prng_context[0], (unsigned char *) arg, strlen(arg));
>> +               }
>> +       }
>> +
>> +       /* fill our random page */
>> +       for (i=0; i<PAGE_SIZE; i++) {
>> +               memory_array[i] = isaac_next_uint32(&prng_context[0]);
>> +       }
>> +
>> +       for_each_present_cpu(cpu) {
>> +               uint32_t seed2 = isaac_next_uint32(&prng_context[0]);
>> +               if (cpu == 0)
>> +                       continue;
>> +
>> +               isaac_init(&prng_context[cpu], (unsigned char *) &seed2, sizeof(seed2));
>> +               smp_boot_secondary(cpu, do_increment);
>> +       }
>> +
>> +       do_increment();
>> +
>> +       while (!cpumask_full(&smp_test_complete))
>> +               cpu_relax();
>> +
>> +       /* All CPUs done, do we add up */
>> +       for_each_present_cpu(cpu) {
>> +               sum += per_cpu_value[cpu];
>> +       }
>> +       report("total incs %d", sum == shared_value, shared_value);
>> +
>> +       return report_summary();
>> +}
>> diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
>> index 67a9dda..af628e6 100644
>> --- a/config/config-arm-common.mak
>> +++ b/config/config-arm-common.mak
>> @@ -12,6 +12,7 @@ endif
>>  tests-common = $(TEST_DIR)/selftest.flat
>>  tests-common += $(TEST_DIR)/spinlock-test.flat
>>  tests-common += $(TEST_DIR)/tlbflush-test.flat
>> +tests-common += $(TEST_DIR)/barrier-test.flat
>>
>>  utils-common = $(TEST_DIR)/utils/kvm-query
>>
>> @@ -80,3 +81,4 @@ utils: $(utils-common)
>>  $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
>>  $(TEST_DIR)/spinlock-test.elf: $(cstart.o) $(TEST_DIR)/spinlock-test.o
>>  $(TEST_DIR)/tlbflush-test.elf: $(cstart.o) $(TEST_DIR)/tlbflush-test.o
>> +$(TEST_DIR)/barrier-test.elf: $(cstart.o) $(TEST_DIR)/barrier-test.o
>> --
>> 2.5.0
>>

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v5 11/11] new: arm/barrier-test for memory barriers
  2015-08-03 10:30     ` Alex Bennée
@ 2015-08-03 10:34       ` alvise rigo
  2015-08-03 16:06         ` Alex Bennée
  0 siblings, 1 reply; 28+ messages in thread
From: alvise rigo @ 2015-08-03 10:34 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, Peter Maydell, Andrew Jones, Claudio Fontana, kvm,
	Alexander Spyridakis, Mark Burton, QEMU Developers,
	KONRAD Frédéric

On Mon, Aug 3, 2015 at 12:30 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> alvise rigo <a.rigo@virtualopensystems.com> writes:
>
>> Hi Alex,
>>
>> Nice set of tests, they are proving to be helpful.
>> One question below.
>>
>> On Fri, Jul 31, 2015 at 5:54 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>> From: Alex Bennée <alex@bennee.com>
>>>
>>> This test has been written mainly to stress multi-threaded TCG behaviour
>>> but will demonstrate failure by default on real hardware. The test takes
>>> the following parameters:
>>>
>>>   - "lock" use GCC's locking semantics
>>>   - "excl" use load/store exclusive semantics
>>>   - "acqrel" use acquire/release semantics
>>>
>>> Currently excl/acqrel lock up on MTTCG
>>>
>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>> ---
>>>  arm/barrier-test.c           | 206 +++++++++++++++++++++++++++++++++++++++++++
>>>  config/config-arm-common.mak |   2 +
>>>  2 files changed, 208 insertions(+)
>>>  create mode 100644 arm/barrier-test.c
>>>
>>> diff --git a/arm/barrier-test.c b/arm/barrier-test.c
>>> new file mode 100644
>>> index 0000000..53d690b
>>> --- /dev/null
>>> +++ b/arm/barrier-test.c
>>> @@ -0,0 +1,206 @@
>>> +#include <libcflat.h>
>>> +#include <asm/smp.h>
>>> +#include <asm/cpumask.h>
>>> +#include <asm/barrier.h>
>>> +#include <asm/mmu.h>
>>> +
>>> +#include <prng.h>
>>> +
>>> +#define MAX_CPUS 4
>>> +
>>> +/* How many increments to do */
>>> +static int increment_count = 10000000;
>>> +
>>> +
>>> +/* shared value, we use the alignment to ensure the global_lock value
>>> + * doesn't share a page */
>>> +static unsigned int shared_value;
>>> +
>>> +/* PAGE_SIZE * uint32_t means we span several pages */
>>> +static uint32_t memory_array[PAGE_SIZE];
>>> +
>>> +__attribute__((aligned(PAGE_SIZE))) static unsigned int per_cpu_value[MAX_CPUS];
>>> +__attribute__((aligned(PAGE_SIZE))) static cpumask_t smp_test_complete;
>>> +__attribute__((aligned(PAGE_SIZE))) static int global_lock;
>>> +
>>> +struct isaac_ctx prng_context[MAX_CPUS];
>>> +
>>> +void (*inc_fn)(void);
>>> +
>>> +static void lock(int *lock_var)
>>> +{
>>> +       while (__sync_lock_test_and_set(lock_var, 1));
>>> +}
>>> +static void unlock(int *lock_var)
>>> +{
>>> +       __sync_lock_release(lock_var);
>>> +}
>>> +
>>> +static void increment_shared(void)
>>> +{
>>> +       shared_value++;
>>> +}
>>> +
>>> +static void increment_shared_with_lock(void)
>>> +{
>>> +       lock(&global_lock);
>>> +       shared_value++;
>>> +       unlock(&global_lock);
>>> +}
>>> +
>>> +static void increment_shared_with_excl(void)
>>> +{
>>> +#if defined (__LP64__) || defined (_LP64)
>>> +       asm volatile(
>>> +       "1:     ldxr    w0, [%[sptr]]\n"
>>> +       "       add     w0, w0, #0x1\n"
>>> +       "       stxr    w1, w0, [%[sptr]]\n"
>>> +       "       cbnz    w1, 1b\n"
>>> +       : /* out */
>>> +       : [sptr] "r" (&shared_value) /* in */
>>> +       : "w0", "w1", "cc");
>>> +#else
>>> +       asm volatile(
>>> +       "1:     ldrex   r0, [%[sptr]]\n"
>>> +       "       add     r0, r0, #0x1\n"
>>> +       "       strexeq r1, r0, [%[sptr]]\n"
>>> +       "       cmpeq   r1, #0\n"
>>
>> Why are we calling these last two instructions with the 'eq' suffix?
>> Shouldn't we just strex r1, r0, [sptr] and then cmp r1, #0?
>
> Possibly, my armv7 is a little rusty. I'm just looking at tweaking this
> test now so I'll try and clean that up.
>
>>
>> Thank you,
>> alvise
>>
>>> +       "       bne     1b\n"
>>> +       : /* out */
>>> +       : [sptr] "r" (&shared_value) /* in */
>>> +       : "r0", "r1", "cc");
>>> +#endif
>>> +}
>>> +
>>> +static void increment_shared_with_acqrel(void)
>>> +{
>>> +#if defined (__LP64__) || defined (_LP64)
>>> +       asm volatile(
>>> +       "       ldar    w0, [%[sptr]]\n"
>>> +       "       add     w0, w0, #0x1\n"
>>> +       "       str     w0, [%[sptr]]\n"
>>> +       : /* out */
>>> +       : [sptr] "r" (&shared_value) /* in */
>>> +       : "w0");
>>> +#else
>>> +       /* ARMv7 has no acquire/release semantics but we
>>> +        * can ensure the results of the write are propagated
>>> +        * with the use of barriers.
>>> +        */
>>> +       asm volatile(
>>> +       "1:     ldrex   r0, [%[sptr]]\n"
>>> +       "       add     r0, r0, #0x1\n"
>>> +       "       strexeq r1, r0, [%[sptr]]\n"
>>> +       "       cmpeq   r1, #0\n"

I have not tested it, but also this one looks wrong.

Regards,
alvise

>>> +       "       bne     1b\n"
>>> +       "       dmb\n"
>>> +       : /* out */
>>> +       : [sptr] "r" (&shared_value) /* in */
>>> +       : "r0", "r1", "cc");
>>> +#endif
>>> +
>>> +}
>>> +
>>> +/* The idea of this is just to generate some random load/store
>>> + * activity which may or may not race with an un-barried incremented
>>> + * of the shared counter
>>> + */
>>> +static void shuffle_memory(int cpu)
>>> +{
>>> +       int i;
>>> +       uint32_t lspat = isaac_next_uint32(&prng_context[cpu]);
>>> +       uint32_t seq = isaac_next_uint32(&prng_context[cpu]);
>>> +       int count = seq & 0x1f;
>>> +       uint32_t val=0;
>>> +
>>> +       seq >>= 5;
>>> +
>>> +       for (i=0; i<count; i++) {
>>> +               int index = seq & ~PAGE_MASK;
>>> +               if (lspat & 1) {
>>> +                       val ^= memory_array[index];
>>> +               } else {
>>> +                       memory_array[index] = val;
>>> +               }
>>> +               seq >>= PAGE_SHIFT;
>>> +               seq ^= lspat;
>>> +               lspat >>= 1;
>>> +       }
>>> +
>>> +}
>>> +
>>> +static void do_increment(void)
>>> +{
>>> +       int i;
>>> +       int cpu = smp_processor_id();
>>> +
>>> +       printf("CPU%d online\n", cpu);
>>> +
>>> +       for (i=0; i < increment_count; i++) {
>>> +               per_cpu_value[cpu]++;
>>> +               inc_fn();
>>> +
>>> +               shuffle_memory(cpu);
>>> +       }
>>> +
>>> +       printf("CPU%d: Done, %d incs\n", cpu, per_cpu_value[cpu]);
>>> +
>>> +       cpumask_set_cpu(cpu, &smp_test_complete);
>>> +       if (cpu != 0)
>>> +               halt();
>>> +}
>>> +
>>> +int main(int argc, char **argv)
>>> +{
>>> +       int cpu;
>>> +       unsigned int i, sum = 0;
>>> +       static const unsigned char seed[] = "myseed";
>>> +
>>> +       inc_fn = &increment_shared;
>>> +
>>> +       isaac_init(&prng_context[0], &seed[0], sizeof(seed));
>>> +
>>> +       for (i=0; i<argc; i++) {
>>> +               char *arg = argv[i];
>>> +
>>> +               if (strcmp(arg, "lock") == 0) {
>>> +                       inc_fn = &increment_shared_with_lock;
>>> +                       report_prefix_push("lock");
>>> +               } else if (strcmp(arg, "excl") == 0) {
>>> +                       inc_fn = &increment_shared_with_excl;
>>> +                       report_prefix_push("excl");
>>> +               } else if (strcmp(arg, "acqrel") == 0) {
>>> +                       inc_fn = &increment_shared_with_acqrel;
>>> +                       report_prefix_push("acqrel");
>>> +               } else {
>>> +                       isaac_reseed(&prng_context[0], (unsigned char *) arg, strlen(arg));
>>> +               }
>>> +       }
>>> +
>>> +       /* fill our random page */
>>> +       for (i=0; i<PAGE_SIZE; i++) {
>>> +               memory_array[i] = isaac_next_uint32(&prng_context[0]);
>>> +       }
>>> +
>>> +       for_each_present_cpu(cpu) {
>>> +               uint32_t seed2 = isaac_next_uint32(&prng_context[0]);
>>> +               if (cpu == 0)
>>> +                       continue;
>>> +
>>> +               isaac_init(&prng_context[cpu], (unsigned char *) &seed2, sizeof(seed2));
>>> +               smp_boot_secondary(cpu, do_increment);
>>> +       }
>>> +
>>> +       do_increment();
>>> +
>>> +       while (!cpumask_full(&smp_test_complete))
>>> +               cpu_relax();
>>> +
>>> +       /* All CPUs done, do we add up */
>>> +       for_each_present_cpu(cpu) {
>>> +               sum += per_cpu_value[cpu];
>>> +       }
>>> +       report("total incs %d", sum == shared_value, shared_value);
>>> +
>>> +       return report_summary();
>>> +}
>>> diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
>>> index 67a9dda..af628e6 100644
>>> --- a/config/config-arm-common.mak
>>> +++ b/config/config-arm-common.mak
>>> @@ -12,6 +12,7 @@ endif
>>>  tests-common = $(TEST_DIR)/selftest.flat
>>>  tests-common += $(TEST_DIR)/spinlock-test.flat
>>>  tests-common += $(TEST_DIR)/tlbflush-test.flat
>>> +tests-common += $(TEST_DIR)/barrier-test.flat
>>>
>>>  utils-common = $(TEST_DIR)/utils/kvm-query
>>>
>>> @@ -80,3 +81,4 @@ utils: $(utils-common)
>>>  $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
>>>  $(TEST_DIR)/spinlock-test.elf: $(cstart.o) $(TEST_DIR)/spinlock-test.o
>>>  $(TEST_DIR)/tlbflush-test.elf: $(cstart.o) $(TEST_DIR)/tlbflush-test.o
>>> +$(TEST_DIR)/barrier-test.elf: $(cstart.o) $(TEST_DIR)/barrier-test.o
>>> --
>>> 2.5.0
>>>
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v5 11/11] new: arm/barrier-test for memory barriers
  2015-08-03 10:34       ` alvise rigo
@ 2015-08-03 16:06         ` Alex Bennée
  2015-08-03 16:46           ` alvise rigo
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Bennée @ 2015-08-03 16:06 UTC (permalink / raw)
  To: alvise rigo
  Cc: mttcg, Peter Maydell, Andrew Jones, Claudio Fontana, kvm,
	Alexander Spyridakis, Mark Burton, QEMU Developers,
	KONRAD Frédéric

[-- Attachment #1: Type: text/plain, Size: 1104 bytes --]


alvise rigo <a.rigo@virtualopensystems.com> writes:

> On Mon, Aug 3, 2015 at 12:30 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> alvise rigo <a.rigo@virtualopensystems.com> writes:
>>
>>> Hi Alex,
>>>
>>> Nice set of tests, they are proving to be helpful.
>>> One question below.
>>>
<snip>
>>>
>>> Why are we calling these last two instructions with the 'eq' suffix?
>>> Shouldn't we just strex r1, r0, [sptr] and then cmp r1, #0?
>>
>> Possibly, my armv7 is a little rusty. I'm just looking at tweaking this
>> test now so I'll try and clean that up.

Please find the updated test attached. I've also included some new test
modes. In theory the barrier test by itself should still fail but it
passes on real ARMv7 as well as TCG. I'm trying to run the test on a
heavier core-ed ARMv7 to check. I suspect we get away with it on
ARMv7-on-x86_64 due to the strong ordering of the x86.

The "excl" and "acqrel" tests now run without issue (although again
plain acqrel semantics shouldn't stop a race corrupting shared_value).

I'll tweak the v8 versions of the test tomorrow.

-- 
Alex Bennée


[-- Attachment #2: updated version of the test --]
[-- Type: text/x-diff, Size: 8958 bytes --]

>From 0953549985134268bf9079a7a01b2631d8a4fdee Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Alex=20Benn=C3=A9e?= <alex@bennee.com>
Date: Thu, 30 Jul 2015 15:13:33 +0000
Subject: [kvm-unit-tests PATCH] arm/barrier-test: add memory barrier tests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This test has been written mainly to stress multi-threaded TCG behaviour
but will demonstrate failure by default on real hardware. The test takes
the following parameters:

  - "lock" use GCC's locking semantics
  - "atomic" use GCC's __atomic primitives
  - "barrier" use plain dmb() barriers
  - "wfelock" use WaitForEvent sleep
  - "excl" use load/store exclusive semantics
  - "acqrel" use acquire/release semantics

Also two more options allow the test to be tweaked

  - "noshuffle" disables the memory shuffling
  - "count=%ld" set your own per-CPU increment count

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v2
  - Don't use thumb style strexeq stuff
  - Add atomic, barrier and wfelock tests
  - Add count/noshuffle test controls
---
 arm/barrier-test.c           | 284 +++++++++++++++++++++++++++++++++++++++++++
 config/config-arm-common.mak |   2 +
 2 files changed, 286 insertions(+)
 create mode 100644 arm/barrier-test.c

diff --git a/arm/barrier-test.c b/arm/barrier-test.c
new file mode 100644
index 0000000..765f8f6
--- /dev/null
+++ b/arm/barrier-test.c
@@ -0,0 +1,284 @@
+#include <libcflat.h>
+#include <asm/smp.h>
+#include <asm/cpumask.h>
+#include <asm/barrier.h>
+#include <asm/mmu.h>
+
+#include <prng.h>
+
+#define MAX_CPUS 8
+
+/* How many increments to do */
+static int increment_count = 10000000;
+static int do_shuffle = 1;
+
+
+/* shared value all the tests attempt to safely increment */
+static unsigned int shared_value;
+
+/* PAGE_SIZE * uint32_t means we span several pages */
+static uint32_t memory_array[PAGE_SIZE];
+
+/* We use the alignment of the following to ensure accesses to locking
+ * and synchronisation primatives don't interfere with the page of the
+ * shared value
+ */
+__attribute__((aligned(PAGE_SIZE))) static unsigned int per_cpu_value[MAX_CPUS];
+__attribute__((aligned(PAGE_SIZE))) static cpumask_t smp_test_complete;
+__attribute__((aligned(PAGE_SIZE))) static int global_lock;
+
+struct isaac_ctx prng_context[MAX_CPUS];
+
+void (*inc_fn)(void);
+
+
+/* In any SMP setting this *should* fail due to cores stepping on
+ * each other updating the shared variable
+ */
+static void increment_shared(void)
+{
+	shared_value++;
+}
+
+/* GCC __sync primitives are deprecated in favour of __atomic */
+static void increment_shared_with_lock(void)
+{
+	while (__sync_lock_test_and_set(&global_lock, 1));
+	shared_value++;
+	__sync_lock_release(&global_lock);
+}
+
+/* In practice even __ATOMIC_RELAXED uses ARM's ldxr/stex exclusive
+ * semantics */
+static void increment_shared_with_atomic(void)
+{
+	__atomic_add_fetch(&shared_value, 1, __ATOMIC_SEQ_CST);
+}
+
+
+/* By themselves barriers do not gaurentee atomicity */
+static void increment_shared_with_barrier(void)
+{
+#if defined (__LP64__) || defined (_LP64)
+#else
+	asm volatile(
+	"	ldr	r0, [%[sptr]]\n"
+	"	dmb\n"
+	"	add     r0, r0, #0x1\n"
+	"	str	r1, r0, [%[sptr]]\n"
+	"	dmb\n"
+	: /* out */
+	: [sptr] "r" (&shared_value) /* in */
+	: "r0", "r1", "cc");
+#endif
+}
+
+/*
+ * Load/store exclusive with WFE (wait-for-event)
+ */
+
+static void increment_shared_with_wfelock(void)
+{
+#if defined (__LP64__) || defined (_LP64)
+#else
+	asm volatile(
+	"	mov     r1, #1\n"
+	"1:	ldrex	r0, [%[lock]]\n"
+	"	cmp     r0, #0\n"
+	"	wfene\n"
+	"	strexeq r0, r1, [%[lock]]\n"
+	"	cmpeq	r0, #0\n"
+	"	bne	1b\n"
+	"	dmb\n"
+	/* lock held */
+	"	ldr	r0, [%[sptr]]\n"
+	"	add	r0, r0, #0x1\n"
+	"	str	r0, [%[sptr]]\n"
+	/* now release */
+	"	mov	r0, #0\n"
+	"	dmb\n"
+	"	str	r0, [%[lock]]\n"
+	"	dsb\n"
+	"	sev\n"
+	: /* out */
+	: [lock] "r" (&global_lock), [sptr] "r" (&shared_value) /* in */
+	: "r0", "r1", "cc");
+#endif
+}
+
+
+/*
+ * Hand-written version of the load/store exclusive
+ */
+static void increment_shared_with_excl(void)
+{
+#if defined (__LP64__) || defined (_LP64)
+	asm volatile(
+	"1:	ldxr	w0, [%[sptr]]\n"
+	"	add     w0, w0, #0x1\n"
+	"	stxr	w1, w0, [%[sptr]]\n"
+	"	cbnz	w1, 1b\n"
+	: /* out */
+	: [sptr] "r" (&shared_value) /* in */
+	: "w0", "w1", "cc");
+#else
+	asm volatile(
+	"1:	ldrex	r0, [%[sptr]]\n"
+	"	add     r0, r0, #0x1\n"
+	"	strex	r1, r0, [%[sptr]]\n"
+	"	cmp	r1, #0\n"
+	"	bne	1b\n"
+	: /* out */
+	: [sptr] "r" (&shared_value) /* in */
+	: "r0", "r1", "cc");
+#endif
+}
+
+static void increment_shared_with_acqrel(void)
+{
+#if defined (__LP64__) || defined (_LP64)
+	asm volatile(
+	"	ldar	w0, [%[sptr]]\n"
+	"	add     w0, w0, #0x1\n"
+	"	str	w0, [%[sptr]]\n"
+	: /* out */
+	: [sptr] "r" (&shared_value) /* in */
+	: "w0");
+#else
+	/* ARMv7 has no acquire/release semantics but we
+	 * can ensure the results of the write are propagated
+	 * with the use of barriers.
+	 */
+	asm volatile(
+	"1:	ldrex	r0, [%[sptr]]\n"
+	"	add     r0, r0, #0x1\n"
+	"	strex	r1, r0, [%[sptr]]\n"
+	"	cmp	r1, #0\n"
+	"	bne	1b\n"
+	"	dmb\n"
+	: /* out */
+	: [sptr] "r" (&shared_value) /* in */
+	: "r0", "r1", "cc");
+#endif
+
+}
+
+/* The idea of this is just to generate some random load/store
+ * activity which may or may not race with an un-barried incremented
+ * of the shared counter
+ */
+static void shuffle_memory(int cpu)
+{
+	int i;
+	uint32_t lspat = isaac_next_uint32(&prng_context[cpu]);
+	uint32_t seq = isaac_next_uint32(&prng_context[cpu]);
+	int count = seq & 0x1f;
+	uint32_t val=0;
+
+	seq >>= 5;
+
+	for (i=0; i<count; i++) {
+		int index = seq & ~PAGE_MASK;
+		if (lspat & 1) {
+			val ^= memory_array[index];
+		} else {
+			memory_array[index] = val;
+		}
+		seq >>= PAGE_SHIFT;
+		seq ^= lspat;
+		lspat >>= 1;
+	}
+
+}
+
+static void do_increment(void)
+{
+	int i;
+	int cpu = smp_processor_id();
+
+	printf("CPU%d online\n", cpu);
+
+	for (i=0; i < increment_count; i++) {
+		per_cpu_value[cpu]++;
+		inc_fn();
+
+		if (do_shuffle)
+			shuffle_memory(cpu);
+	}
+
+	printf("CPU%d: Done, %d incs\n", cpu, per_cpu_value[cpu]);
+
+	cpumask_set_cpu(cpu, &smp_test_complete);
+	if (cpu != 0)
+		halt();
+}
+
+int main(int argc, char **argv)
+{
+	int cpu;
+	unsigned int i, sum = 0;
+	static const unsigned char seed[] = "myseed";
+
+	inc_fn = &increment_shared;
+
+	isaac_init(&prng_context[0], &seed[0], sizeof(seed));
+
+	for (i=0; i<argc; i++) {
+		char *arg = argv[i];
+
+		if (strcmp(arg, "lock") == 0) {
+			inc_fn = &increment_shared_with_lock;
+			report_prefix_push("lock");
+		} else if (strcmp(arg, "atomic") == 0) {
+			inc_fn = &increment_shared_with_atomic;
+			report_prefix_push("atomic");
+		} else if (strcmp(arg, "barrier") == 0) {
+			inc_fn = &increment_shared_with_atomic;
+			report_prefix_push("barrier");
+		} else if (strcmp(arg, "wfelock") == 0) {
+			inc_fn = &increment_shared_with_wfelock;
+			report_prefix_push("wfelock");
+		} else if (strcmp(arg, "excl") == 0) {
+			inc_fn = &increment_shared_with_excl;
+			report_prefix_push("excl");
+		} else if (strcmp(arg, "acqrel") == 0) {
+			inc_fn = &increment_shared_with_acqrel;
+			report_prefix_push("acqrel");
+		} else if (strcmp(arg, "noshuffle") == 0) {
+			do_shuffle = 0;
+			report_prefix_push("noshuffle");
+		} else if (strstr(arg, "count=") != NULL) {
+			char *p = strstr(arg, "=");
+			increment_count = atol(p+1);
+		} else {
+			isaac_reseed(&prng_context[0], (unsigned char *) arg, strlen(arg));
+		}
+	}
+
+	/* fill our random page */
+	for (i=0; i<PAGE_SIZE; i++) {
+		memory_array[i] = isaac_next_uint32(&prng_context[0]);
+	}
+
+	for_each_present_cpu(cpu) {
+		uint32_t seed2 = isaac_next_uint32(&prng_context[0]);
+		if (cpu == 0)
+			continue;
+
+		isaac_init(&prng_context[cpu], (unsigned char *) &seed2, sizeof(seed2));
+		smp_boot_secondary(cpu, do_increment);
+	}
+
+	do_increment();
+
+	while (!cpumask_full(&smp_test_complete))
+		cpu_relax();
+
+	/* All CPUs done, do we add up */
+	for_each_present_cpu(cpu) {
+		sum += per_cpu_value[cpu];
+	}
+	report("total incs %d", sum == shared_value, shared_value);
+
+	return report_summary();
+}
diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
index de2c4a3..4c5abb6 100644
--- a/config/config-arm-common.mak
+++ b/config/config-arm-common.mak
@@ -12,6 +12,7 @@ endif
 tests-common = $(TEST_DIR)/selftest.flat
 tests-common += $(TEST_DIR)/spinlock-test.flat
 tests-common += $(TEST_DIR)/tlbflush-test.flat
+tests-common += $(TEST_DIR)/barrier-test.flat
 
 ifneq ($(TEST),)
 	tests = $(TEST_DIR)/$(TEST).flat
@@ -78,3 +79,4 @@ $(TEST_DIR)/$(TEST).elf: $(cstart.o) $(TEST_DIR)/$(TEST).o
 $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
 $(TEST_DIR)/spinlock-test.elf: $(cstart.o) $(TEST_DIR)/spinlock-test.o
 $(TEST_DIR)/tlbflush-test.elf: $(cstart.o) $(TEST_DIR)/tlbflush-test.o
+$(TEST_DIR)/barrier-test.elf: $(cstart.o) $(TEST_DIR)/barrier-test.o
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v5 11/11] new: arm/barrier-test for memory barriers
  2015-08-03 16:06         ` Alex Bennée
@ 2015-08-03 16:46           ` alvise rigo
  2015-08-04  7:30             ` Alex Bennée
  0 siblings, 1 reply; 28+ messages in thread
From: alvise rigo @ 2015-08-03 16:46 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, Peter Maydell, Andrew Jones, Claudio Fontana, kvm,
	Alexander Spyridakis, Mark Burton, QEMU Developers,
	KONRAD Frédéric

[-- Attachment #1: Type: text/plain, Size: 1615 bytes --]

On Mon, Aug 3, 2015 at 6:06 PM, Alex Bennée <alex.bennee@linaro.org> wrote:

>
> alvise rigo <a.rigo@virtualopensystems.com> writes:
>
> > On Mon, Aug 3, 2015 at 12:30 PM, Alex Bennée <alex.bennee@linaro.org>
> wrote:
> >>
> >> alvise rigo <a.rigo@virtualopensystems.com> writes:
> >>
> >>> Hi Alex,
> >>>
> >>> Nice set of tests, they are proving to be helpful.
> >>> One question below.
> >>>
> <snip>
> >>>
> >>> Why are we calling these last two instructions with the 'eq' suffix?
> >>> Shouldn't we just strex r1, r0, [sptr] and then cmp r1, #0?
> >>
> >> Possibly, my armv7 is a little rusty. I'm just looking at tweaking this
> >> test now so I'll try and clean that up.
>
> Please find the updated test attached. I've also included some new test
> modes. In theory the barrier test by itself should still fail but it
>

Thanks, I will check them out.


> passes on real ARMv7 as well as TCG. I'm trying to run the test on a
> heavier core-ed ARMv7 to check. I suspect we get away with it on
> ARMv7-on-x86_64 due to the strong ordering of the x86.


> The "excl" and "acqrel" tests now run without issue (although again
> plain acqrel semantics shouldn't stop a race corrupting shared_value).



I suppose that, in order to have some race conditions due to a lack of a
proper emulation of barriers and acqrel instructions, we need a test that
does not involve atomic instructions at all, to reduce the emulation
overhead as much as possible.
Does this sound reasonable?


>
> I'll tweak the v8 versions of the test tomorrow.
>
> --
> Alex Bennée
>
>

[-- Attachment #2: Type: text/html, Size: 2842 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v5 11/11] new: arm/barrier-test for memory barriers
  2015-08-03 16:46           ` alvise rigo
@ 2015-08-04  7:30             ` Alex Bennée
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Bennée @ 2015-08-04  7:30 UTC (permalink / raw)
  To: alvise rigo
  Cc: mttcg, Peter Maydell, Andrew Jones, Claudio Fontana, kvm,
	Alexander Spyridakis, Mark Burton, QEMU Developers,
	KONRAD Frédéric


alvise rigo <a.rigo@virtualopensystems.com> writes:

> On Mon, Aug 3, 2015 at 6:06 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>>
>> alvise rigo <a.rigo@virtualopensystems.com> writes:
>>
>> > On Mon, Aug 3, 2015 at 12:30 PM, Alex Bennée <alex.bennee@linaro.org>
>> wrote:
>> >>
>> >> alvise rigo <a.rigo@virtualopensystems.com> writes:
>> >>
>> >>> Hi Alex,
>> >>>
>> >>> Nice set of tests, they are proving to be helpful.
>> >>> One question below.
>> >>>
>> <snip>
>> >>>
>> >>> Why are we calling these last two instructions with the 'eq' suffix?
>> >>> Shouldn't we just strex r1, r0, [sptr] and then cmp r1, #0?
>> >>
>> >> Possibly, my armv7 is a little rusty. I'm just looking at tweaking this
>> >> test now so I'll try and clean that up.
>>
>> Please find the updated test attached. I've also included some new test
>> modes. In theory the barrier test by itself should still fail but it
>>
>
> Thanks, I will check them out.
>
>
>> passes on real ARMv7 as well as TCG. I'm trying to run the test on a
>> heavier core-ed ARMv7 to check. I suspect we get away with it on
>> ARMv7-on-x86_64 due to the strong ordering of the x86.
>
>
>> The "excl" and "acqrel" tests now run without issue (although again
>> plain acqrel semantics shouldn't stop a race corrupting shared_value).
>
>
>
> I suppose that, in order to have some race conditions due to a lack of a
> proper emulation of barriers and acqrel instructions, we need a test that
> does not involve atomic instructions at all, to reduce the emulation
> overhead as much as possible.
> Does this sound reasonable?

I'm writing a "lockless" test now which uses just barriers and a postbox
style signal. But as I say I need to understand why the pure "barrier"
tests still works when it really shouldn't.

>
>
>>
>> I'll tweak the v8 versions of the test tomorrow.
>>
>> --
>> Alex Bennée
>>
>>

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests
  2015-07-31 15:53 [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Alex Bennée
                   ` (10 preceding siblings ...)
  2015-07-31 15:54 ` [Qemu-devel] [kvm-unit-tests PATCH v5 11/11] new: arm/barrier-test for memory barriers Alex Bennée
@ 2015-08-02 16:44 ` Andrew Jones
  11 siblings, 0 replies; 28+ messages in thread
From: Andrew Jones @ 2015-08-02 16:44 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, claudio.fontana, kvm, a.spyridakis,
	mark.burton, a.rigo, qemu-devel, fred.konrad

On Fri, Jul 31, 2015 at 04:53:50PM +0100, Alex Bennée wrote:
> Hi,
> 
> This is the current state of my MTTCG tests based on the KVM's unit
> testing framework. The earlier patches in the series have already been
> reviewed and will (with the exception of the emacs patch) be making
> their way upstream.
> 
> There are a couple of addition to library functions:
>   - printf %u suppport
>   - flush_tlb_page for arm and arm64
>   - a generic prng from CCAN
> 
> The two actual tests are:
>   - tlbflush-test
>   - barrier-test
> 
> The latter barrier test hangs the current -v6 MTTCG patch set in both
> "excl" and "acqrel" modes and will make a good torture test for
> Alvise's atomic patch set. I suspect the load/store ordering issues
> will show up better once tested on a weak-ordered backend. I'm open to
> suggestions for other tests worth adding to show up the issues.
> 
> The github tree can be found at:
> 
> https://github.com/stsquad/kvm-unit-tests/tree/current-mttcg-tests
> 
> 
> Alex Bennée (11):
>   arm/run: set indentation defaults for emacs
>   README: add some CONTRIBUTING notes
>   configure: emit HOST=$host to config.mak
>   arm/run: introduce usingkvm var and use it
>   lib/printf: support the %u unsigned fmt field
>   lib/arm: add flush_tlb_page mmu function
>   new arm/tlbflush-test: TLB torture test
>   arm/unittests.cfg: add the tlbflush tests
>   arm: query /dev/kvm for maximum vcpus
>   new: add isaac prng library from CCAN
>   new: arm/barrier-test for memory barriers

General comment; please remove 'new' from your patch summaries.
The lib/arm prefix is OK, but I've been using 'arm/arm64:' for
all arm/arm64 patches, whether they're lib or tests.

Thanks,
drew

> 
>  README                       |  26 ++++++
>  arm/barrier-test.c           | 206 +++++++++++++++++++++++++++++++++++++++++++
>  arm/run                      |  19 +++-
>  arm/tlbflush-test.c          | 194 ++++++++++++++++++++++++++++++++++++++++
>  arm/unittests.cfg            |  26 +++++-
>  arm/utils/kvm-query.c        |  41 +++++++++
>  config/config-arm-common.mak |  18 +++-
>  configure                    |   2 +
>  lib/arm/asm/mmu.h            |  11 +++
>  lib/arm64/asm/mmu.h          |   8 ++
>  lib/printf.c                 |  13 +++
>  lib/prng.c                   | 162 ++++++++++++++++++++++++++++++++++
>  lib/prng.h                   |  82 +++++++++++++++++
>  13 files changed, 801 insertions(+), 7 deletions(-)
>  create mode 100644 arm/barrier-test.c
>  create mode 100644 arm/tlbflush-test.c
>  create mode 100644 arm/utils/kvm-query.c
>  create mode 100644 lib/prng.c
>  create mode 100644 lib/prng.h
> 
> -- 
> 2.5.0
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2015-08-04  7:30 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-31 15:53 [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Alex Bennée
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 01/11] arm/run: set indentation defaults for emacs Alex Bennée
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 02/11] README: add some CONTRIBUTING notes Alex Bennée
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 03/11] configure: emit HOST=$host to config.mak Alex Bennée
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 04/11] arm/run: introduce usingkvm var and use it Alex Bennée
2015-08-02 16:36   ` Andrew Jones
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 05/11] lib/printf: support the %u unsigned fmt field Alex Bennée
2015-07-31 18:25   ` Andrew Jones
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 06/11] lib/arm: add flush_tlb_page mmu function Alex Bennée
2015-07-31 18:35   ` Andrew Jones
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 07/11] new arm/tlbflush-test: TLB torture test Alex Bennée
2015-07-31 18:51   ` Andrew Jones
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 08/11] arm/unittests.cfg: add the tlbflush tests Alex Bennée
2015-07-31 18:53   ` Andrew Jones
2015-07-31 15:53 ` [Qemu-devel] [kvm-unit-tests PATCH v5 09/11] arm: query /dev/kvm for maximum vcpus Alex Bennée
2015-07-31 19:17   ` Andrew Jones
2015-08-02 16:40     ` Andrew Jones
2015-07-31 15:54 ` [Qemu-devel] [kvm-unit-tests PATCH v5 10/11] new: add isaac prng library from CCAN Alex Bennée
2015-07-31 19:22   ` Andrew Jones
2015-07-31 15:54 ` [Qemu-devel] [kvm-unit-tests PATCH v5 11/11] new: arm/barrier-test for memory barriers Alex Bennée
2015-07-31 19:30   ` Andrew Jones
2015-08-03 10:02   ` alvise rigo
2015-08-03 10:30     ` Alex Bennée
2015-08-03 10:34       ` alvise rigo
2015-08-03 16:06         ` Alex Bennée
2015-08-03 16:46           ` alvise rigo
2015-08-04  7:30             ` Alex Bennée
2015-08-02 16:44 ` [Qemu-devel] [kvm-unit-tests PATCH v5 00/11] My current MTTCG tests Andrew Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).