qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
@ 2015-08-25  4:19 Richard Henderson
  2015-08-25  5:45 ` Dennis Luehring
  2015-08-25  6:35 ` Artyom Tarasenko
  0 siblings, 2 replies; 11+ messages in thread
From: Richard Henderson @ 2015-08-25  4:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien, mark.cave-ayland, dl.soluz, atar4qemu

Doing this instead of saving the raw PS_PRIV and TL.  This means
that all nucleus mode TBs (TL > 0) can be shared.  This fixes a
bug in that we didn't include HS_PRIV in the TB flags, and so could
produce incorrect TB matches for hypervisor state.

The LSU and DMMU states were unused by the translator.  Including
them in TB flags meant unnecessary mismatches from tb_find_fast.

Signed-off-by: Richard Henderson <rth@twiddle.net>

---
Artyom and Dennis, I'm hoping that this will help with some of your
translation performance problems.  I don't currently have a sparc64
kernel set up for booting, but I did smoke test this with openbios,
and even there it reduced the number of TBs created.


r~
---
 target-sparc/cpu.h       | 26 ++++++++++++--------------
 target-sparc/translate.c |  2 +-
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h
index 0522b65..23773f4 100644
--- a/target-sparc/cpu.h
+++ b/target-sparc/cpu.h
@@ -694,34 +694,32 @@ void cpu_tick_set_limit(CPUTimer *timer, uint64_t limit);
 trap_state* cpu_tsptr(CPUSPARCState* env);
 #endif
 
-#define TB_FLAG_FPU_ENABLED (1 << 4)
-#define TB_FLAG_AM_ENABLED (1 << 5)
+#define TB_FLAG_MMU_MASK     7
+#define TB_FLAG_FPU_ENABLED  (1 << 4)
+#define TB_FLAG_AM_ENABLED   (1 << 5)
 
 static inline void cpu_get_tb_cpu_state(CPUSPARCState *env, target_ulong *pc,
-                                        target_ulong *cs_base, int *flags)
+                                        target_ulong *cs_base, int *pflags)
 {
+    int flags;
     *pc = env->pc;
     *cs_base = env->npc;
+    flags = cpu_mmu_index(env);
 #ifdef TARGET_SPARC64
-    // AM . Combined FPU enable bits . PRIV . DMMU enabled . IMMU enabled
-    *flags = (env->pstate & PS_PRIV)               /* 2 */
-        | ((env->lsu & (DMMU_E | IMMU_E)) >> 2)    /* 1, 0 */
-        | ((env->tl & 0xff) << 8)
-        | (env->dmmu.mmu_primary_context << 16);   /* 16... */
     if (env->pstate & PS_AM) {
-        *flags |= TB_FLAG_AM_ENABLED;
+        flags |= TB_FLAG_AM_ENABLED;
     }
-    if ((env->def->features & CPU_FEATURE_FLOAT) && (env->pstate & PS_PEF)
+    if ((env->def->features & CPU_FEATURE_FLOAT)
+        && (env->pstate & PS_PEF)
         && (env->fprs & FPRS_FEF)) {
-        *flags |= TB_FLAG_FPU_ENABLED;
+        flags |= TB_FLAG_FPU_ENABLED;
     }
 #else
-    // FPU enable . Supervisor
-    *flags = env->psrs;
     if ((env->def->features & CPU_FEATURE_FLOAT) && env->psref) {
-        *flags |= TB_FLAG_FPU_ENABLED;
+        flags |= TB_FLAG_FPU_ENABLED;
     }
 #endif
+    *pflags = flags;
 }
 
 static inline bool tb_fpu_enabled(int tb_flags)
diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index 48fc2ab..8254a30 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -5234,7 +5234,7 @@ static inline void gen_intermediate_code_internal(SPARCCPU *cpu,
     last_pc = dc->pc;
     dc->npc = (target_ulong) tb->cs_base;
     dc->cc_op = CC_OP_DYNAMIC;
-    dc->mem_idx = cpu_mmu_index(env);
+    dc->mem_idx = tb->flags & TB_FLAG_MMU_MASK;
     dc->def = env->def;
     dc->fpu_enabled = tb_fpu_enabled(tb->flags);
     dc->address_mask_32bit = tb_am_enabled(tb->flags);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
  2015-08-25  4:19 [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags Richard Henderson
@ 2015-08-25  5:45 ` Dennis Luehring
  2015-08-25  6:44   ` Artyom Tarasenko
  2015-08-25  6:35 ` Artyom Tarasenko
  1 sibling, 1 reply; 11+ messages in thread
From: Dennis Luehring @ 2015-08-25  5:45 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland, aurelien, atar4qemu

Am 25.08.2015 um 06:19 schrieb Richard Henderson:
> Artyom and Dennis, I'm hoping that this will help with some of your
> translation performance problems.  I don't currently have a sparc64
> kernel set up for booting, but I did smoke test this with openbios,
> and even there it reduced the number of TBs created.

i don't really can say it improves something - but maybe im just not the 
right person to interpret the numbers

your patch gives the worst result in stream benchmark but nearly the 
best in pugixml compile times and prime.c runtime
every tried patch or branch nearly halfs the speed of the stream 
benchmark comapred to qemu-git-master

legende:
tcg-indirect => git://github.com/rth7680/qemu.git  tcg-indirect
without-optimization => qemu-git-master + undefine USE_TCG_OPTIMIZATIONS
build flags for gcc are unrelevant because im using always the same

system:
Ubuntu 15.04 x64 Host, NetBSD SPARC64 guest running from ramdisk (to 
reduce io noise)

pugixml.cpp buildtime:
build: g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x 
-c -MMD -MP

results:
tcg-indirect: ~2:46.5
qemu.org-git: ~2:51.2 !!WORST!!
without-optimization: ~2:14.1 !!BEST!!
patch-Store-mmu-index-in-TB-flags: 2:38.4

prime.c runtime (Aurelien Jarno sysbench extracted int primer)
build: gcc prime.c -o prime.out -lm)

results:
tcg-indirect: ~9.3 sec !!BEST!!
qemu-git-master: ~11 sec !!WORST!!
without-optimization: ~9.9 sec
patch-Store-mmu-index-in-TB-flags: ~9.7sec

stream (https://www.cs.virginia.edu/stream/)

build: gcc stream.c -o stream.out -lm)

tcg-indirect:
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             320.8     0.511297     0.498785     0.590214
Scale:            187.0     0.858693     0.855465     0.863527
Add:              218.2     1.104654     1.099698     1.110341
Triad:            169.5     1.433273     1.416321     1.502248

qemu-git-master: !!BEST!!
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             771.5     0.214717     0.207377     0.244214
Scale:            288.1     0.573320     0.555401     0.660161
Add:              423.5     0.633523     0.566661     1.092067
Triad:            242.9     1.053032     0.987970     1.499563

without-optimization:
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             316.6     0.524065     0.505313     0.580103
Scale:            200.5     0.813356     0.798024     0.840986
Add:              243.9     1.010247     0.984025     1.119149
Triad:            182.9     1.345601     1.312236     1.427459

patch-Store-mmu-index-in-TB-flags: !!WORST!!
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             276.2     0.585113     0.579193     0.595607
Scale:            181.3     0.895991     0.882744     0.917396
Add:              215.9     1.126226     1.111750     1.174236
Triad:            167.0     1.469888     1.436790     1.523211

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
  2015-08-25  4:19 [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags Richard Henderson
  2015-08-25  5:45 ` Dennis Luehring
@ 2015-08-25  6:35 ` Artyom Tarasenko
  1 sibling, 0 replies; 11+ messages in thread
From: Artyom Tarasenko @ 2015-08-25  6:35 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Aurelien Jarno, Mark Cave-Ayland, qemu-devel, Dennis Luehring

Hi Richard,

On Tue, Aug 25, 2015 at 6:19 AM, Richard Henderson <rth@twiddle.net> wrote:
> Doing this instead of saving the raw PS_PRIV and TL.  This means
> that all nucleus mode TBs (TL > 0) can be shared.  This fixes a
> bug in that we didn't include HS_PRIV in the TB flags, and so could
> produce incorrect TB matches for hypervisor state.
>
> The LSU and DMMU states were unused by the translator.  Including
> them in TB flags meant unnecessary mismatches from tb_find_fast.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
>
> ---
> Artyom and Dennis, I'm hoping that this will help with some of your
> translation performance problems.  I don't currently have a sparc64
> kernel set up for booting, but I did smoke test this with openbios,
> and even there it reduced the number of TBs created.

This patch indeed fixes a bug in sun4v emulation, and we'll need it once
we have a working sun4v machine (currently qemu doesn't implement
a minimal sun4v machine, for instance there is no sun4v mmu).

I haven't tried it, but it's unlikely it would impact the
sun4u emulation we are currently using for the tests:
the sun4u machine doesn't have a hypervisor.

Artyom

> ---
>  target-sparc/cpu.h       | 26 ++++++++++++--------------
>  target-sparc/translate.c |  2 +-
>  2 files changed, 13 insertions(+), 15 deletions(-)
>
> diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h
> index 0522b65..23773f4 100644
> --- a/target-sparc/cpu.h
> +++ b/target-sparc/cpu.h
> @@ -694,34 +694,32 @@ void cpu_tick_set_limit(CPUTimer *timer, uint64_t limit);
>  trap_state* cpu_tsptr(CPUSPARCState* env);
>  #endif
>
> -#define TB_FLAG_FPU_ENABLED (1 << 4)
> -#define TB_FLAG_AM_ENABLED (1 << 5)
> +#define TB_FLAG_MMU_MASK     7
> +#define TB_FLAG_FPU_ENABLED  (1 << 4)
> +#define TB_FLAG_AM_ENABLED   (1 << 5)
>
>  static inline void cpu_get_tb_cpu_state(CPUSPARCState *env, target_ulong *pc,
> -                                        target_ulong *cs_base, int *flags)
> +                                        target_ulong *cs_base, int *pflags)
>  {
> +    int flags;
>      *pc = env->pc;
>      *cs_base = env->npc;
> +    flags = cpu_mmu_index(env);
>  #ifdef TARGET_SPARC64
> -    // AM . Combined FPU enable bits . PRIV . DMMU enabled . IMMU enabled
> -    *flags = (env->pstate & PS_PRIV)               /* 2 */
> -        | ((env->lsu & (DMMU_E | IMMU_E)) >> 2)    /* 1, 0 */
> -        | ((env->tl & 0xff) << 8)
> -        | (env->dmmu.mmu_primary_context << 16);   /* 16... */
>      if (env->pstate & PS_AM) {
> -        *flags |= TB_FLAG_AM_ENABLED;
> +        flags |= TB_FLAG_AM_ENABLED;
>      }
> -    if ((env->def->features & CPU_FEATURE_FLOAT) && (env->pstate & PS_PEF)
> +    if ((env->def->features & CPU_FEATURE_FLOAT)
> +        && (env->pstate & PS_PEF)
>          && (env->fprs & FPRS_FEF)) {
> -        *flags |= TB_FLAG_FPU_ENABLED;
> +        flags |= TB_FLAG_FPU_ENABLED;
>      }
>  #else
> -    // FPU enable . Supervisor
> -    *flags = env->psrs;
>      if ((env->def->features & CPU_FEATURE_FLOAT) && env->psref) {
> -        *flags |= TB_FLAG_FPU_ENABLED;
> +        flags |= TB_FLAG_FPU_ENABLED;
>      }
>  #endif
> +    *pflags = flags;
>  }
>
>  static inline bool tb_fpu_enabled(int tb_flags)
> diff --git a/target-sparc/translate.c b/target-sparc/translate.c
> index 48fc2ab..8254a30 100644
> --- a/target-sparc/translate.c
> +++ b/target-sparc/translate.c
> @@ -5234,7 +5234,7 @@ static inline void gen_intermediate_code_internal(SPARCCPU *cpu,
>      last_pc = dc->pc;
>      dc->npc = (target_ulong) tb->cs_base;
>      dc->cc_op = CC_OP_DYNAMIC;
> -    dc->mem_idx = cpu_mmu_index(env);
> +    dc->mem_idx = tb->flags & TB_FLAG_MMU_MASK;
>      dc->def = env->def;
>      dc->fpu_enabled = tb_fpu_enabled(tb->flags);
>      dc->address_mask_32bit = tb_am_enabled(tb->flags);
> --
> 2.4.3
>



-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
  2015-08-25  5:45 ` Dennis Luehring
@ 2015-08-25  6:44   ` Artyom Tarasenko
  2015-08-25  7:46     ` Dennis Luehring
  2015-08-25 14:25     ` Richard Henderson
  0 siblings, 2 replies; 11+ messages in thread
From: Artyom Tarasenko @ 2015-08-25  6:44 UTC (permalink / raw)
  To: Dennis Luehring
  Cc: Mark Cave-Ayland, qemu-devel, Aurelien Jarno, Richard Henderson

Hi Dennis,


On Tue, Aug 25, 2015 at 7:45 AM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> Am 25.08.2015 um 06:19 schrieb Richard Henderson:
>>
>> Artyom and Dennis, I'm hoping that this will help with some of your
>> translation performance problems.  I don't currently have a sparc64
>> kernel set up for booting, but I did smoke test this with openbios,
>> and even there it reduced the number of TBs created.
>
>
> i don't really can say it improves something - but maybe im just not the
> right person to interpret the numbers
>
> your patch gives the worst result in stream benchmark but nearly the best in
> pugixml compile times and prime.c runtime
> every tried patch or branch nearly halfs the speed of the stream benchmark
> comapred to qemu-git-master

This is very surprising: the patch should have no effect on a sun4u machine.
Have you applied it to the master or some other branch?
Have you pulled the master branch recently? Maybe there was another
change affecting the performance?

Artyom

> legende:
> tcg-indirect => git://github.com/rth7680/qemu.git  tcg-indirect
> without-optimization => qemu-git-master + undefine USE_TCG_OPTIMIZATIONS
> build flags for gcc are unrelevant because im using always the same
>
> system:
> Ubuntu 15.04 x64 Host, NetBSD SPARC64 guest running from ramdisk (to reduce
> io noise)
>
> pugixml.cpp buildtime:
> build: g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c
> -MMD -MP
>
> results:
> tcg-indirect: ~2:46.5
> qemu.org-git: ~2:51.2 !!WORST!!
> without-optimization: ~2:14.1 !!BEST!!
> patch-Store-mmu-index-in-TB-flags: 2:38.4
>
> prime.c runtime (Aurelien Jarno sysbench extracted int primer)
> build: gcc prime.c -o prime.out -lm)
>
> results:
> tcg-indirect: ~9.3 sec !!BEST!!
> qemu-git-master: ~11 sec !!WORST!!
> without-optimization: ~9.9 sec
> patch-Store-mmu-index-in-TB-flags: ~9.7sec
>
> stream (https://www.cs.virginia.edu/stream/)
>
> build: gcc stream.c -o stream.out -lm)
>
> tcg-indirect:
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:             320.8     0.511297     0.498785     0.590214
> Scale:            187.0     0.858693     0.855465     0.863527
> Add:              218.2     1.104654     1.099698     1.110341
> Triad:            169.5     1.433273     1.416321     1.502248
>
> qemu-git-master: !!BEST!!
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:             771.5     0.214717     0.207377     0.244214
> Scale:            288.1     0.573320     0.555401     0.660161
> Add:              423.5     0.633523     0.566661     1.092067
> Triad:            242.9     1.053032     0.987970     1.499563
>
> without-optimization:
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:             316.6     0.524065     0.505313     0.580103
> Scale:            200.5     0.813356     0.798024     0.840986
> Add:              243.9     1.010247     0.984025     1.119149
> Triad:            182.9     1.345601     1.312236     1.427459
>
> patch-Store-mmu-index-in-TB-flags: !!WORST!!
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:             276.2     0.585113     0.579193     0.595607
> Scale:            181.3     0.895991     0.882744     0.917396
> Add:              215.9     1.126226     1.111750     1.174236
> Triad:            167.0     1.469888     1.436790     1.523211
>



-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
  2015-08-25  6:44   ` Artyom Tarasenko
@ 2015-08-25  7:46     ` Dennis Luehring
  2015-08-25 14:25     ` Richard Henderson
  1 sibling, 0 replies; 11+ messages in thread
From: Dennis Luehring @ 2015-08-25  7:46 UTC (permalink / raw)
  To: Artyom Tarasenko
  Cc: Mark Cave-Ayland, qemu-devel, Aurelien Jarno, Richard Henderson

Am 25.08.2015 um 08:44 schrieb Artyom Tarasenko:
>> >your patch gives the worst result in stream benchmark but nearly the best in
>> >pugixml compile times and prime.c runtime
>> >every tried patch or branch nearly halfs the speed of the stream benchmark
>> >comapred to qemu-git-master
> This is very surprising: the patch should have no effect on a sun4u machine.
> Have you applied it to the master or some other branch?
> Have you pulled the master branch recently? Maybe there was another
> change affecting the performance?

i've completely removed my git qemu folder and freshly cloned the 
qemu-master, applied the patch
and rechecked if applied - and these are my numbers
i always remove my qemu-master (i always use master, other branch or 
clean master + patch) and build completely and im always using the same
settings, remadisk etc. for compilation and benchmarking

and its not realy surprising - there are ~5 people in the talk - each 
with different ideas where the slowness
comes from and all use different or non formalized "bechmark-suits" 
(like your combination or my 3 tests) -
each test i've made seems to give wired or suprising results - so my 
conclusion is: no one realy knows what it is and where it
comes from - and as long as there is no equal benchmark-suite (for 
example NetBSD + the 3 tests) it will go on to be
surprising or wired when i post results

Example:

at first it was - your RAM is full, your system is swapping, your 
harddisk is slow etc. talks with "Artyom Tarasenko", "Aurelien Jarno" 
and some others
- none of these are a problem - i've got more then enough RAM and CPU 
power in my host and free in the guest, and using a ramdisk for the 
image make IO less noisy

"Aurelien Jarno" said it could be the 32bit userland in the my debian 
7.8 SPARC64 system - and showed numbers with prime.c that proves it
i've rechecked that and came to the same results and switched over to 
NetBSD SPARC64 (a pure 64bit system) that make prime.c the fastest
but that does not realy reduce the pugixml compile times (my host needs 
3sek, NetBSD takes ~3minutes, building cmake need ~10 hours or longer)

then someone said it could be IO - so i put the NetBSD image on a 
ramdisk - helped a little

then "Karel Gardas" got the idea that the compilation process is primary 
memory bound - so asked me to use the stream-benchmark - i've posting 
results on every change
and i still don't know if the numbers im getting from the benchmark are 
relevant in any way (no one realy replies to them) - but they seems to 
be very relevant

then i've tested the branch from tgc-indirect branch - prime.c get a 
little better, stream get slower

the last patch from Richard Henderson gives still unclear results - 
prime.c get a little better, stream get the slowest

the next thing i will do is a complete script based qemu-compilation and 
benchmark run in my NetBSD image - then the human-factor is down to 0% 
and the
only source of suprising/wired results is my host-hardware

is threre any interest in my NetBSD image (or the installation process)? 
(to have a change to get to similar results in the differences)
should i add some other tests?
what is usualy in use for performance tests? still no answer on that 
question

im ready and happy to compile/run all your got/want :)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
  2015-08-25  6:44   ` Artyom Tarasenko
  2015-08-25  7:46     ` Dennis Luehring
@ 2015-08-25 14:25     ` Richard Henderson
  2015-08-25 14:37       ` Dennis Luehring
  2015-08-25 16:53       ` Artyom Tarasenko
  1 sibling, 2 replies; 11+ messages in thread
From: Richard Henderson @ 2015-08-25 14:25 UTC (permalink / raw)
  To: Artyom Tarasenko, Dennis Luehring
  Cc: Mark Cave-Ayland, qemu-devel, Aurelien Jarno

On 08/24/2015 11:44 PM, Artyom Tarasenko wrote:
> This is very surprising: the patch should have no effect on a sun4u machine.

Er, no, it should.  The primary vector by which I expect improvement is via not 
encoding dmmu.mmu_primary_context into the TB flags.  I.e. ASI_DMMU, which 
sun4u certainly uses.

The fact that the patch _also_ fixes a sun4v problem is secondary.


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
  2015-08-25 14:25     ` Richard Henderson
@ 2015-08-25 14:37       ` Dennis Luehring
  2015-08-25 18:09         ` Richard Henderson
  2015-08-25 16:53       ` Artyom Tarasenko
  1 sibling, 1 reply; 11+ messages in thread
From: Dennis Luehring @ 2015-08-25 14:37 UTC (permalink / raw)
  To: Richard Henderson, Artyom Tarasenko
  Cc: Mark Cave-Ayland, qemu-devel, Aurelien Jarno

Am 25.08.2015 um 16:25 schrieb Richard Henderson:
> Er, no, it should.  The primary vector by which I expect improvement is via not
> encoding dmmu.mmu_primary_context into the TB flags.  I.e. ASI_DMMU, which
> sun4u certainly uses.
>
> The fact that the patch_also_  fixes a sun4v problem is secondary.

please, can you(or someone else) give me a feedback about my 
tests/numbers - and the relevance of them - the stream benchmarks 
results seems to be worser then before
and the compilespeed is just a little bit better - so i don't understand 
(at all) what problems are fixed or what is improved now - the 
compilation test is still 180 times
slower then on my host

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
  2015-08-25 14:25     ` Richard Henderson
  2015-08-25 14:37       ` Dennis Luehring
@ 2015-08-25 16:53       ` Artyom Tarasenko
  1 sibling, 0 replies; 11+ messages in thread
From: Artyom Tarasenko @ 2015-08-25 16:53 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Aurelien Jarno, Mark Cave-Ayland, qemu-devel, Dennis Luehring

On Tue, Aug 25, 2015 at 4:25 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 08/24/2015 11:44 PM, Artyom Tarasenko wrote:
>>
>> This is very surprising: the patch should have no effect on a sun4u
>> machine.
>
>
> Er, no, it should.  The primary vector by which I expect improvement is via
> not encoding dmmu.mmu_primary_context into the TB flags.  I.e. ASI_DMMU,
> which sun4u certainly uses.
>
> The fact that the patch _also_ fixes a sun4v problem is secondary.

Sorry, my bad, I haven't noticed that.

Applied it on top of the tcg-indirect branch, but see no measurable impact:
my reference g++ run still takes ~ 17 minutes.

Artyom

-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
  2015-08-25 14:37       ` Dennis Luehring
@ 2015-08-25 18:09         ` Richard Henderson
  2015-08-25 19:03           ` Dennis Luehring
  2015-08-25 19:17           ` Dennis Luehring
  0 siblings, 2 replies; 11+ messages in thread
From: Richard Henderson @ 2015-08-25 18:09 UTC (permalink / raw)
  To: Dennis Luehring, Artyom Tarasenko
  Cc: Mark Cave-Ayland, qemu-devel, Aurelien Jarno

On 08/25/2015 07:37 AM, Dennis Luehring wrote:
> Am 25.08.2015 um 16:25 schrieb Richard Henderson:
>> Er, no, it should.  The primary vector by which I expect improvement is via not
>> encoding dmmu.mmu_primary_context into the TB flags.  I.e. ASI_DMMU, which
>> sun4u certainly uses.
>>
>> The fact that the patch_also_  fixes a sun4v problem is secondary.
>
> please, can you(or someone else) give me a feedback about my tests/numbers -
> and the relevance of them - the stream benchmarks results seems to be worser
> then before and the compilespeed is just a little bit better - so i don't understand (at
> all) what problems are fixed or what is improved now

The fact that stream degraded means that stream is unreliable as a benchmark. 
I suspect that if you simply run it N times with the exact same setup you'll 
see a very large variance in its runtime.

This particular patch cannot possibly have degraded performance, as it could 
only result in a reduction, not expansion, of the number of TBs created.

As to why stream should be unreliable, I have no clue.

> - the compilation test is still 180 times slower then on my host

I'll have to compare that test vs an Alpha guest and see what I get.  I only 
remember one factor of 10, not two...

But you're right, it would be nice to put together a coherent set of 
benchmarks.  Ideally, a guest kernel plus minimal ramdisk with the tests 
pre-loaded so that we can boot and run ./benchmark at the prompt.  That's
the sort of thing we can easily upload to the wiki and share.


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
  2015-08-25 18:09         ` Richard Henderson
@ 2015-08-25 19:03           ` Dennis Luehring
  2015-08-25 19:17           ` Dennis Luehring
  1 sibling, 0 replies; 11+ messages in thread
From: Dennis Luehring @ 2015-08-25 19:03 UTC (permalink / raw)
  To: Richard Henderson, Artyom Tarasenko
  Cc: Mark Cave-Ayland, qemu-devel, Aurelien Jarno

Am 25.08.2015 um 20:09 schrieb Richard Henderson:
> On 08/25/2015 07:37 AM, Dennis Luehring wrote:
> > Am 25.08.2015 um 16:25 schrieb Richard Henderson:
> >> Er, no, it should.  The primary vector by which I expect improvement is via not
> >> encoding dmmu.mmu_primary_context into the TB flags.  I.e. ASI_DMMU, which
> >> sun4u certainly uses.
> >>
> >> The fact that the patch_also_  fixes a sun4v problem is secondary.
> >
> > please, can you(or someone else) give me a feedback about my tests/numbers -
> > and the relevance of them - the stream benchmarks results seems to be worser
> > then before and the compilespeed is just a little bit better - so i don't understand (at
> > all) what problems are fixed or what is improved now
>
> The fact that stream degraded means that stream is unreliable as a benchmark.
> I suspect that if you simply run it N times with the exact same setup you'll
> see a very large variance in its runtime.
>
> This particular patch cannot possibly have degraded performance, as it could
> only result in a reduction, not expansion, of the number of TBs created.
>
> As to why stream should be unreliable, I have no clue.

6 runs - 6 times nearly the same result (and the stream benchmark itself 
seems not to be an unknown https://www.cs.virginia.edu/stream/ - 
measures sustainable memory bandwidth vs. FPU performance)

run 1#
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             278.3     0.576045     0.574946     0.581186
Scale:            181.5     0.888582     0.881669     0.900648
Add:              217.6     1.109354     1.102955     1.123495
Triad:            167.7     1.440939     1.430755     1.463517
run 2#
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             277.8     0.577607     0.575970     0.582532
Scale:            181.4     0.909480     0.882134     1.058552
Add:              217.5     1.110417     1.103327     1.122539
Triad:            167.5     1.444383     1.432864     1.477904
run 3#
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             278.3     0.586721     0.574839     0.655187
Scale:            181.7     0.889060     0.880544     0.898155
Add:              217.3     1.115113     1.104248     1.146618
Triad:            167.6     1.480999     1.432066     1.748302
run 4#
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             276.7     0.580837     0.578262     0.585253
Scale:            180.6     0.891853     0.885707     0.895370
Add:              216.5     1.116623     1.108630     1.126520
Triad:            167.1     1.444834     1.435996     1.451557
run 5#
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:             278.3     0.593767     0.574839     0.689366
Scale:            182.0     0.897183     0.879005     0.938262
Add:              217.7     1.132244     1.102195     1.203082
Triad:            167.4     1.444530     1.434112     1.487601


> > - the compilation test is still 180 times slower then on my host
>
> I'll have to compare that test vs an Alpha guest and see what I get.  I only
> remember one factor of 10, not two...
>
> But you're right, it would be nice to put together a coherent set of
> benchmarks.  Ideally, a guest kernel plus minimal ramdisk with the tests
> pre-loaded so that we can boot and run ./benchmark at the prompt.  That's
> the sort of thing we can easily upload to the wiki and share.

any idea what memory bandwidth benchmark i could use
somthing on this list http://lbs.sourceforge.net/ ?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
  2015-08-25 18:09         ` Richard Henderson
  2015-08-25 19:03           ` Dennis Luehring
@ 2015-08-25 19:17           ` Dennis Luehring
  1 sibling, 0 replies; 11+ messages in thread
From: Dennis Luehring @ 2015-08-25 19:17 UTC (permalink / raw)
  To: Richard Henderson, Artyom Tarasenko
  Cc: Mark Cave-Ayland, qemu-devel, Aurelien Jarno

Am 25.08.2015 um 20:09 schrieb Richard Henderson:
> But you're right, it would be nice to put together a coherent set of
> benchmarks.  Ideally, a guest kernel plus minimal ramdisk with the tests
> pre-loaded so that we can boot and run ./benchmark at the prompt.  That's
> the sort of thing we can easily upload to the wiki and share.

i've found these benchmarks in NetBSDs benchmarks packages

http://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/benchmarks/README.html

hint.serial-98.06.12 
<http://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/benchmarks/hint/README.html>: 
Scalable benchmark for testing CPU and memory performance
nbench-2.2.3 
<http://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/benchmarks/nbench/README.html>: 
Benchmark tool for CPU, FPU and memory
ramspeed-2.6.0 
<http://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/benchmarks/ramspeed/README.html>: 
RAMspeed, a cache and memory benchmarking tool

would allow to do benchmarking with NetBSD 6.1.5 under native x64, 
qemu-amd64, qemu-sparc64 and qemu-alpha
based on more or less the "same" OS/source

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-08-25 19:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-25  4:19 [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags Richard Henderson
2015-08-25  5:45 ` Dennis Luehring
2015-08-25  6:44   ` Artyom Tarasenko
2015-08-25  7:46     ` Dennis Luehring
2015-08-25 14:25     ` Richard Henderson
2015-08-25 14:37       ` Dennis Luehring
2015-08-25 18:09         ` Richard Henderson
2015-08-25 19:03           ` Dennis Luehring
2015-08-25 19:17           ` Dennis Luehring
2015-08-25 16:53       ` Artyom Tarasenko
2015-08-25  6:35 ` Artyom Tarasenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).