* [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
@ 2015-08-25 4:19 Richard Henderson
2015-08-25 5:45 ` Dennis Luehring
2015-08-25 6:35 ` Artyom Tarasenko
0 siblings, 2 replies; 11+ messages in thread
From: Richard Henderson @ 2015-08-25 4:19 UTC (permalink / raw)
To: qemu-devel; +Cc: aurelien, mark.cave-ayland, dl.soluz, atar4qemu
Doing this instead of saving the raw PS_PRIV and TL. This means
that all nucleus mode TBs (TL > 0) can be shared. This fixes a
bug in that we didn't include HS_PRIV in the TB flags, and so could
produce incorrect TB matches for hypervisor state.
The LSU and DMMU states were unused by the translator. Including
them in TB flags meant unnecessary mismatches from tb_find_fast.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
Artyom and Dennis, I'm hoping that this will help with some of your
translation performance problems. I don't currently have a sparc64
kernel set up for booting, but I did smoke test this with openbios,
and even there it reduced the number of TBs created.
r~
---
target-sparc/cpu.h | 26 ++++++++++++--------------
target-sparc/translate.c | 2 +-
2 files changed, 13 insertions(+), 15 deletions(-)
diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h
index 0522b65..23773f4 100644
--- a/target-sparc/cpu.h
+++ b/target-sparc/cpu.h
@@ -694,34 +694,32 @@ void cpu_tick_set_limit(CPUTimer *timer, uint64_t limit);
trap_state* cpu_tsptr(CPUSPARCState* env);
#endif
-#define TB_FLAG_FPU_ENABLED (1 << 4)
-#define TB_FLAG_AM_ENABLED (1 << 5)
+#define TB_FLAG_MMU_MASK 7
+#define TB_FLAG_FPU_ENABLED (1 << 4)
+#define TB_FLAG_AM_ENABLED (1 << 5)
static inline void cpu_get_tb_cpu_state(CPUSPARCState *env, target_ulong *pc,
- target_ulong *cs_base, int *flags)
+ target_ulong *cs_base, int *pflags)
{
+ int flags;
*pc = env->pc;
*cs_base = env->npc;
+ flags = cpu_mmu_index(env);
#ifdef TARGET_SPARC64
- // AM . Combined FPU enable bits . PRIV . DMMU enabled . IMMU enabled
- *flags = (env->pstate & PS_PRIV) /* 2 */
- | ((env->lsu & (DMMU_E | IMMU_E)) >> 2) /* 1, 0 */
- | ((env->tl & 0xff) << 8)
- | (env->dmmu.mmu_primary_context << 16); /* 16... */
if (env->pstate & PS_AM) {
- *flags |= TB_FLAG_AM_ENABLED;
+ flags |= TB_FLAG_AM_ENABLED;
}
- if ((env->def->features & CPU_FEATURE_FLOAT) && (env->pstate & PS_PEF)
+ if ((env->def->features & CPU_FEATURE_FLOAT)
+ && (env->pstate & PS_PEF)
&& (env->fprs & FPRS_FEF)) {
- *flags |= TB_FLAG_FPU_ENABLED;
+ flags |= TB_FLAG_FPU_ENABLED;
}
#else
- // FPU enable . Supervisor
- *flags = env->psrs;
if ((env->def->features & CPU_FEATURE_FLOAT) && env->psref) {
- *flags |= TB_FLAG_FPU_ENABLED;
+ flags |= TB_FLAG_FPU_ENABLED;
}
#endif
+ *pflags = flags;
}
static inline bool tb_fpu_enabled(int tb_flags)
diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index 48fc2ab..8254a30 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -5234,7 +5234,7 @@ static inline void gen_intermediate_code_internal(SPARCCPU *cpu,
last_pc = dc->pc;
dc->npc = (target_ulong) tb->cs_base;
dc->cc_op = CC_OP_DYNAMIC;
- dc->mem_idx = cpu_mmu_index(env);
+ dc->mem_idx = tb->flags & TB_FLAG_MMU_MASK;
dc->def = env->def;
dc->fpu_enabled = tb_fpu_enabled(tb->flags);
dc->address_mask_32bit = tb_am_enabled(tb->flags);
--
2.4.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
2015-08-25 4:19 [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags Richard Henderson
@ 2015-08-25 5:45 ` Dennis Luehring
2015-08-25 6:44 ` Artyom Tarasenko
2015-08-25 6:35 ` Artyom Tarasenko
1 sibling, 1 reply; 11+ messages in thread
From: Dennis Luehring @ 2015-08-25 5:45 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland, aurelien, atar4qemu
Am 25.08.2015 um 06:19 schrieb Richard Henderson:
> Artyom and Dennis, I'm hoping that this will help with some of your
> translation performance problems. I don't currently have a sparc64
> kernel set up for booting, but I did smoke test this with openbios,
> and even there it reduced the number of TBs created.
i don't really can say it improves something - but maybe im just not the
right person to interpret the numbers
your patch gives the worst result in stream benchmark but nearly the
best in pugixml compile times and prime.c runtime
every tried patch or branch nearly halfs the speed of the stream
benchmark comapred to qemu-git-master
legende:
tcg-indirect => git://github.com/rth7680/qemu.git tcg-indirect
without-optimization => qemu-git-master + undefine USE_TCG_OPTIMIZATIONS
build flags for gcc are unrelevant because im using always the same
system:
Ubuntu 15.04 x64 Host, NetBSD SPARC64 guest running from ramdisk (to
reduce io noise)
pugixml.cpp buildtime:
build: g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x
-c -MMD -MP
results:
tcg-indirect: ~2:46.5
qemu.org-git: ~2:51.2 !!WORST!!
without-optimization: ~2:14.1 !!BEST!!
patch-Store-mmu-index-in-TB-flags: 2:38.4
prime.c runtime (Aurelien Jarno sysbench extracted int primer)
build: gcc prime.c -o prime.out -lm)
results:
tcg-indirect: ~9.3 sec !!BEST!!
qemu-git-master: ~11 sec !!WORST!!
without-optimization: ~9.9 sec
patch-Store-mmu-index-in-TB-flags: ~9.7sec
stream (https://www.cs.virginia.edu/stream/)
build: gcc stream.c -o stream.out -lm)
tcg-indirect:
Function Best Rate MB/s Avg time Min time Max time
Copy: 320.8 0.511297 0.498785 0.590214
Scale: 187.0 0.858693 0.855465 0.863527
Add: 218.2 1.104654 1.099698 1.110341
Triad: 169.5 1.433273 1.416321 1.502248
qemu-git-master: !!BEST!!
Function Best Rate MB/s Avg time Min time Max time
Copy: 771.5 0.214717 0.207377 0.244214
Scale: 288.1 0.573320 0.555401 0.660161
Add: 423.5 0.633523 0.566661 1.092067
Triad: 242.9 1.053032 0.987970 1.499563
without-optimization:
Function Best Rate MB/s Avg time Min time Max time
Copy: 316.6 0.524065 0.505313 0.580103
Scale: 200.5 0.813356 0.798024 0.840986
Add: 243.9 1.010247 0.984025 1.119149
Triad: 182.9 1.345601 1.312236 1.427459
patch-Store-mmu-index-in-TB-flags: !!WORST!!
Function Best Rate MB/s Avg time Min time Max time
Copy: 276.2 0.585113 0.579193 0.595607
Scale: 181.3 0.895991 0.882744 0.917396
Add: 215.9 1.126226 1.111750 1.174236
Triad: 167.0 1.469888 1.436790 1.523211
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
2015-08-25 4:19 [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags Richard Henderson
2015-08-25 5:45 ` Dennis Luehring
@ 2015-08-25 6:35 ` Artyom Tarasenko
1 sibling, 0 replies; 11+ messages in thread
From: Artyom Tarasenko @ 2015-08-25 6:35 UTC (permalink / raw)
To: Richard Henderson
Cc: Aurelien Jarno, Mark Cave-Ayland, qemu-devel, Dennis Luehring
Hi Richard,
On Tue, Aug 25, 2015 at 6:19 AM, Richard Henderson <rth@twiddle.net> wrote:
> Doing this instead of saving the raw PS_PRIV and TL. This means
> that all nucleus mode TBs (TL > 0) can be shared. This fixes a
> bug in that we didn't include HS_PRIV in the TB flags, and so could
> produce incorrect TB matches for hypervisor state.
>
> The LSU and DMMU states were unused by the translator. Including
> them in TB flags meant unnecessary mismatches from tb_find_fast.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
>
> ---
> Artyom and Dennis, I'm hoping that this will help with some of your
> translation performance problems. I don't currently have a sparc64
> kernel set up for booting, but I did smoke test this with openbios,
> and even there it reduced the number of TBs created.
This patch indeed fixes a bug in sun4v emulation, and we'll need it once
we have a working sun4v machine (currently qemu doesn't implement
a minimal sun4v machine, for instance there is no sun4v mmu).
I haven't tried it, but it's unlikely it would impact the
sun4u emulation we are currently using for the tests:
the sun4u machine doesn't have a hypervisor.
Artyom
> ---
> target-sparc/cpu.h | 26 ++++++++++++--------------
> target-sparc/translate.c | 2 +-
> 2 files changed, 13 insertions(+), 15 deletions(-)
>
> diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h
> index 0522b65..23773f4 100644
> --- a/target-sparc/cpu.h
> +++ b/target-sparc/cpu.h
> @@ -694,34 +694,32 @@ void cpu_tick_set_limit(CPUTimer *timer, uint64_t limit);
> trap_state* cpu_tsptr(CPUSPARCState* env);
> #endif
>
> -#define TB_FLAG_FPU_ENABLED (1 << 4)
> -#define TB_FLAG_AM_ENABLED (1 << 5)
> +#define TB_FLAG_MMU_MASK 7
> +#define TB_FLAG_FPU_ENABLED (1 << 4)
> +#define TB_FLAG_AM_ENABLED (1 << 5)
>
> static inline void cpu_get_tb_cpu_state(CPUSPARCState *env, target_ulong *pc,
> - target_ulong *cs_base, int *flags)
> + target_ulong *cs_base, int *pflags)
> {
> + int flags;
> *pc = env->pc;
> *cs_base = env->npc;
> + flags = cpu_mmu_index(env);
> #ifdef TARGET_SPARC64
> - // AM . Combined FPU enable bits . PRIV . DMMU enabled . IMMU enabled
> - *flags = (env->pstate & PS_PRIV) /* 2 */
> - | ((env->lsu & (DMMU_E | IMMU_E)) >> 2) /* 1, 0 */
> - | ((env->tl & 0xff) << 8)
> - | (env->dmmu.mmu_primary_context << 16); /* 16... */
> if (env->pstate & PS_AM) {
> - *flags |= TB_FLAG_AM_ENABLED;
> + flags |= TB_FLAG_AM_ENABLED;
> }
> - if ((env->def->features & CPU_FEATURE_FLOAT) && (env->pstate & PS_PEF)
> + if ((env->def->features & CPU_FEATURE_FLOAT)
> + && (env->pstate & PS_PEF)
> && (env->fprs & FPRS_FEF)) {
> - *flags |= TB_FLAG_FPU_ENABLED;
> + flags |= TB_FLAG_FPU_ENABLED;
> }
> #else
> - // FPU enable . Supervisor
> - *flags = env->psrs;
> if ((env->def->features & CPU_FEATURE_FLOAT) && env->psref) {
> - *flags |= TB_FLAG_FPU_ENABLED;
> + flags |= TB_FLAG_FPU_ENABLED;
> }
> #endif
> + *pflags = flags;
> }
>
> static inline bool tb_fpu_enabled(int tb_flags)
> diff --git a/target-sparc/translate.c b/target-sparc/translate.c
> index 48fc2ab..8254a30 100644
> --- a/target-sparc/translate.c
> +++ b/target-sparc/translate.c
> @@ -5234,7 +5234,7 @@ static inline void gen_intermediate_code_internal(SPARCCPU *cpu,
> last_pc = dc->pc;
> dc->npc = (target_ulong) tb->cs_base;
> dc->cc_op = CC_OP_DYNAMIC;
> - dc->mem_idx = cpu_mmu_index(env);
> + dc->mem_idx = tb->flags & TB_FLAG_MMU_MASK;
> dc->def = env->def;
> dc->fpu_enabled = tb_fpu_enabled(tb->flags);
> dc->address_mask_32bit = tb_am_enabled(tb->flags);
> --
> 2.4.3
>
--
Regards,
Artyom Tarasenko
SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
2015-08-25 5:45 ` Dennis Luehring
@ 2015-08-25 6:44 ` Artyom Tarasenko
2015-08-25 7:46 ` Dennis Luehring
2015-08-25 14:25 ` Richard Henderson
0 siblings, 2 replies; 11+ messages in thread
From: Artyom Tarasenko @ 2015-08-25 6:44 UTC (permalink / raw)
To: Dennis Luehring
Cc: Mark Cave-Ayland, qemu-devel, Aurelien Jarno, Richard Henderson
Hi Dennis,
On Tue, Aug 25, 2015 at 7:45 AM, Dennis Luehring <dl.soluz@gmx.net> wrote:
> Am 25.08.2015 um 06:19 schrieb Richard Henderson:
>>
>> Artyom and Dennis, I'm hoping that this will help with some of your
>> translation performance problems. I don't currently have a sparc64
>> kernel set up for booting, but I did smoke test this with openbios,
>> and even there it reduced the number of TBs created.
>
>
> i don't really can say it improves something - but maybe im just not the
> right person to interpret the numbers
>
> your patch gives the worst result in stream benchmark but nearly the best in
> pugixml compile times and prime.c runtime
> every tried patch or branch nearly halfs the speed of the stream benchmark
> comapred to qemu-git-master
This is very surprising: the patch should have no effect on a sun4u machine.
Have you applied it to the master or some other branch?
Have you pulled the master branch recently? Maybe there was another
change affecting the performance?
Artyom
> legende:
> tcg-indirect => git://github.com/rth7680/qemu.git tcg-indirect
> without-optimization => qemu-git-master + undefine USE_TCG_OPTIMIZATIONS
> build flags for gcc are unrelevant because im using always the same
>
> system:
> Ubuntu 15.04 x64 Host, NetBSD SPARC64 guest running from ramdisk (to reduce
> io noise)
>
> pugixml.cpp buildtime:
> build: g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c
> -MMD -MP
>
> results:
> tcg-indirect: ~2:46.5
> qemu.org-git: ~2:51.2 !!WORST!!
> without-optimization: ~2:14.1 !!BEST!!
> patch-Store-mmu-index-in-TB-flags: 2:38.4
>
> prime.c runtime (Aurelien Jarno sysbench extracted int primer)
> build: gcc prime.c -o prime.out -lm)
>
> results:
> tcg-indirect: ~9.3 sec !!BEST!!
> qemu-git-master: ~11 sec !!WORST!!
> without-optimization: ~9.9 sec
> patch-Store-mmu-index-in-TB-flags: ~9.7sec
>
> stream (https://www.cs.virginia.edu/stream/)
>
> build: gcc stream.c -o stream.out -lm)
>
> tcg-indirect:
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 320.8 0.511297 0.498785 0.590214
> Scale: 187.0 0.858693 0.855465 0.863527
> Add: 218.2 1.104654 1.099698 1.110341
> Triad: 169.5 1.433273 1.416321 1.502248
>
> qemu-git-master: !!BEST!!
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 771.5 0.214717 0.207377 0.244214
> Scale: 288.1 0.573320 0.555401 0.660161
> Add: 423.5 0.633523 0.566661 1.092067
> Triad: 242.9 1.053032 0.987970 1.499563
>
> without-optimization:
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 316.6 0.524065 0.505313 0.580103
> Scale: 200.5 0.813356 0.798024 0.840986
> Add: 243.9 1.010247 0.984025 1.119149
> Triad: 182.9 1.345601 1.312236 1.427459
>
> patch-Store-mmu-index-in-TB-flags: !!WORST!!
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 276.2 0.585113 0.579193 0.595607
> Scale: 181.3 0.895991 0.882744 0.917396
> Add: 215.9 1.126226 1.111750 1.174236
> Triad: 167.0 1.469888 1.436790 1.523211
>
--
Regards,
Artyom Tarasenko
SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
2015-08-25 6:44 ` Artyom Tarasenko
@ 2015-08-25 7:46 ` Dennis Luehring
2015-08-25 14:25 ` Richard Henderson
1 sibling, 0 replies; 11+ messages in thread
From: Dennis Luehring @ 2015-08-25 7:46 UTC (permalink / raw)
To: Artyom Tarasenko
Cc: Mark Cave-Ayland, qemu-devel, Aurelien Jarno, Richard Henderson
Am 25.08.2015 um 08:44 schrieb Artyom Tarasenko:
>> >your patch gives the worst result in stream benchmark but nearly the best in
>> >pugixml compile times and prime.c runtime
>> >every tried patch or branch nearly halfs the speed of the stream benchmark
>> >comapred to qemu-git-master
> This is very surprising: the patch should have no effect on a sun4u machine.
> Have you applied it to the master or some other branch?
> Have you pulled the master branch recently? Maybe there was another
> change affecting the performance?
i've completely removed my git qemu folder and freshly cloned the
qemu-master, applied the patch
and rechecked if applied - and these are my numbers
i always remove my qemu-master (i always use master, other branch or
clean master + patch) and build completely and im always using the same
settings, remadisk etc. for compilation and benchmarking
and its not realy surprising - there are ~5 people in the talk - each
with different ideas where the slowness
comes from and all use different or non formalized "bechmark-suits"
(like your combination or my 3 tests) -
each test i've made seems to give wired or suprising results - so my
conclusion is: no one realy knows what it is and where it
comes from - and as long as there is no equal benchmark-suite (for
example NetBSD + the 3 tests) it will go on to be
surprising or wired when i post results
Example:
at first it was - your RAM is full, your system is swapping, your
harddisk is slow etc. talks with "Artyom Tarasenko", "Aurelien Jarno"
and some others
- none of these are a problem - i've got more then enough RAM and CPU
power in my host and free in the guest, and using a ramdisk for the
image make IO less noisy
"Aurelien Jarno" said it could be the 32bit userland in the my debian
7.8 SPARC64 system - and showed numbers with prime.c that proves it
i've rechecked that and came to the same results and switched over to
NetBSD SPARC64 (a pure 64bit system) that make prime.c the fastest
but that does not realy reduce the pugixml compile times (my host needs
3sek, NetBSD takes ~3minutes, building cmake need ~10 hours or longer)
then someone said it could be IO - so i put the NetBSD image on a
ramdisk - helped a little
then "Karel Gardas" got the idea that the compilation process is primary
memory bound - so asked me to use the stream-benchmark - i've posting
results on every change
and i still don't know if the numbers im getting from the benchmark are
relevant in any way (no one realy replies to them) - but they seems to
be very relevant
then i've tested the branch from tgc-indirect branch - prime.c get a
little better, stream get slower
the last patch from Richard Henderson gives still unclear results -
prime.c get a little better, stream get the slowest
the next thing i will do is a complete script based qemu-compilation and
benchmark run in my NetBSD image - then the human-factor is down to 0%
and the
only source of suprising/wired results is my host-hardware
is threre any interest in my NetBSD image (or the installation process)?
(to have a change to get to similar results in the differences)
should i add some other tests?
what is usualy in use for performance tests? still no answer on that
question
im ready and happy to compile/run all your got/want :)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
2015-08-25 6:44 ` Artyom Tarasenko
2015-08-25 7:46 ` Dennis Luehring
@ 2015-08-25 14:25 ` Richard Henderson
2015-08-25 14:37 ` Dennis Luehring
2015-08-25 16:53 ` Artyom Tarasenko
1 sibling, 2 replies; 11+ messages in thread
From: Richard Henderson @ 2015-08-25 14:25 UTC (permalink / raw)
To: Artyom Tarasenko, Dennis Luehring
Cc: Mark Cave-Ayland, qemu-devel, Aurelien Jarno
On 08/24/2015 11:44 PM, Artyom Tarasenko wrote:
> This is very surprising: the patch should have no effect on a sun4u machine.
Er, no, it should. The primary vector by which I expect improvement is via not
encoding dmmu.mmu_primary_context into the TB flags. I.e. ASI_DMMU, which
sun4u certainly uses.
The fact that the patch _also_ fixes a sun4v problem is secondary.
r~
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
2015-08-25 14:25 ` Richard Henderson
@ 2015-08-25 14:37 ` Dennis Luehring
2015-08-25 18:09 ` Richard Henderson
2015-08-25 16:53 ` Artyom Tarasenko
1 sibling, 1 reply; 11+ messages in thread
From: Dennis Luehring @ 2015-08-25 14:37 UTC (permalink / raw)
To: Richard Henderson, Artyom Tarasenko
Cc: Mark Cave-Ayland, qemu-devel, Aurelien Jarno
Am 25.08.2015 um 16:25 schrieb Richard Henderson:
> Er, no, it should. The primary vector by which I expect improvement is via not
> encoding dmmu.mmu_primary_context into the TB flags. I.e. ASI_DMMU, which
> sun4u certainly uses.
>
> The fact that the patch_also_ fixes a sun4v problem is secondary.
please, can you(or someone else) give me a feedback about my
tests/numbers - and the relevance of them - the stream benchmarks
results seems to be worser then before
and the compilespeed is just a little bit better - so i don't understand
(at all) what problems are fixed or what is improved now - the
compilation test is still 180 times
slower then on my host
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
2015-08-25 14:25 ` Richard Henderson
2015-08-25 14:37 ` Dennis Luehring
@ 2015-08-25 16:53 ` Artyom Tarasenko
1 sibling, 0 replies; 11+ messages in thread
From: Artyom Tarasenko @ 2015-08-25 16:53 UTC (permalink / raw)
To: Richard Henderson
Cc: Aurelien Jarno, Mark Cave-Ayland, qemu-devel, Dennis Luehring
On Tue, Aug 25, 2015 at 4:25 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 08/24/2015 11:44 PM, Artyom Tarasenko wrote:
>>
>> This is very surprising: the patch should have no effect on a sun4u
>> machine.
>
>
> Er, no, it should. The primary vector by which I expect improvement is via
> not encoding dmmu.mmu_primary_context into the TB flags. I.e. ASI_DMMU,
> which sun4u certainly uses.
>
> The fact that the patch _also_ fixes a sun4v problem is secondary.
Sorry, my bad, I haven't noticed that.
Applied it on top of the tcg-indirect branch, but see no measurable impact:
my reference g++ run still takes ~ 17 minutes.
Artyom
--
Regards,
Artyom Tarasenko
SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
2015-08-25 14:37 ` Dennis Luehring
@ 2015-08-25 18:09 ` Richard Henderson
2015-08-25 19:03 ` Dennis Luehring
2015-08-25 19:17 ` Dennis Luehring
0 siblings, 2 replies; 11+ messages in thread
From: Richard Henderson @ 2015-08-25 18:09 UTC (permalink / raw)
To: Dennis Luehring, Artyom Tarasenko
Cc: Mark Cave-Ayland, qemu-devel, Aurelien Jarno
On 08/25/2015 07:37 AM, Dennis Luehring wrote:
> Am 25.08.2015 um 16:25 schrieb Richard Henderson:
>> Er, no, it should. The primary vector by which I expect improvement is via not
>> encoding dmmu.mmu_primary_context into the TB flags. I.e. ASI_DMMU, which
>> sun4u certainly uses.
>>
>> The fact that the patch_also_ fixes a sun4v problem is secondary.
>
> please, can you(or someone else) give me a feedback about my tests/numbers -
> and the relevance of them - the stream benchmarks results seems to be worser
> then before and the compilespeed is just a little bit better - so i don't understand (at
> all) what problems are fixed or what is improved now
The fact that stream degraded means that stream is unreliable as a benchmark.
I suspect that if you simply run it N times with the exact same setup you'll
see a very large variance in its runtime.
This particular patch cannot possibly have degraded performance, as it could
only result in a reduction, not expansion, of the number of TBs created.
As to why stream should be unreliable, I have no clue.
> - the compilation test is still 180 times slower then on my host
I'll have to compare that test vs an Alpha guest and see what I get. I only
remember one factor of 10, not two...
But you're right, it would be nice to put together a coherent set of
benchmarks. Ideally, a guest kernel plus minimal ramdisk with the tests
pre-loaded so that we can boot and run ./benchmark at the prompt. That's
the sort of thing we can easily upload to the wiki and share.
r~
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
2015-08-25 18:09 ` Richard Henderson
@ 2015-08-25 19:03 ` Dennis Luehring
2015-08-25 19:17 ` Dennis Luehring
1 sibling, 0 replies; 11+ messages in thread
From: Dennis Luehring @ 2015-08-25 19:03 UTC (permalink / raw)
To: Richard Henderson, Artyom Tarasenko
Cc: Mark Cave-Ayland, qemu-devel, Aurelien Jarno
Am 25.08.2015 um 20:09 schrieb Richard Henderson:
> On 08/25/2015 07:37 AM, Dennis Luehring wrote:
> > Am 25.08.2015 um 16:25 schrieb Richard Henderson:
> >> Er, no, it should. The primary vector by which I expect improvement is via not
> >> encoding dmmu.mmu_primary_context into the TB flags. I.e. ASI_DMMU, which
> >> sun4u certainly uses.
> >>
> >> The fact that the patch_also_ fixes a sun4v problem is secondary.
> >
> > please, can you(or someone else) give me a feedback about my tests/numbers -
> > and the relevance of them - the stream benchmarks results seems to be worser
> > then before and the compilespeed is just a little bit better - so i don't understand (at
> > all) what problems are fixed or what is improved now
>
> The fact that stream degraded means that stream is unreliable as a benchmark.
> I suspect that if you simply run it N times with the exact same setup you'll
> see a very large variance in its runtime.
>
> This particular patch cannot possibly have degraded performance, as it could
> only result in a reduction, not expansion, of the number of TBs created.
>
> As to why stream should be unreliable, I have no clue.
6 runs - 6 times nearly the same result (and the stream benchmark itself
seems not to be an unknown https://www.cs.virginia.edu/stream/ -
measures sustainable memory bandwidth vs. FPU performance)
run 1#
Function Best Rate MB/s Avg time Min time Max time
Copy: 278.3 0.576045 0.574946 0.581186
Scale: 181.5 0.888582 0.881669 0.900648
Add: 217.6 1.109354 1.102955 1.123495
Triad: 167.7 1.440939 1.430755 1.463517
run 2#
Function Best Rate MB/s Avg time Min time Max time
Copy: 277.8 0.577607 0.575970 0.582532
Scale: 181.4 0.909480 0.882134 1.058552
Add: 217.5 1.110417 1.103327 1.122539
Triad: 167.5 1.444383 1.432864 1.477904
run 3#
Function Best Rate MB/s Avg time Min time Max time
Copy: 278.3 0.586721 0.574839 0.655187
Scale: 181.7 0.889060 0.880544 0.898155
Add: 217.3 1.115113 1.104248 1.146618
Triad: 167.6 1.480999 1.432066 1.748302
run 4#
Function Best Rate MB/s Avg time Min time Max time
Copy: 276.7 0.580837 0.578262 0.585253
Scale: 180.6 0.891853 0.885707 0.895370
Add: 216.5 1.116623 1.108630 1.126520
Triad: 167.1 1.444834 1.435996 1.451557
run 5#
Function Best Rate MB/s Avg time Min time Max time
Copy: 278.3 0.593767 0.574839 0.689366
Scale: 182.0 0.897183 0.879005 0.938262
Add: 217.7 1.132244 1.102195 1.203082
Triad: 167.4 1.444530 1.434112 1.487601
> > - the compilation test is still 180 times slower then on my host
>
> I'll have to compare that test vs an Alpha guest and see what I get. I only
> remember one factor of 10, not two...
>
> But you're right, it would be nice to put together a coherent set of
> benchmarks. Ideally, a guest kernel plus minimal ramdisk with the tests
> pre-loaded so that we can boot and run ./benchmark at the prompt. That's
> the sort of thing we can easily upload to the wiki and share.
any idea what memory bandwidth benchmark i could use
somthing on this list http://lbs.sourceforge.net/ ?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags
2015-08-25 18:09 ` Richard Henderson
2015-08-25 19:03 ` Dennis Luehring
@ 2015-08-25 19:17 ` Dennis Luehring
1 sibling, 0 replies; 11+ messages in thread
From: Dennis Luehring @ 2015-08-25 19:17 UTC (permalink / raw)
To: Richard Henderson, Artyom Tarasenko
Cc: Mark Cave-Ayland, qemu-devel, Aurelien Jarno
Am 25.08.2015 um 20:09 schrieb Richard Henderson:
> But you're right, it would be nice to put together a coherent set of
> benchmarks. Ideally, a guest kernel plus minimal ramdisk with the tests
> pre-loaded so that we can boot and run ./benchmark at the prompt. That's
> the sort of thing we can easily upload to the wiki and share.
i've found these benchmarks in NetBSDs benchmarks packages
http://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/benchmarks/README.html
hint.serial-98.06.12
<http://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/benchmarks/hint/README.html>:
Scalable benchmark for testing CPU and memory performance
nbench-2.2.3
<http://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/benchmarks/nbench/README.html>:
Benchmark tool for CPU, FPU and memory
ramspeed-2.6.0
<http://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/benchmarks/ramspeed/README.html>:
RAMspeed, a cache and memory benchmarking tool
would allow to do benchmarking with NetBSD 6.1.5 under native x64,
qemu-amd64, qemu-sparc64 and qemu-alpha
based on more or less the "same" OS/source
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-08-25 19:17 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-25 4:19 [Qemu-devel] [PATCH] target-sparc: Store mmu index in TB flags Richard Henderson
2015-08-25 5:45 ` Dennis Luehring
2015-08-25 6:44 ` Artyom Tarasenko
2015-08-25 7:46 ` Dennis Luehring
2015-08-25 14:25 ` Richard Henderson
2015-08-25 14:37 ` Dennis Luehring
2015-08-25 18:09 ` Richard Henderson
2015-08-25 19:03 ` Dennis Luehring
2015-08-25 19:17 ` Dennis Luehring
2015-08-25 16:53 ` Artyom Tarasenko
2015-08-25 6:35 ` Artyom Tarasenko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).