qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] tests with simulated memory
@ 2003-06-19 15:02 Johan Rydberg
  2003-06-19 16:33 ` Fabrice Bellard
  0 siblings, 1 reply; 4+ messages in thread
From: Johan Rydberg @ 2003-06-19 15:02 UTC (permalink / raw)
  To: qemu-devel

Hi,

I've hacked a bit on QEMU and added simulated memory using a 
translation cache as we discussed earlier.  These tests are 
mostly for my own interest, but you might find them interesting
aswell.

Below is the result of the BYTEmark [1] benchmark (I do not have 
access to any of the SPEC benchmarks) using simulated memory:

NUMERIC SORT:  Iterations/sec.: 10.385957       Index: 0.268406 
STRING SORT:   Iterations/sec.: 1.807970        Index: 0.794712
BITFIELD:      Iterations/sec.: 1913364.293577  Index: 0.328203
FP EMULATION:  Iterations/sec.: 0.647669        Index: 0.311379
FOURIER:       Iterations/sec.: 619.696756      Index: 0.701681
ASSIGNMENT:    Iterations/sec.: 0.151103        Index: 0.575697
IDEA:          Iterations/sec.: 21.734410       Index: 0.332534
HUFFMAN:       Iterations/sec.: 13.068599       Index: 0.363168

Same test without the simulated memory (original QEMU):

NUMERIC SORT:  Iterations/sec.: 20.327522       Index: 0.525327
STRING SORT:   Iterations/sec.: 2.919430        Index: 1.283266
BITFIELD:      Iterations/sec.: 3086647.786244  Index: 0.529458
FP EMULATION:  Iterations/sec.: 1.112348        Index: 0.534783
FOURIER:       Iterations/sec.: 717.791439      Index: 0.812754
ASSIGNMENT:    Iterations/sec.: 0.208943        Index: 0.796063
IDEA:          Iterations/sec.: 39.651108       Index: 0.606657
HUFFMAN:       Iterations/sec.: 19.098677       Index: 0.530740

Slowdown rates (calculated using the Index field from original 
QEMU divided with the Index from the QEMU w/ simulated memory):

NUMERIC SORT:  1.96
STRING SORT:   1.61
BITFIELD:      1.61
FP EMULATION:  1.72
FOURIER:       1.16
ASSIGNMENT:    1.38
IDEA:          1.82
HUFFMAN:       1.46

The slowdown would be greater if any processing must have been
done on every cache miss.  The current hack just adds the page
to the cache and does the memory transaction and returns.

A slowdown between 1.16x and ~2x is pretty good I think.

As reference, below is the results for Valgrind (CVS version,
running the none skin):

NUMERIC SORT:  Iterations/sec.: 36.541455       Index: 0.944346
STRING SORT:   Iterations/sec.: 2.181686        Index: 0.958983
BITFIELD:      Iterations/sec.: 6294984.678336  Index: 1.079789
FP EMULATION:  Iterations/sec.: 2.232148        Index: 1.073148
FOURIER:       Iterations/sec.: 746.055296      Index: 0.844757
ASSIGNMENT:    Iterations/sec.: 0.386720        Index: 1.473388
IDEA:          Iterations/sec.: 80.463770       Index: 1.231086
HUFFMAN:       Iterations/sec.: 43.592067       Index: 1.211396

 [1] http://www.byte.com/bmark/bmark.htm

best regards,
johan

-- 
Johan Rydberg, Free Software Developer, Sweden
http://rtmk.sf.net | http://www.nongnu.org/guss/

Playing Track No09

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] tests with simulated memory
  2003-06-19 15:02 [Qemu-devel] tests with simulated memory Johan Rydberg
@ 2003-06-19 16:33 ` Fabrice Bellard
  2003-06-19 17:31   ` Johan Rydberg
  0 siblings, 1 reply; 4+ messages in thread
From: Fabrice Bellard @ 2003-06-19 16:33 UTC (permalink / raw)
  To: qemu-devel

Some days ago I looked for a benchmark to publish QEMU objective results 
(gzip is not enough!). The BYTEmark you mention seems a good start. SPEC 
benchmarks are less interesting since their source code cannot be easily 
distributed.

I would have expected the slowdown to be more important. In the code you 
submitted you did not include alignment tests. Hopefully by just anding 
the address with '0xffff0003' you do both address translation _and_ 
unaligned access handling...

It seems that Valgrind is twice as fast on most tests. Some 
optimisations will be needed in qemu to correct that :-)

Fabrice.

Johan Rydberg wrote:
> Hi,
> 
> I've hacked a bit on QEMU and added simulated memory using a 
> translation cache as we discussed earlier.  These tests are 
> mostly for my own interest, but you might find them interesting
> aswell.
> 
> Below is the result of the BYTEmark [1] benchmark (I do not have 
> access to any of the SPEC benchmarks) using simulated memory:
> 
> NUMERIC SORT:  Iterations/sec.: 10.385957       Index: 0.268406 
> STRING SORT:   Iterations/sec.: 1.807970        Index: 0.794712
> BITFIELD:      Iterations/sec.: 1913364.293577  Index: 0.328203
> FP EMULATION:  Iterations/sec.: 0.647669        Index: 0.311379
> FOURIER:       Iterations/sec.: 619.696756      Index: 0.701681
> ASSIGNMENT:    Iterations/sec.: 0.151103        Index: 0.575697
> IDEA:          Iterations/sec.: 21.734410       Index: 0.332534
> HUFFMAN:       Iterations/sec.: 13.068599       Index: 0.363168
> 
> Same test without the simulated memory (original QEMU):
> 
> NUMERIC SORT:  Iterations/sec.: 20.327522       Index: 0.525327
> STRING SORT:   Iterations/sec.: 2.919430        Index: 1.283266
> BITFIELD:      Iterations/sec.: 3086647.786244  Index: 0.529458
> FP EMULATION:  Iterations/sec.: 1.112348        Index: 0.534783
> FOURIER:       Iterations/sec.: 717.791439      Index: 0.812754
> ASSIGNMENT:    Iterations/sec.: 0.208943        Index: 0.796063
> IDEA:          Iterations/sec.: 39.651108       Index: 0.606657
> HUFFMAN:       Iterations/sec.: 19.098677       Index: 0.530740
> 
> Slowdown rates (calculated using the Index field from original 
> QEMU divided with the Index from the QEMU w/ simulated memory):
> 
> NUMERIC SORT:  1.96
> STRING SORT:   1.61
> BITFIELD:      1.61
> FP EMULATION:  1.72
> FOURIER:       1.16
> ASSIGNMENT:    1.38
> IDEA:          1.82
> HUFFMAN:       1.46
> 
> The slowdown would be greater if any processing must have been
> done on every cache miss.  The current hack just adds the page
> to the cache and does the memory transaction and returns.
> 
> A slowdown between 1.16x and ~2x is pretty good I think.
> 
> As reference, below is the results for Valgrind (CVS version,
> running the none skin):
> 
> NUMERIC SORT:  Iterations/sec.: 36.541455       Index: 0.944346
> STRING SORT:   Iterations/sec.: 2.181686        Index: 0.958983
> BITFIELD:      Iterations/sec.: 6294984.678336  Index: 1.079789
> FP EMULATION:  Iterations/sec.: 2.232148        Index: 1.073148
> FOURIER:       Iterations/sec.: 746.055296      Index: 0.844757
> ASSIGNMENT:    Iterations/sec.: 0.386720        Index: 1.473388
> IDEA:          Iterations/sec.: 80.463770       Index: 1.231086
> HUFFMAN:       Iterations/sec.: 43.592067       Index: 1.211396
> 
>  [1] http://www.byte.com/bmark/bmark.htm
> 
> best regards,
> johan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] tests with simulated memory
  2003-06-19 16:33 ` Fabrice Bellard
@ 2003-06-19 17:31   ` Johan Rydberg
  2003-06-20 19:16     ` Gwenole Beauchesne
  0 siblings, 1 reply; 4+ messages in thread
From: Johan Rydberg @ 2003-06-19 17:31 UTC (permalink / raw)
  To: qemu-devel

Fabrice Bellard <fabrice.bellard@free.fr> wrote:

: Some days ago I looked for a benchmark to publish QEMU objective results 
: (gzip is not enough!). The BYTEmark you mention seems a good start. SPEC 
: benchmarks are less interesting since their source code cannot be easily 
: distributed.

Yes.  And the Linux nbench testsuite is based on BYTEmark.  I used the
sources from byte's website.  Maybe you should try nbench instead.

: I would have expected the slowdown to be more important. In the code you 
: submitted you did not include alignment tests. Hopefully by just anding 
: the address with '0xffff0003' you do both address translation _and_ 
: unaligned access handling...

I was a bit suprised aswell.  It though you would get a slowdown of 
something like 2x - 8x, esp since x86 programs does a lot of memory
accesses.

Regarding alignment checks. I'm not sure it is needed for the x86 platform 
though, since it supports unaligned memory accesses. (but there is some bit 
that enabled alignment checks, isn't there?)

: It seems that Valgrind is twice as fast on most tests. Some 
: optimisations will be needed in qemu to correct that :-)

Hehe.  I was a bit suprised by the result.  I thought QEMU would perform 
better, esp since it must spend less time decoding and more time executing 
than Valgrind (doing the register allocation + improvements of the micro 
operations isn't cheep).

-- 
Johan Rydberg, Free Software Developer, Sweden
http://rtmk.sf.net | http://www.nongnu.org/guss/

Playing Track No09

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] tests with simulated memory
  2003-06-19 17:31   ` Johan Rydberg
@ 2003-06-20 19:16     ` Gwenole Beauchesne
  0 siblings, 0 replies; 4+ messages in thread
From: Gwenole Beauchesne @ 2003-06-20 19:16 UTC (permalink / raw)
  To: qemu-devel

Hi,

> Yes.  And the Linux nbench testsuite is based on BYTEmark.  I used the
> sources from byte's website.  Maybe you should try nbench instead.

FWIW, I am using SSBENCH for my PPC emulator.

Bye,
Gwenole

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-06-20 19:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-19 15:02 [Qemu-devel] tests with simulated memory Johan Rydberg
2003-06-19 16:33 ` Fabrice Bellard
2003-06-19 17:31   ` Johan Rydberg
2003-06-20 19:16     ` Gwenole Beauchesne

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).