* Re: [Qemu-devel] Translation cache sizes
2006-04-08 3:13 [Qemu-devel] Translation cache sizes Julian Seward
@ 2006-04-08 6:30 ` Mulyadi Santosa
2006-04-08 10:27 ` Johannes Schindelin
2006-04-08 12:43 ` Gwenole Beauchesne
2006-04-08 13:16 ` Paul Brook
2 siblings, 1 reply; 6+ messages in thread
From: Mulyadi Santosa @ 2006-04-08 6:30 UTC (permalink / raw)
To: qemu-devel, Julian Seward
Hi Julian...
> Using qemu from cvs simulating x86-softmmu (no kqemu) on x86,
> booting SuSE 9.1 and getting to the xdm (kdm?) graphical login
> screen, requires making about 1088000 translations, and the
> translation cache is flushed 17 times. Booting is not too bad,
> but once user-mode starts to run the translation cache is pretty
> much hammered.
Reminds me when I booted FC2 default kernel (4G/4G VM split). Maybe I
suffer the same thing, that is tons of translations and cache flush
inside qemu.
Anyway, mind to share on how did you get the number? Putting such info
on qemu forum (http://qemu.dad-answers.com) will be great too since it
will let/encourage other (casual) user to tweak qemu.
> I made 2 changes:
>
> * increase CODE_GEN_BUFFER_SIZE from 16*1024*1024
> to 64*1024*1024,
I think, if speed is what user really need, he/she won't mind with the
extra 48 MB IMHO. Run qemu in non X environment, use serial output or
curses based display, the extra 48 MB won't hog too much.
> * observe that CODE_GEN_AVG_BLOCK_SIZE of 128
> for the softmmu case is too low; my measurements put it
> at about 247. So I changed it to 256.
so, you double it....that means, there is more than just the extra 48
MB?
> With those changes in place, the same boot-to-kdm process
> requires only about 570000 translations to be made, and 2
> cache flushes to happen. Of course the cost is an extra
> 48M of memory use.
Good to hear! Wow! Maybe we should made those constants configurable
(using ./configure script)?
regards
Mulyadi
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Translation cache sizes
2006-04-08 6:30 ` Mulyadi Santosa
@ 2006-04-08 10:27 ` Johannes Schindelin
0 siblings, 0 replies; 6+ messages in thread
From: Johannes Schindelin @ 2006-04-08 10:27 UTC (permalink / raw)
To: a_mulyadi, qemu-devel
Hi,
On Sat, 8 Apr 2006, Mulyadi Santosa wrote:
> > With those changes in place, the same boot-to-kdm process
> > requires only about 570000 translations to be made, and 2
> > cache flushes to happen. Of course the cost is an extra
> > 48M of memory use.
>
> Good to hear! Wow! Maybe we should made those constants configurable
> (using ./configure script)?
It might be an even better idea to make command line options out of it. I
know, I know, these are #define'd, and thus the buffer would have to be
malloc()ed, but it'd be nice to have the same binary running on several
computers (even those which cannot afford another 48M).
Ciao,
Dscho
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Translation cache sizes
2006-04-08 3:13 [Qemu-devel] Translation cache sizes Julian Seward
2006-04-08 6:30 ` Mulyadi Santosa
@ 2006-04-08 12:43 ` Gwenole Beauchesne
2006-04-08 13:10 ` Paul Brook
2006-04-08 13:16 ` Paul Brook
2 siblings, 1 reply; 6+ messages in thread
From: Gwenole Beauchesne @ 2006-04-08 12:43 UTC (permalink / raw)
To: qemu-devel
Hi,
> With those changes in place, the same boot-to-kdm process
> requires only about 570000 translations to be made, and 2
> cache flushes to happen. Of course the cost is an extra
> 48M of memory use.
I faced a similar problem in Basilisk II. MacOS 8.x had a tendency to
invalidate the code cache approx. 1000 times per second. My poor
K6-2/300 was suffering a lot. About 45% of the time was dedicated to
compilation of code, and desktop experience was very sluggish. Then, I
came up with a very simple idea I named "lazy cache flush". Performance
increased by 76% and compilation time dropped below 10%, desktop
experience was very smooth. I will give you more contemporary results
hereunder.
So what's lazy invalidation of the translation cache? Well, the goal is
simple: keep translated code as long as possible. In practise, you
invalidate the complete translation cache only when it is full. Other
explicit cache invalidation (CINV instructions on 68k, icbi on ppc,
etc.) is virtual. This means the code is kept but it is put in a
"dormant" state. That is, usual entry points (in the hash table, or
inter-block jumps) are redirected to a check/recovery code where the
source block is checksumed again. If it matches original's the
previously compiled code is brought back to life (restoration of entry
points in hash table, and inter-block links). Otherwise, it's
recompiled and new code is used.
It's very simple and quite efficient. Since, I had no need to increase
the translation cache beyond 8MB.
So, here are a few results on an Athlon64 3200+. Translation cache is
set to 8MB. The test consisted in booting to MacOS 8, running all
Speedometer 4 tests, then shuting down the virtual Mac.
* Without lazy flush:
Number of soft flushes: 0
Number of hard flushes: 101387
Number of checksums : 0
Number of calls to compile_block : 20244047
Total emulation time : 115,8 sec
Total compilation time : 59,4 sec (51,3%)
* With lazy flush:
Number of soft flushes: 405520
Number of hard flushes: 7
Number of checksums : 46545721
Number of calls to compile_block : 340104
Total emulation time : 66,1 sec
Total compilation time : 1,8 sec (2,8%)
The results speak by themselves. ;-)
Speedometer 4 "Performance Rating" increased by 12%. More
interestingly, Color QuickDraw tests improved by a 12x factor: scored
19.95 on average with lazy cache flush, 1.67 without.
Bye,
Gwenolé.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Translation cache sizes
2006-04-08 12:43 ` Gwenole Beauchesne
@ 2006-04-08 13:10 ` Paul Brook
0 siblings, 0 replies; 6+ messages in thread
From: Paul Brook @ 2006-04-08 13:10 UTC (permalink / raw)
To: qemu-devel
On Saturday 08 April 2006 13:43, Gwenole Beauchesne wrote:
> Hi,
>
> > With those changes in place, the same boot-to-kdm process
> > requires only about 570000 translations to be made, and 2
> > cache flushes to happen. Of course the cost is an extra
> > 48M of memory use.
>
> I faced a similar problem in Basilisk II. MacOS 8.x had a tendency to
> invalidate the code cache approx. 1000 times per second. My poor
> K6-2/300 was suffering a lot. About 45% of the time was dedicated to
> compilation of code, and desktop experience was very sluggish. Then, I
> came up with a very simple idea I named "lazy cache flush". Performance
> increased by 76% and compilation time dropped below 10%, desktop
> experience was very smooth. I will give you more contemporary results
> hereunder.
Qemu already does this. Initially it does it on a per-page basis (writes to a
given physical memory page will invalidate all code on that page), and for
frequently contested pages it does more fine-grained locking.
x86 doesn't have explicit icache invalidate instructions, the icache is
architecturally defined to be coherent after every jump instructions.
Paul
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Translation cache sizes
2006-04-08 3:13 [Qemu-devel] Translation cache sizes Julian Seward
2006-04-08 6:30 ` Mulyadi Santosa
2006-04-08 12:43 ` Gwenole Beauchesne
@ 2006-04-08 13:16 ` Paul Brook
2 siblings, 0 replies; 6+ messages in thread
From: Paul Brook @ 2006-04-08 13:16 UTC (permalink / raw)
To: qemu-devel
On Saturday 08 April 2006 04:13, Julian Seward wrote:
> Using qemu from cvs simulating x86-softmmu (no kqemu) on x86,
> booting SuSE 9.1 and getting to the xdm (kdm?) graphical login
> screen, requires making about 1088000 translations, and the
> translation cache is flushed 17 times. Booting is not too bad,
> but once user-mode starts to run the translation cache is pretty
> much hammered.
>
> I made 2 changes:
>
> * increase CODE_GEN_BUFFER_SIZE from 16*1024*1024
> to 64*1024*1024,
>
> * observe that CODE_GEN_AVG_BLOCK_SIZE of 128
> for the softmmu case is too low; my measurements put it
> at about 247. So I changed it to 256.
>
> With those changes in place, the same boot-to-kdm process
> requires only about 570000 translations to be made, and 2
> cache flushes to happen. Of course the cost is an extra
> 48M of memory use.
Did you measure any actual speedup from these changes?
In a typical linux boot there's a lot of new code run only once, so I'd expect
the tb cache to be hammered fairly heavily.
Paul
^ permalink raw reply [flat|nested] 6+ messages in thread