qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] Translation cache sizes
@ 2006-04-08  3:13 Julian Seward
  2006-04-08  6:30 ` Mulyadi Santosa
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Julian Seward @ 2006-04-08  3:13 UTC (permalink / raw)
  To: qemu-devel


Using qemu from cvs simulating x86-softmmu (no kqemu) on x86,
booting SuSE 9.1 and getting to the xdm (kdm?) graphical login
screen, requires making about 1088000 translations, and the
translation cache is flushed 17 times.  Booting is not too bad,
but once user-mode starts to run the translation cache is pretty
much hammered.

I made 2 changes: 

* increase CODE_GEN_BUFFER_SIZE from 16*1024*1024
  to 64*1024*1024, 

* observe that CODE_GEN_AVG_BLOCK_SIZE of 128
  for the softmmu case is too low; my measurements put it
  at about 247.  So I changed it to 256.

With those changes in place, the same boot-to-kdm process 
requires only about 570000 translations to be made, and 2 
cache flushes to happen.  Of course the cost is an extra
48M of memory use.

J

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] Translation cache sizes
  2006-04-08  3:13 [Qemu-devel] Translation cache sizes Julian Seward
@ 2006-04-08  6:30 ` Mulyadi Santosa
  2006-04-08 10:27   ` Johannes Schindelin
  2006-04-08 12:43 ` Gwenole Beauchesne
  2006-04-08 13:16 ` Paul Brook
  2 siblings, 1 reply; 6+ messages in thread
From: Mulyadi Santosa @ 2006-04-08  6:30 UTC (permalink / raw)
  To: qemu-devel, Julian Seward

Hi Julian...

> Using qemu from cvs simulating x86-softmmu (no kqemu) on x86,
> booting SuSE 9.1 and getting to the xdm (kdm?) graphical login
> screen, requires making about 1088000 translations, and the
> translation cache is flushed 17 times.  Booting is not too bad,
> but once user-mode starts to run the translation cache is pretty
> much hammered.

Reminds me when I booted FC2 default kernel (4G/4G VM split). Maybe I 
suffer the same thing, that is tons of translations and cache flush 
inside qemu.

Anyway, mind to share on how did you get the number? Putting such info 
on qemu forum (http://qemu.dad-answers.com) will be great too since it 
will let/encourage other (casual) user to tweak qemu.

> I made 2 changes:
>
> * increase CODE_GEN_BUFFER_SIZE from 16*1024*1024
>   to 64*1024*1024,

I think, if speed is what user really need, he/she won't mind with the 
extra 48 MB IMHO. Run qemu in non X environment, use serial output or 
curses based display, the extra 48 MB won't hog too much.

> * observe that CODE_GEN_AVG_BLOCK_SIZE of 128
>   for the softmmu case is too low; my measurements put it
>   at about 247.  So I changed it to 256.

so, you double it....that means, there is more than just the extra 48 
MB?

> With those changes in place, the same boot-to-kdm process
> requires only about 570000 translations to be made, and 2
> cache flushes to happen.  Of course the cost is an extra
> 48M of memory use.

Good to hear! Wow! Maybe we should  made those constants configurable 
(using ./configure script)?

regards

Mulyadi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] Translation cache sizes
  2006-04-08  6:30 ` Mulyadi Santosa
@ 2006-04-08 10:27   ` Johannes Schindelin
  0 siblings, 0 replies; 6+ messages in thread
From: Johannes Schindelin @ 2006-04-08 10:27 UTC (permalink / raw)
  To: a_mulyadi, qemu-devel

Hi,

On Sat, 8 Apr 2006, Mulyadi Santosa wrote:

> > With those changes in place, the same boot-to-kdm process
> > requires only about 570000 translations to be made, and 2
> > cache flushes to happen.  Of course the cost is an extra
> > 48M of memory use.
> 
> Good to hear! Wow! Maybe we should  made those constants configurable 
> (using ./configure script)?

It might be an even better idea to make command line options out of it. I 
know, I know, these are #define'd, and thus the buffer would have to be 
malloc()ed, but it'd be nice to have the same binary running on several 
computers (even those which cannot afford another 48M).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] Translation cache sizes
  2006-04-08  3:13 [Qemu-devel] Translation cache sizes Julian Seward
  2006-04-08  6:30 ` Mulyadi Santosa
@ 2006-04-08 12:43 ` Gwenole Beauchesne
  2006-04-08 13:10   ` Paul Brook
  2006-04-08 13:16 ` Paul Brook
  2 siblings, 1 reply; 6+ messages in thread
From: Gwenole Beauchesne @ 2006-04-08 12:43 UTC (permalink / raw)
  To: qemu-devel

Hi,

> With those changes in place, the same boot-to-kdm process
> requires only about 570000 translations to be made, and 2
> cache flushes to happen.  Of course the cost is an extra
> 48M of memory use.

I faced a similar problem in Basilisk II. MacOS 8.x had a tendency to 
invalidate the code cache approx. 1000 times per second. My poor 
K6-2/300 was suffering a lot. About 45% of the time was dedicated to 
compilation of code, and desktop experience was very sluggish. Then, I 
came up with a very simple idea I named "lazy cache flush". Performance 
increased by 76% and compilation time dropped below 10%, desktop 
experience was very smooth. I will give you more contemporary results 
hereunder.

So what's lazy invalidation of the translation cache? Well, the goal is 
simple: keep translated code as long as possible. In practise, you 
invalidate the complete translation cache only when it is full. Other 
explicit cache invalidation (CINV instructions on 68k, icbi on ppc, 
etc.) is virtual. This means the code is kept but it is put in a 
"dormant" state. That is, usual entry points (in the hash table, or 
inter-block jumps) are redirected to a check/recovery code where the 
source block is checksumed again. If it matches original's the 
previously compiled code is brought back to life (restoration of entry 
points in hash table, and inter-block links). Otherwise, it's 
recompiled and new code is used.

It's very simple and quite efficient. Since, I had no need to increase 
the translation cache beyond 8MB.

So, here are a few results on an Athlon64 3200+. Translation cache is 
set to 8MB. The test consisted in booting to MacOS 8, running all 
Speedometer 4 tests, then shuting down the virtual Mac.

* Without lazy flush:
Number of soft flushes: 0
Number of hard flushes: 101387
Number of checksums   : 0
Number of calls to compile_block : 20244047
Total emulation time   : 115,8 sec
Total compilation time : 59,4 sec (51,3%)

* With lazy flush:
Number of soft flushes: 405520
Number of hard flushes: 7
Number of checksums   : 46545721
Number of calls to compile_block : 340104
Total emulation time   : 66,1 sec
Total compilation time : 1,8 sec (2,8%)

The results speak by themselves. ;-)

Speedometer 4 "Performance Rating" increased by 12%. More 
interestingly, Color QuickDraw tests improved by a 12x factor: scored 
19.95 on average with lazy cache flush, 1.67 without.

Bye,
Gwenolé.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] Translation cache sizes
  2006-04-08 12:43 ` Gwenole Beauchesne
@ 2006-04-08 13:10   ` Paul Brook
  0 siblings, 0 replies; 6+ messages in thread
From: Paul Brook @ 2006-04-08 13:10 UTC (permalink / raw)
  To: qemu-devel

On Saturday 08 April 2006 13:43, Gwenole Beauchesne wrote:
> Hi,
>
> > With those changes in place, the same boot-to-kdm process
> > requires only about 570000 translations to be made, and 2
> > cache flushes to happen.  Of course the cost is an extra
> > 48M of memory use.
>
> I faced a similar problem in Basilisk II. MacOS 8.x had a tendency to
> invalidate the code cache approx. 1000 times per second. My poor
> K6-2/300 was suffering a lot. About 45% of the time was dedicated to
> compilation of code, and desktop experience was very sluggish. Then, I
> came up with a very simple idea I named "lazy cache flush". Performance
> increased by 76% and compilation time dropped below 10%, desktop
> experience was very smooth. I will give you more contemporary results
> hereunder.

Qemu already does this. Initially it does it on a per-page basis (writes to a 
given physical memory page will invalidate all code on that page), and for 
frequently contested pages it does more fine-grained locking.
x86 doesn't have explicit icache invalidate instructions, the icache is 
architecturally defined to be coherent after every jump instructions.

Paul

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] Translation cache sizes
  2006-04-08  3:13 [Qemu-devel] Translation cache sizes Julian Seward
  2006-04-08  6:30 ` Mulyadi Santosa
  2006-04-08 12:43 ` Gwenole Beauchesne
@ 2006-04-08 13:16 ` Paul Brook
  2 siblings, 0 replies; 6+ messages in thread
From: Paul Brook @ 2006-04-08 13:16 UTC (permalink / raw)
  To: qemu-devel

On Saturday 08 April 2006 04:13, Julian Seward wrote:
> Using qemu from cvs simulating x86-softmmu (no kqemu) on x86,
> booting SuSE 9.1 and getting to the xdm (kdm?) graphical login
> screen, requires making about 1088000 translations, and the
> translation cache is flushed 17 times.  Booting is not too bad,
> but once user-mode starts to run the translation cache is pretty
> much hammered.
>
> I made 2 changes:
>
> * increase CODE_GEN_BUFFER_SIZE from 16*1024*1024
>   to 64*1024*1024,
>
> * observe that CODE_GEN_AVG_BLOCK_SIZE of 128
>   for the softmmu case is too low; my measurements put it
>   at about 247.  So I changed it to 256.
>
> With those changes in place, the same boot-to-kdm process
> requires only about 570000 translations to be made, and 2
> cache flushes to happen.  Of course the cost is an extra
> 48M of memory use.

Did you measure any actual speedup from these changes?

In a typical linux boot there's a lot of new code run only once, so I'd expect 
the tb cache to be hammered fairly heavily.

Paul

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-04-08 13:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-08  3:13 [Qemu-devel] Translation cache sizes Julian Seward
2006-04-08  6:30 ` Mulyadi Santosa
2006-04-08 10:27   ` Johannes Schindelin
2006-04-08 12:43 ` Gwenole Beauchesne
2006-04-08 13:10   ` Paul Brook
2006-04-08 13:16 ` Paul Brook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).