* PVnize instructions
@ 2010-06-21 15:10 Alexander Graf
2010-06-22 20:30 ` Alexander Graf
0 siblings, 1 reply; 2+ messages in thread
From: Alexander Graf @ 2010-06-21 15:10 UTC (permalink / raw)
To: kvm-ppc
I figured I go and try to find out what the emulation distribution is in
random use cases. The one I measured here was a:
$ for i in `seq 1000`; do ls -la > /dev/null; done
inside the guest. This should give pretty good hints on process spawning
overhead. Below are the results on what is issued most often.
Number of invocations | Opcode in hex | OP | XOP | asm name | sprn
parameter | sprn name
00488520 2101346470 OP: 31 XOP: 83 mfmsr
00487702 1275068452 OP: 19 XOP: 18 rfid
00244822 2108900006 OP: 31 XOP: 339 mfspr 275
SPRN_SPRG3
00244799 2107310758 OP: 31 XOP: 339 mfspr 27
SPR_SRR1
00243110 2103116710 OP: 31 XOP: 467 mtspr 27
SPR_SRR1
00242910 2107245478 OP: 31 XOP: 467 mtspr 26
SPR_SRR0
00242854 2105148070 OP: 31 XOP: 339 mfspr 26
SPR_SRR0
00206254 2101412196 OP: 31 XOP: 178 mtmsrd
00163540 2103509348 OP: 31 XOP: 178 mtmsrd
00162348 2108769190 OP: 31 XOP: 467 mtspr 273
SPRN_SPRG1
00158986 2100380326 OP: 31 XOP: 339 mfspr 273
SPRN_SPRG1
00142246 2080375332 OP: 31 XOP: 274 tlbiel
00122541 2107311014 OP: 31 XOP: 467 mtspr 27
SPR_SRR1
00122527 2105148326 OP: 31 XOP: 467 mtspr 26
SPR_SRR0
00089577 2102592166 OP: 31 XOP: 339 mfspr 19
SPR_DAR
00089562 2102526630 OP: 31 XOP: 339 mfspr 18
SPR_DSISR
00082629 2103443622 OP: 31 XOP: 83 mfmsr
00080937 2098922406 OP: 31 XOP: 467 mtspr 27
SPR_SRR1
00080937 2096759718 OP: 31 XOP: 467 mtspr 26
SPR_SRR0
00054759 2080393764 OP: 31 XOP: 274 tlbiel
00042033 2080440676 OP: 31 XOP: 178 mtmsrd
00042013 2080374950 OP: 31 XOP: 83 mfmsr
00040733 2099315044 OP: 31 XOP: 178 mtmsrd
00039939 2081817254 OP: 31 XOP: 339 mfspr 22
SPR_DECR
00039401 2088829284 OP: 31 XOP: 178 mtmsrd
00039386 2088436646 OP: 31 XOP: 467 mtspr 27
SPR_SRR1
00039377 2088763558 OP: 31 XOP: 83 mfmsr
00039343 2086273958 OP: 31 XOP: 467 mtspr 26
SPR_SRR0
Obviously we could PV mfmsr. Most of the mfmsr and mtmsrs can also be
easily replaced by stda/lda to a negative address with a magic page.
Rfid is pretty much impossible, mtmsrd is _very_ difficult without more
logic inside the guest. The only way around tlbiel would be a queuing
invalidation mechanism - and I doubt that's possible as the kernel
expects the page to be gone instantly.
Overall, this looks pretty promising though. Apparently > 60% of the
emulated instructions can be pretty easily patched to non-emulated ones.
So this is definitely the next low hanging performance fruit to get!
Alex
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: PVnize instructions
2010-06-21 15:10 PVnize instructions Alexander Graf
@ 2010-06-22 20:30 ` Alexander Graf
0 siblings, 0 replies; 2+ messages in thread
From: Alexander Graf @ 2010-06-22 20:30 UTC (permalink / raw)
To: kvm-ppc
Alexander Graf wrote:
> I figured I go and try to find out what the emulation distribution is in
> random use cases. The one I measured here was a:
>
> $ for i in `seq 1000`; do ls -la > /dev/null; done
>
> inside the guest. This should give pretty good hints on process spawning
> overhead. Below are the results on what is issued most often.
>
> Number of invocations | Opcode in hex | OP | XOP | asm name | sprn
> parameter | sprn name
>
> 00488520 2101346470 OP: 31 XOP: 83 mfmsr
> 00487702 1275068452 OP: 19 XOP: 18 rfid
> 00244822 2108900006 OP: 31 XOP: 339 mfspr 275
> SPRN_SPRG3
> 00244799 2107310758 OP: 31 XOP: 339 mfspr 27
> SPR_SRR1
> 00243110 2103116710 OP: 31 XOP: 467 mtspr 27
> SPR_SRR1
> 00242910 2107245478 OP: 31 XOP: 467 mtspr 26
> SPR_SRR0
> 00242854 2105148070 OP: 31 XOP: 339 mfspr 26
> SPR_SRR0
> 00206254 2101412196 OP: 31 XOP: 178 mtmsrd
> 00163540 2103509348 OP: 31 XOP: 178 mtmsrd
> 00162348 2108769190 OP: 31 XOP: 467 mtspr 273
> SPRN_SPRG1
> 00158986 2100380326 OP: 31 XOP: 339 mfspr 273
> SPRN_SPRG1
> 00142246 2080375332 OP: 31 XOP: 274 tlbiel
> 00122541 2107311014 OP: 31 XOP: 467 mtspr 27
> SPR_SRR1
> 00122527 2105148326 OP: 31 XOP: 467 mtspr 26
> SPR_SRR0
> 00089577 2102592166 OP: 31 XOP: 339 mfspr 19
> SPR_DAR
> 00089562 2102526630 OP: 31 XOP: 339 mfspr 18
> SPR_DSISR
> 00082629 2103443622 OP: 31 XOP: 83 mfmsr
> 00080937 2098922406 OP: 31 XOP: 467 mtspr 27
> SPR_SRR1
> 00080937 2096759718 OP: 31 XOP: 467 mtspr 26
> SPR_SRR0
> 00054759 2080393764 OP: 31 XOP: 274 tlbiel
> 00042033 2080440676 OP: 31 XOP: 178 mtmsrd
> 00042013 2080374950 OP: 31 XOP: 83 mfmsr
> 00040733 2099315044 OP: 31 XOP: 178 mtmsrd
> 00039939 2081817254 OP: 31 XOP: 339 mfspr 22
> SPR_DECR
> 00039401 2088829284 OP: 31 XOP: 178 mtmsrd
> 00039386 2088436646 OP: 31 XOP: 467 mtspr 27
> SPR_SRR1
> 00039377 2088763558 OP: 31 XOP: 83 mfmsr
> 00039343 2086273958 OP: 31 XOP: 467 mtspr 26
> SPR_SRR0
>
>
> Obviously we could PV mfmsr. Most of the mfmsr and mtmsrs can also be
> easily replaced by stda/lda to a negative address with a magic page.
> Rfid is pretty much impossible, mtmsrd is _very_ difficult without more
> logic inside the guest. The only way around tlbiel would be a queuing
> invalidation mechanism - and I doubt that's possible as the kernel
> expects the page to be gone instantly.
>
> Overall, this looks pretty promising though. Apparently > 60% of the
> emulated instructions can be pretty easily patched to non-emulated ones.
> So this is definitely the next low hanging performance fruit to get!
>
After optimizing the above instructions away, I end up with the
following results:
00682354 OP: 19 XOP: 18 rfid
00237114 OP: 31 XOP: 178 mtmsrd
00232302 OP: 31 XOP: 339 mfspr SPR_DECR
00231933 OP: 31 XOP: 178 mtmsrd
00231036 OP: 31 XOP: 178 mtmsrd
00217262 OP: 31 XOP: 274 tlbiel
00132854 OP: 31 XOP: 178 mtmsrd
00112533 OP: 31 XOP: 178 mtmsrd
00037159 OP: 31 XOP: 274 tlbiel
00036223 OP: 31 XOP: 178 mtmsrd
00019193 OP: 31 XOP: 566 tlbsync
00019171 OP: 31 XOP: 306 tlbie
00007915 OP: 31 XOP: 402 slbmte
00007738 OP: 31 XOP: 434 slbie
00004772 OP: 31 XOP: 467 mtspr SPR_DECR
I'm not sure how to properly deal with that. Rfid is usually called for
a good reason, we need to exit the guest for that. All decrementor magic
needs to be handled by KVM too. I can certainly nop out tlbsync, I guess
I'll just do that. Mtmsrd should be possible to squeeze into the VM, but
it's definitely not as easy as patching a single instruction somewhere.
This needs serious logic.
So for now the only low hanging fruit I can see is tlbsync. Expect a
nice patchset soon!
Alex
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2010-06-22 20:30 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-21 15:10 PVnize instructions Alexander Graf
2010-06-22 20:30 ` Alexander Graf
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.