From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christian Ehrhardt Date: Tue, 07 Oct 2008 14:36:41 +0000 Subject: Re: exit timing analysis v1 - comments&discussions welcome Message-Id: <48EB73F9.6010006@linux.vnet.ibm.com> List-Id: References: <48DA0747.3020004@linux.vnet.ibm.com> In-Reply-To: <48DA0747.3020004@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable To: kvm-ppc@vger.kernel.org As intended I separated the time of one of our exits into it's segments=20 to see where we loose the constant base time. I looked at the syscall exit which is pretty fast and does not deviate=20 very much. A little tracing and debugging later I can now show the rough breakdown=20 where the time is spent for that exit: avg time spent in the segments I checked: a) 11.5% - exit, saving guest state (booke_interrupt.S) b) 8.5% - reaching kvmppc_handle_exit c) 10% - syscall exit is checked and a interrupt is queued using=20 kvmppc_queue_exception d) 30.5% - postprocessing for all exits =3D=20 kvmppc_check_and_deliver_interrupts and updating some statistics e) 5.5% - returning from kvmppc_handle_exit to booke_interrupt.S f) 34% - restoring guest state tlb/regs (booke_interrupt.S) Ok, looking at these numbers I think we have two areas to check in detail. - On one hand the post processing part (d) actually should not take that=20 much time. - On the other hand we might be able to improve the last segment by=20 chanign our tlb management (Liu made a suggestion on the list two weeks ago) When thinking about this breakdown while having the complete timing in=20 mind (see below) shows us that the average time spent for one of our=20 most frequent exit is near the minimum duration (EMUL). This means if we=20 can reduce some of the constant overehad this might really help us in=20 those frequent and hot paths (also birt tlb misses will benefit from that). I continue on that and keep you all updated. Christian Christian Ehrhardt wrote: > Today I refactored my code in a way that it might eventually get=20 > upstream. > That means I removed some debug blocks, made it configurable and print=20 > output only on request per sysrq. > Feel free to review that code now, otherwise wait for my separate rfc=20 > post that will follow ~next week. > > Additionally I realized in a discussion that my host system was still=20 > configured in demo mode (X11+xfce, dhcpd, avahi, ... running in host). > I took some updated numbers without all those background activities -=20 > testacase again boot, login, ls, kill. > > I intend to the irq path in detail as suggested and first of all I'll=20 > try to get some timing of the "segments" of such a call. > This should show us where we loose our "constant base time" and give=20 > me some hints when looking at it in detail. > > FYI - new numbers without high background load (monospace recommended): > NOPV: > MMIO count 10105 min 46 max 1534=20 > sum 2952467 avg 292.1788 stddev 295.933 % 2.04 > DCR count 5428 min 40 max 209=20 > sum 246168 avg 45.3515 stddev 6.758 % 0.17 > SIGNAL count 695 min 65 max 3755=20 > sum 89767 avg 129.1612 stddev 314.421 % 0.06 > ITLBREAL count 80045 min 13 max 108=20 > sum 1063127 avg 13.2816 stddev 2.338 % 0.73 > ITLBVIRT count 1000585 min 21 max 264827=20 > sum 24300596 avg 24.2864 stddev 264.753 % 16.78 > DTLBREAL count 91206 min 13 max 69=20 > sum 1285214 avg 14.0913 stddev 2.225 % 0.89 > DTLBVIRT count 977434 min 21 max 1446=20 > sum 24007008 avg 24.5613 stddev 4.426 % 16.58 > SYSCALL count 10460 min 11 max 55=20 > sum 116447 avg 11.1326 stddev 1.929 % 0.08 > ISI count 11724 min 11 max 61=20 > sum 130007 avg 11.0890 stddev 1.929 % 0.09 > DSI count 20737 min 11 max 57=20 > sum 230009 avg 11.0917 stddev 1.914 % 0.16 > EMULINST count 5683356 min 11 max 3778=20 > sum 79339467 avg 13.9600 stddev 50.275 % 54.78 > DEC count 13079 min 50 max 826=20 > sum 732712 avg 56.0220 stddev 22.382 % 0.51 > EXTINT count 55 min 30 max 1478=20 > sum 10996 avg 199.9273 stddev 238.150 % 0.01 > FP_UNAVAIL count 280 min 11 max 53=20 > sum 3163 avg 11.2964 stddev 3.495 % 0.00 > TIMEINGUEST count 7905189 min 0 max 3688=20 > sum 10330742 avg 1.3068 stddev 8.970 % 7.13 > csum 15810378 =20 > sumsum 144837890 =3D> ~2:24 runtime > =20 > sumavg 7241894 > > PV: > MMIO count 12505 min 46 max 3087=20 > sum 3693782 avg 295.3844 stddev 260.788 % 4.01 > DCR count 5595 min 40 max 706=20 > sum 273578 avg 48.8969 stddev 31.305 % 0.30 > SIGNAL count 654 min 65 max 4132=20 > sum 300027 avg 458.7569 stddev 571.130 % 0.33 > ITLBREAL count 71711 min 13 max 104=20 > sum 943053 avg 13.1507 stddev 2.360 % 1.02 > ITLBVIRT count 750649 min 21 max 1503=20 > sum 18178245 avg 24.2167 stddev 7.335 % 19.71 > DTLBREAL count 83356 min 13 max 102=20 > sum 1146242 avg 13.7512 stddev 2.406 % 1.24 > DTLBPV count 30086 min 20 max 237=20 > sum 653556 avg 21.7229 stddev 4.639 % 0.71 > DTLBVIRT count 772811 min 21 max 713=20 > sum 19079477 avg 24.6884 stddev 6.593 % 20.69 > SYSCALL count 7647 min 11 max 57=20 > sum 84821 avg 11.0921 stddev 1.897 % 0.09 > HCALL count 1 min 19 max 19=20 > sum 19 avg 19.0000 stddev 0.000 % 0.00 > ISI count 9895 min 11 max 73=20 > sum 109667 avg 11.0831 stddev 1.904 % 0.12 > DSI count 17974 min 10 max 57=20 > sum 199504 avg 11.0996 stddev 2.046 % 0.22 > EMULINST count 2567245 min 11 max 4212=20 > sum 40501314 avg 15.7762 stddev 65.673 % 43.92 > DEC count 7488 min 51 max 641=20 > sum 426813 avg 56.9996 stddev 23.893 % 0.46 > EXTINT count 2215 min 31 max 1677=20 > sum 297495 avg 134.3093 stddev 116.219 % 0.32 > FP_UNAVAIL count 258 min 11 max 11=20 > sum 2838 avg 11.0000 stddev 0.000 % 0.00 > TIMEINGUEST count 4340090 min 0 max 3850=20 > sum 6316079 avg 1.4553 stddev 12.599 % 6.85 > csum 8680180 =20 > sumsum 92206510 =3D> ~1:32 runtime > =20 > sumavg 4610325 > > Hollis Blanchard wrote: >> On Thu, 2008-09-25 at 17:32 +0800, Liu Yu-B13201 wrote: >>>> On Wed, 2008-09-24 at 11:24 +0200, Christian Ehrhardt wrote: >>>>> count min max =20 >>>>> sum avg stddev time% >>>>> EMULINST 6,423,699 12 247,582 =20 >>>> 91,732,705 =20 >>>>> 14.2804 241.200 45.36 >>>>> ITLBVIRT 1,777,242 21 264,257 =20 >>>> 47,116,557 =20 >>>>> 26.5111 286.040 23.30 >>>>> DTLBVIRT 1,544,241 22 263,947 =20 >>>> 41,765,447 =20 >>>>> 27.0459 218.997 20.65 >>>> (The max here is of course when the guest was de-scheduled in favor of >>>> another host process.) >>>> >>>> I think it's interesting that the min instruction emulation time is 12 >>>> usecs. In fact, our "null" exits where we do almost no processing=20 >>>> (ISI, >>>> DSI, syscall) are 11 usecs, which suggests we have a problem with >>>> interrupt handler overhead (for all exit types). >>>> >>> Will you consider about moving tlb manipulation out of entering path? >>> Examining the modify array may cost some time. >> >> Yes, definitely. >> >> That's about the only thing I can see in the 440 interrupt path that >> might take significant time. Unfortunately we can't profile that code >> because we have no performance counters, so finding the problem will >> require some experimentation. > > --=20 Gr=FCsse / regards,=20 Christian Ehrhardt IBM Linux Technology Center, Open Virtualization