From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christian Ehrhardt Date: Thu, 09 Oct 2008 09:35:05 +0000 Subject: Re: exit timing analysis v1 - comments&discussions welcome Message-Id: <48EDD049.1000709@linux.vnet.ibm.com> List-Id: References: <48DA0747.3020004@linux.vnet.ibm.com> In-Reply-To: <48DA0747.3020004@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit To: kvm-ppc@vger.kernel.org I modified the code according to your comments and my ideas, the new values are shown in column impISF (irq delivery, Stat, FindFirstBit) I changed some code of the statistic updating and the interrupt delivery and got this: base - impirq (d3) - impstat (d5) - impboth - impISF a) 12.57% - 11.13% - 12.05% - 11.03% - 12.28% exit, saving guest state (booke_interrupt.S) b) 7.37% - 9.38% - 8.69% - 8.07% - 10.13% reaching kvmppc_handle_exit c) 7.38% - 7.20% - 7.49% - 9.78% - 7.85% syscall exit is checked and a interrupt is queued using kvmppc_queue_exception d1) 2.49% - 3.39% - 2.56% - 3.30% - 3.70% some checks for all exits d2) 8.84% - 8.56% - 9.28% - 8.31% - 6.07% finding first bit in kvmppc_check_and_deliver_interrupts d3) 6.53% - 5.25% - 6.63% - 5.10% - 4.27% can_deliver in kvmppc_check_and_deliver_interrupts d4) 13.66% - 15.37% - 14.12% - 14.92% - 13.96% clear&deliver exception in kvmppc_check_and_deliver_interrupts d5) 3.65% - 4.57% - 2.68% - 4.41% - 3.77% updating kvm_stat statistics e) 6.55% - 6.30% - 6.30% - 5.89% - 6.74% returning from kvmppc_handle_exit to booke_interrupt.S f1) 30.90% - 28.78% - 30.16% - 29.16% - 31.19% restoring guest tlb f2) 4.81% - 4.77% - 5.06% - 4.66% - 5.17% restoring guest state ([s]regs) We all see the measurement inaccuracy, but the last columns look good at the improved sections d2, d3 and d4. I'll remove these detailed tracing soon and make a larger test hoping that this will not have the inaccuracy. But for now I still wonder about the ~14% for clear&deliver - that should just not be "that" much. It should be worth to look into that section once again more in detail first. Christian Ehrhardt wrote: > Hollis Blanchard wrote: >> On Wed, 2008-10-08 at 15:49 +0200, Christian Ehrhardt wrote: >> >>> Wondering about that 30.5% for postprocessing and >>> kvmppc_check_and_deliver_interrupts I quickly checked that in detail >>> - part d is now divided in 4 subparts. >>> I also looked at the return to guest path if the expected part >>> (restoring tlb) is really the main time eater there. The result >>> shows clearly that it is. >>> >>> more detailed breakdown: >>> a) 10.94% - exit, saving guest state (booke_interrupt.S) >>> b) 8.12% - reaching kvmppc_handle_exit >>> c) 7.59% - syscall exit is checked and a interrupt is queued >>> using kvmppc_queue_exception >>> d1) 3.33% - some checks for all exits >>> d2) 8.29% - finding first bit in kvmppc_check_and_deliver_interrupts >>> d3) 17.20% - can_deliver/clear&deliver exception in >>> kvmppc_check_and_deliver_interrupts >>> d4) 4.47% - updating kvm_stat statistics >>> e) 6.13% - returning from kvmppc_handle_exit to booke_interrupt.S >>> f1) 29.18% - restoring guest tlb >>> f2) 4.69% - restoring guest state ([s]regs) >>> >>> These fractions are % of our ~12µs syscall exit. >>> => restoring tlb on each reenter = 4µs constant overhead >>> => looking a bit into irq delivery and other constant things like >>> kvm_stat updating >>> >>> >> ... >> >>> Now I go for the TLB replacement in f1. >>> >> >> Hang on... does d3 make sense to you? It doesn't to me, and if there's a >> bug there it will be easier to fix than rewriting the TLB code. :) >> > I did not give up improving that part too :-) >> I think your core runs at 667MHz, right? So that's 1.5 ns/cycle. 17.20% >> of 12µs is 2064ns, or about 1300 cycles. (Check my math.) >> > I get the same results. 1% ~ 80 cycles. >> Now when I look at kvmppc_core_deliver_interrupts(), I'm not sure where >> that time is going. We're assuming the first_first_bit() loop usually >> executes once, for syscall. Does it actually execute more than that? I >> don't expect any of kvmppc_can_deliver_interrupt(), >> kvmppc_booke_clear_exception(), or kvmppc_booke_deliver_interrupt() to >> take lots of time. >> > You can see below that I already had a more detailed breakdown in my > old mail: > [...] > d2) 8.84% - 8.56% - 9.28% - 8.31% finding first bit in > kvmppc_check_and_deliver_interrupts > d3) 6.53% - 5.25% - 6.63% - 5.10% can_deliver in > kvmppc_check_and_deliver_interrupts > d4) 13.66% - 15.37% - 14.12% - 14.92% clear&deliver > exception in kvmppc_check_and_deliver_interrupts > [...] >> Could it be cache effects? exception_priority[] and priority_exception[] >> are 16 bytes each, and our L1 cacheline is 32 bytes, so they should both >> fit into one... except they're not aligned. >> > I would be so happy if I would have hardware performance counters like > cache misses :-) >> Also, it looks like we use the generic find_first_bit(). That may be >> more expensive than we'd like. However, since >> vcpu->arch.pending_exceptions is a single long (not an arbitrary sized >> bitfield), we should be able to use ffs() instead, which has an >> optimized PowerPC implementation. That might help a lot. >> > good idea. > I'll check this and some other small improvements I have in mind. > >> We might even be able to replace find_next_bit() too, by shifting a mask >> over each loop, but I don't think we'll have to, since I expect the >> common case to be we can deliver the first pending exception. (Worth >> checking? :) >> > I'm not sure. It's surely worth checking how often that second > find_next_bit is called. > If that number is far too small it's not worth. > -- Grüsse / regards, Christian Ehrhardt IBM Linux Technology Center, Open Virtualization