From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
To: kvm-ppc@vger.kernel.org
Subject: Re: exit timing analysis v1 - comments&discussions welcome
Date: Thu, 09 Oct 2008 14:49:12 +0000 [thread overview]
Message-ID: <48EE19E8.9080002@linux.vnet.ibm.com> (raw)
In-Reply-To: <48DA0747.3020004@linux.vnet.ibm.com>
Ok, looking at clear&deliver in detail I made some minor changes, but
eventually it has no single point to improve.
Most of the time of the 14% in clear%deliver is lost in
kvmppc_mmu_priv_switch (~8%), the rest spread evenly among the code -
nothing obvious (maybe some cache misses though).
This code has to be touched anyway when going for a change in the
guest/host (shadow)TLB management.
Christian Ehrhardt wrote:
> I modified the code according to your comments and my ideas, the new
> values are shown in column impISF (irq delivery, Stat, FindFirstBit)
>
> I changed some code of the statistic updating and the interrupt
> delivery and got this:
> base - impirq (d3) - impstat (d5) - impboth - impISF
> a) 12.57% - 11.13% - 12.05% - 11.03% - 12.28% exit,
> saving guest state (booke_interrupt.S)
> b) 7.37% - 9.38% - 8.69% - 8.07% - 10.13% reaching
> kvmppc_handle_exit
> c) 7.38% - 7.20% - 7.49% - 9.78% - 7.85% syscall
> exit is checked and a interrupt is queued using kvmppc_queue_exception
> d1) 2.49% - 3.39% - 2.56% - 3.30% - 3.70% some
> checks for all exits
> d2) 8.84% - 8.56% - 9.28% - 8.31% - 6.07% finding
> first bit in kvmppc_check_and_deliver_interrupts
> d3) 6.53% - 5.25% - 6.63% - 5.10% - 4.27%
> can_deliver in kvmppc_check_and_deliver_interrupts
> d4) 13.66% - 15.37% - 14.12% - 14.92% - 13.96%
> clear&deliver exception in kvmppc_check_and_deliver_interrupts
> d5) 3.65% - 4.57% - 2.68% - 4.41% - 3.77% updating
> kvm_stat statistics
> e) 6.55% - 6.30% - 6.30% - 5.89% - 6.74% returning
> from kvmppc_handle_exit to booke_interrupt.S
> f1) 30.90% - 28.78% - 30.16% - 29.16% - 31.19% restoring
> guest tlb
> f2) 4.81% - 4.77% - 5.06% - 4.66% - 5.17% restoring
> guest state ([s]regs)
>
> We all see the measurement inaccuracy, but the last columns look good
> at the improved sections d2, d3 and d4.
> I'll remove these detailed tracing soon and make a larger test hoping
> that this will not have the inaccuracy.
> But for now I still wonder about the ~14% for clear&deliver - that
> should just not be "that" much.
> It should be worth to look into that section once again more in detail
> first.
>
> Christian Ehrhardt wrote:
>> Hollis Blanchard wrote:
>>> On Wed, 2008-10-08 at 15:49 +0200, Christian Ehrhardt wrote:
>>>
>>>> Wondering about that 30.5% for postprocessing and
>>>> kvmppc_check_and_deliver_interrupts I quickly checked that in
>>>> detail - part d is now divided in 4 subparts.
>>>> I also looked at the return to guest path if the expected part
>>>> (restoring tlb) is really the main time eater there. The result
>>>> shows clearly that it is.
>>>>
>>>> more detailed breakdown:
>>>> a) 10.94% - exit, saving guest state (booke_interrupt.S)
>>>> b) 8.12% - reaching kvmppc_handle_exit
>>>> c) 7.59% - syscall exit is checked and a interrupt is queued
>>>> using kvmppc_queue_exception
>>>> d1) 3.33% - some checks for all exits
>>>> d2) 8.29% - finding first bit in kvmppc_check_and_deliver_interrupts
>>>> d3) 17.20% - can_deliver/clear&deliver exception in
>>>> kvmppc_check_and_deliver_interrupts
>>>> d4) 4.47% - updating kvm_stat statistics
>>>> e) 6.13% - returning from kvmppc_handle_exit to booke_interrupt.S
>>>> f1) 29.18% - restoring guest tlb
>>>> f2) 4.69% - restoring guest state ([s]regs)
>>>>
>>>> These fractions are % of our ~12µs syscall exit.
>>>> => restoring tlb on each reenter = 4µs constant overhead
>>>> => looking a bit into irq delivery and other constant things like
>>>> kvm_stat updating
>>>>
>>>>
>>> ...
>>>
>>>> Now I go for the TLB replacement in f1.
>>>>
>>>
>>> Hang on... does d3 make sense to you? It doesn't to me, and if
>>> there's a
>>> bug there it will be easier to fix than rewriting the TLB code. :)
>>>
>> I did not give up improving that part too :-)
>>> I think your core runs at 667MHz, right? So that's 1.5 ns/cycle. 17.20%
>>> of 12µs is 2064ns, or about 1300 cycles. (Check my math.)
>>>
>> I get the same results. 1% ~ 80 cycles.
>>> Now when I look at kvmppc_core_deliver_interrupts(), I'm not sure where
>>> that time is going. We're assuming the first_first_bit() loop usually
>>> executes once, for syscall. Does it actually execute more than that? I
>>> don't expect any of kvmppc_can_deliver_interrupt(),
>>> kvmppc_booke_clear_exception(), or kvmppc_booke_deliver_interrupt() to
>>> take lots of time.
>>>
>> You can see below that I already had a more detailed breakdown in my
>> old mail:
>> [...]
>> d2) 8.84% - 8.56% - 9.28% - 8.31% finding first bit
>> in kvmppc_check_and_deliver_interrupts
>> d3) 6.53% - 5.25% - 6.63% - 5.10% can_deliver in
>> kvmppc_check_and_deliver_interrupts
>> d4) 13.66% - 15.37% - 14.12% - 14.92% clear&deliver
>> exception in kvmppc_check_and_deliver_interrupts
>> [...]
>>> Could it be cache effects? exception_priority[] and
>>> priority_exception[]
>>> are 16 bytes each, and our L1 cacheline is 32 bytes, so they should
>>> both
>>> fit into one... except they're not aligned.
>>>
>> I would be so happy if I would have hardware performance counters
>> like cache misses :-)
>>> Also, it looks like we use the generic find_first_bit(). That may be
>>> more expensive than we'd like. However, since
>>> vcpu->arch.pending_exceptions is a single long (not an arbitrary sized
>>> bitfield), we should be able to use ffs() instead, which has an
>>> optimized PowerPC implementation. That might help a lot.
>>>
>> good idea.
>> I'll check this and some other small improvements I have in mind.
>>
>>> We might even be able to replace find_next_bit() too, by shifting a
>>> mask
>>> over each loop, but I don't think we'll have to, since I expect the
>>> common case to be we can deliver the first pending exception. (Worth
>>> checking? :)
>>>
>> I'm not sure. It's surely worth checking how often that second
>> find_next_bit is called.
>> If that number is far too small it's not worth.
>>
>
>
--
Grüsse / regards,
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization
next prev parent reply other threads:[~2008-10-09 14:49 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-24 9:24 exit timing analysis v1 - comments&discussions welcome Christian Ehrhardt
2008-09-24 15:14 ` Hollis Blanchard
2008-09-25 9:32 ` Liu Yu-B13201
2008-09-25 15:18 ` Hollis Blanchard
2008-10-02 12:02 ` Christian Ehrhardt
2008-10-07 14:36 ` Christian Ehrhardt
2008-10-08 13:49 ` Christian Ehrhardt
2008-10-08 15:41 ` Hollis Blanchard
2008-10-09 8:02 ` Christian Ehrhardt
2008-10-09 9:35 ` Christian Ehrhardt
2008-10-09 14:49 ` Christian Ehrhardt [this message]
2008-10-10 8:32 ` Christian Ehrhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48EE19E8.9080002@linux.vnet.ibm.com \
--to=ehrhardt@linux.vnet.ibm.com \
--cc=kvm-ppc@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox