Re: exit timing analysis v1 - comments&discussions welcome

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
To: kvm-ppc@vger.kernel.org
Subject: Re: exit timing analysis v1 - comments&discussions welcome
Date: Thu, 09 Oct 2008 09:35:05 +0000	[thread overview]
Message-ID: <48EDD049.1000709@linux.vnet.ibm.com> (raw)
In-Reply-To: <48DA0747.3020004@linux.vnet.ibm.com>

I modified the code according to your comments and my ideas, the new 
values are shown in column impISF (irq delivery, Stat, FindFirstBit)

I changed some code of the statistic updating and the interrupt delivery 
and got this:
     base - impirq (d3) - impstat (d5) - impboth  - impISF
a)  12.57% -  11.13%     -  12.05%      -  11.03% - 12.28%  exit, saving 
guest state (booke_interrupt.S)
b)   7.37% -   9.38%     -   8.69%      -   8.07% - 10.13%  reaching 
kvmppc_handle_exit
c)   7.38% -   7.20%     -   7.49%      -   9.78% -  7.85%  syscall exit 
is checked and a interrupt is queued using kvmppc_queue_exception
d1)  2.49% -   3.39%     -   2.56%      -   3.30% -  3.70%  some checks 
for all exits
d2)  8.84% -   8.56%     -   9.28%      -   8.31% -  6.07%  finding 
first bit in kvmppc_check_and_deliver_interrupts
d3)  6.53% -   5.25%     -   6.63%      -   5.10% -  4.27%  can_deliver 
in kvmppc_check_and_deliver_interrupts
d4) 13.66% -  15.37%     -  14.12%      -  14.92% - 13.96%  
clear&deliver exception in kvmppc_check_and_deliver_interrupts
d5)  3.65% -   4.57%     -   2.68%      -   4.41% -  3.77%  updating 
kvm_stat statistics
e)   6.55% -   6.30%     -   6.30%      -   5.89% -  6.74%  returning 
from kvmppc_handle_exit to booke_interrupt.S
f1) 30.90% -  28.78%     -  30.16%      -  29.16% - 31.19%  restoring 
guest tlb
f2)  4.81% -   4.77%     -   5.06%      -   4.66% -  5.17%  restoring 
guest state ([s]regs)

We all see the measurement inaccuracy, but the last columns look good at 
the improved sections d2, d3 and d4.
I'll remove these detailed tracing soon and make a larger test hoping 
that this will not have the inaccuracy.
But for now I still wonder about the ~14% for clear&deliver - that 
should just not be "that" much.
It should be worth to look into that section once again more in detail 
first.

Christian Ehrhardt wrote:
> Hollis Blanchard wrote:
>> On Wed, 2008-10-08 at 15:49 +0200, Christian Ehrhardt wrote:
>>  
>>> Wondering about that 30.5% for postprocessing and 
>>> kvmppc_check_and_deliver_interrupts I quickly checked that in detail 
>>> - part d is now divided in 4 subparts.
>>> I also looked at the return to guest path if the expected part 
>>> (restoring tlb) is really the main time eater there. The result 
>>> shows clearly that it is.
>>>
>>> more detailed breakdown:
>>> a)  10.94%  - exit, saving guest state (booke_interrupt.S)
>>> b)   8.12% - reaching kvmppc_handle_exit
>>> c)   7.59%  - syscall exit is checked and a interrupt is queued 
>>> using kvmppc_queue_exception
>>> d1)  3.33%  - some checks for all exits
>>> d2)  8.29% - finding first bit in kvmppc_check_and_deliver_interrupts
>>> d3) 17.20% - can_deliver/clear&deliver exception in 
>>> kvmppc_check_and_deliver_interrupts
>>> d4)  4.47% - updating kvm_stat statistics
>>> e)   6.13% - returning from kvmppc_handle_exit to booke_interrupt.S
>>> f1) 29.18% - restoring guest tlb
>>> f2)  4.69% - restoring guest state ([s]regs)
>>>
>>> These fractions are % of our ~12µs syscall exit.
>>> => restoring tlb on each reenter = 4µs constant overhead
>>> => looking a bit into irq delivery and other constant things like 
>>> kvm_stat updating
>>>
>>>     
>> ...
>>  
>>> Now I go for the TLB replacement in f1.
>>>     
>>
>> Hang on... does d3 make sense to you? It doesn't to me, and if there's a
>> bug there it will be easier to fix than rewriting the TLB code. :)
>>   
> I did not give up improving that part too :-)
>> I think your core runs at 667MHz, right? So that's 1.5 ns/cycle. 17.20%
>> of 12µs is 2064ns, or about 1300 cycles. (Check my math.)
>>   
> I get the same results. 1% ~ 80 cycles.
>> Now when I look at kvmppc_core_deliver_interrupts(), I'm not sure where
>> that time is going. We're assuming the first_first_bit() loop usually
>> executes once, for syscall. Does it actually execute more than that? I
>> don't expect any of kvmppc_can_deliver_interrupt(),
>> kvmppc_booke_clear_exception(), or kvmppc_booke_deliver_interrupt() to
>> take lots of time.
>>   
> You can see below that I already had a more detailed breakdown in my 
> old mail:
> [...]
> d2)  8.84% -   8.56%     -   9.28%      -   8.31% finding first bit in 
> kvmppc_check_and_deliver_interrupts
> d3)  6.53% -   5.25%     -   6.63%      -   5.10% can_deliver in 
> kvmppc_check_and_deliver_interrupts
> d4) 13.66% -  15.37%     -  14.12%      -  14.92% clear&deliver 
> exception in kvmppc_check_and_deliver_interrupts
> [...]
>> Could it be cache effects? exception_priority[] and priority_exception[]
>> are 16 bytes each, and our L1 cacheline is 32 bytes, so they should both
>> fit into one... except they're not aligned.
>>   
> I would be so happy if I would have hardware performance counters like 
> cache misses :-)
>> Also, it looks like we use the generic find_first_bit(). That may be
>> more expensive than we'd like. However, since
>> vcpu->arch.pending_exceptions is a single long (not an arbitrary sized
>> bitfield), we should be able to use ffs() instead, which has an
>> optimized PowerPC implementation. That might help a lot.
>>   
> good idea.
> I'll check this and some other small improvements I have in mind.
>
>> We might even be able to replace find_next_bit() too, by shifting a mask
>> over each loop, but I don't think we'll have to, since I expect the
>> common case to be we can deliver the first pending exception. (Worth
>> checking? :)
>>   
> I'm not sure. It's surely worth checking how often that second 
> find_next_bit is called.
> If that number is far too small it's not worth.
>


-- 

Grüsse / regards, 
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization

next prev parent reply	other threads:[~2008-10-09  9:35 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-24  9:24 exit timing analysis v1 - comments&discussions welcome Christian Ehrhardt
2008-09-24 15:14 ` Hollis Blanchard
2008-09-25  9:32 ` Liu Yu-B13201
2008-09-25 15:18 ` Hollis Blanchard
2008-10-02 12:02 ` Christian Ehrhardt
2008-10-07 14:36 ` Christian Ehrhardt
2008-10-08 13:49 ` Christian Ehrhardt
2008-10-08 15:41 ` Hollis Blanchard
2008-10-09  8:02 ` Christian Ehrhardt
2008-10-09  9:35 ` Christian Ehrhardt [this message]
2008-10-09 14:49 ` Christian Ehrhardt
2008-10-10  8:32 ` Christian Ehrhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48EDD049.1000709@linux.vnet.ibm.com \
    --to=ehrhardt@linux.vnet.ibm.com \
    --cc=kvm-ppc@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.