From mboxrd@z Thu Jan  1 00:00:00 1970
From: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Date: Thu, 09 Oct 2008 08:02:46 +0000
Subject: Re: exit timing analysis v1 - comments&discussions welcome
Message-Id: <48EDBAA6.9050308@linux.vnet.ibm.com>
List-Id: <kvm-ppc.vger.kernel.org>
References: <48DA0747.3020004@linux.vnet.ibm.com>
In-Reply-To: <48DA0747.3020004@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
To: kvm-ppc@vger.kernel.org

Hollis Blanchard wrote:
> On Wed, 2008-10-08 at 15:49 +0200, Christian Ehrhardt wrote:
>   
>> Wondering about that 30.5% for postprocessing and 
>> kvmppc_check_and_deliver_interrupts I quickly checked that in detail - 
>> part d is now divided in 4 subparts.
>> I also looked at the return to guest path if the expected part 
>> (restoring tlb) is really the main time eater there. The result shows 
>> clearly that it is.
>>
>> more detailed breakdown:
>> a)  10.94%  - exit, saving guest state (booke_interrupt.S)
>> b)   8.12% - reaching kvmppc_handle_exit
>> c)   7.59%  - syscall exit is checked and a interrupt is queued using 
>> kvmppc_queue_exception
>> d1)  3.33%  - some checks for all exits
>> d2)  8.29% - finding first bit in kvmppc_check_and_deliver_interrupts
>> d3) 17.20% - can_deliver/clear&deliver exception in 
>> kvmppc_check_and_deliver_interrupts
>> d4)  4.47% - updating kvm_stat statistics
>> e)   6.13% - returning from kvmppc_handle_exit to booke_interrupt.S
>> f1) 29.18% - restoring guest tlb
>> f2)  4.69% - restoring guest state ([s]regs)
>>
>> These fractions are % of our ~12µs syscall exit.
>> => restoring tlb on each reenter = 4µs constant overhead
>> => looking a bit into irq delivery and other constant things like 
>> kvm_stat updating
>>
>>     
> ...
>   
>> Now I go for the TLB replacement in f1.
>>     
>
> Hang on... does d3 make sense to you? It doesn't to me, and if there's a
> bug there it will be easier to fix than rewriting the TLB code. :)
>   
I did not give up improving that part too :-)
> I think your core runs at 667MHz, right? So that's 1.5 ns/cycle. 17.20%
> of 12µs is 2064ns, or about 1300 cycles. (Check my math.)
>   
I get the same results. 1% ~ 80 cycles.
> Now when I look at kvmppc_core_deliver_interrupts(), I'm not sure where
> that time is going. We're assuming the first_first_bit() loop usually
> executes once, for syscall. Does it actually execute more than that? I
> don't expect any of kvmppc_can_deliver_interrupt(),
> kvmppc_booke_clear_exception(), or kvmppc_booke_deliver_interrupt() to
> take lots of time.
>   
You can see below that I already had a more detailed breakdown in my old 
mail:
[...]
d2)  8.84% -   8.56%     -   9.28%      -   8.31% finding first bit in 
kvmppc_check_and_deliver_interrupts
d3)  6.53% -   5.25%     -   6.63%      -   5.10% can_deliver in 
kvmppc_check_and_deliver_interrupts
d4) 13.66% -  15.37%     -  14.12%      -  14.92% clear&deliver 
exception in kvmppc_check_and_deliver_interrupts
[...]
> Could it be cache effects? exception_priority[] and priority_exception[]
> are 16 bytes each, and our L1 cacheline is 32 bytes, so they should both
> fit into one... except they're not aligned.
>   
I would be so happy if I would have hardware performance counters like 
cache misses :-)
> Also, it looks like we use the generic find_first_bit(). That may be
> more expensive than we'd like. However, since
> vcpu->arch.pending_exceptions is a single long (not an arbitrary sized
> bitfield), we should be able to use ffs() instead, which has an
> optimized PowerPC implementation. That might help a lot.
>   
good idea.
I'll check this and some other small improvements I have in mind.

> We might even be able to replace find_next_bit() too, by shifting a mask
> over each loop, but I don't think we'll have to, since I expect the
> common case to be we can deliver the first pending exception. (Worth
> checking? :)
>   
I'm not sure. It's surely worth checking how often that second 
find_next_bit is called.
If that number is far too small it's not worth.

-- 

Grüsse / regards, 
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization