Re: vtime accounting - Radim Krčmář

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Radim Krčmář" <rkrcmar@redhat.com>
To: Christoffer Dall <cdall@linaro.org>
Cc: Christoffer Dall <christoffer.dall@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, Marc Zyngier <marc.zyngier@arm.com>,
	Rik van Riel <riel@redhat.com>
Subject: Re: vtime accounting
Date: Tue, 14 Mar 2017 21:27:22 +0100	[thread overview]
Message-ID: <20170314202721.GD5432@potion> (raw)
In-Reply-To: <20170314183913.GF1277@cbox>

2017-03-14 19:39+0100, Christoffer Dall:
> On Tue, Mar 14, 2017 at 05:58:59PM +0100, Radim Krčmář wrote:
>> 2017-03-14 09:26+0100, Christoffer Dall:
>> > On Mon, Mar 13, 2017 at 06:28:16PM +0100, Radim Krčmář wrote:
>> >> 2017-03-08 02:57-0800, Christoffer Dall:
>> >> > Hi Paolo,
>> >> > 
>> >> > I'm looking at improving KVM/ARM a bit by calling guest_exit_irqoff
>> >> > before enabling interrupts when coming back from the guest.
>> >> > 
>> >> > Unfortunately, this appears to mess up my view of CPU usage using
>> >> > something like htop on the host, because it appears all time is spent
>> >> > inside the kernel.
>> >> > 
>> >> > From my analysis, I think this is because we never handle any interrupts
>> >> > before enabling interrupts, where the x86 code does its
>> >> > handle_external_intr, and the result on ARM is that we never increment
>> >> > jiffies before doing the vtime accounting.
>> >> 
>> >> (Hm, the counting might be broken on nohz_full then.)
>> >> 
>> > 
>> > Don't you still have a scheduler tick even with nohz_full and something
>> > that will eventually update jiffies then?
>> 
>> Probably, I don't understand jiffies accounting too well and didn't see
>> anything that would bump the jiffies in or before guest_exit_irqoff().
> 
> As far as I understand, from my very very short time of looking at the
> timer code, jiffies are updated on every tick, which can be cause by a
> number of events, including *any* interrupt handler (coming from idle
> state), soft timers, timer interrupts, and possibly other things.

Yes, I was thinking that entering/exiting user mode should trigger it as
well, in order to correctly account for time spent there, but couldn't
find it ...

The case I was wondering about is if the kernel spent e.g. 10 jiffies in
guest mode and then exited on mmio -- no interrupt in the host, and
guest_exit_irqoff() would flip accouting would over system time.
Can those 10 jiffies get accounted to system (not guest) time?

Accounting 1 jiffy wrong is normal as we expect that the distribution of
guest/kernel time in the jiffy is going to be approximated over a longer
sampling period, but if we could account multiple jiffies wrong, this
expectation is very hard to defend.

>> >> > So my current idea is to increment jiffies according to the clocksource
>> >> > before calling guest_exit_irqoff, but this would require some main
>> >> > clocksource infrastructure changes.
>> >> 
>> >> This seems similar to calling the function from the timer interrupt.
>> >> The timer interrupt would be delivered after that and only wasted time,
>> >> so it might actually be slower than just delivering it before ...
>> > 
>> > That's assuming that the timer interrupt hits at every exit.  I don't
>> > think that's the case, but I should measure it.
>> 
>> There cannot be less vm exits and I think there are far more vm exits,
>> but if there was no interrupt, then jiffies shouldn't raise and we would
>> get the same result as with plain guest_exit_irqoff().
>> 
> 
> That's true if you're guaranteed to take the timer interrupts that
> happen while running the guest before hitting guest_exit_irqoff(), so
> that you eventually count *some* time for the guest.  In the arm64 case,
> if we just do guest_exit_irqoff(), we *never* account any time to the
> guest.

Yes.  I assumed that if jiffies should be incremented, then you also get
a timer tick, so checking whether jiffies should be incremented inside
guest_exit_irqoff() was only slowing down the common case, where jiffies
remained the same.

(Checking if jiffies should be incremented is quite slow by itself.)

>> > I assume there's a good reason why we call guest_enter() and
>> > guest_exit() in the hot path on every KVM architecture?
>> 
>> I consider myself biased when it comes to jiffies, so no judgement. :)
>> 
>> From what I see, the mode switch is used only for statistics.
>> The original series is
>> 
>> 5e84cfde51cf303d368fcb48f22059f37b3872de~1..d172fcd3ae1ca7ac27ec8904242fd61e0e11d332
>> 
>> It didn't introduce the overhead with interrupt window and it didn't
>> count host kernel irq time as user time, so it was better at that time.
> 
> Yes, but it was based on cputime_to... functions, which I understand use
> ktime, which on systems running KVM will most often read the clocksource
> directly from the hardware, and that was then optimized later to just
> use jiffies to avoid the clocksource read because jiffies is already in
> memory and adjusted to the granularity we need, so in some sense an
> improvement, only it doesn't work if you don't update jiffies when
> needed.

True.  The kvm_guest_enter/exit still didn't trigger accounting, but the
granularity was better if there were other sources of accounting than
just timer ticks.

And I noticed another funny feature: the original intention was that if
the guest uses hardware acceleration, then the whole timeslice gets
accounted to guest/user time, because kvm_guest_exit() was not supposed
to clear PF_VCPU: https://lkml.org/lkml/2007/10/15/105
This somehow got mangled when merging and later there was a fix that
introduced the current behavior, which might be slightly more accurate:
0552f73b9a81 ("KVM: Move kvm_guest_exit() after local_irq_enable()")

next prev parent reply	other threads:[~2017-03-14 20:27 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-08 10:57 vtime accounting Christoffer Dall
2017-03-09  8:16 ` Paolo Bonzini
2017-03-13 17:28 ` Radim Krčmář
2017-03-14  8:26   ` Christoffer Dall
2017-03-14  8:55     ` Marc Zyngier
2017-03-14 11:12       ` Christoffer Dall
2017-03-14 11:46         ` Marc Zyngier
2017-03-14 16:58     ` Radim Krčmář
2017-03-14 17:09       ` Paolo Bonzini
2017-03-14 18:41         ` Christoffer Dall
2017-03-14 19:32           ` Radim Krčmář
2017-03-14 20:01             ` Christoffer Dall
2017-03-14 21:52               ` Radim Krčmář
2017-03-15  8:09                 ` Paolo Bonzini
2017-03-15  8:05           ` Paolo Bonzini
2017-03-15  8:30             ` Christoffer Dall
2017-03-14 18:39       ` Christoffer Dall
2017-03-14 20:27         ` Radim Krčmář [this message]
2017-03-14 21:53           ` Radim Krčmář
2017-03-15  8:43           ` Christoffer Dall
2017-03-15 15:57             ` Radim Krčmář
2017-03-15 16:48               ` Christoffer Dall
2017-03-15 17:09                 ` Radim Krčmář
2017-03-24 15:04             ` Rik van Riel
2017-03-27 12:29               ` Wanpeng Li
2017-03-24 14:55     ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170314202721.GD5432@potion \
    --to=rkrcmar@redhat.com \
    --cc=cdall@linaro.org \
    --cc=christoffer.dall@linaro.org \
    --cc=kvm@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=pbonzini@redhat.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.