Re: [RFC] cputime: Introduce option to force full dynticks accounting on NOHZ & NOHZ_IDLE CPUs

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Nicolas Saenz Julienne <nsaenz@amazon.com>
Cc: frederic@kernel.org, paulmck@kernel.org, jalliste@amazon.co.uk,
	 mhiramat@kernel.org, akpm@linux-foundation.org,
	pmladek@suse.com,  rdunlap@infradead.org, tsi@tuyoix.net,
	nphamcs@gmail.com,  gregkh@linuxfoundation.org,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	 pbonzini@redhat.com
Subject: Re: [RFC] cputime: Introduce option to force full dynticks accounting on NOHZ & NOHZ_IDLE CPUs
Date: Wed, 21 Feb 2024 08:24:28 -0800	[thread overview]
Message-ID: <ZdYjvBItrl20oHXC@google.com> (raw)
In-Reply-To: <CZA43Y64EK8R.1M8J5Q6L39LFB@amazon.com>

On Tue, Feb 20, 2024, Nicolas Saenz Julienne wrote:
> Hi Sean,
> 
> On Tue Feb 20, 2024 at 4:18 PM UTC, Sean Christopherson wrote:
> > On Mon, Feb 19, 2024, Nicolas Saenz Julienne wrote:
> > > Under certain extreme conditions, the tick-based cputime accounting may
> > > produce inaccurate data. For instance, guest CPU usage is sensitive to
> > > interrupts firing right before the tick's expiration.

Ah, this confused me.  The "right before" is a bit misleading.  It's more like
"shortly before", because if the interrupt that occurs due to the guest's tick
arrives _right_ before the host tick expires, then commit 160457140187 should
avoid horrific accounting.

> > > This forces the guest into kernel context, and has that time slice
> > > wrongly accounted as system time. This issue is exacerbated if the
> > > interrupt source is in sync with the tick,

It's worth calling out why this can happen, to make it clear that getting into
such syncopation can happen quite naturally.  E.g. something like:

      interrupt source is in sync with the tick, e.g. if the guest's tick
      is configured to run at the same frequency as the host tick, and the
      guest tick is every so slightly ahead of the host tick.

> > > significantly skewing usage metrics towards system time.
> >
> > ...
> >
> > > NOTE: This wasn't tested in depth, and it's mostly intended to highlight
> > > the issue we're trying to solve. Also ccing KVM folks, since it's
> > > relevant to guest CPU usage accounting.
> >
> > How bad is the synchronization issue on upstream kernels?  We tried to address
> > that in commit 160457140187 ("KVM: x86: Defer vtime accounting 'til after IRQ handling").
> >
> > I don't expect it to be foolproof, but it'd be good to know if there's a blatant
> > flaw and/or easily closed hole.
> 
> The issue is not really about the interrupts themselves, but their side
> effects.
> 
> For instance, let's say the guest sets up an Hyper-V stimer that
> consistently fires 1 us before the preemption tick. The preemption tick
> will expire while the vCPU thread is running with !PF_VCPU (maybe inside
> kvm_hv_process_stimers() for ex.). As long as they both keep in sync,
> you'll get a 100% system usage. I was able to reproduce this one through
> kvm-unit-tests, but the race window is too small to keep the interrupts
> in sync for long periods of time, yet still capable of producing random
> system usage bursts (which unacceptable for some use-cases).
> 
> Other use-cases have bigger race windows and managed to maintain high
> system CPU usage over long periods of time. For example, with user-space
> HPET emulation, or KVM+Xen (don't know the fine details on these, but
> VIRT_CPU_ACCOUNTING_GEN fixes the mis-accounting). It all comes down to
> the same situation. Something triggers an exit, and the vCPU thread goes
> past 'vtime_account_guest_exit()' just in time for the tick interrupt to
> show up.

I suspect the common "problem" with those flows is that emulating the guest timer
interrupt is (a) slow, relatively speaking and (b) done with interrupts enabled.

E.g. on VMX, the TSC deadline timer is emulated via VMX preemption timer, and both
the programming of the guest's TSC deadline timer and the handling of the expiration
interrupt is done in the VM-Exit fastpath with IRQs disabled.  As a result, even
if the host tick interrupt is a hair behind the guest tick, it doesn't affect
accounting because the host tick interrupt will never be delivered while KVM is
emulating the guest's periodic tick.

I'm guessing that if you tested on SVM (or a guest that doesn't use the APIC timer
in deadline mode), which doesn't utilize the fastpath since KVM needs to bounce
through hrtimers, then you'd see similar accounting problems even without using
any of the problematic "slow" timer sources.

next prev parent reply	other threads:[~2024-02-21 16:24 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-19 17:57 [RFC] cputime: Introduce option to force full dynticks accounting on NOHZ & NOHZ_IDLE CPUs Nicolas Saenz Julienne
2024-02-20 16:18 ` Sean Christopherson
2024-02-20 18:19   ` Nicolas Saenz Julienne
2024-02-21 16:24     ` Sean Christopherson [this message]
2024-02-21 17:11       ` Nicolas Saenz Julienne
2024-03-11 17:15 ` Nicolas Saenz Julienne
2024-03-12 23:13   ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZdYjvBItrl20oHXC@google.com \
    --to=seanjc@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=frederic@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jalliste@amazon.co.uk \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=nsaenz@amazon.com \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=pmladek@suse.com \
    --cc=rdunlap@infradead.org \
    --cc=tsi@tuyoix.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox