All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Jon Kohler <jon@nutanix.com>
Cc: "kvm @ vger . kernel . org" <kvm@vger.kernel.org>,
	 "linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	Fenghua Yu <fenghua.yu@intel.com>,
	 "kyung.min.park@intel.com" <kyung.min.park@intel.com>,
	Tony Luck <tony.luck@intel.com>
Subject: Re: KVM: x86: __wait_lapic_expire silently using TPAUSE C0.2
Date: Mon, 9 Sep 2024 12:11:52 -0700	[thread overview]
Message-ID: <Zt9IeD_15ZsFElIa@google.com> (raw)
In-Reply-To: <DA40912C-CACC-4273-95B8-60AC67DFE317@nutanix.com>

On Fri, Sep 06, 2024, Jon Kohler wrote:
> delay_halt_fn uses __tpause() with TPAUSE_C02_STATE, which is the power
> optimized version of tpause, which according to documentation [3] is
> a slower wakeup latency and higher power savings, with an added benefit
> of being more SMT yield friendly.
> 
> For datacenter, latency sensitive workloads, this is problematic as
> the call to kvm_wait_lapic_expire happens directly prior to reentry
> through vmx_vcpu_enter_exit, which is the exact wrong place for slow
> wakeup latency.

...

> So, with all of that said, there are a few things that could be done,
> and I'm definitely open to ideas:
> 1. Update delay_halt_tpause to use TPAUSE_C01_STATE unilaterally, which
> anecdotally seems inline with the spirit of how AMD implemented
> MWAITX, which uses the same delay_halt loop, and calls mwaitx with
> MWAITX_DISABLE_CSTATES. 
> 2. Provide system level configurability to delay.c to optionally use
> C01 as a config knob, maybe a compile leve setting? That way distros
> aiming at low energy deployments could use that, but otherwise
> default is low latency instead?
> 3. Provide some different delay API that KVM could call, indicating it
> wants low wakeup latency delays, if hardware supports it?
> 4. Pull this code into kvm code directly (boooooo?) and manage it
> directly instead of using delay.c (boooooo?)
> 5. Something else?

The option that would likely give the best of both worlds would be to prioritize
lower wakeup latency for "small" delays.  That could be done in __delay() and/or
in KVM.  E.g. delay_halt_tpause() quite clearly assumes a relatively long delay,
which is a flawed assumption in this case.

	/*
	 * Hard code the deeper (C0.2) sleep state because exit latency is
	 * small compared to the "microseconds" that usleep() will delay.
	 */
	__tpause(TPAUSE_C02_STATE, edx, eax);

The reason I say "and/or KVM" is that even without TPAUSE in the picture, it might
make sense for KVM to avoid __delay() for anything but long delays.  Both because
the overhead of e.g. delay_tsc() could be higher than the delay itself, but also
because the intent of KVM's delay is somewhat unique.

By definition, KVM _knows_ there is an IRQ that is being deliver to the vCPU, i.e.
entering the guest and running the vCPU asap is a priority.  The _only_ reason KVM
is waiting is to not violate the architecture.  Reducing power consumption and
even letting an SMT sibling run are arguably non-goals, i.e. it might be best for
KVM to avoid even regular ol' PAUSE in this specific scenario, unless the wait
time is so high that delaying VM-Enter more than the absolute bare minimum
becomes a worthwhile tradeoff.

  reply	other threads:[~2024-09-09 19:11 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-06 17:57 KVM: x86: __wait_lapic_expire silently using TPAUSE C0.2 Jon Kohler
2024-09-09 19:11 ` Sean Christopherson [this message]
2024-09-10 18:47   ` Jon Kohler
2024-09-11 21:32     ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zt9IeD_15ZsFElIa@google.com \
    --to=seanjc@google.com \
    --cc=fenghua.yu@intel.com \
    --cc=jon@nutanix.com \
    --cc=kvm@vger.kernel.org \
    --cc=kyung.min.park@intel.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.