From: Sean Christopherson <seanjc@google.com>
To: Binbin Wu <binbin.wu@linux.intel.com>
Cc: Chao Gao <chao.gao@intel.com>,
Vishal Annapurve <vannapurve@google.com>,
pbonzini@redhat.com, kvm@vger.kernel.org,
rick.p.edgecombe@intel.com, kai.huang@intel.com,
adrian.hunter@intel.com, reinette.chatre@intel.com,
xiaoyao.li@intel.com, tony.lindgren@linux.intel.com,
isaku.yamahata@intel.com, yan.y.zhao@intel.com,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 12/16] KVM: TDX: Inhibit APICv for TDX guest
Date: Tue, 7 Jan 2025 13:15:56 -0800 [thread overview]
Message-ID: <Z32ZjGH72WPKBMam@google.com> (raw)
In-Reply-To: <ef29512a-2ca0-47c4-8b6e-6553bce1e273@linux.intel.com>
On Tue, Jan 07, 2025, Binbin Wu wrote:
> On 1/7/2025 11:24 AM, Chao Gao wrote:
> > > Note, kvm_use_posted_timer_interrupt() uses a heuristic of HLT/MWAIT interception
> > > being disabled to detect that it's worth trying to post a timer interrupt, but off
> > > the top of my head I'm pretty sure that's unnecessary and pointless. The
> > Commit 1714a4eb6fb0 gives an explanation:
> >
> > if only some guests isolated and others not, they would not
> > have any benefit from posted timer interrupts, and at the same time lose
> > VMX preemption timer fast paths because kvm_can_post_timer_interrupt()
> > returns true and therefore forces kvm_can_use_hv_timer() to false.
But that's only relevant for setup. Upon expiration, if the target vCPU is in
guest mode and APICv is active, then from_timer_fn must be true, which in turn
means that hrtimers are in use.
> Userspace uses KVM_CAP_X86_DISABLE_EXITS to enable mwait_in_guest or/and
> hlt_in_guest for non TDX guest. The code doesn't work for TDX guests.
> - Whether mwait in guest is allowed for TDX depends on the cpuid
> configuration in TD_PARAMS, not the capability of physical cpu checked by
> kvm_can_mwait_in_guest().
> - hlt for TDX is via TDVMCALL, i.e. hlt_in_guest should always be false
> for TDX guests.
>
> For TDX guests, not sure if it worths to fix kvm_{mwait,hlt}_in_guest()
> or add special handling (i.e., skip checking mwait/hlt in guest) because
> VMX preemption timer can't be used anyway, in order to allow housekeeping
> CPU to post timer interrupt for TDX vCPUs when nohz_full is configured
> after changing APICv active to true.
Yes, but I don't think we need any TDX-specific logic beyond noting that the
VMX preemption can't be used. As above, consulting kvm_can_post_timer_interrupt()
in the expiration path is silly.
And after staring at all of this for a few hours, I think we should ditch the
entire lapic_timer.pending approach, because at this point it's doing more harm
than good. Tracking pending timer IRQs was primarily done to support delivery
of *all* timer interrupts when multiple interrupts were coalesced in the host,
e.g. because a vCPU was scheduled out. But that got removed by commit
f1ed0450a5fa ("KVM: x86: Remove support for reporting coalesced APIC IRQs"),
partly because posted interrupts threw a wrench in things, but also because
delivering multiple periodic interrupts in quick succession is problematic in
its own right.
With the interrupt coalescing logic gone, I can't think of any reason KVM needs
to kick the vCPU out to the main vcpu_run() loop just to deliver the interrupt,
i.e. pending interrupts before delivering them is unnecessary. E.g. conditioning
direct deliverly on apicv_active in the !from_timer_fn case makes no sense, because
KVM is going to manually move the interrupt from the PIR to the IRR anyways.
I also don't like that the behavior for the posting path is subtly different than
!APICv. E.g. if an hrtimer fires while KVM is handling a write to TMICT, KVM will
deliver the interrupt if configured to post timer, but not if APICv is disabled,
because the latter will increment "pending", and "pending" will be cleared before
handling the new TMICT. Ditto for switch APIC timer modes.
Unfortunately, there is a lot to disentangle before KVM can directly deliver all
APIC timer interupts, as KVM has built up a fair bit of ugliness on top of "pending".
E.g. when switching back to the VMX preemption timer (after a blocking vCPU is
scheduled back in), KVM leaves the hrtimer active. start_hv_timer() accounts for
that by checking lapic_timer.pending, but leaving the hrtimer running in this case
is absurd and unnecessarily relies on "pending" being incremented.
if (kvm_x86_call(set_hv_timer)(vcpu, ktimer->tscdeadline, &expired))
return false;
ktimer->hv_timer_in_use = true;
hrtimer_cancel(&ktimer->timer);
/*
* To simplify handling the periodic timer, leave the hv timer running
* even if the deadline timer has expired, i.e. rely on the resulting
* VM-Exit to recompute the periodic timer's target expiration.
*/
if (!apic_lvtt_period(apic)) {
/*
* Cancel the hv timer if the sw timer fired while the hv timer
* was being programmed, or if the hv timer itself expired.
*/
if (atomic_read(&ktimer->pending)) {
cancel_hv_timer(apic);
} else if (expired) {
apic_timer_expired(apic, false);
cancel_hv_timer(apic);
}
}
I'm also not convinced that letting the hrtimer run on a different CPU in the
"normal" case is a good idea. KVM explicitly migrates the timer, *except* for
the posting case, and so not pinning the hrtimer is likely counter-productive,
not to mention confusing.
void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
{
struct hrtimer *timer;
if (!lapic_in_kernel(vcpu) ||
kvm_can_post_timer_interrupt(vcpu))
return;
timer = &vcpu->arch.apic->lapic_timer.timer;
if (hrtimer_cancel(timer))
hrtimer_start_expires(timer, HRTIMER_MODE_ABS_HARD);
}
And if the hrtimer is pinned to the pCPU, then in practice it should be impossible
for KVM to post a timer IRQ into a vCPU that's in guest mode.
Seapking of which, posting a timer IRQ into a vCPU that's in guest mode is a bit
dodgy. I _think_ it's safe, because all of the flows that can be coincident are
actually mutually exclusive. E.g. apic_timer_expired()'s call to
__kvm_wait_lapic_expire() can't collide with the call from the entry path, as the
entry path's invocation happens with IRQs disabled.
But all of this makes me wary, so I'd much prefer to clean up, harden, and document
the existing code before we try to optimize timer IRQ delivery for more cases.
That's especially true for TDX as we're already adding significant compexlity
and multiple novel paths for TDX; we don't need more at this time :-)
next prev parent reply other threads:[~2025-01-07 21:15 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-09 1:07 [PATCH 00/16] KVM: TDX: TDX interrupts Binbin Wu
2024-12-09 1:07 ` [PATCH 01/16] KVM: TDX: Add support for find pending IRQ in a protected local APIC Binbin Wu
2025-01-09 15:38 ` Nikolay Borisov
2025-01-10 5:36 ` Binbin Wu
2024-12-09 1:07 ` [PATCH 02/16] KVM: VMX: Remove use of struct vcpu_vmx from posted_intr.c Binbin Wu
2024-12-09 1:07 ` [PATCH 03/16] KVM: TDX: Disable PI wakeup for IPIv Binbin Wu
2024-12-09 1:07 ` [PATCH 04/16] KVM: VMX: Move posted interrupt delivery code to common header Binbin Wu
2024-12-09 1:07 ` [PATCH 05/16] KVM: TDX: Implement non-NMI interrupt injection Binbin Wu
2024-12-09 1:07 ` [PATCH 06/16] KVM: x86: Assume timer IRQ was injected if APIC state is protected Binbin Wu
2024-12-09 1:07 ` [PATCH 07/16] KVM: TDX: Wait lapic expire when timer IRQ was injected Binbin Wu
2024-12-09 1:07 ` [PATCH 08/16] KVM: TDX: Implement methods to inject NMI Binbin Wu
2024-12-09 1:07 ` [PATCH 09/16] KVM: TDX: Complete interrupts after TD exit Binbin Wu
2024-12-09 1:07 ` [PATCH 10/16] KVM: TDX: Handle SMI request as !CONFIG_KVM_SMM Binbin Wu
2024-12-09 1:07 ` [PATCH 11/16] KVM: TDX: Always block INIT/SIPI Binbin Wu
2025-01-08 7:21 ` Xiaoyao Li
2025-01-08 7:53 ` Binbin Wu
2025-01-08 14:40 ` Sean Christopherson
2025-01-09 2:09 ` Xiaoyao Li
2025-01-09 2:26 ` Binbin Wu
2025-01-09 2:46 ` Huang, Kai
2025-01-09 3:20 ` Binbin Wu
2025-01-09 4:01 ` Huang, Kai
2025-01-09 2:51 ` Huang, Kai
2024-12-09 1:07 ` [PATCH 12/16] KVM: TDX: Inhibit APICv for TDX guest Binbin Wu
2025-01-03 21:59 ` Vishal Annapurve
2025-01-06 1:46 ` Binbin Wu
2025-01-06 22:49 ` Vishal Annapurve
2025-01-06 23:40 ` Sean Christopherson
2025-01-07 3:24 ` Chao Gao
2025-01-07 8:09 ` Binbin Wu
2025-01-07 21:15 ` Sean Christopherson [this message]
2025-01-13 2:03 ` Binbin Wu
2025-01-13 2:09 ` Binbin Wu
2025-01-13 17:16 ` Sean Christopherson
2025-01-14 8:20 ` Binbin Wu
2025-01-14 16:59 ` Sean Christopherson
2025-01-16 11:55 ` Huang, Kai
2025-01-16 14:50 ` Sean Christopherson
2025-01-16 20:16 ` Huang, Kai
2025-01-16 22:37 ` Sean Christopherson
2025-01-17 9:53 ` Huang, Kai
2025-01-17 10:46 ` Huang, Kai
2025-01-17 15:08 ` Sean Christopherson
2025-01-17 0:49 ` Binbin Wu
2024-12-09 1:07 ` [PATCH 13/16] KVM: TDX: Add methods to ignore virtual apic related operation Binbin Wu
2025-01-03 22:04 ` Vishal Annapurve
2025-01-06 2:18 ` Binbin Wu
2025-01-22 11:34 ` Paolo Bonzini
2025-01-22 13:59 ` Binbin Wu
2024-12-09 1:07 ` [PATCH 14/16] KVM: VMX: Move NMI/exception handler to common helper Binbin Wu
2024-12-09 1:07 ` [PATCH 15/16] KVM: TDX: Handle EXCEPTION_NMI and EXTERNAL_INTERRUPT Binbin Wu
2024-12-09 1:07 ` [PATCH 16/16] KVM: TDX: Handle EXIT_REASON_OTHER_SMI Binbin Wu
2024-12-10 18:24 ` [PATCH 00/16] KVM: TDX: TDX interrupts Paolo Bonzini
2025-01-06 10:51 ` Xiaoyao Li
2025-01-06 20:08 ` Sean Christopherson
2025-01-09 2:44 ` Binbin Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z32ZjGH72WPKBMam@google.com \
--to=seanjc@google.com \
--cc=adrian.hunter@intel.com \
--cc=binbin.wu@linux.intel.com \
--cc=chao.gao@intel.com \
--cc=isaku.yamahata@intel.com \
--cc=kai.huang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=reinette.chatre@intel.com \
--cc=rick.p.edgecombe@intel.com \
--cc=tony.lindgren@linux.intel.com \
--cc=vannapurve@google.com \
--cc=xiaoyao.li@intel.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.