Re: [PATCH 3/7] KVM: TDX: Retry locally in TDX EPT violation handler on RET_PF_RETRY

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org,  rick.p.edgecombe@intel.com,
	kai.huang@intel.com, adrian.hunter@intel.com,
	 reinette.chatre@intel.com, xiaoyao.li@intel.com,
	tony.lindgren@intel.com,  binbin.wu@linux.intel.com,
	dmatlack@google.com, isaku.yamahata@intel.com,
	 isaku.yamahata@gmail.com
Subject: Re: [PATCH 3/7] KVM: TDX: Retry locally in TDX EPT violation handler on RET_PF_RETRY
Date: Fri, 24 Jan 2025 17:23:36 -0800	[thread overview]
Message-ID: <Z5Q9GNdCpSmuWSeZ@google.com> (raw)
In-Reply-To: <Z44DsmpFVZs3kxfE@yzhao56-desk.sh.intel.com>

On Mon, Jan 20, 2025, Yan Zhao wrote:
> On Fri, Jan 17, 2025 at 01:14:02PM -0800, Sean Christopherson wrote:
> > On Mon, Jan 13, 2025, Yan Zhao wrote:
> > I don't see any point in adding this comment, if the reader can't follow the
> > logic of this code, these comments aren't going to help them.  And the comment
> > about vcpu_run() in particular is misleading, as posted interrupts aren't truly
> > handled by vcpu_run(), rather they're handled by hardware (although KVM does send
> > a self-IPI).
> What about below version?
> 
> "
> Bail out the local retry
> - for pending signal, so that vcpu_run() --> xfer_to_guest_mode_handle_work()
>   --> kvm_handle_signal_exit() can exit to userspace for signal handling.

Eh, pending signals should be self-explanatory.

> - for pending interrupts, so that tdx_vcpu_enter_exit() --> tdh_vp_enter() will
>   be re-executed for interrupt injection through posted interrupt.
> - for pending nmi or KVM_REQ_NMI, so that vcpu_enter_guest() will be
>   re-executed to process and pend NMI to the TDX module. KVM always regards NMI
>   as allowed and the TDX module will inject it when NMI is allowed in the TD.
> "
> 
> > > +		 */
> > > +		if (signal_pending(current) || pi_has_pending_interrupt(vcpu) ||
> > > +		    kvm_test_request(KVM_REQ_NMI, vcpu) || vcpu->arch.nmi_pending)
> > 
> > This needs to check that the IRQ/NMI is actually allowed.  I guess it doesn't
> > matter for IRQs, but it does matter for NMIs.  Why not use kvm_vcpu_has_events()?
> Yes. However, vt_nmi_allowed() is always true for TDs.
> For interrupt, tdx_interrupt_allowed() is always true unless the exit reason is
> EXIT_REASON_HLT. For the EPT violation handler, the exit reason should not be
> EXIT_REASON_HLT.
> 
> > Ah, it's a local function.  At a glance, I don't see any harm in exposing that
> > to TDX.
> Besides that kvm_vcpu_has_events() is a local function, the consideration to
> check "pi_has_pending_interrupt() || kvm_test_request(KVM_REQ_NMI, vcpu) ||
> vcpu->arch.nmi_pending" instead that

*sigh*

  PEND_NMI TDVPS field is a 1-bit filed, i.e. KVM can only pend one NMI in
  the TDX module. Also, TDX doesn't allow KVM to request NMI-window exit
  directly. When there is already one NMI pending in the TDX module, i.e. it
  has not been delivered to TDX guest yet, if there is NMI pending in KVM,
  collapse the pending NMI in KVM into the one pending in the TDX module.
  Such collapse is OK considering on X86 bare metal, multiple NMIs could
  collapse into one NMI, e.g. when NMI is blocked by SMI.  It's OS's
  responsibility to poll all NMI sources in the NMI handler to avoid missing
  handling of some NMI events. More details can be found in the changelog of
  the patch "KVM: TDX: Implement methods to inject NMI".

That's probably fine?  But it's still unfortunate that TDX manages to be different
at almost every opportunity :-(

> (1) the two are effectively equivalent for TDs (as nested is not supported yet)

If they're all equivalent, then *not* open coding is desriable, IMO.  Ah, but
they aren't equivalent.  tdx_protected_apic_has_interrupt() also checks whatever
TD_VCPU_STATE_DETAILS_NON_ARCH is.

	vcpu_state_details =
		td_state_non_arch_read64(to_tdx(vcpu), TD_VCPU_STATE_DETAILS_NON_ARCH);

	return tdx_vcpu_state_details_intr_pending(vcpu_state_details);

That code needs a comment, because depending on the behavior of that field, it
might not even be correct.

> (2) kvm_vcpu_has_events() may lead to unnecessary breaks due to exception
>     pending. However, vt_inject_exception() is NULL for TDs.

Wouldn't a pending exception be a KVM bug?

The bigger oddity, which I think is worth calling out, is that because KVM can't
determine if IRQs (or NMIs) are blocked at the time of the EPT violation, false
positives are inevitable.  I.e. KVM may re-enter the guest even if the IRQ/NMI
can't be delivered.  Call *that* out, and explain why it's fine.

> > > +			break;
> > > +
> > > +		cond_resched();
> > > +	}
> > 
> > Nit, IMO this reads better as:
> > 
> > 	do {
> > 		ret = __vmx_handle_ept_violation(vcpu, gpa, exit_qual);
> > 	} while (ret == RET_PF_RETY && local_retry &&
> > 		 !kvm_vcpu_has_events(vcpu) && !signal_pending(current));
> >
> Hmm, the previous way can save one "cond_resched()" for the common cases, i.e.,
> when ret != RET_PF_RETRY or when gpa is shared .

Hrm, right.  Maybe this?  Dunno if that's any better.

	ret = 0;
	do {
		if (ret)
			cond_resched();

		ret = __vmx_handle_ept_violation(vcpu, gpa, exit_qual);
	} while (...)

next prev parent reply	other threads:[~2025-01-25  1:23 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-13  2:09 [PATCH 0/7] KVM: TDX SEPT SEAMCALL retry Yan Zhao
2025-01-13  2:10 ` [PATCH 1/7] KVM: TDX: Return -EBUSY when tdh_mem_page_add() encounters TDX_OPERAND_BUSY Yan Zhao
2025-01-14 22:24   ` Edgecombe, Rick P
2025-01-15  4:59     ` Yan Zhao
2025-01-13  2:11 ` [PATCH 2/7] KVM: x86/mmu: Return RET_PF* instead of 1 in kvm_mmu_page_fault() Yan Zhao
2025-01-14 22:24   ` Edgecombe, Rick P
2025-01-15  4:58     ` Yan Zhao
2025-01-13  2:12 ` [PATCH 3/7] KVM: TDX: Retry locally in TDX EPT violation handler on RET_PF_RETRY Yan Zhao
2025-01-17 21:14   ` Sean Christopherson
2025-01-20  8:05     ` Yan Zhao
2025-01-25  1:23       ` Sean Christopherson [this message]
2025-01-27  9:24         ` Yan Zhao
2025-01-27 17:04           ` Sean Christopherson
2025-02-05  7:34             ` Yan Zhao
2025-01-13  2:12 ` [PATCH 4/7] KVM: TDX: Kick off vCPUs when SEAMCALL is busy during TD page removal Yan Zhao
2025-01-16  6:23   ` Binbin Wu
2025-01-16  6:28     ` Binbin Wu
2025-01-16  8:18     ` Yan Zhao
2025-01-13  2:13 ` [PATCH 5/7] fixup! KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page table Yan Zhao
2025-01-16  6:30   ` Binbin Wu
2025-01-13  2:13 ` [PATCH 6/7] " Yan Zhao
2025-01-13  2:13 ` [PATCH 7/7] fixup! KVM: TDX: Implement TDX vcpu enter/exit path Yan Zhao
2025-01-14 22:27 ` [PATCH 0/7] KVM: TDX SEPT SEAMCALL retry Edgecombe, Rick P
2025-01-15 16:43 ` Paolo Bonzini
2025-01-16  0:52   ` Yan Zhao
2025-01-16 11:07     ` Paolo Bonzini
2025-01-17  9:52       ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z5Q9GNdCpSmuWSeZ@google.com \
    --to=seanjc@google.com \
    --cc=adrian.hunter@intel.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=tony.lindgren@intel.com \
    --cc=xiaoyao.li@intel.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox