kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Sagi Shahar <sagis@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	 Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	Binbin Wu <binbin.wu@linux.intel.com>,
	 Ira Weiny <ira.weiny@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org,  kvm@vger.kernel.org,
	x86@kernel.org
Subject: Re: [PATCH] KVM: TDX: Force split irqchip for TDX at irqchip creation time
Date: Tue, 26 Aug 2025 14:48:08 -0700	[thread overview]
Message-ID: <aK4rmD7QpotYXume@google.com> (raw)
In-Reply-To: <20250826213455.2338722-1-sagis@google.com>

On Tue, Aug 26, 2025, Sagi Shahar wrote:
> TDX module protects the EOI-bitmap which prevents the use of in-kernel
> I/O APIC. See more details in the original patch [1]
> 
> The current implementation already enforces the use of split irqchip for
> TDX but it does so at the vCPU creation time which is generally to late
> to fallback to split irqchip.
> 
> This patch follows Sean's recomendation from [2] and move the check if
> I/O APIC is supported for the VM at irqchip creation time.
> 
> [1] https://lore.kernel.org/lkml/20250222014757.897978-11-binbin.wu@linux.intel.com/
> [2] https://lore.kernel.org/lkml/aK3vZ5HuKKeFuuM4@google.com/
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sagi Shahar <sagis@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  3 +++
>  arch/x86/kvm/vmx/tdx.c          | 15 ++++++++-------
>  arch/x86/kvm/x86.c              | 10 ++++++++++
>  3 files changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f19a76d3ca0e..cb22fc48cdec 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1357,6 +1357,7 @@ struct kvm_arch {
>  	u8 vm_type;
>  	bool has_private_mem;
>  	bool has_protected_state;
> +	bool has_protected_eoi;
>  	bool pre_fault_allowed;
>  	struct hlist_head *mmu_page_hash;
>  	struct list_head active_mmu_pages;
> @@ -2284,6 +2285,8 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>  
>  #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
>  
> +#define kvm_arch_has_protected_eoi(kvm) (!(kvm)->arch.has_protected_eoi)
> +
>  static inline u16 kvm_read_ldt(void)
>  {
>  	u16 ldt;
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 66744f5768c8..8c270a159692 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -658,6 +658,12 @@ int tdx_vm_init(struct kvm *kvm)
>  	 */
>  	kvm->max_vcpus = min_t(int, kvm->max_vcpus, num_present_cpus());
>  
> +	/*
> +	 * TDX Module doesn't allow the hypervisor to modify the EOI-bitmap,
> +	 * i.e. all EOIs are accelerated and never trigger exits.
> +	 */
> +	kvm->arch.has_protected_eoi = true;
> +
>  	kvm_tdx->state = TD_STATE_UNINITIALIZED;
>  
>  	return 0;
> @@ -671,13 +677,8 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
>  	if (kvm_tdx->state != TD_STATE_INITIALIZED)
>  		return -EIO;
>  
> -	/*
> -	 * TDX module mandates APICv, which requires an in-kernel local APIC.
> -	 * Disallow an in-kernel I/O APIC, because level-triggered interrupts
> -	 * and thus the I/O APIC as a whole can't be faithfully emulated in KVM.
> -	 */
> -	if (!irqchip_split(vcpu->kvm))
> -		return -EINVAL;
> +	/* Split irqchip should be enforced at irqchip creation time. */
> +	KVM_BUG_ON(irqchip_split(vcpu->kvm), vcpu->kvm);

Sadly, the existing check needs to stay, because userspace could simply not create
any irqchip.  My complaints about KVM_CREATE_IRQCHIP is that KVM is allowing an
explicit action that is unsupported/invalid.  For lack of an in-kernel local APIC,
there's no better alternative to enforcing the check at vCPU creation.

>  	fpstate_set_confidential(&vcpu->arch.guest_fpu);
>  	vcpu->arch.apic->guest_apic_protected = true;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a1c49bc681c4..a846dd3dcb23 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6966,6 +6966,16 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
>  		if (irqchip_in_kernel(kvm))
>  			goto create_irqchip_unlock;
>  
> +		/*
> +		 * Disallow an in-kernel I/O APIC for platforms that has protected
> +		 * EOI (such as TDX). The hypervisor can't modify the EOI-bitmap
> +		 * on these platforms which prevents the proper emulation of
> +		 * level-triggered interrupts.
> +		 */

Slight tweak to shorten this and to avoid mentioning the EOI-bitmap.  The use of
a software-controlled EOI-bitmap is a vendor specific detail, and it's not so much
the inability to modify the bitmap that's problematic, it's that TDX doesn't
allow intercepting EOIs.  E.g. TDX also requires x2APIC and PICv to be enabled,
without which EOIs would effectively be intercepted by other means.

		/*
		 * Disallow an in-kernel I/O APIC if the VM has protected EOIs,
		 * i.e. if KVM can't intercept EOIs and thus can't properly
		 * emulate level-triggered interrupts.
		 */

> +		r = -ENOTTY;
> +		if (kvm_arch_has_protected_eoi(kvm))

No need for a macro wrapper, just do

		if (kvm->arch.has_protected_eoi)

kvm_arch_has_readonly_mem() and similar accessors exist so that arch-neutral
code, e.g. check_memory_region_flags() in kvm_main.c, can query arch-specific
state.  Nothing outside of KVM x86 should care about protected EOI, because that's
very much an x86-specific detail.

> +			goto create_irqchip_unlock;
> +
>  		r = -EINVAL;
>  		if (kvm->created_vcpus)
>  			goto create_irqchip_unlock;
> -- 
> 2.51.0.261.g7ce5a0a67e-goog
> 

  reply	other threads:[~2025-08-26 21:48 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-26 21:34 [PATCH] KVM: TDX: Force split irqchip for TDX at irqchip creation time Sagi Shahar
2025-08-26 21:48 ` Sean Christopherson [this message]
2025-08-27  1:19   ` Sagi Shahar
2025-08-27  1:30 ` Binbin Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aK4rmD7QpotYXume@google.com \
    --to=seanjc@google.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=ira.weiny@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=sagis@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).