kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Adrian Hunter <adrian.hunter@intel.com>
Cc: Marc Zyngier <maz@kernel.org>,
	Oliver Upton <oliver.upton@linux.dev>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	linux-arm-kernel@lists.infradead.org,  kvmarm@lists.linux.dev,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 Vishal Annapurve <vannapurve@google.com>,
	Xiaoyao Li <xiaoyao.li@intel.com>,
	 Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Nikolay Borisov <nik.borisov@suse.com>,
	 Yan Y Zhao <yan.y.zhao@intel.com>,
	Kai Huang <kai.huang@intel.com>,
	 "binbin.wu@linux.intel.com" <binbin.wu@linux.intel.com>
Subject: Re: [PATCH 5/5] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM
Date: Fri, 1 Aug 2025 09:44:18 -0700	[thread overview]
Message-ID: <aIzu4q_7yBmCIOWK@google.com> (raw)
In-Reply-To: <b27f807e-b04f-487d-be13-74a8b0a61b42@intel.com>

+Chao

On Fri, Aug 01, 2025, Adrian Hunter wrote:
> On 29/07/2025 22:33, Sean Christopherson wrote:
> > +static int tdx_terminate_vm(struct kvm *kvm)
> > +{
> > +	if (kvm_trylock_all_vcpus(kvm))
> > +		return -EBUSY;
> > +
> > +	kvm_vm_dead(kvm);
> > +	to_kvm_tdx(kvm)->vm_terminated = true;
> > +
> > +	kvm_unlock_all_vcpus(kvm);
> > +
> > +	tdx_mmu_release_hkid(kvm);
> > +
> > +	return 0;
> > +}
> 
> As I think I mentioned when removing vm_dead first came up,
> I think we need more checks.  I spent some time going through
> the code and came up with what is below:
> 
> First, we need to avoid TDX VCPU sub-IOCTLs from racing with
> tdx_mmu_release_hkid().  But having any TDX sub-IOCTL run after
> KVM_TDX_TERMINATE_VM raises questions of what might happen, so
> it is much simpler to understand, if that is not possible.
> There are 3 options:
> 
> 1. Require that KVM_TDX_TERMINATE_VM is valid only if
> kvm_tdx->state == TD_STATE_RUNNABLE.  Since currently all
> the TDX sub-IOCTLs are for initialization, that would block
> the opportunity for any to run after KVM_TDX_TERMINATE_VM.
> 
> 2. Check vm_terminated in tdx_vm_ioctl() and tdx_vcpu_ioctl()
> 
> 3. Test KVM_REQ_VM_DEAD in tdx_vm_ioctl() and tdx_vcpu_ioctl()
> 
> [ Note cannot check is_hkid_assigned() because that is racy ]
> 
> Secondly, I suggest we avoid SEAMCALLs that will fail and
> result in KVM_BUG_ON() if HKID has been released.
> 
> There are 2 groups of those: MMU-related and TDVPS_ACCESSORS.
> 
> For the MMU-related, the following 2 functions should return
> an error immediately if vm_terminated:
> 
> 	tdx_sept_link_private_spt()
> 	tdx_sept_set_private_spte()
> 
> For that not be racy, extra synchronization is needed so that
> vm_terminated can be reliably checked when holding mmu lock
> i.e.
> 
> static int tdx_terminate_vm(struct kvm *kvm)
> {
> 	if (kvm_trylock_all_vcpus(kvm))
> 		return -EBUSY;
> 
> 	kvm_vm_dead(kvm);
> +
> +       write_lock(&kvm->mmu_lock);
> 	to_kvm_tdx(kvm)->vm_terminated = true;
> +       write_unlock(&kvm->mmu_lock);
> 
> 	kvm_unlock_all_vcpus(kvm);
> 
> 	tdx_mmu_release_hkid(kvm);
> 
> 	return 0;
> }
> 
> Finally, there are 2 TDVPS_ACCESSORS that need avoiding:
> 
> 	tdx_load_mmu_pgd()
> 		skip td_vmcs_write64() if vm_terminated
> 
> 	tdx_protected_apic_has_interrupt()
> 		skip td_state_non_arch_read64() if vm_terminated

Oof.  And as Chao pointed out[*], removing the vm_dead check would allow creating
and running vCPUs in a dead VM, which is most definitely not desirable.  Squashing
the vCPU creation case is easy enough if we keep vm_dead but still generally allow
ioctls, and it's probably worth doing that no matter what (to plug the hole where
pending vCPU creations could succeed):

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d477a7fda0ae..941d2c32b7dc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4207,6 +4207,11 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
 
        mutex_lock(&kvm->lock);
 
+       if (kvm->vm_dead) {
+               r = -EIO;
+               goto unlock_vcpu_destroy;
+       }
+
        if (kvm_get_vcpu_by_id(kvm, id)) {
                r = -EEXIST;
                goto unlock_vcpu_destroy;

And then to ensure vCPUs can't do anything, check KVM_REQ_VM_DEAD after acquiring
vcpu->mutex.

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6c07dd423458..883077eee4ce 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4433,6 +4433,12 @@ static long kvm_vcpu_ioctl(struct file *filp,
 
        if (mutex_lock_killable(&vcpu->mutex))
                return -EINTR;
+
+       if (kvm_test_request(KVM_REQ_VM_DEAD, vcpu)) {
+               r = -EIO;
+               goto out;
+       }
+
        switch (ioctl) {
        case KVM_RUN: {
                struct pid *oldpid;


That should address all TDVPS paths (I hope), and I _think_ would address all
MMU-related paths as well?  E.g. prefault requires a vCPU.

Disallowing (most) vCPU ioctls but not all VM ioctls on vm_dead isn't great ABI
(understatement), but I think we need/want the above changes even if we keep the
general vm_dead restriction.  And given the extremely ad hoc behavior of taking
kvm->lock for VM ioctls, trying to enforce vm_dead for "all" VM ioctls seems like
a fool's errand.

So I'm leaning toward keeping "KVM: Reject ioctls only if the VM is bugged, not
simply marked dead" (with a different shortlog+changelog), but keeping vm_dead
(and not introducing kvm_tdx.vm_terminated).

Thoughts?

[*] https://lore.kernel.org/all/aIlzeT+yFG2Tvb3%2F@intel.com

  reply	other threads:[~2025-08-01 16:44 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-29 19:33 [PATCH 0/5] KVM: Drop vm_dead, pivot on vm_bugged for -EIO Sean Christopherson
2025-07-29 19:33 ` [PATCH 1/5] KVM: Never clear KVM_REQ_VM_DEAD from a vCPU's requests Sean Christopherson
2025-07-29 19:33 ` [PATCH 2/5] KVM: TDX: Exit with MEMORY_FAULT on unexpected pending S-EPT Violation Sean Christopherson
2025-07-29 22:27   ` Edgecombe, Rick P
2025-07-29 22:54     ` Sean Christopherson
2025-07-29 22:58       ` Edgecombe, Rick P
2025-07-29 23:08         ` Sean Christopherson
2025-07-29 23:13           ` Edgecombe, Rick P
2025-07-30  5:45         ` Yan Zhao
2025-07-30  5:55     ` Yan Zhao
2025-07-30 12:59       ` Edgecombe, Rick P
2025-07-30  2:07   ` Xiaoyao Li
2025-07-30  6:04     ` Yan Zhao
2025-07-29 19:33 ` [PATCH 3/5] KVM: Reject ioctls only if the VM is bugged, not simply marked dead Sean Christopherson
2025-07-30  1:20   ` Chao Gao
2025-07-29 19:33 ` [PATCH 4/5] KVM: selftests: Use for-loop to handle all successful SEV migrations Sean Christopherson
2025-07-29 19:33 ` [PATCH 5/5] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM Sean Christopherson
2025-08-01 13:56   ` Adrian Hunter
2025-08-01 16:44     ` Sean Christopherson [this message]
2025-08-03 17:41       ` Adrian Hunter
2025-08-06  6:06       ` Chao Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aIzu4q_7yBmCIOWK@google.com \
    --to=seanjc@google.com \
    --cc=adrian.hunter@intel.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=nik.borisov@suse.com \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=vannapurve@google.com \
    --cc=xiaoyao.li@intel.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).