From: Sean Christopherson <seanjc@google.com>
To: Adrian Hunter <adrian.hunter@intel.com>
Cc: Vishal Annapurve <vannapurve@google.com>,
pbonzini@redhat.com, kvm@vger.kernel.org,
rick.p.edgecombe@intel.com, kirill.shutemov@linux.intel.com,
kai.huang@intel.com, reinette.chatre@intel.com,
xiaoyao.li@intel.com, tony.lindgren@linux.intel.com,
binbin.wu@linux.intel.com, isaku.yamahata@intel.com,
linux-kernel@vger.kernel.org, yan.y.zhao@intel.com,
chao.gao@intel.com
Subject: Re: [PATCH V4 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM
Date: Wed, 18 Jun 2025 17:33:48 -0700 [thread overview]
Message-ID: <aFNa7L74tjztduT-@google.com> (raw)
In-Reply-To: <0df27aaf-51be-4003-b8a7-8e623075709e@intel.com>
On Wed, Jun 18, 2025, Adrian Hunter wrote:
> On 18/06/2025 09:00, Vishal Annapurve wrote:
> > On Tue, Jun 17, 2025 at 10:50 PM Adrian Hunter <adrian.hunter@intel.com> wrote:
> >>> Ability to clean up memslots from userspace without closing
> >>> VM/guest_memfd handles is useful to keep reusing the same guest_memfds
> >>> for the next boot iteration of the VM in case of reboot.
> >>
> >> TD lifecycle does not include reboot. In other words, reboot is
> >> done by shutting down the TD and then starting again with a new TD.
> >>
> >> AFAIK it is not currently possible to shut down without closing
> >> guest_memfds since the guest_memfd holds a reference (users_count)
> >> to struct kvm, and destruction begins when users_count hits zero.
> >>
> >
> > gmem link support[1] allows associating existing guest_memfds with new
> > VM instances.
> >
> > Breakdown of the userspace VMM flow:
> > 1) Create a new VM instance before closing guest_memfd files.
> > 2) Link existing guest_memfd files with the new VM instance. -> This
> > creates new set of files backed by the same inode but associated with
> > the new VM instance.
>
> So what about:
>
> 2.5) Call KVM_TDX_TERMINATE_VM IOCTL
>
> Memory reclaimed after KVM_TDX_TERMINATE_VM will be done efficiently,
> so avoid causing it to be reclaimed earlier.
The problem is that setting kvm->vm_dead will prevent (3) from succeeding. If
kvm->vm_dead is set, KVM will reject all vCPU, VM, and device (not /dev/kvm the
device, but rather devices bound to the VM) ioctls.
I intended that behavior, e.g. to guard against userspace blowing up KVM because
the hkid was released, I just didn't consider the memslots angle.
The other thing I didn't consider at the time, is that vm_dead doesn't fully
protect against ioctls that are already in flight. E.g. see commit
17c80d19df6f ("KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation
is in-flight"). Though without a failure/splat of some kind, it's impossible to
to know if this is actually a problem. I.e. I don't think we should *entirely*
scrap blocking ioctls just because it *might* not be perfect (we can always make
KVM better).
I can think of a few options:
1. Skip the vm_dead check if userspace is deleting a memslot.
2. Provide a way for userspace to delete all memslots, and have that bypass
vm_dead.
3. Delete all memslots on KVM_TDX_TERMINATE_VM.
4. Remove vm_dead and instead reject ioctls based on vm_bugged, and simply rely
on KVM_REQ_VM_DEAD to prevent running the guest. I.e. tweak kvm_vm_dead()
to be that it only prevents running the VM, and have kvm_vm_bugged() be the
"something is badly broken, try to limit the damage".
I'm heavily leaning toward #4. #1 is doable, but painful. #2 is basically #1,
but with new uAPI. #3 is all kinds of gross, e.g. userspace might want to simply
kill the VM and move on. KVM would still block ioctls, but only if a bug was
detected. And the few use cases where KVM just wants to prevent entering the
guest won't prevent gracefully tearing down the VM.
Hah! And there's actually a TDX bug fix here, because "checking" for KVM_REQ_VM_DEAD
in kvm_tdp_map_page() and tdx_handle_ept_violation() will *clear* the request,
which isn't what we want, e.g. a vCPU could actually re-enter the guest at that
point.
This is what I'm thinking. If I don't hate it come Friday (or Monday), I'll turn
this patch into a mini-series and post v5.
Adrian, does that work for you?
---
arch/arm64/kvm/arm.c | 2 +-
arch/arm64/kvm/vgic/vgic-init.c | 2 +-
arch/x86/kvm/x86.c | 2 +-
include/linux/kvm_host.h | 2 --
virt/kvm/kvm_main.c | 10 +++++-----
5 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index de2b4e9c9f9f..18bd80388b59 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1017,7 +1017,7 @@ static int kvm_vcpu_suspend(struct kvm_vcpu *vcpu)
static int check_vcpu_requests(struct kvm_vcpu *vcpu)
{
if (kvm_request_pending(vcpu)) {
- if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu))
+ if (kvm_test_request(KVM_REQ_VM_DEAD, vcpu))
return -EIO;
if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index eb1205654ac8..c2033bae73b2 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -612,7 +612,7 @@ int kvm_vgic_map_resources(struct kvm *kvm)
mutex_unlock(&kvm->arch.config_lock);
out_slots:
if (ret)
- kvm_vm_dead(kvm);
+ kvm_vm_bugged(kvm);
mutex_unlock(&kvm->slots_lock);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b58a74c1722d..37f835d77b65 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10783,7 +10783,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
bool req_immediate_exit = false;
if (kvm_request_pending(vcpu)) {
- if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
+ if (kvm_test_request(KVM_REQ_VM_DEAD, vcpu)) {
r = -EIO;
goto out;
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3bde4fb5c6aa..56898e4ab524 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -853,7 +853,6 @@ struct kvm {
u32 dirty_ring_size;
bool dirty_ring_with_bitmap;
bool vm_bugged;
- bool vm_dead;
#ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
struct notifier_block pm_notifier;
@@ -893,7 +892,6 @@ struct kvm {
static inline void kvm_vm_dead(struct kvm *kvm)
{
- kvm->vm_dead = true;
kvm_make_all_cpus_request(kvm, KVM_REQ_VM_DEAD);
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index eec82775c5bf..4220579a9a74 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4403,7 +4403,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
struct kvm_fpu *fpu = NULL;
struct kvm_sregs *kvm_sregs = NULL;
- if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
+ if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_bugged)
return -EIO;
if (unlikely(_IOC_TYPE(ioctl) != KVMIO))
@@ -4646,7 +4646,7 @@ static long kvm_vcpu_compat_ioctl(struct file *filp,
void __user *argp = compat_ptr(arg);
int r;
- if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
+ if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_bugged)
return -EIO;
switch (ioctl) {
@@ -4712,7 +4712,7 @@ static long kvm_device_ioctl(struct file *filp, unsigned int ioctl,
{
struct kvm_device *dev = filp->private_data;
- if (dev->kvm->mm != current->mm || dev->kvm->vm_dead)
+ if (dev->kvm->mm != current->mm || dev->kvm->vm_bugged)
return -EIO;
switch (ioctl) {
@@ -5131,7 +5131,7 @@ static long kvm_vm_ioctl(struct file *filp,
void __user *argp = (void __user *)arg;
int r;
- if (kvm->mm != current->mm || kvm->vm_dead)
+ if (kvm->mm != current->mm || kvm->vm_bugged)
return -EIO;
switch (ioctl) {
case KVM_CREATE_VCPU:
@@ -5395,7 +5395,7 @@ static long kvm_vm_compat_ioctl(struct file *filp,
struct kvm *kvm = filp->private_data;
int r;
- if (kvm->mm != current->mm || kvm->vm_dead)
+ if (kvm->mm != current->mm || kvm->vm_bugged)
return -EIO;
r = kvm_arch_vm_compat_ioctl(filp, ioctl, arg);
base-commit: 8046d29dde17002523f94d3e6e0ebe486ce52166
--
next prev parent reply other threads:[~2025-06-19 0:33 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-11 9:51 [PATCH V4 0/1] KVM: TDX: Decrease TDX VM shutdown time Adrian Hunter
2025-06-11 9:51 ` [PATCH V4 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM Adrian Hunter
2025-06-16 3:40 ` Vishal Annapurve
2025-06-18 5:50 ` Adrian Hunter
2025-06-18 6:00 ` Vishal Annapurve
2025-06-18 8:33 ` Adrian Hunter
2025-06-19 0:33 ` Sean Christopherson [this message]
2025-06-19 11:12 ` Adrian Hunter
2025-06-20 14:24 ` Sean Christopherson
2025-06-20 16:14 ` Vishal Annapurve
2025-06-20 16:26 ` Sean Christopherson
2025-06-23 20:36 ` Vishal Annapurve
2025-06-23 21:39 ` Sean Christopherson
2025-06-23 23:35 ` Vishal Annapurve
2025-06-20 18:59 ` Edgecombe, Rick P
2025-06-20 21:21 ` Vishal Annapurve
2025-06-20 23:34 ` Edgecombe, Rick P
2025-06-21 3:00 ` Vishal Annapurve
2025-06-23 16:23 ` Edgecombe, Rick P
2025-06-23 20:22 ` Vishal Annapurve
2025-06-23 22:51 ` Edgecombe, Rick P
2025-06-18 22:07 ` Edgecombe, Rick P
2025-06-23 20:40 ` Vishal Annapurve
2025-06-25 22:25 ` [PATCH V4 0/1] KVM: TDX: Decrease TDX VM shutdown time Sean Christopherson
2025-06-26 15:58 ` Sean Christopherson
2025-06-26 19:52 ` Adrian Hunter
2025-07-11 8:55 ` Xiaoyao Li
2025-07-11 13:05 ` Sean Christopherson
2025-07-11 13:40 ` Xiaoyao Li
2025-07-11 14:19 ` Sean Christopherson
2025-07-11 22:31 ` Edgecombe, Rick P
2025-07-11 22:54 ` Sean Christopherson
2025-07-11 23:04 ` Edgecombe, Rick P
2025-07-11 23:00 ` Edgecombe, Rick P
2025-07-11 23:05 ` Sean Christopherson
2025-07-11 23:17 ` Edgecombe, Rick P
2025-07-14 3:20 ` Xiaoyao Li
2025-07-14 13:56 ` Sean Christopherson
2025-07-14 15:06 ` Xiaoyao Li
2025-07-16 9:22 ` Xiaoyao Li
2025-07-18 15:35 ` Sean Christopherson
2025-07-17 9:14 ` Nikolay Borisov
2025-07-18 14:36 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aFNa7L74tjztduT-@google.com \
--to=seanjc@google.com \
--cc=adrian.hunter@intel.com \
--cc=binbin.wu@linux.intel.com \
--cc=chao.gao@intel.com \
--cc=isaku.yamahata@intel.com \
--cc=kai.huang@intel.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=reinette.chatre@intel.com \
--cc=rick.p.edgecombe@intel.com \
--cc=tony.lindgren@linux.intel.com \
--cc=vannapurve@google.com \
--cc=xiaoyao.li@intel.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.