* [PATCH V3 0/1] KVM: TDX: Decrease TDX VM shutdown time
@ 2025-04-25 7:57 Adrian Hunter
2025-04-25 7:57 ` [PATCH V3 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM Adrian Hunter
0 siblings, 1 reply; 5+ messages in thread
From: Adrian Hunter @ 2025-04-25 7:57 UTC (permalink / raw)
To: pbonzini, seanjc
Cc: mlevitsk, kvm, rick.p.edgecombe, kirill.shutemov, kai.huang,
reinette.chatre, xiaoyao.li, tony.lindgren, binbin.wu,
isaku.yamahata, linux-kernel, yan.y.zhao, chao.gao
Hi
Changes in V3:
Refer:
https://lore.kernel.org/r/aAL4dT1pWG5dDDeo@google.com
Remove KVM_BUG_ON() from tdx_mmu_release_hkid() because it would
trigger on the error path from __tdx_td_init()
Put cpus_read_lock() handling back into tdx_mmu_release_hkid()
Handle KVM_TDX_TERMINATE_VM in the switch statement, i.e. let
tdx_vm_ioctl() deal with kvm->lock
The version 1 RFC:
https://lore.kernel.org/all/20250313181629.17764-1-adrian.hunter@intel.com/
listed 3 options and implemented option 2. Sean replied with code for
option 1, which tested out OK, so here it is plus a commit log.
It depends upon kvm_trylock_all_vcpus(kvm) which is the assumed result
of Maxim's work-in-progress here:
https://lore.kernel.org/all/20250409014136.2816971-1-mlevitsk@redhat.com/
Note it is assumed that kvm_trylock_all_vcpus(kvm) follows the return value
semantics of mutex_trylock() i.e. 1 means locks have been successfully
acquired, 0 means not succesful.
Sean Christopherson (1):
KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM
Documentation/virt/kvm/x86/intel-tdx.rst | 16 ++++++++
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/vmx/tdx.c | 63 ++++++++++++++++++++++----------
3 files changed, 61 insertions(+), 19 deletions(-)
Regards
Adrian
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH V3 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM
2025-04-25 7:57 [PATCH V3 0/1] KVM: TDX: Decrease TDX VM shutdown time Adrian Hunter
@ 2025-04-25 7:57 ` Adrian Hunter
2025-05-11 8:57 ` Adrian Hunter
2025-06-06 19:17 ` Sean Christopherson
0 siblings, 2 replies; 5+ messages in thread
From: Adrian Hunter @ 2025-04-25 7:57 UTC (permalink / raw)
To: pbonzini, seanjc
Cc: mlevitsk, kvm, rick.p.edgecombe, kirill.shutemov, kai.huang,
reinette.chatre, xiaoyao.li, tony.lindgren, binbin.wu,
isaku.yamahata, linux-kernel, yan.y.zhao, chao.gao
From: Sean Christopherson <seanjc@google.com>
Add sub-ioctl KVM_TDX_TERMINATE_VM to release the HKID prior to shutdown,
which enables more efficient reclaim of private memory.
Private memory is removed from MMU/TDP when guest_memfds are closed. If
the HKID has not been released, the TDX VM is still in RUNNABLE state,
so pages must be removed using "Dynamic Page Removal" procedure (refer
TDX Module Base spec) which involves a number of steps:
Block further address translation
Exit each VCPU
Clear Secure EPT entry
Flush/write-back/invalidate relevant caches
However, when the HKID is released, the TDX VM moves to TD_TEARDOWN state
where all TDX VM pages are effectively unmapped, so pages can be reclaimed
directly.
Reclaiming TD Pages in TD_TEARDOWN State was seen to decrease the total
reclaim time. For example:
VCPUs Size (GB) Before (secs) After (secs)
4 18 72 24
32 107 517 134
64 400 5539 467
Link: https://lore.kernel.org/r/Z-V0qyTn2bXdrPF7@google.com
Link: https://lore.kernel.org/r/aAL4dT1pWG5dDDeo@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
Changes in V3:
Remove KVM_BUG_ON() from tdx_mmu_release_hkid() because it would
trigger on the error path from __tdx_td_init()
Put cpus_read_lock() handling back into tdx_mmu_release_hkid()
Handle KVM_TDX_TERMINATE_VM in the switch statement, i.e. let
tdx_vm_ioctl() deal with kvm->lock
Documentation/virt/kvm/x86/intel-tdx.rst | 16 +++++++++
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/vmx/tdx.c | 41 +++++++++++++++---------
3 files changed, 43 insertions(+), 15 deletions(-)
diff --git a/Documentation/virt/kvm/x86/intel-tdx.rst b/Documentation/virt/kvm/x86/intel-tdx.rst
index de41d4c01e5c..e5d4d9cf4cf2 100644
--- a/Documentation/virt/kvm/x86/intel-tdx.rst
+++ b/Documentation/virt/kvm/x86/intel-tdx.rst
@@ -38,6 +38,7 @@ ioctl with TDX specific sub-ioctl() commands.
KVM_TDX_INIT_MEM_REGION,
KVM_TDX_FINALIZE_VM,
KVM_TDX_GET_CPUID,
+ KVM_TDX_TERMINATE_VM,
KVM_TDX_CMD_NR_MAX,
};
@@ -214,6 +215,21 @@ struct kvm_cpuid2.
__u32 padding[3];
};
+KVM_TDX_TERMINATE_VM
+-------------------
+:Type: vm ioctl
+:Returns: 0 on success, <0 on error
+
+Release Host Key ID (HKID) to allow more efficient reclaim of private memory.
+After this, the TD is no longer in a runnable state.
+
+Using KVM_TDX_TERMINATE_VM is optional.
+
+- id: KVM_TDX_TERMINATE_VM
+- flags: must be 0
+- data: must be 0
+- hw_error: must be 0
+
KVM TDX creation flow
=====================
In addition to the standard KVM flow, new TDX ioctls need to be called. The
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 225a12e0d5d6..a2f973e1d75d 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -939,6 +939,7 @@ enum kvm_tdx_cmd_id {
KVM_TDX_INIT_MEM_REGION,
KVM_TDX_FINALIZE_VM,
KVM_TDX_GET_CPUID,
+ KVM_TDX_TERMINATE_VM,
KVM_TDX_CMD_NR_MAX,
};
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index b952bc673271..5161f6f891d7 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -500,14 +500,7 @@ void tdx_mmu_release_hkid(struct kvm *kvm)
*/
mutex_lock(&tdx_lock);
- /*
- * Releasing HKID is in vm_destroy().
- * After the above flushing vps, there should be no more vCPU
- * associations, as all vCPU fds have been released at this stage.
- */
err = tdh_mng_vpflushdone(&kvm_tdx->td);
- if (err == TDX_FLUSHVP_NOT_DONE)
- goto out;
if (KVM_BUG_ON(err, kvm)) {
pr_tdx_error(TDH_MNG_VPFLUSHDONE, err);
pr_err("tdh_mng_vpflushdone() failed. HKID %d is leaked.\n",
@@ -515,6 +508,7 @@ void tdx_mmu_release_hkid(struct kvm *kvm)
goto out;
}
+ write_lock(&kvm->mmu_lock);
for_each_online_cpu(i) {
if (packages_allocated &&
cpumask_test_and_set_cpu(topology_physical_package_id(i),
@@ -539,7 +533,7 @@ void tdx_mmu_release_hkid(struct kvm *kvm)
} else {
tdx_hkid_free(kvm_tdx);
}
-
+ write_unlock(&kvm->mmu_lock);
out:
mutex_unlock(&tdx_lock);
cpus_read_unlock();
@@ -1789,13 +1783,13 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
struct page *page = pfn_to_page(pfn);
int ret;
- /*
- * HKID is released after all private pages have been removed, and set
- * before any might be populated. Warn if zapping is attempted when
- * there can't be anything populated in the private EPT.
- */
- if (KVM_BUG_ON(!is_hkid_assigned(to_kvm_tdx(kvm)), kvm))
- return -EINVAL;
+ if (!is_hkid_assigned(to_kvm_tdx(kvm))) {
+ WARN_ON_ONCE(!kvm->vm_dead);
+ ret = tdx_reclaim_page(page);
+ if (!ret)
+ tdx_unpin(kvm, page);
+ return ret;
+ }
ret = tdx_sept_zap_private_spte(kvm, gfn, level, page);
if (ret <= 0)
@@ -2790,6 +2784,20 @@ static int tdx_td_finalize(struct kvm *kvm, struct kvm_tdx_cmd *cmd)
return 0;
}
+static int tdx_terminate_vm(struct kvm *kvm)
+{
+ if (!kvm_trylock_all_vcpus(kvm))
+ return -EBUSY;
+
+ kvm_vm_dead(kvm);
+
+ kvm_unlock_all_vcpus(kvm);
+
+ tdx_mmu_release_hkid(kvm);
+
+ return 0;
+}
+
int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_tdx_cmd tdx_cmd;
@@ -2817,6 +2825,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
case KVM_TDX_FINALIZE_VM:
r = tdx_td_finalize(kvm, &tdx_cmd);
break;
+ case KVM_TDX_TERMINATE_VM:
+ r = tdx_terminate_vm(kvm);
+ break;
default:
r = -EINVAL;
goto out;
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH V3 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM
2025-04-25 7:57 ` [PATCH V3 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM Adrian Hunter
@ 2025-05-11 8:57 ` Adrian Hunter
2025-06-06 14:53 ` Adrian Hunter
2025-06-06 19:17 ` Sean Christopherson
1 sibling, 1 reply; 5+ messages in thread
From: Adrian Hunter @ 2025-05-11 8:57 UTC (permalink / raw)
To: pbonzini, seanjc
Cc: mlevitsk, kvm, rick.p.edgecombe, kirill.shutemov, kai.huang,
reinette.chatre, xiaoyao.li, tony.lindgren, binbin.wu,
isaku.yamahata, linux-kernel, yan.y.zhao, chao.gao
On 25/04/2025 10:57, Adrian Hunter wrote:
> +static int tdx_terminate_vm(struct kvm *kvm)
> +{
> + if (!kvm_trylock_all_vcpus(kvm))
Introduction of kvm_trylock_all_vcpus() is still in progress:
https://lore.kernel.org/r/20250430203013.366479-3-mlevitsk@redhat.com/
but it has kvm_trylock_all_vcpus(kvm) return value the other way around, so
this will instead need to be:
if (kvm_trylock_all_vcpus(kvm))
> + return -EBUSY;
> +
> + kvm_vm_dead(kvm);
> +
> + kvm_unlock_all_vcpus(kvm);
> +
> + tdx_mmu_release_hkid(kvm);
> +
> + return 0;
> +}
> +
> int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
> {
> struct kvm_tdx_cmd tdx_cmd;
> @@ -2817,6 +2825,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
> case KVM_TDX_FINALIZE_VM:
> r = tdx_td_finalize(kvm, &tdx_cmd);
> break;
> + case KVM_TDX_TERMINATE_VM:
> + r = tdx_terminate_vm(kvm);
> + break;
> default:
> r = -EINVAL;
> goto out;
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH V3 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM
2025-05-11 8:57 ` Adrian Hunter
@ 2025-06-06 14:53 ` Adrian Hunter
0 siblings, 0 replies; 5+ messages in thread
From: Adrian Hunter @ 2025-06-06 14:53 UTC (permalink / raw)
To: pbonzini, seanjc
Cc: mlevitsk, kvm, rick.p.edgecombe, kirill.shutemov, kai.huang,
reinette.chatre, xiaoyao.li, tony.lindgren, binbin.wu,
isaku.yamahata, linux-kernel, yan.y.zhao, chao.gao
On 11/05/2025 11:57, Adrian Hunter wrote:
> On 25/04/2025 10:57, Adrian Hunter wrote:
>> +static int tdx_terminate_vm(struct kvm *kvm)
>> +{
>> + if (!kvm_trylock_all_vcpus(kvm))
>
> Introduction of kvm_trylock_all_vcpus() is still in progress:
>
> https://lore.kernel.org/r/20250430203013.366479-3-mlevitsk@redhat.com/
>
> but it has kvm_trylock_all_vcpus(kvm) return value the other way around, so
> this will instead need to be:
>
> if (kvm_trylock_all_vcpus(kvm))
>
Sean, do you have any comments on this patch? Should I send out a new version
with the change above?
Note kvm_trylock_all_vcpus() now in Linus' tree:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e4a454ced74c0ac97c8bd32f086ee3ad74528780
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH V3 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM
2025-04-25 7:57 ` [PATCH V3 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM Adrian Hunter
2025-05-11 8:57 ` Adrian Hunter
@ 2025-06-06 19:17 ` Sean Christopherson
1 sibling, 0 replies; 5+ messages in thread
From: Sean Christopherson @ 2025-06-06 19:17 UTC (permalink / raw)
To: Adrian Hunter
Cc: pbonzini, mlevitsk, kvm, rick.p.edgecombe, kirill.shutemov,
kai.huang, reinette.chatre, xiaoyao.li, tony.lindgren, binbin.wu,
isaku.yamahata, linux-kernel, yan.y.zhao, chao.gao
On Fri, Apr 25, 2025, Adrian Hunter wrote:
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index b952bc673271..5161f6f891d7 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -500,14 +500,7 @@ void tdx_mmu_release_hkid(struct kvm *kvm)
> */
> mutex_lock(&tdx_lock);
>
> - /*
> - * Releasing HKID is in vm_destroy().
> - * After the above flushing vps, there should be no more vCPU
> - * associations, as all vCPU fds have been released at this stage.
> - */
> err = tdh_mng_vpflushdone(&kvm_tdx->td);
> - if (err == TDX_FLUSHVP_NOT_DONE)
> - goto out;
This belongs in a separate patch, with a changelog explaining what's up. Because
my original "suggestion"[1] was simply a question :-)
+ /* Uh, what's going on here? */
if (err == TDX_FLUSHVP_NOT_DONE)
You did all the hard work of tracking down the history, and as above, this
definitely warrants its own changelog.
[1] https://lkml.kernel.org/r/Z-V0qyTn2bXdrPF7%40google.com
[2] https://lore.kernel.org/all/d7e220ab-3000-408b-9dd6-0e7ee06d79ec@intel.com
> if (KVM_BUG_ON(err, kvm)) {
> pr_tdx_error(TDH_MNG_VPFLUSHDONE, err);
> pr_err("tdh_mng_vpflushdone() failed. HKID %d is leaked.\n",
> @@ -515,6 +508,7 @@ void tdx_mmu_release_hkid(struct kvm *kvm)
> goto out;
> }
>
> + write_lock(&kvm->mmu_lock);
> for_each_online_cpu(i) {
> if (packages_allocated &&
> cpumask_test_and_set_cpu(topology_physical_package_id(i),
> @@ -539,7 +533,7 @@ void tdx_mmu_release_hkid(struct kvm *kvm)
> } else {
> tdx_hkid_free(kvm_tdx);
> }
> -
> + write_unlock(&kvm->mmu_lock);
> out:
> mutex_unlock(&tdx_lock);
> cpus_read_unlock();
> @@ -1789,13 +1783,13 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
> struct page *page = pfn_to_page(pfn);
> int ret;
>
> - /*
> - * HKID is released after all private pages have been removed, and set
> - * before any might be populated. Warn if zapping is attempted when
> - * there can't be anything populated in the private EPT.
> - */
> - if (KVM_BUG_ON(!is_hkid_assigned(to_kvm_tdx(kvm)), kvm))
> - return -EINVAL;
> + if (!is_hkid_assigned(to_kvm_tdx(kvm))) {
> + WARN_ON_ONCE(!kvm->vm_dead);
Should this be a KVM_BUG_ON? I.e. to kill the VM? That'd set vm_dead, which is
kinda neat, i.e. that it'd achieve what the warning is warning about :-)
> + ret = tdx_reclaim_page(page);
> + if (!ret)
> + tdx_unpin(kvm, page);
> + return ret;
> + }
>
> ret = tdx_sept_zap_private_spte(kvm, gfn, level, page);
> if (ret <= 0)
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-06-06 19:17 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-25 7:57 [PATCH V3 0/1] KVM: TDX: Decrease TDX VM shutdown time Adrian Hunter
2025-04-25 7:57 ` [PATCH V3 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM Adrian Hunter
2025-05-11 8:57 ` Adrian Hunter
2025-06-06 14:53 ` Adrian Hunter
2025-06-06 19:17 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).