* [PATCH 0/5] Introduce a quirk to control memslot zap behavior
@ 2024-06-13 6:06 Yan Zhao
2024-06-13 6:09 ` [PATCH 1/5] KVM: x86/mmu: " Yan Zhao
` (6 more replies)
0 siblings, 7 replies; 18+ messages in thread
From: Yan Zhao @ 2024-06-13 6:06 UTC (permalink / raw)
To: pbonzini, seanjc
Cc: rick.p.edgecombe, kai.huang, isaku.yamahata, dmatlack, sagis,
erdemaktas, linux-kernel, kvm, Yan Zhao
Today "zapping only memslot leaf SPTEs" on moving/deleting a memslot is not
done. Instead, KVM opts to invalidate all page tables and generate fresh
new ones based on the new memslot layout (referred to as "zap all" for
short). This "zap all" behavior is of low overhead for most use cases, and
is adopted primarily due to a bug which caused VM instability when a VM is
with Nvidia Geforce GPU assigned (see link in patch 1).
However, the "zap all" behavior is not desired for certain specific
scenarios. e.g.
- It's not viable for TDX,
a) TDX requires root page of private page table remains unaltered
throughout the TD life cycle.
b) TDX mandates that leaf entries in private page table must be zapped
prior to non-leaf entries.
c) TDX requires re-accepting of private pages after page dropping.
- It's not performant for scenarios involving frequent deletion and
re-adding of numerous small memslots.
This series therefore introduces the KVM_X86_QUIRK_SLOT_ZAP_ALL quirk,
enabling users to control the behavior of memslot zapping when a memslot is
moved/deleted.
The quirk is turned on by default, leading to invalidation/zapping to all
SPTEs when a memslot is moved/deleted.
Users have the option to turn off the quirk. Doing so will limit the
zapping to only leaf SPTEs within the range of memslot being moved/deleted.
This series has been tested with
- Normal VMs
w/ and w/o device assignment, and kvm selftests
- TDX guests.
Memslot deletion typically does not occur without device assignment for a
TD. Therefore, it is tested with shared device assignment.
Note: For TDX integration, the quirk is currently disabled via TDX code in
QEMU rather than being automatically disabled based on VM type in
KVM, which is not safe. A malfunctioning QEMU that fails to disable
the quirk could result in the shared EPT being invalidated while the
private EPT remains unaffected, as kvm_mmu_zap_all_fast() only
targets the shared EPT.
However, current kvm->arch.disabled_quirks is entirely
user-controlled, and there is no mechanism for users to verify if a
quirk has been disabled by the kernel.
We are therefore wondering which below options are better for TDX:
a) Add a condition for TDX VM type in kvm_arch_flush_shadow_memslot()
besides the testing of kvm_check_has_quirk(). It is similar to
"all new VM types have the quirk disabled". e.g.
static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm)
{
return kvm->arch.vm_type != KVM_X86_TDX_VM &&
kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
}
b) Init the disabled_quirks based on VM type in kernel, extend
disabled_quirk querying/setting interface to enforce the quirk to
be disabled for TDX.
Patch 1: KVM changes.
Patch 2-5: Selftests updates. Verify memslot move/deletion functionality
with the quirk enabled/disabled.
Yan Zhao (5):
KVM: x86/mmu: Introduce a quirk to control memslot zap behavior
KVM: selftests: Test slot move/delete with slot zap quirk
enabled/disabled
KVM: selftests: Allow slot modification stress test with quirk
disabled
KVM: selftests: Test memslot move in memslot_perf_test with quirk
disabled
KVM: selftests: Test private access to deleted memslot with quirk
disabled
Documentation/virt/kvm/api.rst | 6 ++++
arch/x86/include/asm/kvm_host.h | 3 +-
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/mmu/mmu.c | 36 ++++++++++++++++++-
.../kvm/memslot_modification_stress_test.c | 19 ++++++++--
.../testing/selftests/kvm/memslot_perf_test.c | 12 ++++++-
.../selftests/kvm/set_memory_region_test.c | 29 ++++++++++-----
.../kvm/x86_64/private_mem_kvm_exits_test.c | 11 ++++--
8 files changed, 102 insertions(+), 15 deletions(-)
base-commit: dd5a440a31fae6e459c0d6271dddd62825505361
--
2.43.2
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/5] KVM: x86/mmu: Introduce a quirk to control memslot zap behavior
2024-06-13 6:06 [PATCH 0/5] Introduce a quirk to control memslot zap behavior Yan Zhao
@ 2024-06-13 6:09 ` Yan Zhao
2024-06-13 6:10 ` [PATCH 2/5] KVM: selftests: Test slot move/delete with slot zap quirk enabled/disabled Yan Zhao
` (5 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: Yan Zhao @ 2024-06-13 6:09 UTC (permalink / raw)
To: pbonzini, seanjc
Cc: rick.p.edgecombe, kai.huang, isaku.yamahata, dmatlack, sagis,
erdemaktas, linux-kernel, kvm, Yan Zhao
Introduce a quirk KVM_X86_QUIRK_SLOT_ZAP_ALL to allow users to choose
between enabling the quirk to invalidate/zap all SPTEs when a memslot is
moved/deleted and disabling the quirk to zap precisely/slowly of only leaf
SPTEs that are within the range of moving/deleting memslot.
The quirk is turned on by default. This is to work around a bug [1] where
the changing the zapping behavior of memslot move/deletion would cause VM
instability for VMs with an Nvidia GPU assigned.
Users have the option to deactivate the quirk for specific VMs that are
unaffected by this bug. Turning off the quirk enables a more precise
zapping of SPTEs within the memory slot range, enhancing performance for
certain scenarios [2] and meeting the functional requirements for TDX.
In TDX, it is crucial that the root page of the private page table remains
unchanged, with leaf entries being zapped before non-leaf entries.
Additionally, any pages dropped in TDX would necessitate their
re-acceptance by the guest.
Previously, an attempt was made to introduce a per-VM capability [3] as a
workaround for the bug. However, this approach was not preferred because
the root cause of the bug remained unidentified. An alternative solution
involving a per-memslot flag [4] was also considered but ultimately
rejected. Sean and Paolo thereafter recommended the implementation of this
quirk and explained that it's the least bad option [5].
For the quirk disabled case, add a new function kvm_mmu_zap_memslot_leafs()
to zap leaf SPTEs within a memslot when moving/deleting a memslot. Rather
than further calling kvm_unmap_gfn_range() for the actual zapping, this
function bypasses the special handling to APIC_ACCESS_PAGE_PRIVATE_MEMSLOT.
This is based on the considerations that
1) The APIC_ACCESS_PAGE_PRIVATE_MEMSLOT cannot be created by users, nor can
it be moved. It is only deleted by KVM when APICv is permanently
inhibited.
2) kvm_vcpu_reload_apic_access_page() effectively does nothing when
APIC_ACCESS_PAGE_PRIVATE_MEMSLOT is deleted.
3) Avoid making all cpus request of KVM_REQ_APIC_PAGE_RELOAD can save on
costly IPIs.
Suggested-by: Kai Huang <kai.huang@intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Link: https://patchwork.kernel.org/project/kvm/patch/20190205210137.1377-11-sean.j.christopherson@intel.com [1]
Link: https://patchwork.kernel.org/project/kvm/patch/20190205210137.1377-11-sean.j.christopherson@intel.com/#25054908 [2]
Link: https://lore.kernel.org/kvm/20200713190649.GE29725@linux.intel.com/T/#mabc0119583dacf621025e9d873c85f4fbaa66d5c [3]
Link: https://lore.kernel.org/all/20240515005952.3410568-3-rick.p.edgecombe@intel.com [4]
Link: https://lore.kernel.org/all/7df9032d-83e4-46a1-ab29-6c7973a2ab0b@redhat.com [5]
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
Documentation/virt/kvm/api.rst | 6 ++++++
arch/x86/include/asm/kvm_host.h | 3 ++-
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/mmu/mmu.c | 36 ++++++++++++++++++++++++++++++++-
4 files changed, 44 insertions(+), 2 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index ebdf88078515..37b5ecb25778 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8146,6 +8146,12 @@ KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS By default, KVM emulates MONITOR/MWAIT (if
guest CPUID on writes to MISC_ENABLE if
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT is
disabled.
+
+KVM_X86_QUIRK_SLOT_ZAP_ALL By default, KVM invalidates all SPTEs in
+ fast way when a memslot is deleted.
+ When this quirk is disabled, KVM zaps only
+ leaf SPTEs that are within the range of the
+ memslot being deleted.
=================================== ============================================
7.32 KVM_CAP_MAX_VCPU_ID
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c9499e3b5915..8152b5259435 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2383,7 +2383,8 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
KVM_X86_QUIRK_OUT_7E_INC_RIP | \
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT | \
KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \
- KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS)
+ KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \
+ KVM_X86_QUIRK_SLOT_ZAP_ALL)
/*
* KVM previously used a u32 field in kvm_run to indicate the hypercall was
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index f64421c55266..c5d189f9ca34 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -438,6 +438,7 @@ struct kvm_sync_regs {
#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)
#define KVM_X86_QUIRK_FIX_HYPERCALL_INSN (1 << 5)
#define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS (1 << 6)
+#define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7)
#define KVM_STATE_NESTED_FORMAT_VMX 0
#define KVM_STATE_NESTED_FORMAT_SVM 1
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1fd2f8ea6fab..6269fa315888 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6987,10 +6987,44 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
kvm_mmu_zap_all(kvm);
}
+/*
+ * Zapping leaf SPTEs with memslot range when a memslot is moved/deleted.
+ *
+ * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst
+ * case scenario we'll have unused shadow pages lying around until they
+ * are recycled due to age or when the VM is destroyed.
+ */
+static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+ struct kvm_gfn_range range = {
+ .slot = slot,
+ .start = slot->base_gfn,
+ .end = slot->base_gfn + slot->npages,
+ .may_block = true,
+ };
+ bool flush = false;
+
+ write_lock(&kvm->mmu_lock);
+
+ if (kvm_memslots_have_rmaps(kvm))
+ flush = kvm_handle_gfn_range(kvm, &range, kvm_zap_rmap);
+
+ if (tdp_mmu_enabled)
+ flush = kvm_tdp_mmu_unmap_gfn_range(kvm, &range, flush);
+
+ if (flush)
+ kvm_flush_remote_tlbs_memslot(kvm, slot);
+
+ write_unlock(&kvm->mmu_lock);
+}
+
void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot)
{
- kvm_mmu_zap_all_fast(kvm);
+ if (kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL))
+ kvm_mmu_zap_all_fast(kvm);
+ else
+ kvm_mmu_zap_memslot_leafs(kvm, slot);
}
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
--
2.43.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/5] KVM: selftests: Test slot move/delete with slot zap quirk enabled/disabled
2024-06-13 6:06 [PATCH 0/5] Introduce a quirk to control memslot zap behavior Yan Zhao
2024-06-13 6:09 ` [PATCH 1/5] KVM: x86/mmu: " Yan Zhao
@ 2024-06-13 6:10 ` Yan Zhao
2024-06-13 6:10 ` [PATCH 3/5] KVM: selftests: Allow slot modification stress test with quirk disabled Yan Zhao
` (4 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: Yan Zhao @ 2024-06-13 6:10 UTC (permalink / raw)
To: pbonzini, seanjc
Cc: rick.p.edgecombe, kai.huang, isaku.yamahata, dmatlack, sagis,
erdemaktas, linux-kernel, kvm, Yan Zhao
Update set_memory_region_test to make sure memslot move and deletion
function correctly both when slot zap quirk KVM_X86_QUIRK_SLOT_ZAP_ALL is
enabled and disabled.
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
.../selftests/kvm/set_memory_region_test.c | 29 ++++++++++++++-----
1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c
index bb8002084f52..a8267628e9ed 100644
--- a/tools/testing/selftests/kvm/set_memory_region_test.c
+++ b/tools/testing/selftests/kvm/set_memory_region_test.c
@@ -175,7 +175,7 @@ static void guest_code_move_memory_region(void)
GUEST_DONE();
}
-static void test_move_memory_region(void)
+static void test_move_memory_region(bool disable_slot_zap_quirk)
{
pthread_t vcpu_thread;
struct kvm_vcpu *vcpu;
@@ -184,6 +184,9 @@ static void test_move_memory_region(void)
vm = spawn_vm(&vcpu, &vcpu_thread, guest_code_move_memory_region);
+ if (disable_slot_zap_quirk)
+ vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, KVM_X86_QUIRK_SLOT_ZAP_ALL);
+
hva = addr_gpa2hva(vm, MEM_REGION_GPA);
/*
@@ -266,7 +269,7 @@ static void guest_code_delete_memory_region(void)
GUEST_ASSERT(0);
}
-static void test_delete_memory_region(void)
+static void test_delete_memory_region(bool disable_slot_zap_quirk)
{
pthread_t vcpu_thread;
struct kvm_vcpu *vcpu;
@@ -276,6 +279,9 @@ static void test_delete_memory_region(void)
vm = spawn_vm(&vcpu, &vcpu_thread, guest_code_delete_memory_region);
+ if (disable_slot_zap_quirk)
+ vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, KVM_X86_QUIRK_SLOT_ZAP_ALL);
+
/* Delete the memory region, the guest should not die. */
vm_mem_region_delete(vm, MEM_REGION_SLOT);
wait_for_vcpu();
@@ -553,7 +559,10 @@ int main(int argc, char *argv[])
{
#ifdef __x86_64__
int i, loops;
+ int j, disable_slot_zap_quirk = 0;
+ if (kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) & KVM_X86_QUIRK_SLOT_ZAP_ALL)
+ disable_slot_zap_quirk = 1;
/*
* FIXME: the zero-memslot test fails on aarch64 and s390x because
* KVM_RUN fails with ENOEXEC or EFAULT.
@@ -579,13 +588,17 @@ int main(int argc, char *argv[])
else
loops = 10;
- pr_info("Testing MOVE of in-use region, %d loops\n", loops);
- for (i = 0; i < loops; i++)
- test_move_memory_region();
+ for (j = 0; j <= disable_slot_zap_quirk; j++) {
+ pr_info("Testing MOVE of in-use region, %d loops, slot zap quirk %s\n",
+ loops, j ? "disabled" : "enabled");
+ for (i = 0; i < loops; i++)
+ test_move_memory_region(!!j);
- pr_info("Testing DELETE of in-use region, %d loops\n", loops);
- for (i = 0; i < loops; i++)
- test_delete_memory_region();
+ pr_info("Testing DELETE of in-use region, %d loops, slot zap quirk %s\n",
+ loops, j ? "disabled" : "enabled");
+ for (i = 0; i < loops; i++)
+ test_delete_memory_region(!!j);
+ }
#endif
return 0;
--
2.43.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/5] KVM: selftests: Allow slot modification stress test with quirk disabled
2024-06-13 6:06 [PATCH 0/5] Introduce a quirk to control memslot zap behavior Yan Zhao
2024-06-13 6:09 ` [PATCH 1/5] KVM: x86/mmu: " Yan Zhao
2024-06-13 6:10 ` [PATCH 2/5] KVM: selftests: Test slot move/delete with slot zap quirk enabled/disabled Yan Zhao
@ 2024-06-13 6:10 ` Yan Zhao
2024-06-13 6:10 ` [PATCH 4/5] KVM: selftests: Test memslot move in memslot_perf_test " Yan Zhao
` (3 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: Yan Zhao @ 2024-06-13 6:10 UTC (permalink / raw)
To: pbonzini, seanjc
Cc: rick.p.edgecombe, kai.huang, isaku.yamahata, dmatlack, sagis,
erdemaktas, linux-kernel, kvm, Yan Zhao
Add a new user option to memslot_modification_stress_test to allow testing
with slot zap quirk KVM_X86_QUIRK_SLOT_ZAP_ALL disabled.
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
.../kvm/memslot_modification_stress_test.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/memslot_modification_stress_test.c b/tools/testing/selftests/kvm/memslot_modification_stress_test.c
index 05fcf902e067..c6f22ded4c96 100644
--- a/tools/testing/selftests/kvm/memslot_modification_stress_test.c
+++ b/tools/testing/selftests/kvm/memslot_modification_stress_test.c
@@ -85,6 +85,7 @@ struct test_params {
useconds_t delay;
uint64_t nr_iterations;
bool partition_vcpu_memory_access;
+ bool disable_slot_zap_quirk;
};
static void run_test(enum vm_guest_mode mode, void *arg)
@@ -95,6 +96,13 @@ static void run_test(enum vm_guest_mode mode, void *arg)
vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1,
VM_MEM_SRC_ANONYMOUS,
p->partition_vcpu_memory_access);
+#ifdef __x86_64__
+ if (p->disable_slot_zap_quirk)
+ vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, KVM_X86_QUIRK_SLOT_ZAP_ALL);
+
+ pr_info("Memslot zap quirk %s\n", p->disable_slot_zap_quirk ?
+ "disabled" : "enabled");
+#endif
pr_info("Finished creating vCPUs\n");
@@ -113,11 +121,12 @@ static void run_test(enum vm_guest_mode mode, void *arg)
static void help(char *name)
{
puts("");
- printf("usage: %s [-h] [-m mode] [-d delay_usec]\n"
+ printf("usage: %s [-h] [-m mode] [-d delay_usec] [-q]\n"
" [-b memory] [-v vcpus] [-o] [-i iterations]\n", name);
guest_modes_help();
printf(" -d: add a delay between each iteration of adding and\n"
" deleting a memslot in usec.\n");
+ printf(" -q: Disable memslot zap quirk.\n");
printf(" -b: specify the size of the memory region which should be\n"
" accessed by each vCPU. e.g. 10M or 3G.\n"
" Default: 1G\n");
@@ -143,7 +152,7 @@ int main(int argc, char *argv[])
guest_modes_append_default();
- while ((opt = getopt(argc, argv, "hm:d:b:v:oi:")) != -1) {
+ while ((opt = getopt(argc, argv, "hm:d:qb:v:oi:")) != -1) {
switch (opt) {
case 'm':
guest_modes_cmdline(optarg);
@@ -166,6 +175,12 @@ int main(int argc, char *argv[])
case 'i':
p.nr_iterations = atoi_positive("Number of iterations", optarg);
break;
+ case 'q':
+ p.disable_slot_zap_quirk = true;
+
+ TEST_REQUIRE(kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) &
+ KVM_X86_QUIRK_SLOT_ZAP_ALL);
+ break;
case 'h':
default:
help(argv[0]);
--
2.43.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/5] KVM: selftests: Test memslot move in memslot_perf_test with quirk disabled
2024-06-13 6:06 [PATCH 0/5] Introduce a quirk to control memslot zap behavior Yan Zhao
` (2 preceding siblings ...)
2024-06-13 6:10 ` [PATCH 3/5] KVM: selftests: Allow slot modification stress test with quirk disabled Yan Zhao
@ 2024-06-13 6:10 ` Yan Zhao
2024-06-13 6:11 ` [PATCH 5/5] KVM: selftests: Test private access to deleted memslot " Yan Zhao
` (2 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: Yan Zhao @ 2024-06-13 6:10 UTC (permalink / raw)
To: pbonzini, seanjc
Cc: rick.p.edgecombe, kai.huang, isaku.yamahata, dmatlack, sagis,
erdemaktas, linux-kernel, kvm, Yan Zhao
Add a new user option to memslot_perf_test to allow testing memslot move
with quirk KVM_X86_QUIRK_SLOT_ZAP_ALL disabled.
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
tools/testing/selftests/kvm/memslot_perf_test.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/memslot_perf_test.c b/tools/testing/selftests/kvm/memslot_perf_test.c
index 579a64f97333..893366982f77 100644
--- a/tools/testing/selftests/kvm/memslot_perf_test.c
+++ b/tools/testing/selftests/kvm/memslot_perf_test.c
@@ -113,6 +113,7 @@ static_assert(ATOMIC_BOOL_LOCK_FREE == 2, "atomic bool is not lockless");
static sem_t vcpu_ready;
static bool map_unmap_verify;
+static bool disable_slot_zap_quirk;
static bool verbose;
#define pr_info_v(...) \
@@ -578,6 +579,9 @@ static bool test_memslot_move_prepare(struct vm_data *data,
uint32_t guest_page_size = data->vm->page_size;
uint64_t movesrcgpa, movetestgpa;
+ if (disable_slot_zap_quirk)
+ vm_enable_cap(data->vm, KVM_CAP_DISABLE_QUIRKS2, KVM_X86_QUIRK_SLOT_ZAP_ALL);
+
movesrcgpa = vm_slot2gpa(data, data->nslots - 1);
if (isactive) {
@@ -896,6 +900,7 @@ static void help(char *name, struct test_args *targs)
pr_info(" -h: print this help screen.\n");
pr_info(" -v: enable verbose mode (not for benchmarking).\n");
pr_info(" -d: enable extra debug checks.\n");
+ pr_info(" -q: Disable memslot zap quirk during memslot move.\n");
pr_info(" -s: specify memslot count cap (-1 means no cap; currently: %i)\n",
targs->nslots);
pr_info(" -f: specify the first test to run (currently: %i; max %zu)\n",
@@ -954,7 +959,7 @@ static bool parse_args(int argc, char *argv[],
uint32_t max_mem_slots;
int opt;
- while ((opt = getopt(argc, argv, "hvds:f:e:l:r:")) != -1) {
+ while ((opt = getopt(argc, argv, "hvdqs:f:e:l:r:")) != -1) {
switch (opt) {
case 'h':
default:
@@ -966,6 +971,11 @@ static bool parse_args(int argc, char *argv[],
case 'd':
map_unmap_verify = true;
break;
+ case 'q':
+ disable_slot_zap_quirk = true;
+ TEST_REQUIRE(kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) &
+ KVM_X86_QUIRK_SLOT_ZAP_ALL);
+ break;
case 's':
targs->nslots = atoi_paranoid(optarg);
if (targs->nslots <= 1 && targs->nslots != -1) {
--
2.43.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 5/5] KVM: selftests: Test private access to deleted memslot with quirk disabled
2024-06-13 6:06 [PATCH 0/5] Introduce a quirk to control memslot zap behavior Yan Zhao
` (3 preceding siblings ...)
2024-06-13 6:10 ` [PATCH 4/5] KVM: selftests: Test memslot move in memslot_perf_test " Yan Zhao
@ 2024-06-13 6:11 ` Yan Zhao
2024-06-13 20:01 ` [PATCH 0/5] Introduce a quirk to control memslot zap behavior Edgecombe, Rick P
2024-06-20 19:37 ` [PATCH] KVM: x86/mmu: Implement memslot deletion for TDX Rick Edgecombe
6 siblings, 0 replies; 18+ messages in thread
From: Yan Zhao @ 2024-06-13 6:11 UTC (permalink / raw)
To: pbonzini, seanjc
Cc: rick.p.edgecombe, kai.huang, isaku.yamahata, dmatlack, sagis,
erdemaktas, linux-kernel, kvm, Yan Zhao
Update private_mem_kvm_exits_test to make sure private access to deleted
memslot functional both when quirk KVM_X86_QUIRK_SLOT_ZAP_ALL is enabled
and disabled.
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
.../selftests/kvm/x86_64/private_mem_kvm_exits_test.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/private_mem_kvm_exits_test.c b/tools/testing/selftests/kvm/x86_64/private_mem_kvm_exits_test.c
index 13e72fcec8dd..a4d304fce294 100644
--- a/tools/testing/selftests/kvm/x86_64/private_mem_kvm_exits_test.c
+++ b/tools/testing/selftests/kvm/x86_64/private_mem_kvm_exits_test.c
@@ -44,7 +44,7 @@ const struct vm_shape protected_vm_shape = {
.type = KVM_X86_SW_PROTECTED_VM,
};
-static void test_private_access_memslot_deleted(void)
+static void test_private_access_memslot_deleted(bool disable_slot_zap_quirk)
{
struct kvm_vm *vm;
struct kvm_vcpu *vcpu;
@@ -55,6 +55,9 @@ static void test_private_access_memslot_deleted(void)
vm = vm_create_shape_with_one_vcpu(protected_vm_shape, &vcpu,
guest_repeatedly_read);
+ if (disable_slot_zap_quirk)
+ vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, KVM_X86_QUIRK_SLOT_ZAP_ALL);
+
vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
EXITS_TEST_GPA, EXITS_TEST_SLOT,
EXITS_TEST_NPAGES,
@@ -115,6 +118,10 @@ int main(int argc, char *argv[])
{
TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM));
- test_private_access_memslot_deleted();
+ test_private_access_memslot_deleted(false);
+
+ if (kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) & KVM_X86_QUIRK_SLOT_ZAP_ALL)
+ test_private_access_memslot_deleted(true);
+
test_private_access_memslot_not_private();
}
--
2.43.2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 0/5] Introduce a quirk to control memslot zap behavior
2024-06-13 6:06 [PATCH 0/5] Introduce a quirk to control memslot zap behavior Yan Zhao
` (4 preceding siblings ...)
2024-06-13 6:11 ` [PATCH 5/5] KVM: selftests: Test private access to deleted memslot " Yan Zhao
@ 2024-06-13 20:01 ` Edgecombe, Rick P
2024-06-17 10:15 ` Yan Zhao
2024-06-20 19:37 ` [PATCH] KVM: x86/mmu: Implement memslot deletion for TDX Rick Edgecombe
6 siblings, 1 reply; 18+ messages in thread
From: Edgecombe, Rick P @ 2024-06-13 20:01 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com, Zhao, Yan Y
Cc: kvm@vger.kernel.org, sagis@google.com,
linux-kernel@vger.kernel.org, dmatlack@google.com, Huang, Kai,
Yamahata, Isaku, Aktas, Erdem
On Thu, 2024-06-13 at 14:06 +0800, Yan Zhao wrote:
> a) Add a condition for TDX VM type in kvm_arch_flush_shadow_memslot()
> besides the testing of kvm_check_has_quirk(). It is similar to
> "all new VM types have the quirk disabled". e.g.
>
> static inline bool kvm_memslot_flush_zap_all(struct kvm
> *kvm)
>
> {
>
> return kvm->arch.vm_type != KVM_X86_TDX_VM
> &&
> kvm_check_has_quirk(kvm,
> KVM_X86_QUIRK_SLOT_ZAP_ALL);
> }
>
> b) Init the disabled_quirks based on VM type in kernel, extend
> disabled_quirk querying/setting interface to enforce the quirk to
> be disabled for TDX.
I'd prefer to go with option (a) here. Because we don't have any behavior
defined yet for KVM_X86_TDX_VM, we don't really need to "disable a quirk" of it.
Instead we could just define KVM_X86_QUIRK_SLOT_ZAP_ALL to be about the behavior
of the existing vm_types. It would be a few lines of documentation to save
implementing and maintaining a whole interface with special logic for TDX. So to
me it doesn't seem worth it, unless there is some other user for a new more
complex quirk interface.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/5] Introduce a quirk to control memslot zap behavior
2024-06-13 20:01 ` [PATCH 0/5] Introduce a quirk to control memslot zap behavior Edgecombe, Rick P
@ 2024-06-17 10:15 ` Yan Zhao
2024-06-18 14:34 ` Sean Christopherson
0 siblings, 1 reply; 18+ messages in thread
From: Yan Zhao @ 2024-06-17 10:15 UTC (permalink / raw)
To: Edgecombe, Rick P
Cc: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org,
sagis@google.com, linux-kernel@vger.kernel.org,
dmatlack@google.com, Huang, Kai, Yamahata, Isaku, Aktas, Erdem
On Fri, Jun 14, 2024 at 04:01:07AM +0800, Edgecombe, Rick P wrote:
> On Thu, 2024-06-13 at 14:06 +0800, Yan Zhao wrote:
> > a) Add a condition for TDX VM type in kvm_arch_flush_shadow_memslot()
> > besides the testing of kvm_check_has_quirk(). It is similar to
> > "all new VM types have the quirk disabled". e.g.
> >
> > static inline bool kvm_memslot_flush_zap_all(struct kvm
> > *kvm)
> >
> > {
> >
> > return kvm->arch.vm_type != KVM_X86_TDX_VM
> > &&
> > kvm_check_has_quirk(kvm,
> > KVM_X86_QUIRK_SLOT_ZAP_ALL);
> > }
> >
> > b) Init the disabled_quirks based on VM type in kernel, extend
> > disabled_quirk querying/setting interface to enforce the quirk to
> > be disabled for TDX.
>
> I'd prefer to go with option (a) here. Because we don't have any behavior
> defined yet for KVM_X86_TDX_VM, we don't really need to "disable a quirk" of it.
>
> Instead we could just define KVM_X86_QUIRK_SLOT_ZAP_ALL to be about the behavior
> of the existing vm_types. It would be a few lines of documentation to save
> implementing and maintaining a whole interface with special logic for TDX. So to
> me it doesn't seem worth it, unless there is some other user for a new more
> complex quirk interface.
What about introducing a forced disabled_quirk field?
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8152b5259435..32859952fa75 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1425,6 +1425,7 @@ struct kvm_arch {
u32 bsp_vcpu_id;
u64 disabled_quirks;
+ u64 force_disabled_quirks;
enum kvm_irqchip_mode irqchip_mode;
u8 nr_reserved_ioapic_pins;
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index e4bb1db45476..629a95cbe568 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -565,6 +565,7 @@ int tdx_vm_init(struct kvm *kvm)
{
kvm->arch.has_private_mem = true;
+ kvm->arch.force_disabled_quirks |= KVM_X86_QUIRK_SLOT_ZAP_ALL;
/*
* Because guest TD is protected, VMM can't parse the instruction in TD.
* Instead, guest uses MMIO hypercall. For unmodified device driver,
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 229497fd3266..53ef06a06517 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -291,7 +291,8 @@ static inline void kvm_register_write(struct kvm_vcpu *vcpu,
static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
{
- return !(kvm->arch.disabled_quirks & quirk);
+ return !(kvm->arch.disabled_quirks & quirk) &&
+ !(kvm->arch.force_disabled_quirks & quirk);
}
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 0/5] Introduce a quirk to control memslot zap behavior
2024-06-17 10:15 ` Yan Zhao
@ 2024-06-18 14:34 ` Sean Christopherson
2024-06-20 3:59 ` Yan Zhao
2024-06-20 19:38 ` Edgecombe, Rick P
0 siblings, 2 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-06-18 14:34 UTC (permalink / raw)
To: Yan Zhao
Cc: Rick P Edgecombe, pbonzini@redhat.com, kvm@vger.kernel.org,
sagis@google.com, linux-kernel@vger.kernel.org,
dmatlack@google.com, Kai Huang, Isaku Yamahata, Erdem Aktas
On Mon, Jun 17, 2024, Yan Zhao wrote:
> On Fri, Jun 14, 2024 at 04:01:07AM +0800, Edgecombe, Rick P wrote:
> > On Thu, 2024-06-13 at 14:06 +0800, Yan Zhao wrote:
> > > a) Add a condition for TDX VM type in kvm_arch_flush_shadow_memslot()
> > > besides the testing of kvm_check_has_quirk(). It is similar to
> > > "all new VM types have the quirk disabled". e.g.
> > >
> > > static inline bool kvm_memslot_flush_zap_all(struct kvm
> > > *kvm)
> > >
> > > {
> > >
> > > return kvm->arch.vm_type != KVM_X86_TDX_VM
> > > &&
> > > kvm_check_has_quirk(kvm,
> > > KVM_X86_QUIRK_SLOT_ZAP_ALL);
> > > }
> > >
> > > b) Init the disabled_quirks based on VM type in kernel, extend
> > > disabled_quirk querying/setting interface to enforce the quirk to
> > > be disabled for TDX.
There's also option:
c) Init disabled_quirks based on VM type.
I.e. let userspace enable the quirk. If the VMM wants to shoot its TDX VM guests,
then so be it. That said, I don't like this option because it would create a very
bizarre ABI.
> >
> > I'd prefer to go with option (a) here. Because we don't have any behavior
> > defined yet for KVM_X86_TDX_VM, we don't really need to "disable a quirk" of it.
I vote for (a) as well.
> > Instead we could just define KVM_X86_QUIRK_SLOT_ZAP_ALL to be about the behavior
> > of the existing vm_types. It would be a few lines of documentation to save
> > implementing and maintaining a whole interface with special logic for TDX. So to
> > me it doesn't seem worth it, unless there is some other user for a new more
> > complex quirk interface.
> What about introducing a forced disabled_quirk field?
Nah, it'd require manual opt-in for every VM type for almost no benefit. In fact,
IMO the code itself would be a net negative versus:
return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
because explicitly checking for KVM_X86_DEFAULT_VM would directly match the
documentation (which would state that the quirk only applies to DEFAULT_VM).
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/5] Introduce a quirk to control memslot zap behavior
2024-06-18 14:34 ` Sean Christopherson
@ 2024-06-20 3:59 ` Yan Zhao
2024-06-20 19:38 ` Edgecombe, Rick P
1 sibling, 0 replies; 18+ messages in thread
From: Yan Zhao @ 2024-06-20 3:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: Rick P Edgecombe, pbonzini@redhat.com, kvm@vger.kernel.org,
sagis@google.com, linux-kernel@vger.kernel.org,
dmatlack@google.com, Kai Huang, Isaku Yamahata, Erdem Aktas
On Tue, Jun 18, 2024 at 07:34:15AM -0700, Sean Christopherson wrote:
> On Mon, Jun 17, 2024, Yan Zhao wrote:
> > On Fri, Jun 14, 2024 at 04:01:07AM +0800, Edgecombe, Rick P wrote:
> > > On Thu, 2024-06-13 at 14:06 +0800, Yan Zhao wrote:
> > > > a) Add a condition for TDX VM type in kvm_arch_flush_shadow_memslot()
> > > > besides the testing of kvm_check_has_quirk(). It is similar to
> > > > "all new VM types have the quirk disabled". e.g.
> > > >
> > > > static inline bool kvm_memslot_flush_zap_all(struct kvm
> > > > *kvm)
> > > >
> > > > {
> > > >
> > > > return kvm->arch.vm_type != KVM_X86_TDX_VM
> > > > &&
> > > > kvm_check_has_quirk(kvm,
> > > > KVM_X86_QUIRK_SLOT_ZAP_ALL);
> > > > }
> > > >
> > > > b) Init the disabled_quirks based on VM type in kernel, extend
> > > > disabled_quirk querying/setting interface to enforce the quirk to
> > > > be disabled for TDX.
>
> There's also option:
>
> c) Init disabled_quirks based on VM type.
>
> I.e. let userspace enable the quirk. If the VMM wants to shoot its TDX VM guests,
> then so be it. That said, I don't like this option because it would create a very
> bizarre ABI.
>
> > >
> > > I'd prefer to go with option (a) here. Because we don't have any behavior
> > > defined yet for KVM_X86_TDX_VM, we don't really need to "disable a quirk" of it.
>
> I vote for (a) as well.
>
> > > Instead we could just define KVM_X86_QUIRK_SLOT_ZAP_ALL to be about the behavior
> > > of the existing vm_types. It would be a few lines of documentation to save
> > > implementing and maintaining a whole interface with special logic for TDX. So to
> > > me it doesn't seem worth it, unless there is some other user for a new more
> > > complex quirk interface.
> > What about introducing a forced disabled_quirk field?
>
> Nah, it'd require manual opt-in for every VM type for almost no benefit. In fact,
> IMO the code itself would be a net negative versus:
>
> return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
> kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
>
> because explicitly checking for KVM_X86_DEFAULT_VM would directly match the
> documentation (which would state that the quirk only applies to DEFAULT_VM).
Makes sense.
Then, (a) looks good to me :)
Do you prefer to include the document update (i.e. the quirk only applies to
DEFAULT_VM) in this series or in TDX MMU series?
And it means the quirk will not be enabled for all other VM types, e.g.
KVM_X86_SW_PROTECTED_VM, and SEV, SNP..., right?
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] KVM: x86/mmu: Implement memslot deletion for TDX
2024-06-13 6:06 [PATCH 0/5] Introduce a quirk to control memslot zap behavior Yan Zhao
` (5 preceding siblings ...)
2024-06-13 20:01 ` [PATCH 0/5] Introduce a quirk to control memslot zap behavior Edgecombe, Rick P
@ 2024-06-20 19:37 ` Rick Edgecombe
2024-06-20 19:40 ` Edgecombe, Rick P
2024-06-21 0:08 ` Sean Christopherson
6 siblings, 2 replies; 18+ messages in thread
From: Rick Edgecombe @ 2024-06-20 19:37 UTC (permalink / raw)
To: yan.y.zhao
Cc: dmatlack, erdemaktas, isaku.yamahata, kai.huang, kvm,
linux-kernel, pbonzini, rick.p.edgecombe, sagis, seanjc
Force TDX VMs to use the KVM_X86_QUIRK_SLOT_ZAP_ALL behavior.
TDs cannot use the fast zapping operation to implement memslot deletion for
a couple reasons:
1. KVM cannot zap TDX private PTEs and re-fault them without coordinating
with the guest. This is due to the TDs needing to "accept" memory. So an
operation to delete a memslot needs to limit the private zapping to the
range of the memslot.
2. For reason (1), kvm_mmu_zap_all_fast() is limited to direct (shared)
roots. This means it will not zap the mirror (private) PTEs. If a
memslot is deleted with private memory mapped, the private memory would
remain mapped in the TD. Then if later the gmem fd was whole punched,
the pages could be freed on the host while still mapped in the TD. This
is because that operation would no longer have the memslot to map the
pgoff to the gfn.
To handle the first case, userspace could simply set the
KVM_X86_QUIRK_SLOT_ZAP_ALL quirk for TDs. This would prevent the issue in
(1), but it is not sufficient to resolve (2) because the problems there
extend beyond the userspace's TD, to affecting the rest of the host. So the
zap-leafs-only behavior is required for both
A couple options were considered, including forcing
KVM_X86_QUIRK_SLOT_ZAP_ALL to always be on for TDs, however due to the
currently limited quirks interface (no way to query quirks, or force them
to be disabled), this would require developing additional interfaces. So
instead just do the simple thing and make TDs always do the zap-leafs
behavior like when KVM_X86_QUIRK_SLOT_ZAP_ALL is disabled.
While at it, have the new behavior apply to all non-KVM_X86_DEFAULT_VM VMs,
as the previous behavior was not ideal (see [0]). It is assumed until
proven otherwise that the other VM types will not be exposed to the bug[1]
that derailed that effort.
Memslot deletion needs to zap both the private and shared mappings of a
GFN, so update the attr_filter field in kvm_mmu_zap_memslot_leafs() to
include both.
Co-developed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Link: https://lore.kernel.org/kvm/20190205205443.1059-1-sean.j.christopherson@intel.com/ [0]
Link: https://patchwork.kernel.org/project/kvm/patch/20190205210137.1377-11-sean.j.christopherson@intel.com [1]
---
Here is the patch for TDX integration. It is not needed until we can
actually create KVM_X86_TDX_VMs.
Admittedly, this kind of combines two changes, but the amount of code is
very small so I left it as one patch.
arch/x86/kvm/mmu.h | 6 ++++++
arch/x86/kvm/mmu/mmu.c | 3 ++-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 7b12ba761c51..72ed6c07719a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -335,4 +335,10 @@ static inline bool kvm_is_addr_direct(struct kvm *kvm, gpa_t gpa)
return !gpa_direct_bits || (gpa & gpa_direct_bits);
}
+
+static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm)
+{
+ return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
+ kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
+}
#endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 42faad76806a..8212bf77af70 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6993,6 +6993,7 @@ static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *s
.start = slot->base_gfn,
.end = slot->base_gfn + slot->npages,
.may_block = true,
+ .attr_filter = KVM_FILTER_PRIVATE | KVM_FILTER_SHARED,
};
bool flush = false;
@@ -7013,7 +7014,7 @@ static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *s
void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot)
{
- if (kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL))
+ if (kvm_memslot_flush_zap_all(kvm))
kvm_mmu_zap_all_fast(kvm);
else
kvm_mmu_zap_memslot_leafs(kvm, slot);
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 0/5] Introduce a quirk to control memslot zap behavior
2024-06-18 14:34 ` Sean Christopherson
2024-06-20 3:59 ` Yan Zhao
@ 2024-06-20 19:38 ` Edgecombe, Rick P
2024-06-21 0:00 ` Sean Christopherson
1 sibling, 1 reply; 18+ messages in thread
From: Edgecombe, Rick P @ 2024-06-20 19:38 UTC (permalink / raw)
To: seanjc@google.com, Zhao, Yan Y
Cc: Huang, Kai, sagis@google.com, linux-kernel@vger.kernel.org,
Aktas, Erdem, kvm@vger.kernel.org, pbonzini@redhat.com,
Yamahata, Isaku, dmatlack@google.com
On Tue, 2024-06-18 at 07:34 -0700, Sean Christopherson wrote:
> There's also option:
>
> c) Init disabled_quirks based on VM type.
>
> I.e. let userspace enable the quirk. If the VMM wants to shoot its TDX VM
> guests,
> then so be it. That said, I don't like this option because it would create a
> very
> bizarre ABI.
I think we actually need to force it on for TDX because kvm_mmu_zap_all_fast()
only zaps the direct (shared) root. If userspace decides to not enable the
quirk, mirror/private memory will not be zapped on memslot deletion. Then later
if there is a hole punch it will skip zapping that range because there is no
memslot. Then won't it let the pages get freed while they are still mapped in
the TD?
If I got that right (not 100% sure on the gmem hole punch page freeing), I think
KVM needs to force the behavior for TDs.
>
> > >
> > > I'd prefer to go with option (a) here. Because we don't have any behavior
> > > defined yet for KVM_X86_TDX_VM, we don't really need to "disable a quirk"
> > > of it.
>
> I vote for (a) as well.
>
> > > Instead we could just define KVM_X86_QUIRK_SLOT_ZAP_ALL to be about the
> > > behavior
> > > of the existing vm_types. It would be a few lines of documentation to save
> > > implementing and maintaining a whole interface with special logic for TDX.
> > > So to
> > > me it doesn't seem worth it, unless there is some other user for a new
> > > more
> > > complex quirk interface.
> > What about introducing a forced disabled_quirk field?
>
> Nah, it'd require manual opt-in for every VM type for almost no benefit. In
> fact,
> IMO the code itself would be a net negative versus:
>
> return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
> kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
>
> because explicitly checking for KVM_X86_DEFAULT_VM would directly match the
> documentation (which would state that the quirk only applies to DEFAULT_VM).
Ok, I updated (and posted on this series) the TDX integration patch.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] KVM: x86/mmu: Implement memslot deletion for TDX
2024-06-20 19:37 ` [PATCH] KVM: x86/mmu: Implement memslot deletion for TDX Rick Edgecombe
@ 2024-06-20 19:40 ` Edgecombe, Rick P
2024-06-21 0:08 ` Sean Christopherson
1 sibling, 0 replies; 18+ messages in thread
From: Edgecombe, Rick P @ 2024-06-20 19:40 UTC (permalink / raw)
To: Zhao, Yan Y
Cc: seanjc@google.com, Huang, Kai, sagis@google.com,
linux-kernel@vger.kernel.org, Aktas, Erdem, kvm@vger.kernel.org,
pbonzini@redhat.com, Yamahata, Isaku, dmatlack@google.com
On Thu, 2024-06-20 at 12:37 -0700, Rick Edgecombe wrote:
> Here is the patch for TDX integration. It is not needed until we can
> actually create KVM_X86_TDX_VMs.
It depends on this series as well:
https://lore.kernel.org/kvm/e693adab-9fa3-47fd-b62f-c3f2589ffe7f@linux.intel.com/
So it should not be included when applying just this series (will get a build
error).
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/5] Introduce a quirk to control memslot zap behavior
2024-06-20 19:38 ` Edgecombe, Rick P
@ 2024-06-21 0:00 ` Sean Christopherson
2024-06-21 1:06 ` Edgecombe, Rick P
0 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2024-06-21 0:00 UTC (permalink / raw)
To: Rick P Edgecombe
Cc: Yan Y Zhao, Kai Huang, sagis@google.com,
linux-kernel@vger.kernel.org, Erdem Aktas, kvm@vger.kernel.org,
pbonzini@redhat.com, Isaku Yamahata, dmatlack@google.com
On Thu, Jun 20, 2024, Rick P Edgecombe wrote:
> On Tue, 2024-06-18 at 07:34 -0700, Sean Christopherson wrote:
> > There's also option:
> >
> > c) Init disabled_quirks based on VM type.
> >
> > I.e. let userspace enable the quirk. If the VMM wants to shoot its TDX VM
> > guests, then so be it. That said, I don't like this option because it
> > would create a very bizarre ABI.
>
> I think we actually need to force it on for TDX because kvm_mmu_zap_all_fast()
> only zaps the direct (shared) root. If userspace decides to not enable the
> quirk, mirror/private memory will not be zapped on memslot deletion. Then later
> if there is a hole punch it will skip zapping that range because there is no
> memslot. Then won't it let the pages get freed while they are still mapped in
> the TD?
>
> If I got that right (not 100% sure on the gmem hole punch page freeing), I think
> KVM needs to force the behavior for TDs.
What I was suggesting is that we condition the skipping of the mirror/private
EPT pages tables on the quirk, i.e. zap *everything* for TDX VMs if the quirk is
enabled. Hence the very bizarre ABI.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] KVM: x86/mmu: Implement memslot deletion for TDX
2024-06-20 19:37 ` [PATCH] KVM: x86/mmu: Implement memslot deletion for TDX Rick Edgecombe
2024-06-20 19:40 ` Edgecombe, Rick P
@ 2024-06-21 0:08 ` Sean Christopherson
2024-06-21 1:17 ` Edgecombe, Rick P
2024-06-21 2:14 ` Yan Zhao
1 sibling, 2 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-06-21 0:08 UTC (permalink / raw)
To: Rick Edgecombe
Cc: yan.y.zhao, dmatlack, erdemaktas, isaku.yamahata, kai.huang, kvm,
linux-kernel, pbonzini, sagis
On Thu, Jun 20, 2024, Rick Edgecombe wrote:
> Force TDX VMs to use the KVM_X86_QUIRK_SLOT_ZAP_ALL behavior.
>
> TDs cannot use the fast zapping operation to implement memslot deletion for
> a couple reasons:
> 1. KVM cannot zap TDX private PTEs and re-fault them without coordinating
Uber nit, this isn't strictly true, for KVM's definition of "zap" (which is fuzzy
and overloaded). KVM _could_ zap and re-fault *leaf* PTEs, e.g. BLOCK+UNBLOCK.
It's specifically the full teardown and rebuild of the "fast zap" that doesn't
play nice, as the non-leaf S-EPT entries *must* be preserved due to how the TDX
module does is refcounting.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/5] Introduce a quirk to control memslot zap behavior
2024-06-21 0:00 ` Sean Christopherson
@ 2024-06-21 1:06 ` Edgecombe, Rick P
0 siblings, 0 replies; 18+ messages in thread
From: Edgecombe, Rick P @ 2024-06-21 1:06 UTC (permalink / raw)
To: seanjc@google.com
Cc: Huang, Kai, sagis@google.com, linux-kernel@vger.kernel.org,
Zhao, Yan Y, Aktas, Erdem, kvm@vger.kernel.org,
pbonzini@redhat.com, Yamahata, Isaku, dmatlack@google.com
On Thu, 2024-06-20 at 17:00 -0700, Sean Christopherson wrote:
>
> What I was suggesting is that we condition the skipping of the mirror/private
> EPT pages tables on the quirk, i.e. zap *everything* for TDX VMs if the quirk
> is
> enabled. Hence the very bizarre ABI.
Ah, I see now. Yes, that would be strange.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] KVM: x86/mmu: Implement memslot deletion for TDX
2024-06-21 0:08 ` Sean Christopherson
@ 2024-06-21 1:17 ` Edgecombe, Rick P
2024-06-21 2:14 ` Yan Zhao
1 sibling, 0 replies; 18+ messages in thread
From: Edgecombe, Rick P @ 2024-06-21 1:17 UTC (permalink / raw)
To: seanjc@google.com
Cc: Huang, Kai, sagis@google.com, linux-kernel@vger.kernel.org,
Aktas, Erdem, Zhao, Yan Y, pbonzini@redhat.com,
kvm@vger.kernel.org, Yamahata, Isaku, dmatlack@google.com
On Thu, 2024-06-20 at 17:08 -0700, Sean Christopherson wrote:
> On Thu, Jun 20, 2024, Rick Edgecombe wrote:
> > Force TDX VMs to use the KVM_X86_QUIRK_SLOT_ZAP_ALL behavior.
> >
> > TDs cannot use the fast zapping operation to implement memslot deletion for
> > a couple reasons:
> > 1. KVM cannot zap TDX private PTEs and re-fault them without coordinating
>
> Uber nit, this isn't strictly true, for KVM's definition of "zap" (which is
> fuzzy
> and overloaded). KVM _could_ zap and re-fault *leaf* PTEs, e.g.
> BLOCK+UNBLOCK.
> It's specifically the full teardown and rebuild of the "fast zap" that doesn't
> play nice, as the non-leaf S-EPT entries *must* be preserved due to how the
> TDX
> module does is refcounting.
Hmm, yea. That is probably worth an update. I'll change it for when I post
another version of this patch.
I was imaging this series might go up ahead of the rest of the MMU prep stuff.
In which case I can just post a new version of this patch on top of kvm-coco-
queue once this series appears in the base of that branch. Assuming there is no
problems with that, I won't post a v2 right away.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] KVM: x86/mmu: Implement memslot deletion for TDX
2024-06-21 0:08 ` Sean Christopherson
2024-06-21 1:17 ` Edgecombe, Rick P
@ 2024-06-21 2:14 ` Yan Zhao
1 sibling, 0 replies; 18+ messages in thread
From: Yan Zhao @ 2024-06-21 2:14 UTC (permalink / raw)
To: Sean Christopherson
Cc: Rick Edgecombe, dmatlack, erdemaktas, isaku.yamahata, kai.huang,
kvm, linux-kernel, pbonzini, sagis
On Thu, Jun 20, 2024 at 05:08:37PM -0700, Sean Christopherson wrote:
> On Thu, Jun 20, 2024, Rick Edgecombe wrote:
> > Force TDX VMs to use the KVM_X86_QUIRK_SLOT_ZAP_ALL behavior.
> >
> > TDs cannot use the fast zapping operation to implement memslot deletion for
> > a couple reasons:
> > 1. KVM cannot zap TDX private PTEs and re-fault them without coordinating
>
> Uber nit, this isn't strictly true, for KVM's definition of "zap" (which is fuzzy
> and overloaded). KVM _could_ zap and re-fault *leaf* PTEs, e.g. BLOCK+UNBLOCK.
> It's specifically the full teardown and rebuild of the "fast zap" that doesn't
> play nice, as the non-leaf S-EPT entries *must* be preserved due to how the TDX
> module does is refcounting.
Actually, slower zap all for TDX was also considered. e.g.
1) fully dropping of the leaf entries within the range of memslot
2) just blocking all other leaf entries
But given current TDX MMU series does not provide block-only implementation,
that proposal was aborted internally :)
BTW, there's another proposal, not sure if you will consider it.
We can still have users to control whether the quirk is enabled or not.
(i.e. KVM does not enforce quirk disabling).
If user does not disable the quirk for TDX, then kvm_mmu_zap_all_fast()
will be called to invalidate all shared EPTs.
Meanwhile, could we trigger kvm_gmem_invalidate*() in kvm_gmem_unbind()
as well? Then, the private EPT will also be ensured to be zapped before
memslot removal.
The concern for this approach is that we need to do some similar invalidation
stuffs when in future memslots besides gmem can also be mapped into private EPT.
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-06-21 2:15 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13 6:06 [PATCH 0/5] Introduce a quirk to control memslot zap behavior Yan Zhao
2024-06-13 6:09 ` [PATCH 1/5] KVM: x86/mmu: " Yan Zhao
2024-06-13 6:10 ` [PATCH 2/5] KVM: selftests: Test slot move/delete with slot zap quirk enabled/disabled Yan Zhao
2024-06-13 6:10 ` [PATCH 3/5] KVM: selftests: Allow slot modification stress test with quirk disabled Yan Zhao
2024-06-13 6:10 ` [PATCH 4/5] KVM: selftests: Test memslot move in memslot_perf_test " Yan Zhao
2024-06-13 6:11 ` [PATCH 5/5] KVM: selftests: Test private access to deleted memslot " Yan Zhao
2024-06-13 20:01 ` [PATCH 0/5] Introduce a quirk to control memslot zap behavior Edgecombe, Rick P
2024-06-17 10:15 ` Yan Zhao
2024-06-18 14:34 ` Sean Christopherson
2024-06-20 3:59 ` Yan Zhao
2024-06-20 19:38 ` Edgecombe, Rick P
2024-06-21 0:00 ` Sean Christopherson
2024-06-21 1:06 ` Edgecombe, Rick P
2024-06-20 19:37 ` [PATCH] KVM: x86/mmu: Implement memslot deletion for TDX Rick Edgecombe
2024-06-20 19:40 ` Edgecombe, Rick P
2024-06-21 0:08 ` Sean Christopherson
2024-06-21 1:17 ` Edgecombe, Rick P
2024-06-21 2:14 ` Yan Zhao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox