* [PATCH 0/4] KVM: x86: Revert SLOT_ZAP_ALL quirk
@ 2024-09-27 0:16 Sean Christopherson
2024-09-27 0:16 ` [PATCH 1/4] Revert "KVM: selftests: Test memslot move in memslot_perf_test with quirk disabled" Sean Christopherson
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: Sean Christopherson @ 2024-09-27 0:16 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Kai Huang, Rick Edgecombe, Yan Zhao
Revert the entire KVM_X86_QUIRK_SLOT_ZAP_ALL series, as the code is buggy
for shadow MMUs, and I'm not convinced a quirk is actually the right way
forward. I'm not totally opposed to it (obviously, given that I suggested
it at one point), but I would prefer to give ourselves ample time to sort
out exactly how we want to move forward, i.e. not rush something in to
unhose v6.12.
Sean Christopherson (4):
Revert "KVM: selftests: Test memslot move in memslot_perf_test with
quirk disabled"
Revert "KVM: selftests: Allow slot modification stress test with quirk
disabled"
Revert "KVM: selftests: Test slot move/delete with slot zap quirk
enabled/disabled"
Revert "KVM: x86/mmu: Introduce a quirk to control memslot zap
behavior"
Documentation/virt/kvm/api.rst | 8 -----
arch/x86/include/asm/kvm_host.h | 3 +-
arch/x86/include/uapi/asm/kvm.h | 1 -
arch/x86/kvm/mmu/mmu.c | 34 +------------------
.../kvm/memslot_modification_stress_test.c | 19 ++---------
.../testing/selftests/kvm/memslot_perf_test.c | 12 +------
.../selftests/kvm/set_memory_region_test.c | 29 +++++-----------
7 files changed, 13 insertions(+), 93 deletions(-)
base-commit: 3f8df6285271d9d8f17d733433e5213a63b83a0b
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/4] Revert "KVM: selftests: Test memslot move in memslot_perf_test with quirk disabled"
2024-09-27 0:16 [PATCH 0/4] KVM: x86: Revert SLOT_ZAP_ALL quirk Sean Christopherson
@ 2024-09-27 0:16 ` Sean Christopherson
2024-09-27 0:16 ` [PATCH 2/4] Revert "KVM: selftests: Allow slot modification stress test " Sean Christopherson
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Sean Christopherson @ 2024-09-27 0:16 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Kai Huang, Rick Edgecombe, Yan Zhao
Revert memslot_perf_test's testcase for KVM_X86_QUIRK_SLOT_ZAP_ALL, as the
quirk is being removed, i.e. the KVM side of things is being reverted.
This reverts commit 61de4c34b51c5b9c7ef8229eb246346254638446.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/memslot_perf_test.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)
diff --git a/tools/testing/selftests/kvm/memslot_perf_test.c b/tools/testing/selftests/kvm/memslot_perf_test.c
index 893366982f77..579a64f97333 100644
--- a/tools/testing/selftests/kvm/memslot_perf_test.c
+++ b/tools/testing/selftests/kvm/memslot_perf_test.c
@@ -113,7 +113,6 @@ static_assert(ATOMIC_BOOL_LOCK_FREE == 2, "atomic bool is not lockless");
static sem_t vcpu_ready;
static bool map_unmap_verify;
-static bool disable_slot_zap_quirk;
static bool verbose;
#define pr_info_v(...) \
@@ -579,9 +578,6 @@ static bool test_memslot_move_prepare(struct vm_data *data,
uint32_t guest_page_size = data->vm->page_size;
uint64_t movesrcgpa, movetestgpa;
- if (disable_slot_zap_quirk)
- vm_enable_cap(data->vm, KVM_CAP_DISABLE_QUIRKS2, KVM_X86_QUIRK_SLOT_ZAP_ALL);
-
movesrcgpa = vm_slot2gpa(data, data->nslots - 1);
if (isactive) {
@@ -900,7 +896,6 @@ static void help(char *name, struct test_args *targs)
pr_info(" -h: print this help screen.\n");
pr_info(" -v: enable verbose mode (not for benchmarking).\n");
pr_info(" -d: enable extra debug checks.\n");
- pr_info(" -q: Disable memslot zap quirk during memslot move.\n");
pr_info(" -s: specify memslot count cap (-1 means no cap; currently: %i)\n",
targs->nslots);
pr_info(" -f: specify the first test to run (currently: %i; max %zu)\n",
@@ -959,7 +954,7 @@ static bool parse_args(int argc, char *argv[],
uint32_t max_mem_slots;
int opt;
- while ((opt = getopt(argc, argv, "hvdqs:f:e:l:r:")) != -1) {
+ while ((opt = getopt(argc, argv, "hvds:f:e:l:r:")) != -1) {
switch (opt) {
case 'h':
default:
@@ -971,11 +966,6 @@ static bool parse_args(int argc, char *argv[],
case 'd':
map_unmap_verify = true;
break;
- case 'q':
- disable_slot_zap_quirk = true;
- TEST_REQUIRE(kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) &
- KVM_X86_QUIRK_SLOT_ZAP_ALL);
- break;
case 's':
targs->nslots = atoi_paranoid(optarg);
if (targs->nslots <= 1 && targs->nslots != -1) {
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/4] Revert "KVM: selftests: Allow slot modification stress test with quirk disabled"
2024-09-27 0:16 [PATCH 0/4] KVM: x86: Revert SLOT_ZAP_ALL quirk Sean Christopherson
2024-09-27 0:16 ` [PATCH 1/4] Revert "KVM: selftests: Test memslot move in memslot_perf_test with quirk disabled" Sean Christopherson
@ 2024-09-27 0:16 ` Sean Christopherson
2024-09-27 0:16 ` [PATCH 3/4] Revert "KVM: selftests: Test slot move/delete with slot zap quirk enabled/disabled" Sean Christopherson
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Sean Christopherson @ 2024-09-27 0:16 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Kai Huang, Rick Edgecombe, Yan Zhao
Revert memslot_modification_stress_test's KVM_X86_QUIRK_SLOT_ZAP_ALL
testcase, as the quirk is being removed, i.e. the KVM side of things is
being reverted.
This reverts commit 218f6415004a881d116e254eeb837358aced55ab.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
.../kvm/memslot_modification_stress_test.c | 19 ++-----------------
1 file changed, 2 insertions(+), 17 deletions(-)
diff --git a/tools/testing/selftests/kvm/memslot_modification_stress_test.c b/tools/testing/selftests/kvm/memslot_modification_stress_test.c
index e3343f0df9e1..49f162573126 100644
--- a/tools/testing/selftests/kvm/memslot_modification_stress_test.c
+++ b/tools/testing/selftests/kvm/memslot_modification_stress_test.c
@@ -79,7 +79,6 @@ struct test_params {
useconds_t delay;
uint64_t nr_iterations;
bool partition_vcpu_memory_access;
- bool disable_slot_zap_quirk;
};
static void run_test(enum vm_guest_mode mode, void *arg)
@@ -90,13 +89,6 @@ static void run_test(enum vm_guest_mode mode, void *arg)
vm = memstress_create_vm(mode, nr_vcpus, guest_percpu_mem_size, 1,
VM_MEM_SRC_ANONYMOUS,
p->partition_vcpu_memory_access);
-#ifdef __x86_64__
- if (p->disable_slot_zap_quirk)
- vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, KVM_X86_QUIRK_SLOT_ZAP_ALL);
-
- pr_info("Memslot zap quirk %s\n", p->disable_slot_zap_quirk ?
- "disabled" : "enabled");
-#endif
pr_info("Finished creating vCPUs\n");
@@ -115,12 +107,11 @@ static void run_test(enum vm_guest_mode mode, void *arg)
static void help(char *name)
{
puts("");
- printf("usage: %s [-h] [-m mode] [-d delay_usec] [-q]\n"
+ printf("usage: %s [-h] [-m mode] [-d delay_usec]\n"
" [-b memory] [-v vcpus] [-o] [-i iterations]\n", name);
guest_modes_help();
printf(" -d: add a delay between each iteration of adding and\n"
" deleting a memslot in usec.\n");
- printf(" -q: Disable memslot zap quirk.\n");
printf(" -b: specify the size of the memory region which should be\n"
" accessed by each vCPU. e.g. 10M or 3G.\n"
" Default: 1G\n");
@@ -146,7 +137,7 @@ int main(int argc, char *argv[])
guest_modes_append_default();
- while ((opt = getopt(argc, argv, "hm:d:qb:v:oi:")) != -1) {
+ while ((opt = getopt(argc, argv, "hm:d:b:v:oi:")) != -1) {
switch (opt) {
case 'm':
guest_modes_cmdline(optarg);
@@ -169,12 +160,6 @@ int main(int argc, char *argv[])
case 'i':
p.nr_iterations = atoi_positive("Number of iterations", optarg);
break;
- case 'q':
- p.disable_slot_zap_quirk = true;
-
- TEST_REQUIRE(kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) &
- KVM_X86_QUIRK_SLOT_ZAP_ALL);
- break;
case 'h':
default:
help(argv[0]);
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/4] Revert "KVM: selftests: Test slot move/delete with slot zap quirk enabled/disabled"
2024-09-27 0:16 [PATCH 0/4] KVM: x86: Revert SLOT_ZAP_ALL quirk Sean Christopherson
2024-09-27 0:16 ` [PATCH 1/4] Revert "KVM: selftests: Test memslot move in memslot_perf_test with quirk disabled" Sean Christopherson
2024-09-27 0:16 ` [PATCH 2/4] Revert "KVM: selftests: Allow slot modification stress test " Sean Christopherson
@ 2024-09-27 0:16 ` Sean Christopherson
2024-09-27 0:16 ` [PATCH 4/4] Revert "KVM: x86/mmu: Introduce a quirk to control memslot zap behavior" Sean Christopherson
2024-09-27 15:43 ` [PATCH 0/4] KVM: x86: Revert SLOT_ZAP_ALL quirk Paolo Bonzini
4 siblings, 0 replies; 7+ messages in thread
From: Sean Christopherson @ 2024-09-27 0:16 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Kai Huang, Rick Edgecombe, Yan Zhao
Revert set_memory_region_test's KVM_X86_QUIRK_SLOT_ZAP_ALL testcase, as
the quirk is being removed, i.e. the KVM side of things is being reverted.
This reverts commit b4ed2c67d275b85b2ab07d54f88bebd5998d61d8.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
.../selftests/kvm/set_memory_region_test.c | 29 +++++--------------
1 file changed, 8 insertions(+), 21 deletions(-)
diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c
index a8267628e9ed..bb8002084f52 100644
--- a/tools/testing/selftests/kvm/set_memory_region_test.c
+++ b/tools/testing/selftests/kvm/set_memory_region_test.c
@@ -175,7 +175,7 @@ static void guest_code_move_memory_region(void)
GUEST_DONE();
}
-static void test_move_memory_region(bool disable_slot_zap_quirk)
+static void test_move_memory_region(void)
{
pthread_t vcpu_thread;
struct kvm_vcpu *vcpu;
@@ -184,9 +184,6 @@ static void test_move_memory_region(bool disable_slot_zap_quirk)
vm = spawn_vm(&vcpu, &vcpu_thread, guest_code_move_memory_region);
- if (disable_slot_zap_quirk)
- vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, KVM_X86_QUIRK_SLOT_ZAP_ALL);
-
hva = addr_gpa2hva(vm, MEM_REGION_GPA);
/*
@@ -269,7 +266,7 @@ static void guest_code_delete_memory_region(void)
GUEST_ASSERT(0);
}
-static void test_delete_memory_region(bool disable_slot_zap_quirk)
+static void test_delete_memory_region(void)
{
pthread_t vcpu_thread;
struct kvm_vcpu *vcpu;
@@ -279,9 +276,6 @@ static void test_delete_memory_region(bool disable_slot_zap_quirk)
vm = spawn_vm(&vcpu, &vcpu_thread, guest_code_delete_memory_region);
- if (disable_slot_zap_quirk)
- vm_enable_cap(vm, KVM_CAP_DISABLE_QUIRKS2, KVM_X86_QUIRK_SLOT_ZAP_ALL);
-
/* Delete the memory region, the guest should not die. */
vm_mem_region_delete(vm, MEM_REGION_SLOT);
wait_for_vcpu();
@@ -559,10 +553,7 @@ int main(int argc, char *argv[])
{
#ifdef __x86_64__
int i, loops;
- int j, disable_slot_zap_quirk = 0;
- if (kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) & KVM_X86_QUIRK_SLOT_ZAP_ALL)
- disable_slot_zap_quirk = 1;
/*
* FIXME: the zero-memslot test fails on aarch64 and s390x because
* KVM_RUN fails with ENOEXEC or EFAULT.
@@ -588,17 +579,13 @@ int main(int argc, char *argv[])
else
loops = 10;
- for (j = 0; j <= disable_slot_zap_quirk; j++) {
- pr_info("Testing MOVE of in-use region, %d loops, slot zap quirk %s\n",
- loops, j ? "disabled" : "enabled");
- for (i = 0; i < loops; i++)
- test_move_memory_region(!!j);
+ pr_info("Testing MOVE of in-use region, %d loops\n", loops);
+ for (i = 0; i < loops; i++)
+ test_move_memory_region();
- pr_info("Testing DELETE of in-use region, %d loops, slot zap quirk %s\n",
- loops, j ? "disabled" : "enabled");
- for (i = 0; i < loops; i++)
- test_delete_memory_region(!!j);
- }
+ pr_info("Testing DELETE of in-use region, %d loops\n", loops);
+ for (i = 0; i < loops; i++)
+ test_delete_memory_region();
#endif
return 0;
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 4/4] Revert "KVM: x86/mmu: Introduce a quirk to control memslot zap behavior"
2024-09-27 0:16 [PATCH 0/4] KVM: x86: Revert SLOT_ZAP_ALL quirk Sean Christopherson
` (2 preceding siblings ...)
2024-09-27 0:16 ` [PATCH 3/4] Revert "KVM: selftests: Test slot move/delete with slot zap quirk enabled/disabled" Sean Christopherson
@ 2024-09-27 0:16 ` Sean Christopherson
2024-09-27 15:43 ` [PATCH 0/4] KVM: x86: Revert SLOT_ZAP_ALL quirk Paolo Bonzini
4 siblings, 0 replies; 7+ messages in thread
From: Sean Christopherson @ 2024-09-27 0:16 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Kai Huang, Rick Edgecombe, Yan Zhao
Remove KVM_X86_QUIRK_SLOT_ZAP_ALL, as the code is broken for shadow MMUs,
and the underlying premise is dodgy.
As was tried in commit 4e103134b862 ("KVM: x86/mmu: Zap only the relevant
pages when removing a memslot"), all shadow pages, i.e. non-leaf SPTEs,
need to be zapped. All of the accounting for a shadow page is tied to the
memslot, i.e. the shadow page holds a reference to the memslot, for all
intents and purposes. Deleting the memslot without removing all relevant
shadow pages, as is done when KVM_X86_QUIRK_SLOT_ZAP_ALL is disabled,
results in NULL pointer derefs when tearing down the VM.
BUG: kernel NULL pointer dereference, address: 00000000000000b0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 6085f43067 P4D 608c080067 PUD 608c081067 PMD 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 79 UID: 0 PID: 187063 Comm: set_memory_regi Tainted: G W 6.11.0-smp--24867312d167-cpl #395
Tainted: [W]=WARN
Hardware name: Google Astoria/astoria, BIOS 0.20240617.0-0 06/17/2024
RIP: 0010:__kvm_mmu_prepare_zap_page+0x3a9/0x7b0 [kvm]
Code: <48> 8b 8e b0 00 00 00 48 8b 96 e0 00 00 00 48 c1 e9 09 48 29 c8 8b
RSP: 0018:ff314a25b19f7c28 EFLAGS: 00010212
Call Trace:
<TASK>
kvm_arch_flush_shadow_all+0x7a/0xf0 [kvm]
kvm_mmu_notifier_release+0x6c/0xb0 [kvm]
mmu_notifier_unregister+0x85/0x140
kvm_put_kvm+0x263/0x410 [kvm]
kvm_vm_release+0x21/0x30 [kvm]
__fput+0x8d/0x2c0
__se_sys_close+0x71/0xc0
do_syscall_64+0x83/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Rather than trying to get things functional for shadow MMUs (which
includes nested TDP), scrap the quirk idea, at least for now. In addition
to the function bug, it's not clear that unconditionally doing a targeted
zap for all non-default VM types is actually desirable. E.g. it's entirely
possible that SEV-ES and SNP VMs would exhibit worse performance than KVM's
current "zap all" behavior, or that it's better to do a targeted zap only
in specific situations, etc.
This reverts commit aa8d1f48d353b0469bff357183ee9df137d15ef0.
Cc: Kai Huang <kai.huang@intel.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
Documentation/virt/kvm/api.rst | 8 --------
arch/x86/include/asm/kvm_host.h | 3 +--
arch/x86/include/uapi/asm/kvm.h | 1 -
arch/x86/kvm/mmu/mmu.c | 34 +--------------------------------
4 files changed, 2 insertions(+), 44 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index e32471977d0a..a4b7dc4a9dda 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8097,14 +8097,6 @@ KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS By default, KVM emulates MONITOR/MWAIT (if
guest CPUID on writes to MISC_ENABLE if
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT is
disabled.
-
-KVM_X86_QUIRK_SLOT_ZAP_ALL By default, KVM invalidates all SPTEs in
- fast way for memslot deletion when VM type
- is KVM_X86_DEFAULT_VM.
- When this quirk is disabled or when VM type
- is other than KVM_X86_DEFAULT_VM, KVM zaps
- only leaf SPTEs that are within the range of
- the memslot being deleted.
=================================== ============================================
7.32 KVM_CAP_MAX_VCPU_ID
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6d9f763a7bb9..4738f6f5a794 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2358,8 +2358,7 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
KVM_X86_QUIRK_OUT_7E_INC_RIP | \
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT | \
KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \
- KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \
- KVM_X86_QUIRK_SLOT_ZAP_ALL)
+ KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS)
/*
* KVM previously used a u32 field in kvm_run to indicate the hypercall was
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index a8debbf2f702..bf57a824f722 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -439,7 +439,6 @@ struct kvm_sync_regs {
#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)
#define KVM_X86_QUIRK_FIX_HYPERCALL_INSN (1 << 5)
#define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS (1 << 6)
-#define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7)
#define KVM_STATE_NESTED_FORMAT_VMX 0
#define KVM_STATE_NESTED_FORMAT_SVM 1
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e081f785fb23..0d94354bb2f8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7049,42 +7049,10 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
kvm_mmu_zap_all(kvm);
}
-/*
- * Zapping leaf SPTEs with memslot range when a memslot is moved/deleted.
- *
- * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst
- * case scenario we'll have unused shadow pages lying around until they
- * are recycled due to age or when the VM is destroyed.
- */
-static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *slot)
-{
- struct kvm_gfn_range range = {
- .slot = slot,
- .start = slot->base_gfn,
- .end = slot->base_gfn + slot->npages,
- .may_block = true,
- };
-
- write_lock(&kvm->mmu_lock);
- if (kvm_unmap_gfn_range(kvm, &range))
- kvm_flush_remote_tlbs_memslot(kvm, slot);
-
- write_unlock(&kvm->mmu_lock);
-}
-
-static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm)
-{
- return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
- kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
-}
-
void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot)
{
- if (kvm_memslot_flush_zap_all(kvm))
- kvm_mmu_zap_all_fast(kvm);
- else
- kvm_mmu_zap_memslot_leafs(kvm, slot);
+ kvm_mmu_zap_all_fast(kvm);
}
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/4] KVM: x86: Revert SLOT_ZAP_ALL quirk
2024-09-27 0:16 [PATCH 0/4] KVM: x86: Revert SLOT_ZAP_ALL quirk Sean Christopherson
` (3 preceding siblings ...)
2024-09-27 0:16 ` [PATCH 4/4] Revert "KVM: x86/mmu: Introduce a quirk to control memslot zap behavior" Sean Christopherson
@ 2024-09-27 15:43 ` Paolo Bonzini
2024-09-27 16:40 ` Sean Christopherson
4 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2024-09-27 15:43 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, linux-kernel, Kai Huang, Rick Edgecombe, Yan Zhao
On Fri, Sep 27, 2024 at 2:18 AM Sean Christopherson <seanjc@google.com> wrote:
>
> Revert the entire KVM_X86_QUIRK_SLOT_ZAP_ALL series, as the code is buggy
> for shadow MMUs, and I'm not convinced a quirk is actually the right way
> forward. I'm not totally opposed to it (obviously, given that I suggested
> it at one point), but I would prefer to give ourselves ample time to sort
> out exactly how we want to move forward, i.e. not rush something in to
> unhose v6.12.
Yeah, the code is buggy but I think it's safe enough to use code like the
one you wrote back in 2019; untested patch follows:
------------------------------- 8< ------------------------
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Fri, 27 Sep 2024 06:25:35 -0400
Subject: [PATCH] KVM: x86/mmu: fix KVM_X86_QUIRK_SLOT_ZAP_ALL for shadow MMU
As was tried in commit 4e103134b862 ("KVM: x86/mmu: Zap only the relevant
pages when removing a memslot"), all shadow pages, i.e. non-leaf SPTEs,
need to be zapped. All of the accounting for a shadow page is tied to the
memslot, i.e. the shadow page holds a reference to the memslot, for all
intents and purposes. Deleting the memslot without removing all relevant
shadow pages, as is done when KVM_X86_QUIRK_SLOT_ZAP_ALL is disabled,
results in NULL pointer derefs when tearing down the VM.
Reintroduce from that commit the code that walks the whole memslot when
there are active shadow MMU pages.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e081f785fb23..6843535905fb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7049,14 +7049,42 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
kvm_mmu_zap_all(kvm);
}
-/*
- * Zapping leaf SPTEs with memslot range when a memslot is moved/deleted.
- *
- * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst
- * case scenario we'll have unused shadow pages lying around until they
- * are recycled due to age or when the VM is destroyed.
- */
-static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *slot)
+static void kvm_mmu_zap_memslot_pages_and_flush(struct kvm *kvm,
+ struct kvm_memory_slot *slot,
+ bool flush)
+{
+ LIST_HEAD(invalid_list);
+ unsigned long i;
+
+ if (list_empty(&kvm->arch.active_mmu_pages))
+ goto out_flush;
+
+ /*
+ * Since accounting information is stored in struct kvm_arch_memory_slot,
+ * deleting shadow pages (e.g. in unaccount_shadowed()) requires that all
+ * gfns with a shadow page have a corresponding memslot. Do so before
+ * the memslot goes away.
+ */
+ for (i = 0; i < slot->npages; i++) {
+ struct kvm_mmu_page *sp;
+ gfn_t gfn = slot->base_gfn + i;
+
+ for_each_gfn_valid_sp_with_gptes(kvm, sp, gfn)
+ kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
+
+ if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
+ kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush);
+ flush = false;
+ cond_resched_rwlock_write(&kvm->mmu_lock);
+ }
+ }
+
+out_flush:
+ kvm_mmu_remote_flush_or_zap(kvm, &invalid_list, flush);
+}
+
+static void kvm_mmu_zap_memslot(struct kvm *kvm,
+ struct kvm_memory_slot *slot)
{
struct kvm_gfn_range range = {
.slot = slot,
@@ -7064,11 +7097,11 @@ static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *s
.end = slot->base_gfn + slot->npages,
.may_block = true,
};
+ bool flush;
write_lock(&kvm->mmu_lock);
- if (kvm_unmap_gfn_range(kvm, &range))
- kvm_flush_remote_tlbs_memslot(kvm, slot);
-
+ flush = kvm_unmap_gfn_range(kvm, &range);
+ kvm_mmu_zap_memslot_pages_and_flush(kvm, slot, flush);
write_unlock(&kvm->mmu_lock);
}
@@ -7084,7 +7117,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
if (kvm_memslot_flush_zap_all(kvm))
kvm_mmu_zap_all_fast(kvm);
else
- kvm_mmu_zap_memslot_leafs(kvm, slot);
+ kvm_mmu_zap_memslot(kvm, slot);
}
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
--------------------------------------------------
(Not too sure about using the sp_has_gptes() test, which is why I haven't
posted this yet).
With respect to the choice of API, the quirk is at least good for
testing; this was already proven, I guess.
Also I think it's safe to enable it for SEV/SEV-ES VM types: they
pretty much depend on NPT (see sev_hardware_setup), and with the
TDP MMU it should always be better to kill the PTEs for the memslot
(even if invalidating the whole MMU is cheap) to avoid having to
fault all the remainder of the memory back in. So I think the current
version of kvm_memslot_flush_zap_all() is better than using e.g.
kvm_arch_has_private_mem().
The only straggler is software-protected VMs, which I don't care
too much about; but if anything it's better to make them closer to
SNP and TDX VM types.
For now I think I'll send the existing kvm/next to Linus and we
can sort it out next week, as the weekend (and the closure of the
merge window) is impending...
Paolo
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/4] KVM: x86: Revert SLOT_ZAP_ALL quirk
2024-09-27 15:43 ` [PATCH 0/4] KVM: x86: Revert SLOT_ZAP_ALL quirk Paolo Bonzini
@ 2024-09-27 16:40 ` Sean Christopherson
0 siblings, 0 replies; 7+ messages in thread
From: Sean Christopherson @ 2024-09-27 16:40 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kvm, linux-kernel, Kai Huang, Rick Edgecombe, Yan Zhao
On Fri, Sep 27, 2024, Paolo Bonzini wrote:
> On Fri, Sep 27, 2024 at 2:18 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > Revert the entire KVM_X86_QUIRK_SLOT_ZAP_ALL series, as the code is buggy
> > for shadow MMUs, and I'm not convinced a quirk is actually the right way
> > forward. I'm not totally opposed to it (obviously, given that I suggested
> > it at one point), but I would prefer to give ourselves ample time to sort
> > out exactly how we want to move forward, i.e. not rush something in to
> > unhose v6.12.
>
> Yeah, the code is buggy but I think it's safe enough to use code like the
> one you wrote back in 2019; untested patch follows:
...
> (Not too sure about using the sp_has_gptes() test, which is why I haven't
> posted this yet).
Heh, I was going to ask about that too. Luckily I read ahead :-)
To be 100% safe, I think the zap needs to purge everything, even invalid SPs.
I doubt it would ever cause problems to leave dangling invalid SPs, but I don't
love the idea of avoiding UAF purely by relying on KVM not consuming stale info.
The other thing that makes my head hurt is how SPs are tracked by direct SPs in
the shadow MMU, i.e. by the effect of direct_map() and the guest hugepage case
(it would be weird, but legal for the guest to create a hugepage that straddles
a memslot boundary) rounding the gfn for the level when creating SPs.
Hmm, but I suppose that's an argument against being paranoid for the !sp_has_gptes()
case, as KVM already creates SPs with a target gfn that isn't covered by a memslot.
Blech.
> With respect to the choice of API, the quirk is at least good for
> testing; this was already proven, I guess.
True. I do think the documentation should be updated to be less prescriptive,
i.e. to give KVM wiggle room. Disabling the quirk should only _allow_ KVM to
a targeted/partial zap, it shouldn't _force_ KVM to do so.
> Also I think it's safe to enable it for SEV/SEV-ES VM types: they
> pretty much depend on NPT (see sev_hardware_setup), and with the
> TDP MMU it should always be better to kill the PTEs for the memslot
> (even if invalidating the whole MMU is cheap) to avoid having to
> fault all the remainder of the memory back in. So I think the current
> version of kvm_memslot_flush_zap_all() is better than using e.g.
> kvm_arch_has_private_mem().
In practice, you're probably right. Realistically, the only memslot removal that
would be problematic is the deletion of a large memslot, at which point SEV+ VMs
are in for a world of hurt no matter what.
> The only straggler is software-protected VMs, which I don't care
> too much about; but if anything it's better to make them closer to
> SNP and TDX VM types.
>
> For now I think I'll send the existing kvm/next to Linus and we
> can sort it out next week, as the weekend (and the closure of the
> merge window) is impending...
Works for me. Thanks!
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-09-27 16:41 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-27 0:16 [PATCH 0/4] KVM: x86: Revert SLOT_ZAP_ALL quirk Sean Christopherson
2024-09-27 0:16 ` [PATCH 1/4] Revert "KVM: selftests: Test memslot move in memslot_perf_test with quirk disabled" Sean Christopherson
2024-09-27 0:16 ` [PATCH 2/4] Revert "KVM: selftests: Allow slot modification stress test " Sean Christopherson
2024-09-27 0:16 ` [PATCH 3/4] Revert "KVM: selftests: Test slot move/delete with slot zap quirk enabled/disabled" Sean Christopherson
2024-09-27 0:16 ` [PATCH 4/4] Revert "KVM: x86/mmu: Introduce a quirk to control memslot zap behavior" Sean Christopherson
2024-09-27 15:43 ` [PATCH 0/4] KVM: x86: Revert SLOT_ZAP_ALL quirk Paolo Bonzini
2024-09-27 16:40 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).