* [PATCH v3 0/3] KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking @ 2025-08-22 8:01 Yan Zhao 2025-08-22 8:02 ` [PATCH v3 1/3] " Yan Zhao ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Yan Zhao @ 2025-08-22 8:01 UTC (permalink / raw) To: pbonzini, seanjc; +Cc: peterx, rick.p.edgecombe, linux-kernel, kvm, Yan Zhao Hi, This series addresses a bug where userspace can request KVM to reset dirty GFNs in memslots that do not have dirty tracking enabled. Patch 1 provides the fix. Patch 2 is an optimization to avoid unnecessary invoking of handlers in kvm_handle_hva_range() when a GFN range is entirely private. Patch 2 is not directly related to the issue in this series, but was found while implementing the selftest in patch 3. It also enhance reliability of the selftest results in patch 3 by ruling out the zap-related changes to private mappings of the test slot. Patch 3 converts the TDX-specific selftest in v2 to test KVM_X86_SW_PROTECTED_VM VMs. Unlike TDX cases which would generate KVM_BUG_ON() when GFNs are incorrectly reset in memslots not enabling dirty tracking, there are not obvious errors for KVM_X86_SW_PROTECTED_VMs. So, patch 3 detects the event kvm_tdp_mmu_spte_changed instead. Will provide TDX cases once the TDX selftest framework is finalized. v3: - Rebased patch 1. - Added patch 2. - Converted patch 3 to test KVM_X86_SW_PROTECTED_VM VMs. - code base: kvm-x86-next-2025.08.21 v2: https://lore.kernel.org/all/20241223070427.29583-1-yan.y.zhao@intel.com - Added a comment in patch 1, explaining that it's possible to try to update a memslot that isn't being dirty-logged if userspace is misbehaving. Specifically, userspace can write arbitrary data into the ring. (Sean) v1: https://lore.kernel.org/all/20241220082027.15851-1-yan.y.zhao@intel.com Yan Zhao (3): KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking KVM: Skip invoking shared memory handler for entirely private GFN ranges KVM: selftests: Test resetting dirty ring in gmem slots in protected VMs tools/testing/selftests/kvm/Makefile.kvm | 1 + .../kvm/x86/reset_dirty_ring_on_gmem_test.c | 392 ++++++++++++++++++ virt/kvm/dirty_ring.c | 8 +- virt/kvm/kvm_main.c | 11 + 4 files changed, 411 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/kvm/x86/reset_dirty_ring_on_gmem_test.c -- 2.43.2 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 1/3] KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking 2025-08-22 8:01 [PATCH v3 0/3] KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking Yan Zhao @ 2025-08-22 8:02 ` Yan Zhao 2025-08-25 20:42 ` Sean Christopherson 2025-08-22 8:02 ` [PATCH v3 2/3] KVM: Skip invoking shared memory handler for entirely private GFN ranges Yan Zhao 2025-08-22 8:03 ` [PATCH v3 3/3] KVM: selftests: Test resetting dirty ring in gmem slots in protected VMs Yan Zhao 2 siblings, 1 reply; 8+ messages in thread From: Yan Zhao @ 2025-08-22 8:02 UTC (permalink / raw) To: pbonzini, seanjc; +Cc: peterx, rick.p.edgecombe, linux-kernel, kvm, Yan Zhao Do not allow resetting dirty GFNs in memslots that do not enable dirty tracking. vCPUs' dirty rings are shared between userspace and KVM. After KVM sets dirtied entries in the dirty rings, userspace is responsible for harvesting/resetting these entries and calling the ioctl KVM_RESET_DIRTY_RINGS to inform KVM to advance the reset_index in the dirty rings and invoke kvm_arch_mmu_enable_log_dirty_pt_masked() to clear the SPTEs' dirty bits or perform write protection of the GFNs. Although KVM does not set dirty entries for GFNs in a memslot that does not enable dirty tracking, userspace can write arbitrary data into the dirty ring. This makes it possible for misbehaving userspace to specify that it has harvested a GFN from such a memslot. When this happens, KVM will be asked to clear dirty bits or perform write protection for GFNs in a memslot that does not enable dirty tracking, which is undesirable. For TDX, this unexpected resetting of dirty GFNs could cause inconsistency between the mirror SPTE and the external SPTE in hardware (e.g., the mirror SPTE has no write bit while the external SPTE is writable). When kvm_dirty_log_manual_protect_and_init_set() is true and huge pages are enabled in TDX, this could even lead to kvm_mmu_slot_gfn_write_protect() being called and trigger KVM_BUG_ON() due to permission reduction changes in the huge mirror SPTEs. Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> --- virt/kvm/dirty_ring.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c index 02bc6b00d76c..b38b4b7d7667 100644 --- a/virt/kvm/dirty_ring.c +++ b/virt/kvm/dirty_ring.c @@ -63,7 +63,13 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask) memslot = id_to_memslot(__kvm_memslots(kvm, as_id), id); - if (!memslot || (offset + __fls(mask)) >= memslot->npages) + /* + * Userspace can write arbitrary data into the dirty ring, making it + * possible for misbehaving userspace to try to reset an out-of-memslot + * GFN or a GFN in a memslot that isn't being dirty-logged. + */ + if (!memslot || (offset + __fls(mask)) >= memslot->npages || + !kvm_slot_dirty_track_enabled(memslot)) return; KVM_MMU_LOCK(kvm); -- 2.43.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3 1/3] KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking 2025-08-22 8:02 ` [PATCH v3 1/3] " Yan Zhao @ 2025-08-25 20:42 ` Sean Christopherson 2025-08-26 1:22 ` Yan Zhao 0 siblings, 1 reply; 8+ messages in thread From: Sean Christopherson @ 2025-08-25 20:42 UTC (permalink / raw) To: Yan Zhao; +Cc: pbonzini, peterx, rick.p.edgecombe, linux-kernel, kvm On Fri, Aug 22, 2025, Yan Zhao wrote: > Do not allow resetting dirty GFNs in memslots that do not enable dirty > tracking. > > vCPUs' dirty rings are shared between userspace and KVM. After KVM sets > dirtied entries in the dirty rings, userspace is responsible for > harvesting/resetting these entries and calling the ioctl > KVM_RESET_DIRTY_RINGS to inform KVM to advance the reset_index in the dirty > rings and invoke kvm_arch_mmu_enable_log_dirty_pt_masked() to clear the > SPTEs' dirty bits or perform write protection of the GFNs. > > Although KVM does not set dirty entries for GFNs in a memslot that does not > enable dirty tracking, userspace can write arbitrary data into the dirty > ring. This makes it possible for misbehaving userspace to specify that it > has harvested a GFN from such a memslot. When this happens, KVM will be > asked to clear dirty bits or perform write protection for GFNs in a memslot > that does not enable dirty tracking, which is undesirable. > > For TDX, this unexpected resetting of dirty GFNs could cause inconsistency > between the mirror SPTE and the external SPTE in hardware (e.g., the mirror > SPTE has no write bit while the external SPTE is writable). When > kvm_dirty_log_manual_protect_and_init_set() is true and huge pages are > enabled in TDX, this could even lead to kvm_mmu_slot_gfn_write_protect() > being called and trigger KVM_BUG_ON() due to permission reduction changes > in the huge mirror SPTEs. > Sounds like this needs a Fixes and Cc: stable? > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > --- > virt/kvm/dirty_ring.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c > index 02bc6b00d76c..b38b4b7d7667 100644 > --- a/virt/kvm/dirty_ring.c > +++ b/virt/kvm/dirty_ring.c > @@ -63,7 +63,13 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask) > > memslot = id_to_memslot(__kvm_memslots(kvm, as_id), id); > > - if (!memslot || (offset + __fls(mask)) >= memslot->npages) > + /* > + * Userspace can write arbitrary data into the dirty ring, making it > + * possible for misbehaving userspace to try to reset an out-of-memslot > + * GFN or a GFN in a memslot that isn't being dirty-logged. > + */ > + if (!memslot || (offset + __fls(mask)) >= memslot->npages || > + !kvm_slot_dirty_track_enabled(memslot)) Maybe check for dirty tracking being enabled before checking the range? Purely because checking if _any_ gfn can be recorded seems like something that should be checked before a specific gfn can be recorded. I.e. if (!memslot || !kvm_slot_dirty_track_enabled(memslot) || (offset + __fls(mask)) >= memslot->npages) > return; > > KVM_MMU_LOCK(kvm); > -- > 2.43.2 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 1/3] KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking 2025-08-25 20:42 ` Sean Christopherson @ 2025-08-26 1:22 ` Yan Zhao 0 siblings, 0 replies; 8+ messages in thread From: Yan Zhao @ 2025-08-26 1:22 UTC (permalink / raw) To: Sean Christopherson; +Cc: pbonzini, peterx, rick.p.edgecombe, linux-kernel, kvm On Mon, Aug 25, 2025 at 01:42:43PM -0700, Sean Christopherson wrote: > On Fri, Aug 22, 2025, Yan Zhao wrote: > > Do not allow resetting dirty GFNs in memslots that do not enable dirty > > tracking. > > > > vCPUs' dirty rings are shared between userspace and KVM. After KVM sets > > dirtied entries in the dirty rings, userspace is responsible for > > harvesting/resetting these entries and calling the ioctl > > KVM_RESET_DIRTY_RINGS to inform KVM to advance the reset_index in the dirty > > rings and invoke kvm_arch_mmu_enable_log_dirty_pt_masked() to clear the > > SPTEs' dirty bits or perform write protection of the GFNs. > > > > Although KVM does not set dirty entries for GFNs in a memslot that does not > > enable dirty tracking, userspace can write arbitrary data into the dirty > > ring. This makes it possible for misbehaving userspace to specify that it > > has harvested a GFN from such a memslot. When this happens, KVM will be > > asked to clear dirty bits or perform write protection for GFNs in a memslot > > that does not enable dirty tracking, which is undesirable. > > > > For TDX, this unexpected resetting of dirty GFNs could cause inconsistency > > between the mirror SPTE and the external SPTE in hardware (e.g., the mirror > > SPTE has no write bit while the external SPTE is writable). When > > kvm_dirty_log_manual_protect_and_init_set() is true and huge pages are > > enabled in TDX, this could even lead to kvm_mmu_slot_gfn_write_protect() > > being called and trigger KVM_BUG_ON() due to permission reduction changes > > in the huge mirror SPTEs. > > > > Sounds like this needs a Fixes and Cc: stable? Ok. Will include them in the next version. > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > > --- > > virt/kvm/dirty_ring.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c > > index 02bc6b00d76c..b38b4b7d7667 100644 > > --- a/virt/kvm/dirty_ring.c > > +++ b/virt/kvm/dirty_ring.c > > @@ -63,7 +63,13 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask) > > > > memslot = id_to_memslot(__kvm_memslots(kvm, as_id), id); > > > > - if (!memslot || (offset + __fls(mask)) >= memslot->npages) > > + /* > > + * Userspace can write arbitrary data into the dirty ring, making it > > + * possible for misbehaving userspace to try to reset an out-of-memslot > > + * GFN or a GFN in a memslot that isn't being dirty-logged. > > + */ > > + if (!memslot || (offset + __fls(mask)) >= memslot->npages || > > + !kvm_slot_dirty_track_enabled(memslot)) > > Maybe check for dirty tracking being enabled before checking the range? Purely > because checking if _any_ gfn can be recorded seems like something that should > be checked before a specific gfn can be recorded. I.e. > > if (!memslot || !kvm_slot_dirty_track_enabled(memslot) || > (offset + __fls(mask)) >= memslot->npages) Makes sense. Thank you! > > return; > > > > KVM_MMU_LOCK(kvm); > > -- > > 2.43.2 > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 2/3] KVM: Skip invoking shared memory handler for entirely private GFN ranges 2025-08-22 8:01 [PATCH v3 0/3] KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking Yan Zhao 2025-08-22 8:02 ` [PATCH v3 1/3] " Yan Zhao @ 2025-08-22 8:02 ` Yan Zhao 2025-08-25 21:05 ` Sean Christopherson 2025-08-22 8:03 ` [PATCH v3 3/3] KVM: selftests: Test resetting dirty ring in gmem slots in protected VMs Yan Zhao 2 siblings, 1 reply; 8+ messages in thread From: Yan Zhao @ 2025-08-22 8:02 UTC (permalink / raw) To: pbonzini, seanjc; +Cc: peterx, rick.p.edgecombe, linux-kernel, kvm, Yan Zhao When a GFN range is entirely private, it's unnecessary for kvm_handle_hva_range() to invoke handlers for the GFN range, because 1) the gfn_range.attr_filter for the handler is KVM_FILTER_SHARED, which is for shared mappings only; 2) KVM has already zapped all shared mappings before setting the memory attribute to private. This can avoid unnecessary zaps on private mappings for VMs of type KVM_X86_SW_PROTECTED_VM, e.g., during auto numa balancing scans of VMAs. Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> --- virt/kvm/kvm_main.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f769d1dccc21..e615ad405ce4 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -620,6 +620,17 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm, gfn_range.slot = slot; gfn_range.lockless = range->lockless; +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES + /* + * If GFN range are all private, no need to invoke the + * handler. + */ + if (kvm_range_has_memory_attributes(kvm, gfn_range.start, + gfn_range.end, ~0, + KVM_MEMORY_ATTRIBUTE_PRIVATE)) + continue; +#endif + if (!r.found_memslot) { r.found_memslot = true; if (!range->lockless) { -- 2.43.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3 2/3] KVM: Skip invoking shared memory handler for entirely private GFN ranges 2025-08-22 8:02 ` [PATCH v3 2/3] KVM: Skip invoking shared memory handler for entirely private GFN ranges Yan Zhao @ 2025-08-25 21:05 ` Sean Christopherson 2025-08-26 6:51 ` Yan Zhao 0 siblings, 1 reply; 8+ messages in thread From: Sean Christopherson @ 2025-08-25 21:05 UTC (permalink / raw) To: Yan Zhao; +Cc: pbonzini, peterx, rick.p.edgecombe, linux-kernel, kvm On Fri, Aug 22, 2025, Yan Zhao wrote: > When a GFN range is entirely private, it's unnecessary for > kvm_handle_hva_range() to invoke handlers for the GFN range, because > 1) the gfn_range.attr_filter for the handler is KVM_FILTER_SHARED, which > is for shared mappings only; > 2) KVM has already zapped all shared mappings before setting the memory > attribute to private. > > This can avoid unnecessary zaps on private mappings for VMs of type > KVM_X86_SW_PROTECTED_VM, e.g., during auto numa balancing scans of VMAs. This feels like the wrong place to try and optimize spurious zaps. x86 should be skipping SPTEs that don't match. For KVM_X86_SW_PROTECTED_VM, I don't think we care about spurious zpas, because that's a testing-only type that doesn't have line of sight to be being a "real" type. For SNP, we might care? But actually zapping private SPTEs would require userspace to retain the shared mappings across a transition, _and_ be running NUMA autobalancing in the first place. If someone actually cares about optimizing this scenario, KVM x86 could track private SPTEs via a software-available bit. We also want to move away from KVM_MEMORY_ATTRIBUTE_PRIVATE and instead track private vs. shared in the gmem instance. So I'm inclined to skip this... > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > --- > virt/kvm/kvm_main.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index f769d1dccc21..e615ad405ce4 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -620,6 +620,17 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm, > gfn_range.slot = slot; > gfn_range.lockless = range->lockless; > > +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES > + /* > + * If GFN range are all private, no need to invoke the > + * handler. > + */ > + if (kvm_range_has_memory_attributes(kvm, gfn_range.start, > + gfn_range.end, ~0, > + KVM_MEMORY_ATTRIBUTE_PRIVATE)) > + continue; > +#endif > + > if (!r.found_memslot) { > r.found_memslot = true; > if (!range->lockless) { > -- > 2.43.2 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 2/3] KVM: Skip invoking shared memory handler for entirely private GFN ranges 2025-08-25 21:05 ` Sean Christopherson @ 2025-08-26 6:51 ` Yan Zhao 0 siblings, 0 replies; 8+ messages in thread From: Yan Zhao @ 2025-08-26 6:51 UTC (permalink / raw) To: Sean Christopherson; +Cc: pbonzini, peterx, rick.p.edgecombe, linux-kernel, kvm On Mon, Aug 25, 2025 at 02:05:22PM -0700, Sean Christopherson wrote: > On Fri, Aug 22, 2025, Yan Zhao wrote: > > When a GFN range is entirely private, it's unnecessary for > > kvm_handle_hva_range() to invoke handlers for the GFN range, because > > 1) the gfn_range.attr_filter for the handler is KVM_FILTER_SHARED, which > > is for shared mappings only; > > 2) KVM has already zapped all shared mappings before setting the memory > > attribute to private. > > > > This can avoid unnecessary zaps on private mappings for VMs of type > > KVM_X86_SW_PROTECTED_VM, e.g., during auto numa balancing scans of VMAs. > > This feels like the wrong place to try and optimize spurious zaps. x86 should > be skipping SPTEs that don't match. For KVM_X86_SW_PROTECTED_VM, I don't think > we care about spurious zpas, because that's a testing-only type that doesn't have > line of sight to be being a "real" type. > > For SNP, we might care? But actually zapping private SPTEs would require > userspace to retain the shared mappings across a transition, _and_ be running > NUMA autobalancing in the first place. If someone actually cares about optimizing Hmm, "running NUMA autobalancing" + "madvise(MADV_DONTNEED)" can still trigger the spurious zaps. task_numa_work ==> found a VMA change_prot_numa change_protection change_pud_range ==> mmu_notifier_invalidate_range_start() if !pud_none() Let me use munmap() in patch 3 to guard againt spurious zap then. > this scenario, KVM x86 could track private SPTEs via a software-available bit. > > We also want to move away from KVM_MEMORY_ATTRIBUTE_PRIVATE and instead track > private vs. shared in the gmem instance. > > So I'm inclined to skip this... Fair enough. Thank you for the detailed explanation! ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 3/3] KVM: selftests: Test resetting dirty ring in gmem slots in protected VMs 2025-08-22 8:01 [PATCH v3 0/3] KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking Yan Zhao 2025-08-22 8:02 ` [PATCH v3 1/3] " Yan Zhao 2025-08-22 8:02 ` [PATCH v3 2/3] KVM: Skip invoking shared memory handler for entirely private GFN ranges Yan Zhao @ 2025-08-22 8:03 ` Yan Zhao 2 siblings, 0 replies; 8+ messages in thread From: Yan Zhao @ 2025-08-22 8:03 UTC (permalink / raw) To: pbonzini, seanjc; +Cc: peterx, rick.p.edgecombe, linux-kernel, kvm, Yan Zhao Test resetting dirty ring in slots with the KVM_MEM_GUEST_MEMFD flag in KVM_X86_SW_PROTECTED_VM VMs. Purposely resetting dirty ring entries incorrectly to point to a gmem slot. Unlike in TDX VMs, where resetting the dirty ring in a gmem slot could trigger KVM_BUG_ON(), there are no obvious errors for KVM_X86_SW_PROTECTED_VM VMs. Therefore, detect SPTE changes by reading trace messages with the kvm_tdp_mmu_spte_changed event enabled. Consequently, the test is conducted only when tdp_mmu is enabled and tracing is available. Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> --- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../kvm/x86/reset_dirty_ring_on_gmem_test.c | 392 ++++++++++++++++++ 2 files changed, 393 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86/reset_dirty_ring_on_gmem_test.c diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm index f6fe7a07a0a2..ebd1d829c3f9 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -136,6 +136,7 @@ TEST_GEN_PROGS_x86 += x86/max_vcpuid_cap_test TEST_GEN_PROGS_x86 += x86/triple_fault_event_test TEST_GEN_PROGS_x86 += x86/recalc_apic_map_test TEST_GEN_PROGS_x86 += x86/aperfmperf_test +TEST_GEN_PROGS_x86 += x86/reset_dirty_ring_on_gmem_test TEST_GEN_PROGS_x86 += access_tracking_perf_test TEST_GEN_PROGS_x86 += coalesced_io_test TEST_GEN_PROGS_x86 += dirty_log_perf_test diff --git a/tools/testing/selftests/kvm/x86/reset_dirty_ring_on_gmem_test.c b/tools/testing/selftests/kvm/x86/reset_dirty_ring_on_gmem_test.c new file mode 100644 index 000000000000..cf1746c0149f --- /dev/null +++ b/tools/testing/selftests/kvm/x86/reset_dirty_ring_on_gmem_test.c @@ -0,0 +1,392 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Test reset dirty ring on gmem slot on x86. + * Copyright (C) 2025, Intel, Inc. + * + * The slot flag KVM_MEM_GUEST_MEMFD is incompatible with the flag + * KVM_MEM_LOG_DIRTY_PAGES, which means KVM does not permit dirty page tracking + * on gmem slots. + * + * When dirty ring is enabled, although KVM does not mark GFNs in gmem slots as + * dirty, userspace can reset and write arbitrary data into the dirty ring + * entries shared between KVM and userspace. This can lead KVM to incorrectly + * clear write permission or dirty bits on SPTEs of gmem slots. + * + * While this might be harmless for non-TDX VMs, it could cause inconsistencies + * between the mirror SPTEs and the external SPTEs in hardware, or even trigger + * KVM_BUG_ON() for TDX. + * + * Purposely reset dirty ring incorrectly on gmem slots (gmem slots do not allow + * dirty page tracking) to verify malbehaved userspace cannot cause any SPTE + * permission reduction change. + * + * Steps conducted in this test: + * 1. echo nop > ${TRACING_ROOT}/current_tracer + * echo 1 > ${TRACING_ROOT}/events/kvmmmu/kvm_tdp_mmu_spte_changed/enable + * echo > ${TRACING_ROOT}/set_event_pid + * echo > ${TRACING_ROOT}/set_event_notrace_pid + * + * 2. echo "common_pid == $tid && gfn == 0xc0400" > \ + * ${TRACING_ROOT}/events/kvmmmu/kvm_tdp_mmu_spte_changed/filter + * + * 3. echo 0 > ${TRACING_ROOT}/tracing_on + * echo > ${TRACING_ROOT}/trace + * echo 1 > ${TRACING_ROOT}/tracing_on + * + * 4. purposely reset dirty ring incorrectly + * + * 5. cat ${TRACING_ROOT}/trace + */ +#include <linux/kvm.h> +#include <asm/barrier.h> +#include <test_util.h> +#include <kvm_util.h> +#include <processor.h> + +#define DEBUGFS "/sys/kernel/debug/tracing" +#define TRACEFS "/sys/kernel/tracing" + +#define TEST_DIRTY_RING_GPA (0xc0400000) +#define TEST_DIRTY_RING_GVA (0x90400000) +#define TEST_DIRTY_RING_REGION_SLOT 11 +#define TEST_DIRTY_RING_REGION_SIZE 0x200000 +#define TEST_DIRTY_RING_COUNT 4096 +#define TEST_DIRTY_RING_GUEST_WRITE_MAX_CNT 3 + +static const char *PATTEN = "spte_changed"; +static char *tracing_root; + +static int open_path(char *subpath, int flags) +{ + static char path[100]; + int count, fd; + + count = snprintf(path, sizeof(path), "%s/%s", tracing_root, subpath); + TEST_ASSERT(count > 0, "Incorrect path\n"); + fd = open(path, flags); + TEST_ASSERT(fd >= 0, "Cannot open %s\n", path); + + return fd; +} + +static void setup_tracing(void) +{ + int fd; + + /* set current_tracer to nop */ + fd = open_path("current_tracer", O_WRONLY); + test_write(fd, "nop\n", 4); + close(fd); + + /* turn on event kvm_tdp_mmu_spte_changed */ + fd = open_path("events/kvmmmu/kvm_tdp_mmu_spte_changed/enable", O_WRONLY); + test_write(fd, "1\n", 2); + close(fd); + + /* clear set_event_pid & set_event_notrace_pid */ + fd = open_path("set_event_pid", O_WRONLY | O_TRUNC); + close(fd); + + fd = open_path("set_event_notrace_pid", O_WRONLY | O_TRUNC); + close(fd); +} + +static void filter_event(void) +{ + int count, fd; + char buf[100]; + + fd = open_path("events/kvmmmu/kvm_tdp_mmu_spte_changed/filter", + O_WRONLY | O_TRUNC); + + count = snprintf(buf, sizeof(buf), "common_pid == %d && gfn == 0x%x\n", + gettid(), TEST_DIRTY_RING_GPA >> PAGE_SHIFT); + TEST_ASSERT(count > 0, "Incorrect number of data written\n"); + test_write(fd, buf, count); + close(fd); +} + +static void enable_tracing(bool enable) +{ + char *val = enable ? "1\n" : "0\n"; + int fd; + + if (enable) { + /* clear trace log before enabling */ + fd = open_path("trace", O_WRONLY | O_TRUNC); + close(fd); + } + + fd = open_path("tracing_on", O_WRONLY); + test_write(fd, val, 2); + close(fd); +} + +static void reset_tracing(void) +{ + enable_tracing(false); + enable_tracing(true); +} + +static void detect_spte_change(void) +{ + static char buf[1024]; + FILE *file; + int count; + + count = snprintf(buf, sizeof(buf), "%s/trace", tracing_root); + TEST_ASSERT(count > 0, "Incorrect path\n"); + file = fopen(buf, "r"); + TEST_ASSERT(file, "Cannot open %s\n", buf); + + while (fgets(buf, sizeof(buf), file)) + TEST_ASSERT(!strstr(buf, PATTEN), "Unexpected SPTE change %s\n", buf); + + fclose(file); +} + +/* + * Write to a gmem slot and exit to host after each write to allow host to check + * dirty ring. + */ +void guest_code(void) +{ + uint64_t count = 0; + + while (count < TEST_DIRTY_RING_GUEST_WRITE_MAX_CNT) { + count++; + memset((void *)TEST_DIRTY_RING_GVA, 1, 8); + GUEST_SYNC(count); + } + GUEST_DONE(); +} + +/* + * Verify that KVM_MEM_LOG_DIRTY_PAGES cannot be set on a memslot with flag + * KVM_MEM_GUEST_MEMFD. + */ +static void verify_turn_on_log_dirty_pages_flag(struct kvm_vcpu *vcpu) +{ + struct userspace_mem_region *region; + int ret; + + region = memslot2region(vcpu->vm, TEST_DIRTY_RING_REGION_SLOT); + region->region.flags |= KVM_MEM_LOG_DIRTY_PAGES; + + ret = __vm_ioctl(vcpu->vm, KVM_SET_USER_MEMORY_REGION2, ®ion->region); + + TEST_ASSERT(ret, "KVM_SET_USER_MEMORY_REGION2 incorrectly succeeds\n"); + region->region.flags &= ~KVM_MEM_LOG_DIRTY_PAGES; +} + +static inline bool dirty_gfn_is_dirtied(struct kvm_dirty_gfn *gfn) +{ + return smp_load_acquire(&gfn->flags) == KVM_DIRTY_GFN_F_DIRTY; +} + +static inline void dirty_gfn_set_collected(struct kvm_dirty_gfn *gfn) +{ + smp_store_release(&gfn->flags, KVM_DIRTY_GFN_F_RESET); +} + +static bool dirty_ring_empty(struct kvm_vcpu *vcpu) +{ + struct kvm_dirty_gfn *dirty_gfns = vcpu_map_dirty_ring(vcpu); + struct kvm_dirty_gfn *cur; + int i; + + for (i = 0; i < TEST_DIRTY_RING_COUNT; i++) { + cur = &dirty_gfns[i]; + + if (dirty_gfn_is_dirtied(cur)) + return false; + } + return true; +} + +/* + * Purposely reset the dirty ring incorrectly by resetting a dirty ring entry + * even when KVM does not report the entry as dirty. + * + * In the kvm_dirty_gfn entry, specify the slot to the gmem slot that does not + * allow dirty page tracking and has no flag KVM_MEM_LOG_DIRTY_PAGES. + */ +static void reset_dirty_ring(struct kvm_vcpu *vcpu, int *reset_index) +{ + struct kvm_dirty_gfn *dirty_gfns = vcpu_map_dirty_ring(vcpu); + struct kvm_dirty_gfn *cur = &dirty_gfns[*reset_index]; + uint32_t cleared; + + reset_tracing(); + + cur->slot = TEST_DIRTY_RING_REGION_SLOT; + cur->offset = 0; + dirty_gfn_set_collected(cur); + cleared = kvm_vm_reset_dirty_ring(vcpu->vm); + *reset_index += cleared; + TEST_ASSERT(cleared == 1, "Unexpected cleared count %d\n", cleared); + + detect_spte_change(); +} + +/* + * The vCPU worker to loop vcpu_run(). After each vCPU access to a GFN, check if + * the dirty ring is empty and reset the dirty ring. + */ +static void reset_dirty_ring_worker(struct kvm_vcpu *vcpu) +{ + struct kvm_run *run = vcpu->run; + struct ucall uc; + uint64_t cmd; + int index = 0; + + filter_event(); + while (1) { + vcpu_run(vcpu); + + if (run->exit_reason == KVM_EXIT_IO) { + cmd = get_ucall(vcpu, &uc); + if (cmd != UCALL_SYNC) + break; + + TEST_ASSERT(dirty_ring_empty(vcpu), + "Guest write should not cause GFN dirty\n"); + + reset_dirty_ring(vcpu, &index); + } + } +} + +static struct kvm_vm *create_vm(unsigned long vm_type, struct kvm_vcpu **vcpu, + bool private) +{ + unsigned int npages = TEST_DIRTY_RING_REGION_SIZE / getpagesize(); + const struct vm_shape shape = { + .mode = VM_MODE_DEFAULT, + .type = vm_type, + }; + struct kvm_vm *vm; + + vm = __vm_create(shape, 1, 0); + vm_enable_dirty_ring(vm, TEST_DIRTY_RING_COUNT * sizeof(struct kvm_dirty_gfn)); + *vcpu = vm_vcpu_add(vm, 0, guest_code); + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, + TEST_DIRTY_RING_GPA, + TEST_DIRTY_RING_REGION_SLOT, + npages, KVM_MEM_GUEST_MEMFD); + vm->memslots[MEM_REGION_TEST_DATA] = TEST_DIRTY_RING_REGION_SLOT; + virt_map(vm, TEST_DIRTY_RING_GVA, TEST_DIRTY_RING_GPA, npages); + if (private) + vm_mem_set_private(vm, TEST_DIRTY_RING_GPA, + TEST_DIRTY_RING_REGION_SIZE); + return vm; +} + +struct test_config { + unsigned long vm_type; + bool manual_protect_and_init_set; + bool private_access; + char *test_desc; +}; + +void test_dirty_ring_on_gmem_slot(struct test_config *config) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + + if (config->vm_type && + !(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(config->vm_type))) { + ksft_test_result_skip("\n"); + return; + } + + vm = create_vm(config->vm_type, &vcpu, config->private_access); + + /* + * Let KVM detect that kvm_dirty_log_manual_protect_and_init_set() is + * true in kvm_arch_mmu_enable_log_dirty_pt_masked() to check if + * kvm_mmu_slot_gfn_write_protect() will be called on a gmem memslot. + */ + if (config->manual_protect_and_init_set) { + u64 manual_caps; + + manual_caps = kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2); + + manual_caps &= (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | + KVM_DIRTY_LOG_INITIALLY_SET); + + if (!manual_caps) + return; + + vm_enable_cap(vm, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2, manual_caps); + } + + verify_turn_on_log_dirty_pages_flag(vcpu); + + reset_dirty_ring_worker(vcpu); + + kvm_vm_free(vm); + ksft_test_result_pass("\n"); +} + +static bool dirty_ring_supported(void) +{ + return (kvm_has_cap(KVM_CAP_DIRTY_LOG_RING) || + kvm_has_cap(KVM_CAP_DIRTY_LOG_RING_ACQ_REL)); +} + +static bool has_tracing(void) +{ + if (faccessat(AT_FDCWD, DEBUGFS, F_OK, AT_EACCESS) == 0) { + tracing_root = DEBUGFS; + return true; + } + + if (faccessat(AT_FDCWD, TRACEFS, F_OK, AT_EACCESS) == 0) { + tracing_root = TRACEFS; + return true; + } + + return false; +} + +static struct test_config tests[] = { + { + .vm_type = KVM_X86_SW_PROTECTED_VM, + .manual_protect_and_init_set = false, + .private_access = true, + .test_desc = "SW_PROTECTED_VM, manual_protect_and_init_set=false, private access", + }, + { + .vm_type = KVM_X86_SW_PROTECTED_VM, + .manual_protect_and_init_set = true, + .private_access = true, + .test_desc = "SW_PROTECTED_VM, manual_protect_and_init_set=true, private access", + }, +}; + +int main(int argc, char **argv) +{ + int test_cnt = ARRAY_SIZE(tests); + + ksft_print_header(); + ksft_set_plan(test_cnt); + + TEST_REQUIRE(get_kvm_param_bool("tdp_mmu")); + TEST_REQUIRE(has_tracing()); + TEST_REQUIRE(dirty_ring_supported()); + + setup_tracing(); + + for (int i = 0; i < test_cnt; i++) { + pthread_t vm_thread; + + pthread_create(&vm_thread, NULL, + (void *(*)(void *))test_dirty_ring_on_gmem_slot, + &tests[i]); + pthread_join(vm_thread, NULL); + } + + ksft_finished(); + return 0; +} -- 2.43.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-08-26 6:52 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-22 8:01 [PATCH v3 0/3] KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking Yan Zhao 2025-08-22 8:02 ` [PATCH v3 1/3] " Yan Zhao 2025-08-25 20:42 ` Sean Christopherson 2025-08-26 1:22 ` Yan Zhao 2025-08-22 8:02 ` [PATCH v3 2/3] KVM: Skip invoking shared memory handler for entirely private GFN ranges Yan Zhao 2025-08-25 21:05 ` Sean Christopherson 2025-08-26 6:51 ` Yan Zhao 2025-08-22 8:03 ` [PATCH v3 3/3] KVM: selftests: Test resetting dirty ring in gmem slots in protected VMs Yan Zhao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).