From: Sean Christopherson <seanjc@google.com>
To: "Nikunj A. Dadhania" <nikunj@amd.com>
Cc: kvm@vger.kernel.org, pbonzini@redhat.com,
thomas.lendacky@amd.com, bp@alien8.de,
joao.m.martins@oracle.com, kai.huang@intel.com
Subject: Re: [PATCH v6 7/7] KVM: SVM: Add Page modification logging support
Date: Tue, 21 Apr 2026 08:08:28 -0700 [thread overview]
Message-ID: <aeeS7LDPJ6NCy3Uw@google.com> (raw)
In-Reply-To: <34cfe5e8-756a-435a-a73d-54bf69801161@amd.com>
On Mon, Apr 20, 2026, Nikunj A. Dadhania wrote:
> Sashiko reported a couple of issues [1]. Let me address them here:
>
> On 4/7/2026 12:02 PM, Nikunj A Dadhania wrote:
> > @@ -1206,6 +1209,16 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
> > if (vcpu->kvm->arch.bus_lock_detection_enabled)
> > svm_set_intercept(svm, INTERCEPT_BUSLOCK);
> >
> > + if (pml) {
> > + /*
> > + * Populate the page address and index here, PML is enabled
> > + * when dirty logging is enabled on the memslot through
> > + * svm_update_cpu_dirty_logging()
> > + */
> > + control->pml_addr = (u64)__sme_set(page_to_phys(vcpu->arch.pml_page));
> > + control->pml_index = PML_HEAD_INDEX;
> > + }
> > +
>
> > If the guest receives an INIT IPI and init_vmcb() is called to reset the
> > vCPU, does unconditionally setting pml_index to PML_HEAD_INDEX discard any
> > un-flushed dirty GPAs logged by the hardware?
>
> There are two scenarios where init_vmcb() is called:
>
> 1) During vCPU creation time, where we need to set pml_index to PML_HEAD_INDEX
> 2) During vCPU reset, when init_event=true
Nit, please just call this INIT, even though KVM calls the function "reset". #1
is KVM's one and only true RESET flow.
> Before vCPU reset:
> vcpu_enter_guest()
> └─> kvm_x86_call(vcpu_run) [VMRUN]
> └─> [guest executes, PML accumulates dirty pages]
> └─> VMEXIT
> └─> svm_handle_exit() --> PML buffer flushed here
> └─> return to vcpu_run()
>
> vCPU Reset:
> vcpu_enter_guest()
> ├─> kvm_check_request(KVM_REQ_EVENT)
> ├─> kvm_apic_accept_events()
> │ └─> kvm_vcpu_reset(..., true)
> │ └─> init_vmcb(..., true)
> │ └─> control->pml_index = PML_HEAD_INDEX -- PML buffer was already flushed
> └─> kvm_x86_call(): Next VMRUN
>
> > Could this result in the hypervisor losing track of dirty memory during live
> > migration, leading to memory corruption on the destination host, since
> > svm_flush_pml_buffer() isn't called before resetting the index?
>
> AFAIU, no. The PML buffer is always flushed opportunistically at every VM exit.
Huh. There's a pre-existing bug here. Commit f7f39c50edb9 ("KVM: x86: Exit to
userspace if fastpath triggers one on instruction skip") added a path that skips
kvm_x86_ops.handle_exit(), and specifically can give userspace control without
going through vmx_flush_pml_buffer():
if (unlikely(exit_fastpath == EXIT_FASTPATH_EXIT_USERSPACE))
return 0;
r = kvm_x86_call(handle_exit)(vcpu, exit_fastpath);
Given that SVM support for PML is (obviously) on its way, it's mildly tempting
to add a dedicated kvm_x86_ops hook to flush the buffer on a fastpath userspace
exit. But, I dislike one-off kvm_x86_ops hooks, and that only works if there's
no other vendor action required. E.g. very theoretically, a fastpath userspace
exit could also be coincident with bus_lock_detected.
Yikes! And I think userspace could see a stale CR0 and/or CR3 on AMD. Hmm, but
waiting until the full exit path to grab CR0 and CR3 is flawed on its own, e.g.
it's a bug waiting to happen if KVM consumes CR3 in the fastpath.
So over two patches, I think the fix for those issues is the below. I'll test
and send a mini-series. I don't think there's anything you need to do; I can
resolve the resulting conflict easy enough.
diff --git arch/x86/kvm/svm/svm.c arch/x86/kvm/svm/svm.c
index e7fdd7a9c280..df0bd132edf7 100644
--- arch/x86/kvm/svm/svm.c
+++ arch/x86/kvm/svm/svm.c
@@ -3644,13 +3644,8 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
struct vcpu_svm *svm = to_svm(vcpu);
struct kvm_run *kvm_run = vcpu->run;
- /* SEV-ES guests must use the CR write traps to track CR registers. */
- if (!is_sev_es_guest(vcpu)) {
- if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE))
- vcpu->arch.cr0 = svm->vmcb->save.cr0;
- if (npt_enabled)
- vcpu->arch.cr3 = svm->vmcb->save.cr3;
- }
+ if (unlikely(exit_fastpath == EXIT_FASTPATH_EXIT_USERSPACE))
+ return 0;
if (is_guest_mode(vcpu)) {
int vmexit;
@@ -4502,11 +4497,17 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
x86_spec_ctrl_restore_host(svm->virt_spec_ctrl);
+ /* SEV-ES guests must use the CR write traps to track CR registers. */
if (!is_sev_es_guest(vcpu)) {
vcpu->arch.cr2 = svm->vmcb->save.cr2;
vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax;
vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
+
+ if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE))
+ vcpu->arch.cr0 = svm->vmcb->save.cr0;
+ if (npt_enabled)
+ vcpu->arch.cr3 = svm->vmcb->save.cr3;
}
vcpu->arch.regs_dirty = 0;
diff --git arch/x86/kvm/vmx/vmx.c arch/x86/kvm/vmx/vmx.c
index a29896a9ef14..4cb355ecfe46 100644
--- arch/x86/kvm/vmx/vmx.c
+++ arch/x86/kvm/vmx/vmx.c
@@ -6687,6 +6687,9 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
if (enable_pml && !is_guest_mode(vcpu))
vmx_flush_pml_buffer(vcpu);
+ if (unlikely(exit_fastpath == EXIT_FASTPATH_EXIT_USERSPACE))
+ return 0;
+
/*
* KVM should never reach this point with a pending nested VM-Enter.
* More specifically, short-circuiting VM-Entry to emulate L2 due to
diff --git arch/x86/kvm/x86.c arch/x86/kvm/x86.c
index 0a1b63c63d1a..9ad7ec3bf0f1 100644
--- arch/x86/kvm/x86.c
+++ arch/x86/kvm/x86.c
@@ -11602,9 +11602,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (vcpu->arch.apic_attention)
kvm_lapic_sync_from_vapic(vcpu);
- if (unlikely(exit_fastpath == EXIT_FASTPATH_EXIT_USERSPACE))
- return 0;
-
r = kvm_x86_call(handle_exit)(vcpu, exit_fastpath);
return r;
prev parent reply other threads:[~2026-04-21 15:08 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-07 6:32 [PATCH v6 0/7] KVM: SVM: Add Page Modification Logging (PML) support Nikunj A Dadhania
2026-04-07 6:32 ` [PATCH v6 1/7] KVM: x86: Carve out PML flush routine Nikunj A Dadhania
2026-04-07 6:32 ` [PATCH v6 2/7] KVM: x86: Move PML page to common vcpu arch structure Nikunj A Dadhania
2026-04-07 6:32 ` [PATCH v6 3/7] KVM: VMX: Use cpu_dirty_log_size instead of enable_pml for PML checks Nikunj A Dadhania
2026-04-07 6:32 ` [PATCH v6 4/7] x86/cpufeatures: Add Page modification logging Nikunj A Dadhania
2026-04-07 6:32 ` [PATCH v6 5/7] KVM: SVM: Use BIT_ULL for 64-bit nested_ctl bit definitions Nikunj A Dadhania
2026-04-07 6:32 ` [PATCH v6 6/7] KVM: nSVM: Add helpers to temporarily switch to vmcb01 Nikunj A Dadhania
2026-04-07 6:32 ` [PATCH v6 7/7] KVM: SVM: Add Page modification logging support Nikunj A Dadhania
2026-04-20 6:38 ` Nikunj A. Dadhania
2026-04-21 15:08 ` Sean Christopherson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeeS7LDPJ6NCy3Uw@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=joao.m.martins@oracle.com \
--cc=kai.huang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=nikunj@amd.com \
--cc=pbonzini@redhat.com \
--cc=thomas.lendacky@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox