All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: "Nikunj A. Dadhania" <nikunj@amd.com>
Cc: kvm@vger.kernel.org, pbonzini@redhat.com,
	thomas.lendacky@amd.com,  bp@alien8.de,
	joao.m.martins@oracle.com, kai.huang@intel.com
Subject: Re: [PATCH v6 7/7] KVM: SVM: Add Page modification logging support
Date: Tue, 21 Apr 2026 08:08:28 -0700	[thread overview]
Message-ID: <aeeS7LDPJ6NCy3Uw@google.com> (raw)
In-Reply-To: <34cfe5e8-756a-435a-a73d-54bf69801161@amd.com>

On Mon, Apr 20, 2026, Nikunj A. Dadhania wrote:
> Sashiko reported a couple of issues [1]. Let me address them here:
> 
> On 4/7/2026 12:02 PM, Nikunj A Dadhania wrote:
> > @@ -1206,6 +1209,16 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
> >  	if (vcpu->kvm->arch.bus_lock_detection_enabled)
> >  		svm_set_intercept(svm, INTERCEPT_BUSLOCK);
> >  
> > +	if (pml) {
> > +		/*
> > +		 * Populate the page address and index here, PML is enabled
> > +		 * when dirty logging is enabled on the memslot through
> > +		 * svm_update_cpu_dirty_logging()
> > +		 */
> > +		control->pml_addr = (u64)__sme_set(page_to_phys(vcpu->arch.pml_page));
> > +		control->pml_index = PML_HEAD_INDEX;
> > +	}
> > +
> 
> > If the guest receives an INIT IPI and init_vmcb() is called to reset the
> > vCPU, does unconditionally setting pml_index to PML_HEAD_INDEX discard any
> > un-flushed dirty GPAs logged by the hardware?
> 
> There are two scenarios where init_vmcb() is called:
> 
> 1) During vCPU creation time, where we need to set pml_index to PML_HEAD_INDEX
> 2) During vCPU reset, when init_event=true

Nit, please just call this INIT, even though KVM calls the function "reset".  #1
is KVM's one and only true RESET flow.

>   Before vCPU reset:
>     vcpu_enter_guest()
>       └─> kvm_x86_call(vcpu_run) [VMRUN]
>       └─> [guest executes, PML accumulates dirty pages]
>       └─> VMEXIT
>       └─> svm_handle_exit() --> PML buffer flushed here
>       └─> return to vcpu_run()
> 
>   vCPU Reset:
>     vcpu_enter_guest()
>       ├─> kvm_check_request(KVM_REQ_EVENT)
>       ├─> kvm_apic_accept_events()
>       │     └─> kvm_vcpu_reset(..., true)
>       │           └─> init_vmcb(..., true)
>       │                 └─> control->pml_index = PML_HEAD_INDEX -- PML buffer was already flushed
>       └─> kvm_x86_call(): Next VMRUN
> 
> > Could this result in the hypervisor losing track of dirty memory during live
> > migration, leading to memory corruption on the destination host, since
> > svm_flush_pml_buffer() isn't called before resetting the index?
> 
> AFAIU, no. The PML buffer is always flushed opportunistically at every VM exit.

Huh.  There's a pre-existing bug here.  Commit f7f39c50edb9 ("KVM: x86: Exit to
userspace if fastpath triggers one on instruction skip") added a path that skips
kvm_x86_ops.handle_exit(), and specifically can give userspace control without
going through vmx_flush_pml_buffer():

	if (unlikely(exit_fastpath == EXIT_FASTPATH_EXIT_USERSPACE))
		return 0;

	r = kvm_x86_call(handle_exit)(vcpu, exit_fastpath);

Given that SVM support for PML is (obviously) on its way, it's mildly tempting
to add a dedicated kvm_x86_ops hook to flush the buffer on a fastpath userspace
exit.  But, I dislike one-off kvm_x86_ops hooks, and that only works if there's
no other vendor action required.  E.g. very theoretically, a fastpath userspace
exit could also be coincident with bus_lock_detected.

Yikes!  And I think userspace could see a stale CR0 and/or CR3 on AMD.  Hmm, but
waiting until the full exit path to grab CR0 and CR3 is flawed on its own, e.g.
it's a bug waiting to happen if KVM consumes CR3 in the fastpath.

So over two patches, I think the fix for those issues is the below.  I'll test
and send a mini-series.  I don't think there's anything you need to do; I can
resolve the resulting conflict easy enough.

diff --git arch/x86/kvm/svm/svm.c arch/x86/kvm/svm/svm.c
index e7fdd7a9c280..df0bd132edf7 100644
--- arch/x86/kvm/svm/svm.c
+++ arch/x86/kvm/svm/svm.c
@@ -3644,13 +3644,8 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
        struct vcpu_svm *svm = to_svm(vcpu);
        struct kvm_run *kvm_run = vcpu->run;
 
-       /* SEV-ES guests must use the CR write traps to track CR registers. */
-       if (!is_sev_es_guest(vcpu)) {
-               if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE))
-                       vcpu->arch.cr0 = svm->vmcb->save.cr0;
-               if (npt_enabled)
-                       vcpu->arch.cr3 = svm->vmcb->save.cr3;
-       }
+       if (unlikely(exit_fastpath == EXIT_FASTPATH_EXIT_USERSPACE))
+               return 0;
 
        if (is_guest_mode(vcpu)) {
                int vmexit;
@@ -4502,11 +4497,17 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
        if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
                x86_spec_ctrl_restore_host(svm->virt_spec_ctrl);
 
+       /* SEV-ES guests must use the CR write traps to track CR registers. */
        if (!is_sev_es_guest(vcpu)) {
                vcpu->arch.cr2 = svm->vmcb->save.cr2;
                vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax;
                vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
                vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
+
+               if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE))
+                       vcpu->arch.cr0 = svm->vmcb->save.cr0;
+               if (npt_enabled)
+                       vcpu->arch.cr3 = svm->vmcb->save.cr3;
        }
        vcpu->arch.regs_dirty = 0;
 
diff --git arch/x86/kvm/vmx/vmx.c arch/x86/kvm/vmx/vmx.c
index a29896a9ef14..4cb355ecfe46 100644
--- arch/x86/kvm/vmx/vmx.c
+++ arch/x86/kvm/vmx/vmx.c
@@ -6687,6 +6687,9 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
        if (enable_pml && !is_guest_mode(vcpu))
                vmx_flush_pml_buffer(vcpu);
 
+       if (unlikely(exit_fastpath == EXIT_FASTPATH_EXIT_USERSPACE))
+               return 0;
+
        /*
         * KVM should never reach this point with a pending nested VM-Enter.
         * More specifically, short-circuiting VM-Entry to emulate L2 due to
diff --git arch/x86/kvm/x86.c arch/x86/kvm/x86.c
index 0a1b63c63d1a..9ad7ec3bf0f1 100644
--- arch/x86/kvm/x86.c
+++ arch/x86/kvm/x86.c
@@ -11602,9 +11602,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
        if (vcpu->arch.apic_attention)
                kvm_lapic_sync_from_vapic(vcpu);
 
-       if (unlikely(exit_fastpath == EXIT_FASTPATH_EXIT_USERSPACE))
-               return 0;
-
        r = kvm_x86_call(handle_exit)(vcpu, exit_fastpath);
        return r;

  reply	other threads:[~2026-04-21 15:08 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-07  6:32 [PATCH v6 0/7] KVM: SVM: Add Page Modification Logging (PML) support Nikunj A Dadhania
2026-04-07  6:32 ` [PATCH v6 1/7] KVM: x86: Carve out PML flush routine Nikunj A Dadhania
2026-05-08 22:52   ` Sean Christopherson
2026-05-12  5:36     ` Nikunj A. Dadhania
2026-04-07  6:32 ` [PATCH v6 2/7] KVM: x86: Move PML page to common vcpu arch structure Nikunj A Dadhania
2026-05-08 22:57   ` Sean Christopherson
2026-05-12  5:45     ` Nikunj A. Dadhania
2026-04-07  6:32 ` [PATCH v6 3/7] KVM: VMX: Use cpu_dirty_log_size instead of enable_pml for PML checks Nikunj A Dadhania
2026-04-07  6:32 ` [PATCH v6 4/7] x86/cpufeatures: Add Page modification logging Nikunj A Dadhania
2026-04-07  6:32 ` [PATCH v6 5/7] KVM: SVM: Use BIT_ULL for 64-bit nested_ctl bit definitions Nikunj A Dadhania
2026-04-07  6:32 ` [PATCH v6 6/7] KVM: nSVM: Add helpers to temporarily switch to vmcb01 Nikunj A Dadhania
2026-04-07  6:32 ` [PATCH v6 7/7] KVM: SVM: Add Page modification logging support Nikunj A Dadhania
2026-04-20  6:38   ` Nikunj A. Dadhania
2026-04-21 15:08     ` Sean Christopherson [this message]
2026-04-21 23:50       ` Huang, Kai
2026-04-22  0:30         ` Sean Christopherson
2026-04-22  1:42           ` Huang, Kai
2026-04-22  5:59             ` Nikunj A. Dadhania
2026-04-22  8:14               ` Huang, Kai
2026-04-22 13:20             ` Sean Christopherson
2026-04-22 22:14               ` Huang, Kai
2026-04-24 16:25               ` Tom Lendacky
2026-04-25 14:45                 ` Tom Lendacky
2026-04-27 20:16                   ` Sean Christopherson
2026-05-14  4:14                     ` Nikunj A. Dadhania
2026-04-21 23:04   ` Yosry Ahmed
2026-04-21 23:15     ` Sean Christopherson
2026-04-22  6:26       ` Nikunj A. Dadhania
2026-04-22 19:48         ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeeS7LDPJ6NCy3Uw@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=joao.m.martins@oracle.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=nikunj@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=thomas.lendacky@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.