From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D73452C030E for ; Tue, 21 Apr 2026 15:08:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776784112; cv=none; b=ucUXGJPLqf7Dh4OciMbhHjUJBxFzfhohLEwhHERXz0xDSJguUUhitm0i1yldE7yB3GnPBNRL1Yr6POKuBeMkUA8/VsmzyI88SwF32/DgG1lyH6m5zDGP13fkrOPky/3N0o8Vhoq3QEUgwiHpz7usrh5t7q9R1BZIGoHF2mbcHmc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776784112; c=relaxed/simple; bh=MFcd1+QV7DFyCakC+nJoiA0jDRhS8k94rxx1FQOrc1A=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=VmqsZeSq5tZibF2lEKbk+Jew/NdHh+y1ANlS2M1U/rpQOPeLiUXHgRWTUc6arf3rxJkXVmPycwP2o+KTdc+7Qpl/ayVi6Jop6Z9Qd1sewtLOPcKrJd1Z6VzSQIZLuxKBMk8Y2O94hCRzexfSO3ial4JeGK8u+vbw+KDZqfuyWjA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=gZNiuOZq; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="gZNiuOZq" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-82f6b984b3aso2412059b3a.3 for ; Tue, 21 Apr 2026 08:08:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1776784110; x=1777388910; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=o/fLvadU/CHyav5bcbpJnR4cQyR8gwfjFOkr8LTaMcI=; b=gZNiuOZq5XXX9zmw3gdAode9HgWbldE1xfNCFqHOH6c5I1VkkGH2liQf+4L+hVbdIm iH9kvOt0oEG7v2G8YLQTlkpynTIZl6n6xqWC8KvhuAiw5raNzOpVWUuDnqPtNLFU22zN 2hN/hMXPkXFldTHPxMtiB/aBCmWHZ3R3YbxKq34X9AhcB4JExJaaJBFexYho4M7DdVTV aall2c99cDdwGQT3Q7aQgXvUy/PCxOKjdyrRbasHyN8lWKSSKdCFE7J8kOXl4D0lGpQ9 w939e02A1T2xkgvKy8aXztIAs8Rr7a/63t11n/CGyozVsQC9mu879sfhU6IvhszOqlVE iQAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776784110; x=1777388910; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=o/fLvadU/CHyav5bcbpJnR4cQyR8gwfjFOkr8LTaMcI=; b=cCME9pi2X9NXgOiKcLpDgzsrrZiWeuDYu/mc3/PT3jEynKfOblpyFJDMjcm582Fd6Z h/iRwTyvi12wLzacPLlgywIcDIj3/1v1ULzGSE/TfJ6xpoQfbaaugvR4lVzseTCINrwh RTWmPYS2re6SVysio0hjHsFbJwpm+PSD7tL7TORkItJHoIv7cwdgZIDHdoxJghccPOyG xv8u/ArkGRC49h6pgP4Zn7xjHO6lDPyd8KVDENNfluYk3WpMNZdep/qoAL3ui3eGoLJl 3RGO3YAlBOaqB/V83U8ylnIwzlsANyIBwfM7m1NxKS7sfDRvNYZFWvVqc4BW4QN/KQCP R8Jw== X-Gm-Message-State: AOJu0YxzHHOIPyWZ3CBV6GpOd0pIIuYzuYE93EVoWAb95a3+2NAAEq95 Yg6VevV/Elh5SCocrcb4wfDTNcKE9H3chGyXCx7fCVzyV9WFdOt1ng7bNu7/abwf+tP4HeM7RUP jKhL2tQ== X-Received: from pfad4.prod.google.com ([2002:a05:6a00:284:b0:82f:96ee:b9ab]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:1956:b0:82a:17b8:1474 with SMTP id d2e1a72fcca58-82f8c82dd9cmr19431275b3a.1.1776784109845; Tue, 21 Apr 2026 08:08:29 -0700 (PDT) Date: Tue, 21 Apr 2026 08:08:28 -0700 In-Reply-To: <34cfe5e8-756a-435a-a73d-54bf69801161@amd.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260407063245.2755579-1-nikunj@amd.com> <20260407063245.2755579-8-nikunj@amd.com> <34cfe5e8-756a-435a-a73d-54bf69801161@amd.com> Message-ID: Subject: Re: [PATCH v6 7/7] KVM: SVM: Add Page modification logging support From: Sean Christopherson To: "Nikunj A. Dadhania" Cc: kvm@vger.kernel.org, pbonzini@redhat.com, thomas.lendacky@amd.com, bp@alien8.de, joao.m.martins@oracle.com, kai.huang@intel.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Mon, Apr 20, 2026, Nikunj A. Dadhania wrote: > Sashiko reported a couple of issues [1]. Let me address them here: >=20 > On 4/7/2026 12:02 PM, Nikunj A Dadhania wrote: > > @@ -1206,6 +1209,16 @@ static void init_vmcb(struct kvm_vcpu *vcpu, boo= l init_event) > > if (vcpu->kvm->arch.bus_lock_detection_enabled) > > svm_set_intercept(svm, INTERCEPT_BUSLOCK); > > =20 > > + if (pml) { > > + /* > > + * Populate the page address and index here, PML is enabled > > + * when dirty logging is enabled on the memslot through > > + * svm_update_cpu_dirty_logging() > > + */ > > + control->pml_addr =3D (u64)__sme_set(page_to_phys(vcpu->arch.pml_pag= e)); > > + control->pml_index =3D PML_HEAD_INDEX; > > + } > > + >=20 > > If the guest receives an INIT IPI and init_vmcb() is called to reset th= e > > vCPU, does unconditionally setting pml_index to PML_HEAD_INDEX discard = any > > un-flushed dirty GPAs logged by the hardware? >=20 > There are two scenarios where init_vmcb() is called: >=20 > 1) During vCPU creation time, where we need to set pml_index to PML_HEAD_= INDEX > 2) During vCPU reset, when init_event=3Dtrue Nit, please just call this INIT, even though KVM calls the function "reset"= . #1 is KVM's one and only true RESET flow. > Before vCPU reset: > vcpu_enter_guest() > =E2=94=94=E2=94=80> kvm_x86_call(vcpu_run) [VMRUN] > =E2=94=94=E2=94=80> [guest executes, PML accumulates dirty pages] > =E2=94=94=E2=94=80> VMEXIT > =E2=94=94=E2=94=80> svm_handle_exit() --> PML buffer flushed here > =E2=94=94=E2=94=80> return to vcpu_run() >=20 > vCPU Reset: > vcpu_enter_guest() > =E2=94=9C=E2=94=80> kvm_check_request(KVM_REQ_EVENT) > =E2=94=9C=E2=94=80> kvm_apic_accept_events() > =E2=94=82 =E2=94=94=E2=94=80> kvm_vcpu_reset(..., true) > =E2=94=82 =E2=94=94=E2=94=80> init_vmcb(..., true) > =E2=94=82 =E2=94=94=E2=94=80> control->pml_index = =3D PML_HEAD_INDEX -- PML buffer was already flushed > =E2=94=94=E2=94=80> kvm_x86_call(): Next VMRUN >=20 > > Could this result in the hypervisor losing track of dirty memory during= live > > migration, leading to memory corruption on the destination host, since > > svm_flush_pml_buffer() isn't called before resetting the index? >=20 > AFAIU, no. The PML buffer is always flushed opportunistically at every VM= exit. Huh. There's a pre-existing bug here. Commit f7f39c50edb9 ("KVM: x86: Exi= t to userspace if fastpath triggers one on instruction skip") added a path that = skips kvm_x86_ops.handle_exit(), and specifically can give userspace control with= out going through vmx_flush_pml_buffer(): if (unlikely(exit_fastpath =3D=3D EXIT_FASTPATH_EXIT_USERSPACE)) return 0; r =3D kvm_x86_call(handle_exit)(vcpu, exit_fastpath); Given that SVM support for PML is (obviously) on its way, it's mildly tempt= ing to add a dedicated kvm_x86_ops hook to flush the buffer on a fastpath users= pace exit. But, I dislike one-off kvm_x86_ops hooks, and that only works if the= re's no other vendor action required. E.g. very theoretically, a fastpath users= pace exit could also be coincident with bus_lock_detected. Yikes! And I think userspace could see a stale CR0 and/or CR3 on AMD. Hmm= , but waiting until the full exit path to grab CR0 and CR3 is flawed on its own, = e.g. it's a bug waiting to happen if KVM consumes CR3 in the fastpath. So over two patches, I think the fix for those issues is the below. I'll t= est and send a mini-series. I don't think there's anything you need to do; I c= an resolve the resulting conflict easy enough. diff --git arch/x86/kvm/svm/svm.c arch/x86/kvm/svm/svm.c index e7fdd7a9c280..df0bd132edf7 100644 --- arch/x86/kvm/svm/svm.c +++ arch/x86/kvm/svm/svm.c @@ -3644,13 +3644,8 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fa= stpath_t exit_fastpath) struct vcpu_svm *svm =3D to_svm(vcpu); struct kvm_run *kvm_run =3D vcpu->run; =20 - /* SEV-ES guests must use the CR write traps to track CR registers.= */ - if (!is_sev_es_guest(vcpu)) { - if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE)) - vcpu->arch.cr0 =3D svm->vmcb->save.cr0; - if (npt_enabled) - vcpu->arch.cr3 =3D svm->vmcb->save.cr3; - } + if (unlikely(exit_fastpath =3D=3D EXIT_FASTPATH_EXIT_USERSPACE)) + return 0; =20 if (is_guest_mode(vcpu)) { int vmexit; @@ -4502,11 +4497,17 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kv= m_vcpu *vcpu, u64 run_flags) if (!static_cpu_has(X86_FEATURE_V_SPEC_CTRL)) x86_spec_ctrl_restore_host(svm->virt_spec_ctrl); =20 + /* SEV-ES guests must use the CR write traps to track CR registers.= */ if (!is_sev_es_guest(vcpu)) { vcpu->arch.cr2 =3D svm->vmcb->save.cr2; vcpu->arch.regs[VCPU_REGS_RAX] =3D svm->vmcb->save.rax; vcpu->arch.regs[VCPU_REGS_RSP] =3D svm->vmcb->save.rsp; vcpu->arch.regs[VCPU_REGS_RIP] =3D svm->vmcb->save.rip; + + if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE)) + vcpu->arch.cr0 =3D svm->vmcb->save.cr0; + if (npt_enabled) + vcpu->arch.cr3 =3D svm->vmcb->save.cr3; } vcpu->arch.regs_dirty =3D 0; =20 diff --git arch/x86/kvm/vmx/vmx.c arch/x86/kvm/vmx/vmx.c index a29896a9ef14..4cb355ecfe46 100644 --- arch/x86/kvm/vmx/vmx.c +++ arch/x86/kvm/vmx/vmx.c @@ -6687,6 +6687,9 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, f= astpath_t exit_fastpath) if (enable_pml && !is_guest_mode(vcpu)) vmx_flush_pml_buffer(vcpu); =20 + if (unlikely(exit_fastpath =3D=3D EXIT_FASTPATH_EXIT_USERSPACE)) + return 0; + /* * KVM should never reach this point with a pending nested VM-Enter= . * More specifically, short-circuiting VM-Entry to emulate L2 due t= o diff --git arch/x86/kvm/x86.c arch/x86/kvm/x86.c index 0a1b63c63d1a..9ad7ec3bf0f1 100644 --- arch/x86/kvm/x86.c +++ arch/x86/kvm/x86.c @@ -11602,9 +11602,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (vcpu->arch.apic_attention) kvm_lapic_sync_from_vapic(vcpu); =20 - if (unlikely(exit_fastpath =3D=3D EXIT_FASTPATH_EXIT_USERSPACE)) - return 0; - r =3D kvm_x86_call(handle_exit)(vcpu, exit_fastpath); return r;