From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DC9DC38839F for ; Wed, 22 Apr 2026 13:20:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776864031; cv=none; b=a8k2xwY2vHt5nH/hL7n8yE+CvioO2dFt0V9zM9wh56Dc/mQy1iLXNaYmF6XOixGlJ7UQv6lm/7ba/q37dfACyVD8E1iTeSc7+iX/KDdOry6D0obSK3crbgdo0p1wdNR+3mkgh26wl0sx2J9bQNU/QQPwV1XjYsaygsKsMZ5FQ3o= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776864031; c=relaxed/simple; bh=dwNvp5cQjxN93kZejq/HnytKRM27fh9VyrplqWAuk6Q=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ClXdPTwBvTpbKs+XtIxiiWwu85C6eYGv4b7T/B9Yi8KcRbcZhMYi+B2qD/1OhKQBBWzFLoSErTsmCyxLGnvdM+DHXB75f4QYM/oupgfaPZ9+AzTZDi3q8RiyXgzCZ0I+fgh0TesFOLJryo8dvx1X8IZkaIuShbDHdWMjEPh3hYo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=KcuGj6LJ; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="KcuGj6LJ" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b6097ca315bso8149706a12.3 for ; Wed, 22 Apr 2026 06:20:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1776864029; x=1777468829; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=crXZqqSbgnbrPuRYPg8OgLKGvN1wrTiGY8ceEWHehu8=; b=KcuGj6LJzY0l3WEJu4EcIQCfrY1xyItzY2lfoSfXqLfyPXiZufdC84bp8rFgWxXxXJ Wo8Z0peJG8MxZBAGv/z0bYa8nNkAhbwXjulQ16L64G7+yH5yclmD8W+FFsc0QSHo33ZY 4TDrr+7V71I0gZg3Y/SzWO848NxwgKEFIQQosHhasI6jvuw9ZsKunwGOzDWdPms3/ykm e7gcVd3N0Pdw/k7+2JVD/dBS06jG/9C2IBYRMHLqRDJ+z2KTjmKDzck+Nskz6MmbolKq b0p0VB0J3XtBKLO9iVqw/jINYDjEcfn3gvUewC7cWg9JrmVLvSLqtquy0gMDVajXUC0I VKnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776864029; x=1777468829; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=crXZqqSbgnbrPuRYPg8OgLKGvN1wrTiGY8ceEWHehu8=; b=dkB9hN/XCUr4tRWHVVvpxYBK8rZmB5NppAKvRlVpv0x0cC/PlCTHpAUsy8vdp/YX8l R9BYO6LRTY6bsRoFEEFWk+/ncBPpxZ9DNK6g6hI9QEm28nxM2y/Ib2/u1gA/GCysmbq2 O/CPft1YKQV8WqXe6XemWLqVwCsHDzPbp4TTub/qzROgxEv9vYeKYbGliJ5eqK91rnZT x9/F7ttpoZNkt2YOFfH8Z834kGPALsZ+7fZt6yruDdOFdDvjDL0geELirWoXMrQS8txT RSy/9wnuRuGTyo8jCs5hWe4FvGpW+fZniVToGERe2AtevTFDA+AARxWs6Xi1XDpDpn59 3tig== X-Forwarded-Encrypted: i=1; AFNElJ8rTOUJyM55p53nsrnfRFjkIhl8iwTU6iub9ACKiTOZoKndXM/LQHXjUq1Dv5RbOQNbRp8=@vger.kernel.org X-Gm-Message-State: AOJu0Yynx/C9io9bDBF2qYV25I0lR71+KFjfbBw+qOiIEiWk7lfSbR5x MgQFNYmJXUPfPr/ql+pIW9RuFZTKfGz62OhzEJqDj1Z+gF57r67gUs4UTPncww2NmNPn7swStIg 8a4u35g== X-Received: from pfud7.prod.google.com ([2002:a05:6a00:10c7:b0:82f:2efd:4158]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:4085:b0:827:3946:a23c with SMTP id d2e1a72fcca58-82f8c830793mr23688860b3a.10.1776864029054; Wed, 22 Apr 2026 06:20:29 -0700 (PDT) Date: Wed, 22 Apr 2026 06:20:27 -0700 In-Reply-To: <977e805339a29ab789650aa18cd320dd1e9e0c25.camel@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260407063245.2755579-1-nikunj@amd.com> <20260407063245.2755579-8-nikunj@amd.com> <34cfe5e8-756a-435a-a73d-54bf69801161@amd.com> <3e4641288d7791919abf1a5b02b80431285484e5.camel@intel.com> <977e805339a29ab789650aa18cd320dd1e9e0c25.camel@intel.com> Message-ID: Subject: Re: [PATCH v6 7/7] KVM: SVM: Add Page modification logging support From: Sean Christopherson To: Kai Huang Cc: "thomas.lendacky@amd.com" , "kvm@vger.kernel.org" , "pbonzini@redhat.com" , "joao.m.martins@oracle.com" , "nikunj@amd.com" , "bp@alien8.de" Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Wed, Apr 22, 2026, Kai Huang wrote: > On Tue, 2026-04-21 at 17:30 -0700, Sean Christopherson wrote: > > On Tue, Apr 21, 2026, Kai Huang wrote: > > > On Tue, 2026-04-21 at 08:08 -0700, Sean Christopherson wrote: > > > > > =C2=A0=C2=A0 vCPU Reset: > > > > > =C2=A0=C2=A0=C2=A0=C2=A0 vcpu_enter_guest() > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =E2=94=9C=E2=94=80> kvm_chec= k_request(KVM_REQ_EVENT) > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =E2=94=9C=E2=94=80> kvm_apic= _accept_events() > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =E2=94=82=C2=A0=C2=A0=C2=A0= =C2=A0 =E2=94=94=E2=94=80> kvm_vcpu_reset(..., true) > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =E2=94=82=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =E2=94=94=E2=94=80> init_vmcb(..= ., true) > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =E2=94=82=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 =E2=94=94=E2=94=80> control->pml_index =3D PML_HEAD_INDEX -- PML buffer= was already flushed > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =E2=94=94=E2=94=80> kvm_x86_= call(): Next VMRUN > > > > >=20 > > > > > > Could this result in the hypervisor losing track of dirty memor= y during live > > > > > > migration, leading to memory corruption on the destination host= , since > > > > > > svm_flush_pml_buffer() isn't called before resetting the index? > > > > >=20 > > > > > AFAIU, no. The PML buffer is always flushed opportunistically at = every VM exit. > > > >=20 > > > > Huh.=C2=A0 There's a pre-existing bug here.=C2=A0 Commit f7f39c50ed= b9 ("KVM: x86: Exit to > > > > userspace if fastpath triggers one on instruction skip") added a pa= th that skips > > > > kvm_x86_ops.handle_exit(), and specifically can give userspace cont= rol without > > > > going through vmx_flush_pml_buffer(): > > > >=20 > > > > if (unlikely(exit_fastpath =3D=3D EXIT_FASTPATH_EXIT_USERSPACE)) > > > > return 0; > > > >=20 > > > > r =3D kvm_x86_call(handle_exit)(vcpu, exit_fastpath); > > > >=20 > > > > Given that SVM support for PML is (obviously) on its way, it's mild= ly tempting > > > > to add a dedicated kvm_x86_ops hook to flush the buffer on a fastpa= th userspace > > > > exit.=C2=A0 But, I dislike one-off kvm_x86_ops hooks, and that only= works if there's > > > > no other vendor action required.=C2=A0 E.g. very theoretically, a f= astpath userspace > > > > exit could also be coincident with bus_lock_detected. > > >=20 > > > Seems vmx_vcpu_reset() doesn't reset PML index upon INIT event, which= seems > > > to be fine since we are not losing any dirty GPA tracking AFAICT (oth= erwise > > > we already have a bug for VMX here)? > > >=20 > > > How about doing the same for SVM? > >=20 > > We don't really have that luxury. On SHUTDOWN (even intercepted SHUTDO= WN), the > > state of the VMCB is technically undefined. I.e. KVM needs to write _s= omething_. >=20 > You mean KVM needs to reset VMCB to reflect the architecturally defined I= NIT > state for a vCPU? Or the hardware itself may reset VMCB thus may reset P= ML > index? Neither. The APM states that the VMCB is undefined after SHUTDOWN. PML i= ndex could be anything: 15.14.3 Shutdown Intercept When this intercept occurs, any condition that normally causes a shutdown= causes a #VMEXIT to the VMM instead. After an intercepted shutdown, the state saved in the VMCB i= s undefined. KVM synthesizes an INIT because it's the least awful option. static int shutdown_interception(struct kvm_vcpu *vcpu) { struct kvm_run *kvm_run =3D vcpu->run; struct vcpu_svm *svm =3D to_svm(vcpu); /* * VMCB is undefined after a SHUTDOWN intercept. INIT the vCPU to put * the VMCB in a known good state. Unfortuately, KVM doesn't have * KVM_MP_STATE_SHUTDOWN and can't add it without potentially breaking * userspace. At a platform view, INIT is acceptable behavior as * there exist bare metal platforms that automatically INIT the CPU * in response to shutdown. * * The VM save area for SEV-ES guests has already been encrypted so it * cannot be reinitialized, i.e. synthesizing INIT is futile. */ if (!is_sev_es_guest(vcpu)) { clear_page(svm->vmcb); #ifdef CONFIG_KVM_SMM if (is_smm(vcpu)) kvm_smm_changed(vcpu, false); #endif kvm_vcpu_reset(vcpu, true); } kvm_run->exit_reason =3D KVM_EXIT_SHUTDOWN; return 0; } Now, maybe the APM is trying to say only the save area is undefined, in whi= ch case PML Index is fine and can and should be left alone. But if that's the= case, the APM needs to be updated to make explicitly clear what fields in the VMC= S are and are not valid after SHUTDOWN. > > Hmm, actually, how is that going to work? Dropping PML entries just be= cause a > > vCPU hit SHUTDOWN isn't going to fly. =C2=A0 > >=20 >=20 > Not dropping, but just leave PML index unchanged. The PML buffer itself = is > still there unchanged, and PML is still working in hardware thus the buff= er > will eventually get flushed. As above, the PML index could be garbage. > This is the case for VMX AFAICT, thus I wonder whether this also works fo= r > SVM. No, VMX is fine. > > E.g. if a VM crashes while it's being > > migrated by the host, then it could end up with corrupted, incoherent d= ata on > > the target due to leaving dirtied pages behind. >=20 > If we leave PML index unchanged upon INIT as mentioned above, I don't thi= nk > we will lose any dirty GPA tracking. But maybe I am missing something. Correct. The question is whether or not AMD's architecture guarantees PML = Index will be valid/accurate after SHUTDOWN.