* [RFC PATCH 0/6] KVM: x86: nSVM: Improve virtualization of VMCB12 G_PAT
@ 2025-11-07 20:11 Jim Mattson
2025-11-07 20:11 ` [RFC PATCH 1/6] KVM: x86: nSVM: Shuffle guest PAT and PAT MSR in svm_set_nested_state() Jim Mattson
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Jim Mattson @ 2025-11-07 20:11 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Alexander Graf,
Joerg Roedel, Avi Kivity, Radim Krčmář,
David Hildenbrand, kvm, linux-kernel
Cc: Jim Mattson
There are several problems with KVM's virtualization of the G_PAT
field when nested paging is enabled in VMCB12.
* The VMCB12 G_PAT field is not checked for validity when emulating
VMRUN. (APM volume 2, section 15.25.4: Nested Paging and
VMRUN/#VMEXIT)
* RDMSR(PAT) and WRMSR(PAT) from L2 access L1's PAT MSR rather than
L2's Guest PAT register. (APM volume 2, section 15.25.2: Replicated
State)
* The L2 Guest PAT register is not written back to VMCB12 on #VMEXIT
from L2 to L1. (APM volume 3, Section 4: "VMRUN")
* The value of L2's Guest PAT register is not serialized for
save/restore when a checkpoint is taken while L2 is active.
Commit 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2
guest") left this comment in nested_vmcb02_compute_g_pat():
/* FIXME: merge g_pat from vmcb01 and vmcb12. */
This comment makes no sense. It is true that there are now three
different PATs to consider: L2's PAT for guest page tables, L1's PAT
for the nested page tables mapping L2 guest physical addresses to L1
guest physical addresses, and L0's PAT for the nested page tables
mapping L1 guest physical addresses to host physical
addresses. However, if there is any "merging" to be done, it would
involve the latter two, and would happen during shadow nested page
table construction. (For the record, I don't think "merging" the two
nested page table PATs is feasible.) In any case, the VMCB12 G_PAT
should be copied unmodified into VMCB02.
Maybe the rest of the current implementation is a consistent quirk
based on the existing nested_vmcb02_compute_g_pat() code that bypasses
L1's request in VMCB12 and copies L1's PAT MSR into vmcb02
instead. However, an L1 hypervisor that does not intercept accesses to
the PAT MSR would legitimately be surprised to find that its L2 guest
can modify the hypervisor's own PAT!
The commits in this series are in an awkward order, because I didn't
want to change nested_vmcb02_compute_g_pat() until I had removed the
call site from svm_set_msr().
The first two commits should arguably be one, but I tried to deal with
the serialization issue separately from the RDMSR/WRMSR issue, despite
the two being intertwined.
I don't like the ugliness of KVM_GET_MSRS saving the L2 Guest PAT
register during a checkpoint, but KVM_SET_MSRS restoring the
architectural PAT MSR on restore (because when KVM_SET_MSRS is called,
L2 is not active). The APM section on replicated state offers a
possible out:
While nested paging is enabled, all (guest) references to the state
of the paging registers by x86 code (MOV to/from CRn, etc.) read and
write the guest copy of the registers
If we consider KVM_{GET,SET}_MSRS not to be "guest" references, we
could always access the architected PAT MSR from userspace, and we
could grab 64 bits from the SVM nested state header to serialize L2's
G_PAT. In some ways, that seems cleaner, but it does mean that
KVM_{GET,SET}_MSR will access L1's PAT, which is irrelevant while L2
is active.
Hence, I am posting this series as an RFC.
Jim Mattson (6):
KVM: x86: nSVM: Shuffle guest PAT and PAT MSR in
svm_set_nested_state()
KVM: x86: nSVM: Redirect PAT MSR accesses to gPAT when NPT is enabled
in vmcb12
KVM: x86: nSVM: Copy current vmcb02 g_pat to vmcb12 g_pat on #VMEXIT
KVM: x86: nSVM: Cache g_pat in vmcb_ctrl_area_cached
KVM: x86: nSVM: Add validity check for the VMCB12 g_pat
KVM: x86: nSVM: Use cached VMCB12 g_pat in VMCB02 when using NPT
arch/x86/include/uapi/asm/kvm.h | 2 ++
arch/x86/kvm/svm/nested.c | 35 +++++++++++++++++++++++++++++++--
arch/x86/kvm/svm/svm.c | 25 +++++++++++++++--------
arch/x86/kvm/svm/svm.h | 1 +
4 files changed, 53 insertions(+), 10 deletions(-)
--
2.51.2.1041.gc1ab5b90ca-goog
^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH 1/6] KVM: x86: nSVM: Shuffle guest PAT and PAT MSR in svm_set_nested_state()
2025-11-07 20:11 [RFC PATCH 0/6] KVM: x86: nSVM: Improve virtualization of VMCB12 G_PAT Jim Mattson
@ 2025-11-07 20:11 ` Jim Mattson
2025-11-07 20:11 ` [RFC PATCH 2/6] KVM: x86: nSVM: Redirect PAT MSR accesses to gPAT when NPT is enabled in vmcb12 Jim Mattson
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jim Mattson @ 2025-11-07 20:11 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Alexander Graf,
Joerg Roedel, Avi Kivity, Radim Krčmář,
David Hildenbrand, kvm, linux-kernel
Cc: Jim Mattson
When L2 is active and using nested paging, accesses to the PAT MSR
should be redirected to the Guest PAT register. As a result,
KVM_GET_MSRS will save the Guest PAT register rather than the PAT
MSR. However, on restore, KVM_SET_MSRS is called before
KVM_SET_NESTED_STATE, so the Guest PAT register will be restored to
the PAT MSR.
To fix the serialization of the Guest PAT register and the PAT MSR,
copy the PAT MSR to the Guest PAT register (vmcb02->save.g_pat) and
copy vmcb01->save.g_pat to the PAT MSR in svm_set_nested_state() under
the right conditions. One of these conditions is a new SVM nested
state flag, which will be set in the commit that modifies the
KVM_{GET,SET}_MSRS semantics.
Signed-off-by: Jim Mattson <jmattson@google.com>
---
arch/x86/include/uapi/asm/kvm.h | 2 ++
arch/x86/kvm/svm/nested.c | 15 +++++++++++++++
2 files changed, 17 insertions(+)
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index d420c9c066d4..df8ae68f56f7 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -494,6 +494,7 @@ struct kvm_sync_regs {
#define KVM_STATE_NESTED_SVM_VMCB_SIZE 0x1000
#define KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE 0x00000001
+#define KVM_STATE_SVM_VALID_GPAT 0x00000001
/* vendor-independent attributes for system fd (group 0) */
#define KVM_X86_GRP_SYSTEM 0
@@ -529,6 +530,7 @@ struct kvm_svm_nested_state_data {
struct kvm_svm_nested_state_hdr {
__u64 vmcb_pa;
+ __u32 flags;
};
/* for KVM_CAP_NESTED_STATE */
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index a6443feab252..ad11b11f482e 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1052,6 +1052,7 @@ void svm_copy_vmrun_state(struct vmcb_save_area *to_save,
to_save->rsp = from_save->rsp;
to_save->rip = from_save->rip;
to_save->cpl = 0;
+ to_save->g_pat = from_save->g_pat;
if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
to_save->s_cet = from_save->s_cet;
@@ -1890,6 +1891,20 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
if (WARN_ON_ONCE(ret))
goto out_free;
+ /*
+ * If nested paging is enabled in vmcb12, then KVM_SET_MSRS restored
+ * the guest PAT register to the PAT MSR. Move this to the guest PAT
+ * register (svm->vmcb->save.g_pat) and restore the PAT MSR from
+ * svm->vmcb01.ptr->save.g_pat).
+ */
+ if ((kvm_state->hdr.svm.flags & KVM_STATE_SVM_VALID_GPAT) &&
+ nested_npt_enabled(svm)) {
+ ret = -EINVAL;
+ svm->vmcb->save.g_pat = vcpu->arch.pat;
+ if (!kvm_pat_valid(svm->vmcb01.ptr->save.g_pat))
+ goto out_free;
+ vcpu->arch.pat = svm->vmcb01.ptr->save.g_pat;
+ }
svm->nested.force_msr_bitmap_recalc = true;
kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
--
2.51.2.1041.gc1ab5b90ca-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC PATCH 2/6] KVM: x86: nSVM: Redirect PAT MSR accesses to gPAT when NPT is enabled in vmcb12
2025-11-07 20:11 [RFC PATCH 0/6] KVM: x86: nSVM: Improve virtualization of VMCB12 G_PAT Jim Mattson
2025-11-07 20:11 ` [RFC PATCH 1/6] KVM: x86: nSVM: Shuffle guest PAT and PAT MSR in svm_set_nested_state() Jim Mattson
@ 2025-11-07 20:11 ` Jim Mattson
2025-11-07 20:11 ` [RFC PATCH 3/6] KVM: x86: nSVM: Copy current vmcb02 g_pat to vmcb12 g_pat on #VMEXIT Jim Mattson
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jim Mattson @ 2025-11-07 20:11 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Alexander Graf,
Joerg Roedel, Avi Kivity, Radim Krčmář,
David Hildenbrand, kvm, linux-kernel
Cc: Jim Mattson
According to the APM volume 2, section 15.25.2: "Replicated State,"
While nested paging is enabled, all (guest) references to the state of
the paging registers by x86 code (MOV to/from CRn, etc.) read and write
the guest copy of the registers.
The PAT MSR is explicitly enumerated as a "paging register" in that
section of the APM.
Implement the architected behavior by redirecting PAT MSR accesses
from vcpu->arch.pat to svm->vmcb->save.g_pat when L2 is active and
nested paging is enabled in vmcb12.
Note that the change in KVM_{GET,SET}_MSRS semantics breaks
serialization. Trigger a fixup in svm_set_nested_state() by setting
the VALID_GPAT flag in the SVM nested state header.
Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler")
Signed-off-by: Jim Mattson <jmattson@google.com>
---
arch/x86/kvm/svm/nested.c | 1 +
arch/x86/kvm/svm/svm.c | 25 +++++++++++++++++--------
2 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index ad11b11f482e..c68511948501 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1728,6 +1728,7 @@ static int svm_get_nested_state(struct kvm_vcpu *vcpu,
/* First fill in the header and copy it out. */
if (is_guest_mode(vcpu)) {
kvm_state.hdr.svm.vmcb_pa = svm->nested.vmcb12_gpa;
+ kvm_state.hdr.svm.flags = KVM_STATE_SVM_VALID_GPAT;
kvm_state.size += KVM_STATE_NESTED_SVM_VMCB_SIZE;
kvm_state.flags |= KVM_STATE_NESTED_GUEST_MODE;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b4e5a0684f57..7e192fd5fb7f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2675,6 +2675,12 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
msr_info->data = svm->tsc_ratio_msr;
break;
+ case MSR_IA32_CR_PAT:
+ if (!(is_guest_mode(vcpu) && nested_npt_enabled(svm)))
+ msr_info->data = vcpu->arch.pat;
+ else
+ msr_info->data = svm->vmcb->save.g_pat;
+ break;
case MSR_STAR:
msr_info->data = svm->vmcb01.ptr->save.star;
break;
@@ -2864,14 +2870,17 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
break;
case MSR_IA32_CR_PAT:
- ret = kvm_set_msr_common(vcpu, msr);
- if (ret)
- break;
-
- svm->vmcb01.ptr->save.g_pat = data;
- if (is_guest_mode(vcpu))
- nested_vmcb02_compute_g_pat(svm);
- vmcb_mark_dirty(svm->vmcb, VMCB_NPT);
+ if (!kvm_pat_valid(data))
+ return 1;
+ if (!(is_guest_mode(vcpu) && nested_npt_enabled(svm))) {
+ vcpu->arch.pat = data;
+ svm->vmcb01.ptr->save.g_pat = data;
+ vmcb_mark_dirty(svm->vmcb01.ptr, VMCB_NPT);
+ }
+ if (is_guest_mode(vcpu)) {
+ svm->vmcb->save.g_pat = data;
+ vmcb_mark_dirty(svm->vmcb, VMCB_NPT);
+ }
break;
case MSR_IA32_SPEC_CTRL:
if (!msr->host_initiated &&
--
2.51.2.1041.gc1ab5b90ca-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC PATCH 3/6] KVM: x86: nSVM: Copy current vmcb02 g_pat to vmcb12 g_pat on #VMEXIT
2025-11-07 20:11 [RFC PATCH 0/6] KVM: x86: nSVM: Improve virtualization of VMCB12 G_PAT Jim Mattson
2025-11-07 20:11 ` [RFC PATCH 1/6] KVM: x86: nSVM: Shuffle guest PAT and PAT MSR in svm_set_nested_state() Jim Mattson
2025-11-07 20:11 ` [RFC PATCH 2/6] KVM: x86: nSVM: Redirect PAT MSR accesses to gPAT when NPT is enabled in vmcb12 Jim Mattson
@ 2025-11-07 20:11 ` Jim Mattson
2025-11-07 20:11 ` [RFC PATCH 4/6] KVM: x86: nSVM: Cache g_pat in vmcb_ctrl_area_cached Jim Mattson
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jim Mattson @ 2025-11-07 20:11 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Alexander Graf,
Joerg Roedel, Avi Kivity, Radim Krčmář,
David Hildenbrand, kvm, linux-kernel
Cc: Jim Mattson
According to the APM volume 3 pseudo-code for "VMRUN," when nested
paging is enabled in the VMCB, the guest PAT register is copied into
the g_pat field of the VMCB on #VMEXIT.
In KVM, the guest PAT register is the g_pat field of vmcb02. When
nested paging is enabled in vmcb12, copy the g_pat field of the vmcb02
to the g_pat field of the vmcb12 in nested_svm_vmexit().
Fixes: 15038e147247 ("KVM: SVM: obey guest PAT")
Signed-off-by: Jim Mattson <jmattson@google.com>
---
arch/x86/kvm/svm/nested.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index c68511948501..51a89d6aa29f 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1126,6 +1126,9 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
vmcb12->save.dr6 = svm->vcpu.arch.dr6;
vmcb12->save.cpl = vmcb02->save.cpl;
+ if (nested_npt_enabled(svm))
+ vmcb12->save.g_pat = vmcb02->save.g_pat;
+
if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) {
vmcb12->save.s_cet = vmcb02->save.s_cet;
vmcb12->save.isst_addr = vmcb02->save.isst_addr;
--
2.51.2.1041.gc1ab5b90ca-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC PATCH 4/6] KVM: x86: nSVM: Cache g_pat in vmcb_ctrl_area_cached
2025-11-07 20:11 [RFC PATCH 0/6] KVM: x86: nSVM: Improve virtualization of VMCB12 G_PAT Jim Mattson
` (2 preceding siblings ...)
2025-11-07 20:11 ` [RFC PATCH 3/6] KVM: x86: nSVM: Copy current vmcb02 g_pat to vmcb12 g_pat on #VMEXIT Jim Mattson
@ 2025-11-07 20:11 ` Jim Mattson
2025-11-07 20:11 ` [RFC PATCH 5/6] KVM: x86: nSVM: Add validity check for the VMCB12 g_pat Jim Mattson
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jim Mattson @ 2025-11-07 20:11 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Alexander Graf,
Joerg Roedel, Avi Kivity, Radim Krčmář,
David Hildenbrand, kvm, linux-kernel
Cc: Jim Mattson
Add a g_pat field to the vmcb_ctrl_area_cached struct for caching the
VMCB12 g_pat at emulated VMRUN. This is a preliminary step to allow
for proper validation and handling of the VMCB12 g_pat when nested
paging is enabled in VMCB12.
Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler")
Signed-off-by: Jim Mattson <jmattson@google.com>
---
arch/x86/kvm/svm/nested.c | 6 ++++++
arch/x86/kvm/svm/svm.h | 1 +
2 files changed, 7 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 51a89d6aa29f..6e48572e2bd7 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -984,6 +984,12 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
+ /*
+ * To facilitate independent validation of the cached state
+ * save area and the cached control area, we cache the vmcb12
+ * g_pat with the cached controls.
+ */
+ svm->nested.ctl.g_pat = vmcb12->save.g_pat;
if (!nested_vmcb_check_save(vcpu) ||
!nested_vmcb_check_controls(vcpu)) {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e4b04f435b3d..c91e20aa3ec2 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -176,6 +176,7 @@ struct vmcb_ctrl_area_cached {
u64 virt_ext;
u32 clean;
u64 bus_lock_rip;
+ u64 g_pat;
union {
#if IS_ENABLED(CONFIG_HYPERV) || IS_ENABLED(CONFIG_KVM_HYPERV)
struct hv_vmcb_enlightenments hv_enlightenments;
--
2.51.2.1041.gc1ab5b90ca-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC PATCH 5/6] KVM: x86: nSVM: Add validity check for the VMCB12 g_pat
2025-11-07 20:11 [RFC PATCH 0/6] KVM: x86: nSVM: Improve virtualization of VMCB12 G_PAT Jim Mattson
` (3 preceding siblings ...)
2025-11-07 20:11 ` [RFC PATCH 4/6] KVM: x86: nSVM: Cache g_pat in vmcb_ctrl_area_cached Jim Mattson
@ 2025-11-07 20:11 ` Jim Mattson
2025-11-07 20:11 ` [RFC PATCH 6/6] KVM: x86: nSVM: Use cached VMCB12 g_pat in VMCB02 when using NPT Jim Mattson
2025-11-17 20:56 ` [RFC PATCH 0/6] KVM: x86: nSVM: Improve virtualization of VMCB12 G_PAT Yosry Ahmed
6 siblings, 0 replies; 8+ messages in thread
From: Jim Mattson @ 2025-11-07 20:11 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Alexander Graf,
Joerg Roedel, Avi Kivity, Radim Krčmář,
David Hildenbrand, kvm, linux-kernel
Cc: Jim Mattson
When nested paging is enabled for VMCB12, an invalid g_pat causes an
immediate #VMEXIT with exit code VMEXIT_INVALID, as specified in the
APM.
Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler")
Signed-off-by: Jim Mattson <jmattson@google.com>
---
arch/x86/kvm/svm/nested.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 6e48572e2bd7..43429399993c 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -336,6 +336,10 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
if (CC((control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) && !npt_enabled))
return false;
+ if (CC((control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) &&
+ npt_enabled && !kvm_pat_valid(control->g_pat)))
+ return false;
+
if (CC(!nested_svm_check_bitmap_pa(vcpu, control->msrpm_base_pa,
MSRPM_SIZE)))
return false;
--
2.51.2.1041.gc1ab5b90ca-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC PATCH 6/6] KVM: x86: nSVM: Use cached VMCB12 g_pat in VMCB02 when using NPT
2025-11-07 20:11 [RFC PATCH 0/6] KVM: x86: nSVM: Improve virtualization of VMCB12 G_PAT Jim Mattson
` (4 preceding siblings ...)
2025-11-07 20:11 ` [RFC PATCH 5/6] KVM: x86: nSVM: Add validity check for the VMCB12 g_pat Jim Mattson
@ 2025-11-07 20:11 ` Jim Mattson
2025-11-17 20:56 ` [RFC PATCH 0/6] KVM: x86: nSVM: Improve virtualization of VMCB12 G_PAT Yosry Ahmed
6 siblings, 0 replies; 8+ messages in thread
From: Jim Mattson @ 2025-11-07 20:11 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Alexander Graf,
Joerg Roedel, Avi Kivity, Radim Krčmář,
David Hildenbrand, kvm, linux-kernel
Cc: Jim Mattson
When nested paging is enabled in VMCB12, copy the (cached and
validated) VMCB12 g_pat to VMCB02.
Fixes: 15038e147247 ("KVM: SVM: obey guest PAT")
Signed-off-by: Jim Mattson <jmattson@google.com>
---
arch/x86/kvm/svm/nested.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 43429399993c..21d8db10525d 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -605,8 +605,10 @@ void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm)
if (!svm->nested.vmcb02.ptr)
return;
- /* FIXME: merge g_pat from vmcb01 and vmcb12. */
- svm->nested.vmcb02.ptr->save.g_pat = svm->vmcb01.ptr->save.g_pat;
+ if (nested_npt_enabled(svm))
+ svm->nested.vmcb02.ptr->save.g_pat = svm->nested.ctl.g_pat;
+ else
+ svm->nested.vmcb02.ptr->save.g_pat = svm->vmcb01.ptr->save.g_pat;
}
static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12)
--
2.51.2.1041.gc1ab5b90ca-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 0/6] KVM: x86: nSVM: Improve virtualization of VMCB12 G_PAT
2025-11-07 20:11 [RFC PATCH 0/6] KVM: x86: nSVM: Improve virtualization of VMCB12 G_PAT Jim Mattson
` (5 preceding siblings ...)
2025-11-07 20:11 ` [RFC PATCH 6/6] KVM: x86: nSVM: Use cached VMCB12 g_pat in VMCB02 when using NPT Jim Mattson
@ 2025-11-17 20:56 ` Yosry Ahmed
6 siblings, 0 replies; 8+ messages in thread
From: Yosry Ahmed @ 2025-11-17 20:56 UTC (permalink / raw)
To: Jim Mattson
Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Alexander Graf,
Joerg Roedel, Avi Kivity, Radim Krčmář,
David Hildenbrand, kvm, linux-kernel
On Fri, Nov 07, 2025 at 12:11:23PM -0800, Jim Mattson wrote:
> There are several problems with KVM's virtualization of the G_PAT
> field when nested paging is enabled in VMCB12.
>
> * The VMCB12 G_PAT field is not checked for validity when emulating
> VMRUN. (APM volume 2, section 15.25.4: Nested Paging and
> VMRUN/#VMEXIT)
>
> * RDMSR(PAT) and WRMSR(PAT) from L2 access L1's PAT MSR rather than
> L2's Guest PAT register. (APM volume 2, section 15.25.2: Replicated
> State)
>
> * The L2 Guest PAT register is not written back to VMCB12 on #VMEXIT
> from L2 to L1. (APM volume 3, Section 4: "VMRUN")
>
> * The value of L2's Guest PAT register is not serialized for
> save/restore when a checkpoint is taken while L2 is active.
>
> Commit 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2
> guest") left this comment in nested_vmcb02_compute_g_pat():
>
> /* FIXME: merge g_pat from vmcb01 and vmcb12. */
>
> This comment makes no sense. It is true that there are now three
> different PATs to consider: L2's PAT for guest page tables, L1's PAT
> for the nested page tables mapping L2 guest physical addresses to L1
> guest physical addresses, and L0's PAT for the nested page tables
> mapping L1 guest physical addresses to host physical
> addresses. However, if there is any "merging" to be done, it would
> involve the latter two, and would happen during shadow nested page
> table construction. (For the record, I don't think "merging" the two
> nested page table PATs is feasible.) In any case, the VMCB12 G_PAT
> should be copied unmodified into VMCB02.
>
> Maybe the rest of the current implementation is a consistent quirk
> based on the existing nested_vmcb02_compute_g_pat() code that bypasses
> L1's request in VMCB12 and copies L1's PAT MSR into vmcb02
> instead. However, an L1 hypervisor that does not intercept accesses to
> the PAT MSR would legitimately be surprised to find that its L2 guest
> can modify the hypervisor's own PAT!
>
> The commits in this series are in an awkward order, because I didn't
> want to change nested_vmcb02_compute_g_pat() until I had removed the
> call site from svm_set_msr().
>
> The first two commits should arguably be one, but I tried to deal with
> the serialization issue separately from the RDMSR/WRMSR issue, despite
> the two being intertwined.
>
> I don't like the ugliness of KVM_GET_MSRS saving the L2 Guest PAT
> register during a checkpoint, but KVM_SET_MSRS restoring the
> architectural PAT MSR on restore (because when KVM_SET_MSRS is called,
> L2 is not active). The APM section on replicated state offers a
> possible out:
>
> While nested paging is enabled, all (guest) references to the state
> of the paging registers by x86 code (MOV to/from CRn, etc.) read and
> write the guest copy of the registers
>
> If we consider KVM_{GET,SET}_MSRS not to be "guest" references, we
> could always access the architected PAT MSR from userspace, and we
> could grab 64 bits from the SVM nested state header to serialize L2's
> G_PAT. In some ways, that seems cleaner, but it does mean that
> KVM_{GET,SET}_MSR will access L1's PAT, which is irrelevant while L2
> is active.
>
> Hence, I am posting this series as an RFC.
A little bit more context here, the APM is a bit unclear about how
PAT/gPAT are actually handled. Specifically, whether or not gPAT is
context switched with the PAT MSR on VMRUN and #VMEXIT or not.
On one hand, the APM mentions that with nested paging, paging registers
are replicated and are loaded on VMRUN (in section 15.25.2 in Vol. 2):
Most processor state affecting paging is replicated for host and
guest. This includes the paging registers CR0, CR3, CR4, EFER and
PAT. CR2 is not replicated but is loaded by VMRUN. The MTRRs are not
replicated.
While nested paging is enabled, all (guest) references to the state of
the paging registers by x86 code (MOV to/from CRn, etc.) read and
write the guest copy of the registers; the VMM's versions of the
registers are untouched and continue to control the second level
translations from guest physical to system physical addresses. In
contrast, when nested paging is disabled, the VMM's paging control
registers are stored in the host state save area and the paging
control registers from the guest VMCB are the only active versions of
those registers.
This gives the impression that gPAT is loaded into the PAT MSR on
VMRUN, and switched back on #VMEXIT.
However, the APM also have multiple hints about PAT being different from
the other paging registers, and that gPAT might be a hidden register
different from PAT MSR:
- Looking at the pseudocode for VMRUN in Vol. 3 in the APM:
* The part with "save host state to physical memory indicated in the
VM_HSAVE_PA MSR:" does not include host PAT.
* The part with "from the VMCB at physical address rAX, load guest
state:" has:
IF (NP_ENABLE == 1)
gPAT // Leaves host hPAT register unchanged.
- Similarly, the pseudocode for #VMEXIT does not include restoring PAT
as part of the host state. Also, the part with "save guest state to
VMCB:" specifies "gPAT" not "PAT" being saved to the VMCB.
- In Vol 2, Table B-4: VMSA Layout, State Save Area
for SEV-ES, EFER, CR0, CR3, and CR4 are swap type A, but gPAT has no
swap type and a note: "Swapped for guest, not used in host mode."
So it's unclear how the hardware actually handles gPAT. This RFC
implements the second interpretation, that gPAT is not loaded into the
architerctural PAT MSR on VMRUN, and that guest accesses are redirect to
this hidden gPAT register. This complicates things like save/restore, as
Jim mentioned.
The alternative would be loading L2's PAT (or the gPAT set by L1 for L2)
into the architectural state when entering guest mode, similar to EFER
and other paging registers, which would considerably simplify things,
especially save/restore. This, however, may not be the architectural
behavior.
From the guest prespective it should be more-or-less the same thing,
which makes me inclined to the simpler approach. However, I am not sure
if there could be any repercussions from not following the architecture,
assuming the architecture is really not loading gPAT into the PAT MSR.
It would be great if someone from AMD could clarify the architectural
behavior here and voice their opinion about which approach we should
take here.
Side-note: If we follow the simple approach, we can probably disable
intercepts to PAT completely when NPT is enabled, as guest accesses to
PAT are redirected to the gPAT (whatever that is) anyway. It probably
won't buy us much tho.
>
> Jim Mattson (6):
> KVM: x86: nSVM: Shuffle guest PAT and PAT MSR in
> svm_set_nested_state()
> KVM: x86: nSVM: Redirect PAT MSR accesses to gPAT when NPT is enabled
> in vmcb12
> KVM: x86: nSVM: Copy current vmcb02 g_pat to vmcb12 g_pat on #VMEXIT
> KVM: x86: nSVM: Cache g_pat in vmcb_ctrl_area_cached
> KVM: x86: nSVM: Add validity check for the VMCB12 g_pat
> KVM: x86: nSVM: Use cached VMCB12 g_pat in VMCB02 when using NPT
>
> arch/x86/include/uapi/asm/kvm.h | 2 ++
> arch/x86/kvm/svm/nested.c | 35 +++++++++++++++++++++++++++++++--
> arch/x86/kvm/svm/svm.c | 25 +++++++++++++++--------
> arch/x86/kvm/svm/svm.h | 1 +
> 4 files changed, 53 insertions(+), 10 deletions(-)
>
> --
> 2.51.2.1041.gc1ab5b90ca-goog
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-11-17 20:56 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-07 20:11 [RFC PATCH 0/6] KVM: x86: nSVM: Improve virtualization of VMCB12 G_PAT Jim Mattson
2025-11-07 20:11 ` [RFC PATCH 1/6] KVM: x86: nSVM: Shuffle guest PAT and PAT MSR in svm_set_nested_state() Jim Mattson
2025-11-07 20:11 ` [RFC PATCH 2/6] KVM: x86: nSVM: Redirect PAT MSR accesses to gPAT when NPT is enabled in vmcb12 Jim Mattson
2025-11-07 20:11 ` [RFC PATCH 3/6] KVM: x86: nSVM: Copy current vmcb02 g_pat to vmcb12 g_pat on #VMEXIT Jim Mattson
2025-11-07 20:11 ` [RFC PATCH 4/6] KVM: x86: nSVM: Cache g_pat in vmcb_ctrl_area_cached Jim Mattson
2025-11-07 20:11 ` [RFC PATCH 5/6] KVM: x86: nSVM: Add validity check for the VMCB12 g_pat Jim Mattson
2025-11-07 20:11 ` [RFC PATCH 6/6] KVM: x86: nSVM: Use cached VMCB12 g_pat in VMCB02 when using NPT Jim Mattson
2025-11-17 20:56 ` [RFC PATCH 0/6] KVM: x86: nSVM: Improve virtualization of VMCB12 G_PAT Yosry Ahmed
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox