* [PATCH v2 00/16] KVM: x86: Enable APX for guests
@ 2026-01-12 23:53 Chang S. Bae
2026-01-12 23:53 ` [PATCH v2 01/16] KVM: x86: Rename register accessors to be GPR-specific Chang S. Bae
` (15 more replies)
0 siblings, 16 replies; 39+ messages in thread
From: Chang S. Bae @ 2026-01-12 23:53 UTC (permalink / raw)
To: pbonzini, seanjc; +Cc: kvm, linux-kernel, chao.gao, chang.seok.bae
Hi all,
Here is a summary of changes since the last posting [1]:
* PATCH 2/3: Move EGPR accessor code to x86.c (Paolo)
* PATCH 11: Rename NoRex to NoRex2 (Paolo)
* PATCH 05: Remove an unused function parameter (Chao)
* PATCH 7/8: Reorder nVMX changes after VMX patches (Chao)
With this posting, I would like to see if I can collect some review tags.
For anyone looking at this series for the first time, please refer to the
initial RFC posting cover and the related discussions [3], while a brief
overview is here:
APX [2] extends the general-purpose register set (EGPRs). Unlike legacy
GPRs, the state is not cached by KVM, so will be accessed directly to
live hardware registers (Part1). Based on that, VMX exit handling
(Part2) and instruction emulation (Part3) are updated before the
feature is exposed to guests (Part4).
* Part1, PATCH 01-03: GPR accessor refactoring and EGPR support
* Part2, PATCH 04-08: VMX handler changes for EGPR indices
* Part3, PATCH 09-12: Emulator changes for REX2 support
* Part4, PATCH 13-16: Feature expossure and self-tests
The series is also available here:
git://github.com/intel/apx.git apx-kvm_v2
This version is rebased on v6.19-rc5. I do not see any direct dependency
on other pending changes right now; any conflict should be manageable
later.
Thanks,
Chang
[1] V1: https://lore.kernel.org/kvm/20251221040742.29749-1-chang.seok.bae@intel.com/
[2] APX Architecture Specification:
https://cdrdv2.intel.com/v1/dl/getContent/784266
[3] RFC: https://lore.kernel.org/kvm/20251110180131.28264-1-chang.seok.bae@intel.com/
Chang S. Bae (15):
KVM: x86: Rename register accessors to be GPR-specific
KVM: x86: Refactor GPR accessors to differentiate register access
types
KVM: x86: Implement accessors for extended GPRs
KVM: VMX: Introduce unified instruction info structure
KVM: VMX: Refactor instruction information retrieval
KVM: VMX: Refactor GPR index retrieval from exit qualification
KVM: VMX: Support extended register index in exit handling
KVM: nVMX: Propagate the extended instruction info field
KVM: emulate: Support EGPR accessing and tracking
KVM: emulate: Handle EGPR index and REX2-incompatible opcodes
KVM: emulate: Support REX2-prefixed opcode decode
KVM: emulate: Reject EVEX-prefixed instructions
KVM: x86: Guard valid XCR0.APX settings
KVM: x86: Expose APX sub-features to guests
KVM: x86: selftests: Add APX state handling and XCR0 sanity checks
Peter Fang (1):
KVM: x86: Expose APX foundational feature bit to guests
arch/x86/include/asm/kvm_host.h | 19 +++
arch/x86/include/asm/kvm_vcpu_regs.h | 16 +++
arch/x86/include/asm/vmx.h | 2 +
arch/x86/kvm/Kconfig | 4 +
arch/x86/kvm/cpuid.c | 14 +-
arch/x86/kvm/emulate.c | 121 +++++++++++-----
arch/x86/kvm/kvm_emulate.h | 11 +-
arch/x86/kvm/reverse_cpuid.h | 6 +
arch/x86/kvm/svm/svm.c | 23 +++-
arch/x86/kvm/vmx/nested.c | 87 ++++++------
arch/x86/kvm/vmx/nested.h | 2 +-
arch/x86/kvm/vmx/vmcs12.c | 1 +
arch/x86/kvm/vmx/vmcs12.h | 3 +-
arch/x86/kvm/vmx/vmx.c | 26 ++--
arch/x86/kvm/vmx/vmx.h | 106 ++++++++++++--
arch/x86/kvm/x86.c | 130 ++++++++++++++++--
arch/x86/kvm/x86.h | 24 +++-
arch/x86/kvm/xen.c | 2 +-
.../selftests/kvm/include/x86/processor.h | 1 +
tools/testing/selftests/kvm/x86/state_test.c | 6 +
.../selftests/kvm/x86/xcr0_cpuid_test.c | 19 +++
21 files changed, 498 insertions(+), 125 deletions(-)
base-commit: 0f61b1860cc3f52aef9036d7235ed1f017632193
--
2.51.0
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v2 01/16] KVM: x86: Rename register accessors to be GPR-specific
2026-01-12 23:53 [PATCH v2 00/16] KVM: x86: Enable APX for guests Chang S. Bae
@ 2026-01-12 23:53 ` Chang S. Bae
2026-03-05 1:35 ` Sean Christopherson
2026-01-12 23:53 ` [PATCH v2 02/16] KVM: x86: Refactor GPR accessors to differentiate register access types Chang S. Bae
` (14 subsequent siblings)
15 siblings, 1 reply; 39+ messages in thread
From: Chang S. Bae @ 2026-01-12 23:53 UTC (permalink / raw)
To: pbonzini, seanjc; +Cc: kvm, linux-kernel, chao.gao, chang.seok.bae
Refactor the VCPU register state accessors to make them explicitly
GPR-only.
The existing register accessors operate on the cached VCPU register
state. That cache holds GPRs and RIP. RIP has its own interface already.
This renaming clarifies GPR access only.
No functional changes intended.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kvm/svm/svm.c | 8 ++++----
arch/x86/kvm/vmx/nested.c | 20 ++++++++++----------
arch/x86/kvm/vmx/vmx.c | 12 ++++++------
arch/x86/kvm/x86.c | 10 +++++-----
arch/x86/kvm/x86.h | 5 ++---
arch/x86/kvm/xen.c | 2 +-
6 files changed, 28 insertions(+), 29 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 24d59ccfa40d..209faa742e98 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2474,7 +2474,7 @@ static int cr_interception(struct kvm_vcpu *vcpu)
err = 0;
if (cr >= 16) { /* mov to cr */
cr -= 16;
- val = kvm_register_read(vcpu, reg);
+ val = kvm_gpr_read(vcpu, reg);
trace_kvm_cr_write(cr, val);
switch (cr) {
case 0:
@@ -2520,7 +2520,7 @@ static int cr_interception(struct kvm_vcpu *vcpu)
kvm_queue_exception(vcpu, UD_VECTOR);
return 1;
}
- kvm_register_write(vcpu, reg, val);
+ kvm_gpr_write(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
}
return kvm_complete_insn_gp(vcpu, err);
@@ -2592,9 +2592,9 @@ static int dr_interception(struct kvm_vcpu *vcpu)
dr = svm->vmcb->control.exit_code - SVM_EXIT_READ_DR0;
if (dr >= 16) { /* mov to DRn */
dr -= 16;
- err = kvm_set_dr(vcpu, dr, kvm_register_read(vcpu, reg));
+ err = kvm_set_dr(vcpu, dr, kvm_gpr_read(vcpu, reg));
} else {
- kvm_register_write(vcpu, reg, kvm_get_dr(vcpu, dr));
+ kvm_gpr_write(vcpu, reg, kvm_get_dr(vcpu, dr));
}
return kvm_complete_insn_gp(vcpu, err);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6137e5307d0f..b7d5feb4f5bd 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5275,9 +5275,9 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
else if (addr_size == 0)
off = (gva_t)sign_extend64(off, 15);
if (base_is_valid)
- off += kvm_register_read(vcpu, base_reg);
+ off += kvm_gpr_read(vcpu, base_reg);
if (index_is_valid)
- off += kvm_register_read(vcpu, index_reg) << scaling;
+ off += kvm_gpr_read(vcpu, index_reg) << scaling;
vmx_get_segment(vcpu, &s, seg_reg);
/*
@@ -5669,7 +5669,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
return 1;
/* Decode instruction info and find the field to read */
- field = kvm_register_read(vcpu, (((instr_info) >> 28) & 0xf));
+ field = kvm_gpr_read(vcpu, (((instr_info) >> 28) & 0xf));
if (!nested_vmx_is_evmptr12_valid(vmx)) {
/*
@@ -5718,7 +5718,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
* on the guest's mode (32 or 64 bit), not on the given field's length.
*/
if (instr_info & BIT(10)) {
- kvm_register_write(vcpu, (((instr_info) >> 3) & 0xf), value);
+ kvm_gpr_write(vcpu, (((instr_info) >> 3) & 0xf), value);
} else {
len = is_64_bit_mode(vcpu) ? 8 : 4;
if (get_vmx_mem_address(vcpu, exit_qualification,
@@ -5792,7 +5792,7 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
return nested_vmx_failInvalid(vcpu);
if (instr_info & BIT(10))
- value = kvm_register_read(vcpu, (((instr_info) >> 3) & 0xf));
+ value = kvm_gpr_read(vcpu, (((instr_info) >> 3) & 0xf));
else {
len = is_64_bit_mode(vcpu) ? 8 : 4;
if (get_vmx_mem_address(vcpu, exit_qualification,
@@ -5803,7 +5803,7 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
return kvm_handle_memory_failure(vcpu, r, &e);
}
- field = kvm_register_read(vcpu, (((instr_info) >> 28) & 0xf));
+ field = kvm_gpr_read(vcpu, (((instr_info) >> 28) & 0xf));
offset = get_vmcs12_field_offset(field);
if (offset < 0)
@@ -6001,7 +6001,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
gpr_index = vmx_get_instr_info_reg2(vmx_instruction_info);
- type = kvm_register_read(vcpu, gpr_index);
+ type = kvm_gpr_read(vcpu, gpr_index);
types = (vmx->nested.msrs.ept_caps >> VMX_EPT_EXTENT_SHIFT) & 6;
@@ -6082,7 +6082,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
gpr_index = vmx_get_instr_info_reg2(vmx_instruction_info);
- type = kvm_register_read(vcpu, gpr_index);
+ type = kvm_gpr_read(vcpu, gpr_index);
types = (vmx->nested.msrs.vpid_caps &
VMX_VPID_EXTENT_SUPPORTED_MASK) >> 8;
@@ -6356,7 +6356,7 @@ static bool nested_vmx_exit_handled_cr(struct kvm_vcpu *vcpu,
switch ((exit_qualification >> 4) & 3) {
case 0: /* mov to cr */
reg = (exit_qualification >> 8) & 15;
- val = kvm_register_read(vcpu, reg);
+ val = kvm_gpr_read(vcpu, reg);
switch (cr) {
case 0:
if (vmcs12->cr0_guest_host_mask &
@@ -6442,7 +6442,7 @@ static bool nested_vmx_exit_handled_vmcs_access(struct kvm_vcpu *vcpu,
/* Decode instruction info and find the field to access */
vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
- field = kvm_register_read(vcpu, (((vmx_instruction_info) >> 28) & 0xf));
+ field = kvm_gpr_read(vcpu, (((vmx_instruction_info) >> 28) & 0xf));
/* Out-of-range fields always cause a VM exit from L2 to L1 */
if (field >> 15)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6b96f7aea20b..4320f61aabc2 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5591,7 +5591,7 @@ static int handle_cr(struct kvm_vcpu *vcpu)
reg = (exit_qualification >> 8) & 15;
switch ((exit_qualification >> 4) & 3) {
case 0: /* mov to cr */
- val = kvm_register_read(vcpu, reg);
+ val = kvm_gpr_read(vcpu, reg);
trace_kvm_cr_write(cr, val);
switch (cr) {
case 0:
@@ -5633,12 +5633,12 @@ static int handle_cr(struct kvm_vcpu *vcpu)
WARN_ON_ONCE(enable_unrestricted_guest);
val = kvm_read_cr3(vcpu);
- kvm_register_write(vcpu, reg, val);
+ kvm_gpr_write(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
case 8:
val = kvm_get_cr8(vcpu);
- kvm_register_write(vcpu, reg, val);
+ kvm_gpr_write(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
}
@@ -5708,10 +5708,10 @@ static int handle_dr(struct kvm_vcpu *vcpu)
reg = DEBUG_REG_ACCESS_REG(exit_qualification);
if (exit_qualification & TYPE_MOV_FROM_DR) {
- kvm_register_write(vcpu, reg, kvm_get_dr(vcpu, dr));
+ kvm_gpr_write(vcpu, reg, kvm_get_dr(vcpu, dr));
err = 0;
} else {
- err = kvm_set_dr(vcpu, dr, kvm_register_read(vcpu, reg));
+ err = kvm_set_dr(vcpu, dr, kvm_gpr_read(vcpu, reg));
}
out:
@@ -6070,7 +6070,7 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
gpr_index = vmx_get_instr_info_reg2(vmx_instruction_info);
- type = kvm_register_read(vcpu, gpr_index);
+ type = kvm_gpr_read(vcpu, gpr_index);
/* According to the Intel instruction reference, the memory operand
* is read even if it isn't needed (e.g., for type==all)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ff8812f3a129..3256ad507265 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2080,8 +2080,8 @@ static int complete_fast_rdmsr(struct kvm_vcpu *vcpu)
static int complete_fast_rdmsr_imm(struct kvm_vcpu *vcpu)
{
if (!vcpu->run->msr.error)
- kvm_register_write(vcpu, vcpu->arch.cui_rdmsr_imm_reg,
- vcpu->run->msr.data);
+ kvm_gpr_write(vcpu, vcpu->arch.cui_rdmsr_imm_reg,
+ vcpu->run->msr.data);
return complete_fast_msr_access(vcpu);
}
@@ -2135,7 +2135,7 @@ static int __kvm_emulate_rdmsr(struct kvm_vcpu *vcpu, u32 msr, int reg,
kvm_rax_write(vcpu, data & -1u);
kvm_rdx_write(vcpu, (data >> 32) & -1u);
} else {
- kvm_register_write(vcpu, reg, data);
+ kvm_gpr_write(vcpu, reg, data);
}
} else {
/* MSR read failed? See if we should ask user space */
@@ -2193,7 +2193,7 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_wrmsr);
int kvm_emulate_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg)
{
- return __kvm_emulate_wrmsr(vcpu, msr, kvm_register_read(vcpu, reg));
+ return __kvm_emulate_wrmsr(vcpu, msr, kvm_gpr_read(vcpu, reg));
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_wrmsr_imm);
@@ -2297,7 +2297,7 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_wrmsr);
fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg)
{
- return __handle_fastpath_wrmsr(vcpu, msr, kvm_register_read(vcpu, reg));
+ return __handle_fastpath_wrmsr(vcpu, msr, kvm_gpr_read(vcpu, reg));
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_wrmsr_imm);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index fdab0ad49098..7d6c1c31539f 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -400,15 +400,14 @@ static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
return false;
}
-static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, int reg)
+static inline unsigned long kvm_gpr_read(struct kvm_vcpu *vcpu, int reg)
{
unsigned long val = kvm_register_read_raw(vcpu, reg);
return is_64_bit_mode(vcpu) ? val : (u32)val;
}
-static inline void kvm_register_write(struct kvm_vcpu *vcpu,
- int reg, unsigned long val)
+static inline void kvm_gpr_write(struct kvm_vcpu *vcpu, int reg, unsigned long val)
{
if (!is_64_bit_mode(vcpu))
val = (u32)val;
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index d6b2a665b499..c9700dc88bb1 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1679,7 +1679,7 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
bool handled = false;
u8 cpl;
- input = (u64)kvm_register_read(vcpu, VCPU_REGS_RAX);
+ input = (u64)kvm_gpr_read(vcpu, VCPU_REGS_RAX);
/* Hyper-V hypercalls get bit 31 set in EAX */
if ((input & 0x80000000) &&
--
2.51.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 02/16] KVM: x86: Refactor GPR accessors to differentiate register access types
2026-01-12 23:53 [PATCH v2 00/16] KVM: x86: Enable APX for guests Chang S. Bae
2026-01-12 23:53 ` [PATCH v2 01/16] KVM: x86: Rename register accessors to be GPR-specific Chang S. Bae
@ 2026-01-12 23:53 ` Chang S. Bae
2026-03-05 1:49 ` Sean Christopherson
2026-01-12 23:53 ` [PATCH v2 03/16] KVM: x86: Implement accessors for extended GPRs Chang S. Bae
` (13 subsequent siblings)
15 siblings, 1 reply; 39+ messages in thread
From: Chang S. Bae @ 2026-01-12 23:53 UTC (permalink / raw)
To: pbonzini, seanjc; +Cc: kvm, linux-kernel, chao.gao, chang.seok.bae
Refactor the GPR accessors to introduce internal helpers to distinguish
between legacy and extended GPRs. Add CONFIG_KVM_APX to selectively
enable EGPR support.
EGPRs will initially remain unused in the kernel. Thus, the state will
not be saved in KVM register cache on every VM exit. Instead, the guest
state remains live in hardware registers or is stored in guest fpstate.
For now, the EGPR accessors are placeholders to be implemented later.
Link: https://lore.kernel.org/7cff2a78-94f3-4746-9833-c2a1bf51eed6@redhat.com
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
V1 -> V2: Move kvm_read_egpr()/kvm_write_egpr() to x86.c (Paolo)
---
arch/x86/include/asm/kvm_host.h | 18 ++++++++++++
arch/x86/include/asm/kvm_vcpu_regs.h | 16 +++++++++++
arch/x86/kvm/Kconfig | 4 +++
arch/x86/kvm/x86.c | 41 ++++++++++++++++++++++++++++
arch/x86/kvm/x86.h | 19 +++++++++++--
5 files changed, 96 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5a3bfa293e8b..9dedb8d77222 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -212,6 +212,24 @@ enum {
VCPU_SREG_GS,
VCPU_SREG_TR,
VCPU_SREG_LDTR,
+#ifdef CONFIG_X86_64
+ VCPU_XREG_R16 = __VCPU_XREG_R16,
+ VCPU_XREG_R17 = __VCPU_XREG_R17,
+ VCPU_XREG_R18 = __VCPU_XREG_R18,
+ VCPU_XREG_R19 = __VCPU_XREG_R19,
+ VCPU_XREG_R20 = __VCPU_XREG_R20,
+ VCPU_XREG_R21 = __VCPU_XREG_R21,
+ VCPU_XREG_R22 = __VCPU_XREG_R22,
+ VCPU_XREG_R23 = __VCPU_XREG_R23,
+ VCPU_XREG_R24 = __VCPU_XREG_R24,
+ VCPU_XREG_R25 = __VCPU_XREG_R25,
+ VCPU_XREG_R26 = __VCPU_XREG_R26,
+ VCPU_XREG_R27 = __VCPU_XREG_R27,
+ VCPU_XREG_R28 = __VCPU_XREG_R28,
+ VCPU_XREG_R29 = __VCPU_XREG_R29,
+ VCPU_XREG_R30 = __VCPU_XREG_R30,
+ VCPU_XREG_R31 = __VCPU_XREG_R31,
+#endif
};
enum exit_fastpath_completion {
diff --git a/arch/x86/include/asm/kvm_vcpu_regs.h b/arch/x86/include/asm/kvm_vcpu_regs.h
index 1af2cb59233b..dd0cc171f405 100644
--- a/arch/x86/include/asm/kvm_vcpu_regs.h
+++ b/arch/x86/include/asm/kvm_vcpu_regs.h
@@ -20,6 +20,22 @@
#define __VCPU_REGS_R13 13
#define __VCPU_REGS_R14 14
#define __VCPU_REGS_R15 15
+#define __VCPU_XREG_R16 16
+#define __VCPU_XREG_R17 17
+#define __VCPU_XREG_R18 18
+#define __VCPU_XREG_R19 19
+#define __VCPU_XREG_R20 20
+#define __VCPU_XREG_R21 21
+#define __VCPU_XREG_R22 22
+#define __VCPU_XREG_R23 23
+#define __VCPU_XREG_R24 24
+#define __VCPU_XREG_R25 25
+#define __VCPU_XREG_R26 26
+#define __VCPU_XREG_R27 27
+#define __VCPU_XREG_R28 28
+#define __VCPU_XREG_R29 29
+#define __VCPU_XREG_R30 30
+#define __VCPU_XREG_R31 31
#endif
#endif /* _ASM_X86_KVM_VCPU_REGS_H */
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 278f08194ec8..2b2995188e97 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -93,10 +93,14 @@ config KVM_SW_PROTECTED_VM
If unsure, say "N".
+config KVM_APX
+ bool
+
config KVM_INTEL
tristate "KVM for Intel (and compatible) processors support"
depends on KVM && IA32_FEAT_CTL
select X86_FRED if X86_64
+ select KVM_APX if X86_64
help
Provides support for KVM on processors equipped with Intel's VT
extensions, a.k.a. Virtual Machine Extensions (VMX).
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3256ad507265..9857b4d319ed 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1255,6 +1255,47 @@ static inline u64 kvm_guest_supported_xfd(struct kvm_vcpu *vcpu)
}
#endif
+#ifdef CONFIG_KVM_APX
+static unsigned long kvm_read_egpr(int reg)
+{
+ return 0;
+}
+
+static void kvm_write_egpr(int reg, unsigned long data)
+{
+}
+
+unsigned long kvm_gpr_read_raw(struct kvm_vcpu *vcpu, int reg)
+{
+ switch (reg) {
+ case VCPU_REGS_RAX ... VCPU_REGS_R15:
+ return kvm_register_read_raw(vcpu, reg);
+ case VCPU_XREG_R16 ... VCPU_XREG_R31:
+ return kvm_read_egpr(reg);
+ default:
+ WARN_ON_ONCE(1);
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gpr_read_raw);
+
+void kvm_gpr_write_raw(struct kvm_vcpu *vcpu, int reg, unsigned long val)
+{
+ switch (reg) {
+ case VCPU_REGS_RAX ... VCPU_REGS_R15:
+ kvm_register_write_raw(vcpu, reg, val);
+ break;
+ case VCPU_XREG_R16 ... VCPU_XREG_R31:
+ kvm_write_egpr(reg, val);
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ }
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gpr_write_raw);
+#endif
+
int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
{
u64 xcr0 = xcr;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 7d6c1c31539f..19183aa92855 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -400,9 +400,24 @@ static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
return false;
}
+#ifdef CONFIG_KVM_APX
+unsigned long kvm_gpr_read_raw(struct kvm_vcpu *vcpu, int reg);
+void kvm_gpr_write_raw(struct kvm_vcpu *vcpu, int reg, unsigned long val);
+#else
+static inline unsigned long kvm_gpr_read_raw(struct kvm_vcpu *vcpu, int reg)
+{
+ return kvm_register_read_raw(vcpu, reg);
+}
+
+static inline void kvm_gpr_write_raw(struct kvm_vcpu *vcpu, int reg, unsigned long val)
+{
+ kvm_register_write_raw(vcpu, reg, val);
+}
+#endif
+
static inline unsigned long kvm_gpr_read(struct kvm_vcpu *vcpu, int reg)
{
- unsigned long val = kvm_register_read_raw(vcpu, reg);
+ unsigned long val = kvm_gpr_read_raw(vcpu, reg);
return is_64_bit_mode(vcpu) ? val : (u32)val;
}
@@ -411,7 +426,7 @@ static inline void kvm_gpr_write(struct kvm_vcpu *vcpu, int reg, unsigned long v
{
if (!is_64_bit_mode(vcpu))
val = (u32)val;
- return kvm_register_write_raw(vcpu, reg, val);
+ kvm_gpr_write_raw(vcpu, reg, val);
}
static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
--
2.51.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 03/16] KVM: x86: Implement accessors for extended GPRs
2026-01-12 23:53 [PATCH v2 00/16] KVM: x86: Enable APX for guests Chang S. Bae
2026-01-12 23:53 ` [PATCH v2 01/16] KVM: x86: Rename register accessors to be GPR-specific Chang S. Bae
2026-01-12 23:53 ` [PATCH v2 02/16] KVM: x86: Refactor GPR accessors to differentiate register access types Chang S. Bae
@ 2026-01-12 23:53 ` Chang S. Bae
2026-03-05 1:41 ` Sean Christopherson
2026-01-12 23:53 ` [PATCH v2 04/16] KVM: VMX: Introduce unified instruction info structure Chang S. Bae
` (12 subsequent siblings)
15 siblings, 1 reply; 39+ messages in thread
From: Chang S. Bae @ 2026-01-12 23:53 UTC (permalink / raw)
To: pbonzini, seanjc; +Cc: kvm, linux-kernel, chao.gao, chang.seok.bae
Add helpers to directly read and write EGPRs (R16–R31).
Unlike legacy GPRs, EGPRs are not cached in vcpu->arch.regs[]. Their
contents remain live in hardware. If preempted, the EGPR state is
preserved in the guest XSAVE buffer.
The Advanced Performance Extensions (APX) feature introduces EGPRs as an
XSAVE-managed state component. The new helpers access the registers
directly between kvm_fpu_get() and kvm_fpu_put().
Callers should ensure that EGPRs are enabled before using these helpers.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
V1 -> V2: Move _kvm_read_egpr()/_kvm_write_egpr() to x86.c (Paolo)
---
arch/x86/kvm/x86.c | 70 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 69 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9857b4d319ed..edac2ec11e2f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1256,13 +1256,81 @@ static inline u64 kvm_guest_supported_xfd(struct kvm_vcpu *vcpu)
#endif
#ifdef CONFIG_KVM_APX
+/*
+ * Accessors for extended general-purpose registers. binutils >= 2.43 can
+ * recognize those register symbols.
+ */
+
+static void _kvm_read_egpr(int reg, unsigned long *data)
+{
+ /* mov %r16..%r31, %rax */
+ switch (reg) {
+ case __VCPU_XREG_R16: asm(".byte 0xd5, 0x48, 0x89, 0xc0" : "=a"(*data)); break;
+ case __VCPU_XREG_R17: asm(".byte 0xd5, 0x48, 0x89, 0xc8" : "=a"(*data)); break;
+ case __VCPU_XREG_R18: asm(".byte 0xd5, 0x48, 0x89, 0xd0" : "=a"(*data)); break;
+ case __VCPU_XREG_R19: asm(".byte 0xd5, 0x48, 0x89, 0xd8" : "=a"(*data)); break;
+ case __VCPU_XREG_R20: asm(".byte 0xd5, 0x48, 0x89, 0xe0" : "=a"(*data)); break;
+ case __VCPU_XREG_R21: asm(".byte 0xd5, 0x48, 0x89, 0xe8" : "=a"(*data)); break;
+ case __VCPU_XREG_R22: asm(".byte 0xd5, 0x48, 0x89, 0xf0" : "=a"(*data)); break;
+ case __VCPU_XREG_R23: asm(".byte 0xd5, 0x48, 0x89, 0xf8" : "=a"(*data)); break;
+ case __VCPU_XREG_R24: asm(".byte 0xd5, 0x4c, 0x89, 0xc0" : "=a"(*data)); break;
+ case __VCPU_XREG_R25: asm(".byte 0xd5, 0x4c, 0x89, 0xc8" : "=a"(*data)); break;
+ case __VCPU_XREG_R26: asm(".byte 0xd5, 0x4c, 0x89, 0xd0" : "=a"(*data)); break;
+ case __VCPU_XREG_R27: asm(".byte 0xd5, 0x4c, 0x89, 0xd8" : "=a"(*data)); break;
+ case __VCPU_XREG_R28: asm(".byte 0xd5, 0x4c, 0x89, 0xe0" : "=a"(*data)); break;
+ case __VCPU_XREG_R29: asm(".byte 0xd5, 0x4c, 0x89, 0xe8" : "=a"(*data)); break;
+ case __VCPU_XREG_R30: asm(".byte 0xd5, 0x4c, 0x89, 0xf0" : "=a"(*data)); break;
+ case __VCPU_XREG_R31: asm(".byte 0xd5, 0x4c, 0x89, 0xf8" : "=a"(*data)); break;
+ default: BUG();
+ }
+}
+
+static void _kvm_write_egpr(int reg, unsigned long *data)
+{
+ /* mov %rax, %r16...%r31*/
+ switch (reg) {
+ case __VCPU_XREG_R16: asm(".byte 0xd5, 0x18, 0x89, 0xc0" : : "a"(*data)); break;
+ case __VCPU_XREG_R17: asm(".byte 0xd5, 0x18, 0x89, 0xc1" : : "a"(*data)); break;
+ case __VCPU_XREG_R18: asm(".byte 0xd5, 0x18, 0x89, 0xc2" : : "a"(*data)); break;
+ case __VCPU_XREG_R19: asm(".byte 0xd5, 0x18, 0x89, 0xc3" : : "a"(*data)); break;
+ case __VCPU_XREG_R20: asm(".byte 0xd5, 0x18, 0x89, 0xc4" : : "a"(*data)); break;
+ case __VCPU_XREG_R21: asm(".byte 0xd5, 0x18, 0x89, 0xc5" : : "a"(*data)); break;
+ case __VCPU_XREG_R22: asm(".byte 0xd5, 0x18, 0x89, 0xc6" : : "a"(*data)); break;
+ case __VCPU_XREG_R23: asm(".byte 0xd5, 0x18, 0x89, 0xc7" : : "a"(*data)); break;
+ case __VCPU_XREG_R24: asm(".byte 0xd5, 0x19, 0x89, 0xc0" : : "a"(*data)); break;
+ case __VCPU_XREG_R25: asm(".byte 0xd5, 0x19, 0x89, 0xc1" : : "a"(*data)); break;
+ case __VCPU_XREG_R26: asm(".byte 0xd5, 0x19, 0x89, 0xc2" : : "a"(*data)); break;
+ case __VCPU_XREG_R27: asm(".byte 0xd5, 0x19, 0x89, 0xc3" : : "a"(*data)); break;
+ case __VCPU_XREG_R28: asm(".byte 0xd5, 0x19, 0x89, 0xc4" : : "a"(*data)); break;
+ case __VCPU_XREG_R29: asm(".byte 0xd5, 0x19, 0x89, 0xc5" : : "a"(*data)); break;
+ case __VCPU_XREG_R30: asm(".byte 0xd5, 0x19, 0x89, 0xc6" : : "a"(*data)); break;
+ case __VCPU_XREG_R31: asm(".byte 0xd5, 0x19, 0x89, 0xc7" : : "a"(*data)); break;
+ default: BUG();
+ }
+}
+
static unsigned long kvm_read_egpr(int reg)
{
- return 0;
+ unsigned long data;
+
+ if (WARN_ON_ONCE(!cpu_has_xfeatures(XFEATURE_MASK_APX, NULL)))
+ return 0;
+
+ kvm_fpu_get();
+ _kvm_read_egpr(reg, &data);
+ kvm_fpu_put();
+
+ return data;
}
static void kvm_write_egpr(int reg, unsigned long data)
{
+ if (WARN_ON_ONCE(!cpu_has_xfeatures(XFEATURE_MASK_APX, NULL)))
+ return;
+
+ kvm_fpu_get();
+ _kvm_write_egpr(reg, &data);
+ kvm_fpu_put();
}
unsigned long kvm_gpr_read_raw(struct kvm_vcpu *vcpu, int reg)
--
2.51.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 04/16] KVM: VMX: Introduce unified instruction info structure
2026-01-12 23:53 [PATCH v2 00/16] KVM: x86: Enable APX for guests Chang S. Bae
` (2 preceding siblings ...)
2026-01-12 23:53 ` [PATCH v2 03/16] KVM: x86: Implement accessors for extended GPRs Chang S. Bae
@ 2026-01-12 23:53 ` Chang S. Bae
2026-03-05 4:21 ` Sean Christopherson
2026-01-12 23:53 ` [PATCH v2 05/16] KVM: VMX: Refactor instruction information retrieval Chang S. Bae
` (11 subsequent siblings)
15 siblings, 1 reply; 39+ messages in thread
From: Chang S. Bae @ 2026-01-12 23:53 UTC (permalink / raw)
To: pbonzini, seanjc; +Cc: kvm, linux-kernel, chao.gao, chang.seok.bae
Define a unified data structure that can represent both the legacy and
extended VMX instruction information formats.
VMX provides per-instruction metadata for VM exits to help decode the
attributes of the instruction that triggered the exit. The legacy format,
however, only supports up to 16 GPRs and thus cannot represent EGPRs. To
support these new registers, VMX introduces an extended 64-bit layout.
Instead of maintaining separate storage for each format, a single
union structure makes the overall handling simple. The field names are
consistent across both layouts. While the presence of certain fields
depends on the instruction type, the offsets remain fixed within each
format.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kvm/vmx/vmx.h | 61 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index bc3ed3145d7e..567320115a5a 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -311,6 +311,67 @@ struct kvm_vmx {
u64 *pid_table;
};
+/*
+ * 32-bit layout of the legacy instruction information field. This format
+ * supports the 16 legacy GPRs.
+ */
+struct base_insn_info {
+ u32 scale : 2; /* Scaling factor */
+ u32 reserved1 : 1;
+ u32 reg1 : 4; /* First register index */
+ u32 asize : 3; /* Address size */
+ u32 is_reg : 1; /* 0: memory, 1: register */
+ u32 osize : 2; /* Operand size */
+ u32 reserved2 : 2;
+ u32 seg : 3; /* Segment register index */
+ u32 index : 4; /* Index register index */
+ u32 index_invalid : 1; /* 0: valid, 1: invalid */
+ u32 base : 4; /* Base register index */
+ u32 base_invalid : 1; /* 0: valid, 1: invalid */
+ u32 reg2 : 4; /* Second register index */
+};
+
+/*
+ * 64-bit layout of the extended instruction information field, which
+ * supports EGPRs.
+ */
+struct ext_insn_info {
+ u64 scale : 2; /* Scaling factor */
+ u64 asize : 2; /* Address size */
+ u64 is_reg : 1; /* 0: memory, 1: register */
+ u64 osize : 2; /* Operand size */
+ u64 seg : 3; /* Segment register index */
+ u64 index_invalid : 1; /* 0: valid, 1: invalid */
+ u64 base_invalid : 1; /* 0: valid, 1: invalid */
+ u64 reserved1 : 4;
+ u64 reg1 : 5; /* First register index */
+ u64 reserved2 : 3;
+ u64 index : 5; /* Index register index */
+ u64 reserved3 : 3;
+ u64 base : 5; /* Base register index */
+ u64 reserved4 : 3;
+ u64 reg2 : 5; /* Second register index */
+ u64 reserved5 : 19;
+};
+
+/* Union for accessing either the legacy or extended format. */
+union insn_info {
+ struct base_insn_info base;
+ struct ext_insn_info ext;
+ u32 word;
+ u64 dword;
+};
+
+/*
+ * Wrapper structure combining the instruction info and a flag indicating
+ * whether the extended layout is in use.
+ */
+struct vmx_insn_info {
+ /* true if using the extended layout */
+ bool extended;
+ union insn_info info;
+};
+
static __always_inline struct vcpu_vt *to_vt(struct kvm_vcpu *vcpu)
{
return &(container_of(vcpu, struct vcpu_vmx, vcpu)->vt);
--
2.51.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 05/16] KVM: VMX: Refactor instruction information retrieval
2026-01-12 23:53 [PATCH v2 00/16] KVM: x86: Enable APX for guests Chang S. Bae
` (3 preceding siblings ...)
2026-01-12 23:53 ` [PATCH v2 04/16] KVM: VMX: Introduce unified instruction info structure Chang S. Bae
@ 2026-01-12 23:53 ` Chang S. Bae
2026-01-12 23:53 ` [PATCH v2 06/16] KVM: VMX: Refactor GPR index retrieval from exit qualification Chang S. Bae
` (10 subsequent siblings)
15 siblings, 0 replies; 39+ messages in thread
From: Chang S. Bae @ 2026-01-12 23:53 UTC (permalink / raw)
To: pbonzini, seanjc; +Cc: kvm, linux-kernel, chao.gao, chang.seok.bae
Introduce helpers to convert and extract exited instruction attributes,
preparing for EGPR support and deprecating some existing helpers.
Previously, VMX exit handlers directly decoded the raw VMCS field,
resulting in duplicated logic and assumption tied to the legacy layout.
With the unified structure, handlers can convert raw data into a
structure form and access each instruction attribute by field name.
The helper will later determine the format based on the VCPU
configuration. For now, there is no functional change since only the
legacy layout is used.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
V1 -> V2: Remove the unused function argument (Chao)
---
arch/x86/kvm/vmx/nested.c | 73 +++++++++++++++++++--------------------
arch/x86/kvm/vmx/nested.h | 2 +-
arch/x86/kvm/vmx/vmx.c | 14 ++++----
arch/x86/kvm/vmx/vmx.h | 23 ++++++------
4 files changed, 57 insertions(+), 55 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b7d5feb4f5bd..144012dd9599 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5239,7 +5239,7 @@ static void nested_vmx_triple_fault(struct kvm_vcpu *vcpu)
* #UD, #GP, or #SS.
*/
int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
- u32 vmx_instruction_info, bool wr, int len, gva_t *ret)
+ struct vmx_insn_info info, bool wr, int len, gva_t *ret)
{
gva_t off;
bool exn;
@@ -5253,14 +5253,14 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
* For how an actual address is calculated from all these components,
* refer to Vol. 1, "Operand Addressing".
*/
- int scaling = vmx_instruction_info & 3;
- int addr_size = (vmx_instruction_info >> 7) & 7;
- bool is_reg = vmx_instruction_info & (1u << 10);
- int seg_reg = (vmx_instruction_info >> 15) & 7;
- int index_reg = (vmx_instruction_info >> 18) & 0xf;
- bool index_is_valid = !(vmx_instruction_info & (1u << 22));
- int base_reg = (vmx_instruction_info >> 23) & 0xf;
- bool base_is_valid = !(vmx_instruction_info & (1u << 27));
+ int scaling = insn_attr(info, scale);
+ int addr_size = insn_attr(info, asize);
+ bool is_reg = insn_attr(info, is_reg);
+ int seg_reg = insn_attr(info, seg);
+ int index_reg = insn_attr(info, index);
+ bool index_is_valid = !insn_attr(info, index_invalid);
+ int base_reg = insn_attr(info, base);
+ bool base_is_valid = !insn_attr(info, base_invalid);
if (is_reg) {
kvm_queue_exception(vcpu, UD_VECTOR);
@@ -5371,7 +5371,7 @@ static int nested_vmx_get_vmptr(struct kvm_vcpu *vcpu, gpa_t *vmpointer,
int r;
if (get_vmx_mem_address(vcpu, vmx_get_exit_qual(vcpu),
- vmcs_read32(VMX_INSTRUCTION_INFO), false,
+ vmx_get_insn_info(), false,
sizeof(*vmpointer), &gva)) {
*ret = 1;
return -EINVAL;
@@ -5656,7 +5656,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
struct vmcs12 *vmcs12 = is_guest_mode(vcpu) ? get_shadow_vmcs12(vcpu)
: get_vmcs12(vcpu);
unsigned long exit_qualification = vmx_get_exit_qual(vcpu);
- u32 instr_info = vmcs_read32(VMX_INSTRUCTION_INFO);
+ struct vmx_insn_info info = vmx_get_insn_info();
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct x86_exception e;
unsigned long field;
@@ -5669,7 +5669,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
return 1;
/* Decode instruction info and find the field to read */
- field = kvm_gpr_read(vcpu, (((instr_info) >> 28) & 0xf));
+ field = kvm_gpr_read(vcpu, insn_attr(info, reg2));
if (!nested_vmx_is_evmptr12_valid(vmx)) {
/*
@@ -5717,12 +5717,12 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
* Note that the number of bits actually copied is 32 or 64 depending
* on the guest's mode (32 or 64 bit), not on the given field's length.
*/
- if (instr_info & BIT(10)) {
- kvm_gpr_write(vcpu, (((instr_info) >> 3) & 0xf), value);
+ if (insn_attr(info, is_reg)) {
+ kvm_gpr_write(vcpu, insn_attr(info, reg1), value);
} else {
len = is_64_bit_mode(vcpu) ? 8 : 4;
if (get_vmx_mem_address(vcpu, exit_qualification,
- instr_info, true, len, &gva))
+ info, true, len, &gva))
return 1;
/* _system ok, nested_vmx_check_permission has verified cpl=0 */
r = kvm_write_guest_virt_system(vcpu, gva, &value, len, &e);
@@ -5762,7 +5762,7 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
struct vmcs12 *vmcs12 = is_guest_mode(vcpu) ? get_shadow_vmcs12(vcpu)
: get_vmcs12(vcpu);
unsigned long exit_qualification = vmx_get_exit_qual(vcpu);
- u32 instr_info = vmcs_read32(VMX_INSTRUCTION_INFO);
+ struct vmx_insn_info info = vmx_get_insn_info();
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct x86_exception e;
unsigned long field;
@@ -5791,19 +5791,19 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
get_vmcs12(vcpu)->vmcs_link_pointer == INVALID_GPA))
return nested_vmx_failInvalid(vcpu);
- if (instr_info & BIT(10))
- value = kvm_gpr_read(vcpu, (((instr_info) >> 3) & 0xf));
+ if (insn_attr(info, is_reg))
+ value = kvm_gpr_read(vcpu, insn_attr(info, reg1));
else {
len = is_64_bit_mode(vcpu) ? 8 : 4;
if (get_vmx_mem_address(vcpu, exit_qualification,
- instr_info, false, len, &gva))
+ info, false, len, &gva))
return 1;
r = kvm_read_guest_virt(vcpu, gva, &value, len, &e);
if (r != X86EMUL_CONTINUE)
return kvm_handle_memory_failure(vcpu, r, &e);
}
- field = kvm_gpr_read(vcpu, (((instr_info) >> 28) & 0xf));
+ field = kvm_gpr_read(vcpu, insn_attr(info, reg2));
offset = get_vmcs12_field_offset(field);
if (offset < 0)
@@ -5951,7 +5951,7 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu)
static int handle_vmptrst(struct kvm_vcpu *vcpu)
{
unsigned long exit_qual = vmx_get_exit_qual(vcpu);
- u32 instr_info = vmcs_read32(VMX_INSTRUCTION_INFO);
+ struct vmx_insn_info info = vmx_get_insn_info();
gpa_t current_vmptr = to_vmx(vcpu)->nested.current_vmptr;
struct x86_exception e;
gva_t gva;
@@ -5963,7 +5963,7 @@ static int handle_vmptrst(struct kvm_vcpu *vcpu)
if (unlikely(nested_vmx_is_evmptr12_valid(to_vmx(vcpu))))
return 1;
- if (get_vmx_mem_address(vcpu, exit_qual, instr_info,
+ if (get_vmx_mem_address(vcpu, exit_qual, info,
true, sizeof(gpa_t), &gva))
return 1;
/* *_system ok, nested_vmx_check_permission has verified cpl=0 */
@@ -5979,15 +5979,16 @@ static int handle_vmptrst(struct kvm_vcpu *vcpu)
static int handle_invept(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
- u32 vmx_instruction_info, types;
unsigned long type, roots_to_free;
+ struct vmx_insn_info info;
struct kvm_mmu *mmu;
gva_t gva;
struct x86_exception e;
struct {
u64 eptp, gpa;
} operand;
- int i, r, gpr_index;
+ u32 types;
+ int i, r;
if (!(vmx->nested.msrs.secondary_ctls_high &
SECONDARY_EXEC_ENABLE_EPT) ||
@@ -5999,9 +6000,8 @@ static int handle_invept(struct kvm_vcpu *vcpu)
if (!nested_vmx_check_permission(vcpu))
return 1;
- vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
- gpr_index = vmx_get_instr_info_reg2(vmx_instruction_info);
- type = kvm_gpr_read(vcpu, gpr_index);
+ info = vmx_get_insn_info();
+ type = kvm_gpr_read(vcpu, insn_attr(info, reg2));
types = (vmx->nested.msrs.ept_caps >> VMX_EPT_EXTENT_SHIFT) & 6;
@@ -6012,7 +6012,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
* operand is read even if it isn't needed (e.g., for type==global)
*/
if (get_vmx_mem_address(vcpu, vmx_get_exit_qual(vcpu),
- vmx_instruction_info, false, sizeof(operand), &gva))
+ info, false, sizeof(operand), &gva))
return 1;
r = kvm_read_guest_virt(vcpu, gva, &operand, sizeof(operand), &e);
if (r != X86EMUL_CONTINUE)
@@ -6059,7 +6059,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
static int handle_invvpid(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
- u32 vmx_instruction_info;
+ struct vmx_insn_info info;
unsigned long type, types;
gva_t gva;
struct x86_exception e;
@@ -6068,7 +6068,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
u64 gla;
} operand;
u16 vpid02;
- int r, gpr_index;
+ int r;
if (!(vmx->nested.msrs.secondary_ctls_high &
SECONDARY_EXEC_ENABLE_VPID) ||
@@ -6080,9 +6080,8 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
if (!nested_vmx_check_permission(vcpu))
return 1;
- vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
- gpr_index = vmx_get_instr_info_reg2(vmx_instruction_info);
- type = kvm_gpr_read(vcpu, gpr_index);
+ info = vmx_get_insn_info();
+ type = kvm_gpr_read(vcpu, insn_attr(info, reg2));
types = (vmx->nested.msrs.vpid_caps &
VMX_VPID_EXTENT_SUPPORTED_MASK) >> 8;
@@ -6095,7 +6094,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
* operand is read even if it isn't needed (e.g., for type==global)
*/
if (get_vmx_mem_address(vcpu, vmx_get_exit_qual(vcpu),
- vmx_instruction_info, false, sizeof(operand), &gva))
+ info, false, sizeof(operand), &gva))
return 1;
r = kvm_read_guest_virt(vcpu, gva, &operand, sizeof(operand), &e);
if (r != X86EMUL_CONTINUE)
@@ -6433,7 +6432,7 @@ static bool nested_vmx_exit_handled_encls(struct kvm_vcpu *vcpu,
static bool nested_vmx_exit_handled_vmcs_access(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12, gpa_t bitmap)
{
- u32 vmx_instruction_info;
+ struct vmx_insn_info info;
unsigned long field;
u8 b;
@@ -6441,8 +6440,8 @@ static bool nested_vmx_exit_handled_vmcs_access(struct kvm_vcpu *vcpu,
return true;
/* Decode instruction info and find the field to access */
- vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
- field = kvm_gpr_read(vcpu, (((vmx_instruction_info) >> 28) & 0xf));
+ info = vmx_get_insn_info();
+ field = kvm_gpr_read(vcpu, insn_attr(info, reg2));
/* Out-of-range fields always cause a VM exit from L2 to L1 */
if (field >> 15)
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index 983484d42ebf..e54f4e7b3664 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -50,7 +50,7 @@ void nested_sync_vmcs12_to_shadow(struct kvm_vcpu *vcpu);
int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
int vmx_get_vmx_msr(struct nested_vmx_msrs *msrs, u32 msr_index, u64 *pdata);
int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
- u32 vmx_instruction_info, bool wr, int len, gva_t *ret);
+ struct vmx_insn_info info, bool wr, int len, gva_t *ret);
void nested_mark_vmcs12_pages_dirty(struct kvm_vcpu *vcpu);
bool nested_vmx_check_io_bitmaps(struct kvm_vcpu *vcpu, unsigned int port,
int size);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4320f61aabc2..10479114fd1c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6054,29 +6054,27 @@ static int handle_monitor_trap(struct kvm_vcpu *vcpu)
static int handle_invpcid(struct kvm_vcpu *vcpu)
{
- u32 vmx_instruction_info;
+ struct vmx_insn_info info;
unsigned long type;
gva_t gva;
struct {
u64 pcid;
u64 gla;
} operand;
- int gpr_index;
if (!guest_cpu_cap_has(vcpu, X86_FEATURE_INVPCID)) {
kvm_queue_exception(vcpu, UD_VECTOR);
return 1;
}
- vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
- gpr_index = vmx_get_instr_info_reg2(vmx_instruction_info);
- type = kvm_gpr_read(vcpu, gpr_index);
+ info = vmx_get_insn_info();
+ type = kvm_gpr_read(vcpu, insn_attr(info, reg2));
/* According to the Intel instruction reference, the memory operand
* is read even if it isn't needed (e.g., for type==all)
*/
if (get_vmx_mem_address(vcpu, vmx_get_exit_qual(vcpu),
- vmx_instruction_info, false,
+ info, false,
sizeof(operand), &gva))
return 1;
@@ -6219,7 +6217,9 @@ static int handle_notify(struct kvm_vcpu *vcpu)
static int vmx_get_msr_imm_reg(struct kvm_vcpu *vcpu)
{
- return vmx_get_instr_info_reg(vmcs_read32(VMX_INSTRUCTION_INFO));
+ struct vmx_insn_info info = vmx_get_insn_info();
+
+ return insn_attr(info, reg1);
}
static int handle_rdmsr_imm(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 567320115a5a..2bb3ac8c5b8b 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -372,6 +372,19 @@ struct vmx_insn_info {
union insn_info info;
};
+static inline struct vmx_insn_info vmx_get_insn_info(void)
+{
+ struct vmx_insn_info insn;
+
+ insn.extended = false;
+ insn.info.word = vmcs_read32(VMX_INSTRUCTION_INFO);
+
+ return insn;
+}
+
+#define insn_attr(insn, attr) \
+ ((insn).extended ? (insn).info.ext.attr : (insn).info.base.attr)
+
static __always_inline struct vcpu_vt *to_vt(struct kvm_vcpu *vcpu)
{
return &(container_of(vcpu, struct vcpu_vmx, vcpu)->vt);
@@ -778,16 +791,6 @@ static inline bool vmx_guest_state_valid(struct kvm_vcpu *vcpu)
void dump_vmcs(struct kvm_vcpu *vcpu);
-static inline int vmx_get_instr_info_reg(u32 vmx_instr_info)
-{
- return (vmx_instr_info >> 3) & 0xf;
-}
-
-static inline int vmx_get_instr_info_reg2(u32 vmx_instr_info)
-{
- return (vmx_instr_info >> 28) & 0xf;
-}
-
static inline bool vmx_can_use_ipiv(struct kvm_vcpu *vcpu)
{
return lapic_in_kernel(vcpu) && enable_ipiv;
--
2.51.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 06/16] KVM: VMX: Refactor GPR index retrieval from exit qualification
2026-01-12 23:53 [PATCH v2 00/16] KVM: x86: Enable APX for guests Chang S. Bae
` (4 preceding siblings ...)
2026-01-12 23:53 ` [PATCH v2 05/16] KVM: VMX: Refactor instruction information retrieval Chang S. Bae
@ 2026-01-12 23:53 ` Chang S. Bae
2026-03-05 4:13 ` Sean Christopherson
2026-01-12 23:53 ` [PATCH v2 07/16] KVM: VMX: Support extended register index in exit handling Chang S. Bae
` (9 subsequent siblings)
15 siblings, 1 reply; 39+ messages in thread
From: Chang S. Bae @ 2026-01-12 23:53 UTC (permalink / raw)
To: pbonzini, seanjc; +Cc: kvm, linux-kernel, chao.gao, chang.seok.bae
Introduce a helper to extract the GPR index from the exit qualification
field.
VMX exit qualification, in addition to the VMX instruction info field,
encodes a GPR index. With the introduction of EGPRs, this field is
extended by a previously reserved bit position.
This refactoring centralizes the logic so that future updates can handle
the extended GPR index without code duplication.
Since the VMCS exit qualification is cached in VCPU state, it is safe
for the helper to access it directly via the VCPU pointer. This argument
will also be used later to determine EGPR availability.
No functional change intended.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kvm/vmx/nested.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/kvm/vmx/vmx.h | 5 +++++
3 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 144012dd9599..46c12b64e819 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6354,7 +6354,7 @@ static bool nested_vmx_exit_handled_cr(struct kvm_vcpu *vcpu,
switch ((exit_qualification >> 4) & 3) {
case 0: /* mov to cr */
- reg = (exit_qualification >> 8) & 15;
+ reg = vmx_get_exit_qual_gpr(vcpu);
val = kvm_gpr_read(vcpu, reg);
switch (cr) {
case 0:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 10479114fd1c..29d588c3b3b1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5588,7 +5588,7 @@ static int handle_cr(struct kvm_vcpu *vcpu)
exit_qualification = vmx_get_exit_qual(vcpu);
cr = exit_qualification & 15;
- reg = (exit_qualification >> 8) & 15;
+ reg = vmx_get_exit_qual_gpr(vcpu);
switch ((exit_qualification >> 4) & 3) {
case 0: /* mov to cr */
val = kvm_gpr_read(vcpu, reg);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 2bb3ac8c5b8b..8d3e0aff2e13 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -411,6 +411,11 @@ static __always_inline unsigned long vmx_get_exit_qual(struct kvm_vcpu *vcpu)
return vt->exit_qualification;
}
+static inline int vmx_get_exit_qual_gpr(struct kvm_vcpu *vcpu)
+{
+ return (vmx_get_exit_qual(vcpu) >> 8) & 0xf;
+}
+
static __always_inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu)
{
struct vcpu_vt *vt = to_vt(vcpu);
--
2.51.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 07/16] KVM: VMX: Support extended register index in exit handling
2026-01-12 23:53 [PATCH v2 00/16] KVM: x86: Enable APX for guests Chang S. Bae
` (5 preceding siblings ...)
2026-01-12 23:53 ` [PATCH v2 06/16] KVM: VMX: Refactor GPR index retrieval from exit qualification Chang S. Bae
@ 2026-01-12 23:53 ` Chang S. Bae
2026-01-12 23:54 ` [PATCH v2 08/16] KVM: nVMX: Propagate the extended instruction info field Chang S. Bae
` (8 subsequent siblings)
15 siblings, 0 replies; 39+ messages in thread
From: Chang S. Bae @ 2026-01-12 23:53 UTC (permalink / raw)
To: pbonzini, seanjc; +Cc: kvm, linux-kernel, chao.gao, chang.seok.bae
Define the VMCS field offset for the extended instruction information.
Then, support 5-bit register indices retrieval from VMCS fields when APX
feature is enumerated.
The presence of the extended instruction information field is indicated
by APX enumeration, regardless of the XCR0.APX bit setting.
With APX enumerated, the previously reserved bit in the exit qualification
can be referenced safely now. However, there is no guarantee that older
implementations always zeroed this bit.
Link: https://lore.kernel.org/7bb14722-c036-4835-8ed9-046b4e67909e@redhat.com
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
V1 -> V2:
* Switch the change order; putting this ahead of nVMX changes (Chao)
* Subsequently, define the field offset here.
---
arch/x86/include/asm/vmx.h | 2 ++
arch/x86/kvm/vmx/vmx.h | 23 ++++++++++++++++++++---
2 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index c85c50019523..6170251306db 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -264,6 +264,8 @@ enum vmcs_field {
PID_POINTER_TABLE_HIGH = 0x00002043,
GUEST_PHYSICAL_ADDRESS = 0x00002400,
GUEST_PHYSICAL_ADDRESS_HIGH = 0x00002401,
+ EXTENDED_INSTRUCTION_INFO = 0x00002406,
+ EXTENDED_INSTRUCTION_INFO_HIGH = 0x00002407,
VMCS_LINK_POINTER = 0x00002800,
VMCS_LINK_POINTER_HIGH = 0x00002801,
GUEST_IA32_DEBUGCTL = 0x00002802,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 8d3e0aff2e13..a24d87aa4f79 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -372,12 +372,26 @@ struct vmx_insn_info {
union insn_info info;
};
+/*
+ * The APX enumeration guarantees the presence of the extended fields.
+ * The host CPUID bit alone is sufficient to rely on it.
+ */
+static inline bool vmx_insn_info_extended(void)
+{
+ return static_cpu_has(X86_FEATURE_APX);
+}
+
static inline struct vmx_insn_info vmx_get_insn_info(void)
{
struct vmx_insn_info insn;
- insn.extended = false;
- insn.info.word = vmcs_read32(VMX_INSTRUCTION_INFO);
+ if (vmx_insn_info_extended()) {
+ insn.extended = true;
+ insn.info.dword = vmcs_read64(EXTENDED_INSTRUCTION_INFO);
+ } else {
+ insn.extended = false;
+ insn.info.word = vmcs_read32(VMX_INSTRUCTION_INFO);
+ }
return insn;
}
@@ -413,7 +427,10 @@ static __always_inline unsigned long vmx_get_exit_qual(struct kvm_vcpu *vcpu)
static inline int vmx_get_exit_qual_gpr(struct kvm_vcpu *vcpu)
{
- return (vmx_get_exit_qual(vcpu) >> 8) & 0xf;
+ if (vmx_insn_info_extended())
+ return (vmx_get_exit_qual(vcpu) >> 8) & 0x1f;
+ else
+ return (vmx_get_exit_qual(vcpu) >> 8) & 0xf;
}
static __always_inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu)
--
2.51.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 08/16] KVM: nVMX: Propagate the extended instruction info field
2026-01-12 23:53 [PATCH v2 00/16] KVM: x86: Enable APX for guests Chang S. Bae
` (6 preceding siblings ...)
2026-01-12 23:53 ` [PATCH v2 07/16] KVM: VMX: Support extended register index in exit handling Chang S. Bae
@ 2026-01-12 23:54 ` Chang S. Bae
2026-01-12 23:54 ` [PATCH v2 09/16] KVM: emulate: Support EGPR accessing and tracking Chang S. Bae
` (7 subsequent siblings)
15 siblings, 0 replies; 39+ messages in thread
From: Chang S. Bae @ 2026-01-12 23:54 UTC (permalink / raw)
To: pbonzini, seanjc; +Cc: kvm, linux-kernel, chao.gao, chang.seok.bae
Define a new extended_instruction_info field in struct vmcs12 and
propagate it to nested VMX.
Gate the propagation on the guest APX enumeration which aligns with VMX
behavior. Define the CPUID bit for that.
Link: https://lore.kernel.org/CABgObfa-vqWCenVvvTAoB773AQ+9a1OOT9n5hjqT=zZBDQbb+Q@mail.gmail.com
Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
V1 -> V2: Fix build error by defining APX CPUID bit here,
and refine the changelog
---
arch/x86/kvm/reverse_cpuid.h | 2 ++
arch/x86/kvm/vmx/nested.c | 6 ++++++
arch/x86/kvm/vmx/vmcs12.c | 1 +
arch/x86/kvm/vmx/vmcs12.h | 3 ++-
4 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
index 81b4a7acf72e..e538b5444919 100644
--- a/arch/x86/kvm/reverse_cpuid.h
+++ b/arch/x86/kvm/reverse_cpuid.h
@@ -35,6 +35,7 @@
#define X86_FEATURE_AVX_VNNI_INT16 KVM_X86_FEATURE(CPUID_7_1_EDX, 10)
#define X86_FEATURE_PREFETCHITI KVM_X86_FEATURE(CPUID_7_1_EDX, 14)
#define X86_FEATURE_AVX10 KVM_X86_FEATURE(CPUID_7_1_EDX, 19)
+#define KVM_X86_FEATURE_APX KVM_X86_FEATURE(CPUID_7_1_EDX, 21)
/* Intel-defined sub-features, CPUID level 0x00000007:2 (EDX) */
#define X86_FEATURE_INTEL_PSFD KVM_X86_FEATURE(CPUID_7_2_EDX, 0)
@@ -125,6 +126,7 @@ static __always_inline u32 __feature_translate(int x86_feature)
KVM_X86_TRANSLATE_FEATURE(SGX1);
KVM_X86_TRANSLATE_FEATURE(SGX2);
KVM_X86_TRANSLATE_FEATURE(SGX_EDECCSSA);
+ KVM_X86_TRANSLATE_FEATURE(APX);
KVM_X86_TRANSLATE_FEATURE(CONSTANT_TSC);
KVM_X86_TRANSLATE_FEATURE(PERFMON_V2);
KVM_X86_TRANSLATE_FEATURE(RRSBA_CTRL);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 46c12b64e819..da17e73d2414 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4747,6 +4747,12 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
vmcs12->vm_exit_intr_info = exit_intr_info;
vmcs12->vm_exit_instruction_len = exit_insn_len;
vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
+ /*
+ * The APX enumeration guarantees the presence of the extended
+ * fields. This CPUID bit alone is sufficient to rely on it.
+ */
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_APX))
+ vmcs12->extended_instruction_info = vmcs_read64(EXTENDED_INSTRUCTION_INFO);
/*
* According to spec, there's no need to store the guest's
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index 4233b5ca9461..ea2b690a419e 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -53,6 +53,7 @@ const unsigned short vmcs12_field_offsets[] = {
FIELD64(XSS_EXIT_BITMAP, xss_exit_bitmap),
FIELD64(ENCLS_EXITING_BITMAP, encls_exiting_bitmap),
FIELD64(GUEST_PHYSICAL_ADDRESS, guest_physical_address),
+ FIELD64(EXTENDED_INSTRUCTION_INFO, extended_instruction_info),
FIELD64(VMCS_LINK_POINTER, vmcs_link_pointer),
FIELD64(GUEST_IA32_DEBUGCTL, guest_ia32_debugctl),
FIELD64(GUEST_IA32_PAT, guest_ia32_pat),
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 4ad6b16525b9..2146e45aaade 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -71,7 +71,7 @@ struct __packed vmcs12 {
u64 pml_address;
u64 encls_exiting_bitmap;
u64 tsc_multiplier;
- u64 padding64[1]; /* room for future expansion */
+ u64 extended_instruction_info;
/*
* To allow migration of L1 (complete with its L2 guests) between
* machines of different natural widths (32 or 64 bit), we cannot have
@@ -261,6 +261,7 @@ static inline void vmx_check_vmcs12_offsets(void)
CHECK_OFFSET(pml_address, 312);
CHECK_OFFSET(encls_exiting_bitmap, 320);
CHECK_OFFSET(tsc_multiplier, 328);
+ CHECK_OFFSET(extended_instruction_info, 336);
CHECK_OFFSET(cr0_guest_host_mask, 344);
CHECK_OFFSET(cr4_guest_host_mask, 352);
CHECK_OFFSET(cr0_read_shadow, 360);
--
2.51.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 09/16] KVM: emulate: Support EGPR accessing and tracking
2026-01-12 23:53 [PATCH v2 00/16] KVM: x86: Enable APX for guests Chang S. Bae
` (7 preceding siblings ...)
2026-01-12 23:54 ` [PATCH v2 08/16] KVM: nVMX: Propagate the extended instruction info field Chang S. Bae
@ 2026-01-12 23:54 ` Chang S. Bae
2026-03-05 4:22 ` Sean Christopherson
2026-01-12 23:54 ` [PATCH v2 10/16] KVM: emulate: Handle EGPR index and REX2-incompatible opcodes Chang S. Bae
` (6 subsequent siblings)
15 siblings, 1 reply; 39+ messages in thread
From: Chang S. Bae @ 2026-01-12 23:54 UTC (permalink / raw)
To: pbonzini, seanjc; +Cc: kvm, linux-kernel, chao.gao, chang.seok.bae
Extend the emulator context and GPR accessors to handle EGPRs before
adding support for REX2-prefixed instructions.
Now the KVM GPR accessors can handle EGPRs. Then, the emulator can
uniformly cache and track all GPRs without requiring separate handling.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kvm/kvm_emulate.h | 10 +++++-----
arch/x86/kvm/x86.c | 4 ++--
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index fb3dab4b5a53..16b35a796a7f 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -105,13 +105,13 @@ struct x86_instruction_info {
struct x86_emulate_ops {
void (*vm_bugged)(struct x86_emulate_ctxt *ctxt);
/*
- * read_gpr: read a general purpose register (rax - r15)
+ * read_gpr: read a general purpose register (rax - r31)
*
* @reg: gpr number.
*/
ulong (*read_gpr)(struct x86_emulate_ctxt *ctxt, unsigned reg);
/*
- * write_gpr: write a general purpose register (rax - r15)
+ * write_gpr: write a general purpose register (rax - r31)
*
* @reg: gpr number.
* @val: value to write.
@@ -314,7 +314,7 @@ typedef void (*fastop_t)(struct fastop *);
* a ModRM or SIB byte.
*/
#ifdef CONFIG_X86_64
-#define NR_EMULATOR_GPRS 16
+#define NR_EMULATOR_GPRS 32
#else
#define NR_EMULATOR_GPRS 8
#endif
@@ -373,9 +373,9 @@ struct x86_emulate_ctxt {
u8 lock_prefix;
u8 rep_prefix;
/* bitmaps of registers in _regs[] that can be read */
- u16 regs_valid;
+ u32 regs_valid;
/* bitmaps of registers in _regs[] that have been written */
- u16 regs_dirty;
+ u32 regs_dirty;
/* modrm */
u8 modrm;
u8 modrm_mod;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index edac2ec11e2f..e7f858488f2c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8894,12 +8894,12 @@ static bool emulator_guest_cpuid_is_intel_compatible(struct x86_emulate_ctxt *ct
static ulong emulator_read_gpr(struct x86_emulate_ctxt *ctxt, unsigned reg)
{
- return kvm_register_read_raw(emul_to_vcpu(ctxt), reg);
+ return kvm_gpr_read_raw(emul_to_vcpu(ctxt), reg);
}
static void emulator_write_gpr(struct x86_emulate_ctxt *ctxt, unsigned reg, ulong val)
{
- kvm_register_write_raw(emul_to_vcpu(ctxt), reg, val);
+ kvm_gpr_write_raw(emul_to_vcpu(ctxt), reg, val);
}
static void emulator_set_nmi_mask(struct x86_emulate_ctxt *ctxt, bool masked)
--
2.51.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 10/16] KVM: emulate: Handle EGPR index and REX2-incompatible opcodes
2026-01-12 23:53 [PATCH v2 00/16] KVM: x86: Enable APX for guests Chang S. Bae
` (8 preceding siblings ...)
2026-01-12 23:54 ` [PATCH v2 09/16] KVM: emulate: Support EGPR accessing and tracking Chang S. Bae
@ 2026-01-12 23:54 ` Chang S. Bae
2026-01-12 23:54 ` [PATCH v2 11/16] KVM: emulate: Support REX2-prefixed opcode decode Chang S. Bae
` (5 subsequent siblings)
15 siblings, 0 replies; 39+ messages in thread
From: Chang S. Bae @ 2026-01-12 23:54 UTC (permalink / raw)
To: pbonzini, seanjc; +Cc: kvm, linux-kernel, chao.gao, chang.seok.bae
Prepare the emulator for REX2 handling by introducing the NoRex2 opcode
flag and supporting extended register indices.
Add a helper to factor out common logic for calculating register indices
from a given register identifier and REX bits alone.
REX2 does not support three-byte opcodes. Instead, the REX2.M bit selects
between one- and two-byte opcode tables, which were previously
distinguished by the 0x0F escape byte.
Some legacy instructions in those tables never reference extended
registers. When prefixed with REX, such instructions are treated as if
the prefix were absent. In contrast, a REX2 prefix causes a #UD, which
should be handled explicitly.
Link: https://lore.kernel.org/1ebf3a23-5671-41c1-8daa-c83f2f105936@redhat.com
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
V1 -> V2: Rename NoRex to NoRex2 (Paolo)
---
arch/x86/kvm/emulate.c | 80 +++++++++++++++++++++++---------------
arch/x86/kvm/kvm_emulate.h | 1 +
2 files changed, 50 insertions(+), 31 deletions(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c8e292e9a24d..ef0da1acab5a 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -175,6 +175,7 @@
#define TwoMemOp ((u64)1 << 55) /* Instruction has two memory operand */
#define IsBranch ((u64)1 << 56) /* Instruction is considered a branch. */
#define ShadowStack ((u64)1 << 57) /* Instruction affects Shadow Stacks. */
+#define NoRex2 ((u64)1 << 58) /* Instruction not present in REX2 maps */
#define DstXacc (DstAccLo | SrcAccHi | SrcWrite)
@@ -244,6 +245,7 @@ enum rex_bits {
REX_X = 2,
REX_R = 4,
REX_W = 8,
+ REX_M = 0x80,
};
static void writeback_registers(struct x86_emulate_ctxt *ctxt)
@@ -1078,6 +1080,15 @@ static int em_fnstsw(struct x86_emulate_ctxt *ctxt)
return X86EMUL_CONTINUE;
}
+static __always_inline int rex_get_rxb(u8 rex, u8 fld)
+{
+ BUILD_BUG_ON(!__builtin_constant_p(fld));
+ BUILD_BUG_ON(fld != REX_B && fld != REX_X && fld != REX_R);
+
+ rex >>= ffs(fld) - 1;
+ return (rex & 1 ? 8 : 0) + (rex & 0x10 ? 16 : 0);
+}
+
static void __decode_register_operand(struct x86_emulate_ctxt *ctxt,
struct operand *op, int reg)
{
@@ -1117,7 +1128,7 @@ static void decode_register_operand(struct x86_emulate_ctxt *ctxt,
if (ctxt->d & ModRM)
reg = ctxt->modrm_reg;
else
- reg = (ctxt->b & 7) | (ctxt->rex_bits & REX_B ? 8 : 0);
+ reg = (ctxt->b & 7) | rex_get_rxb(ctxt->rex_bits, REX_B);
__decode_register_operand(ctxt, op, reg);
}
@@ -1136,9 +1147,9 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
int rc = X86EMUL_CONTINUE;
ulong modrm_ea = 0;
- ctxt->modrm_reg = (ctxt->rex_bits & REX_R ? 8 : 0);
- index_reg = (ctxt->rex_bits & REX_X ? 8 : 0);
- base_reg = (ctxt->rex_bits & REX_B ? 8 : 0);
+ ctxt->modrm_reg = rex_get_rxb(ctxt->rex_bits, REX_R);
+ index_reg = rex_get_rxb(ctxt->rex_bits, REX_X);
+ base_reg = rex_get_rxb(ctxt->rex_bits, REX_B);
ctxt->modrm_mod = (ctxt->modrm & 0xc0) >> 6;
ctxt->modrm_reg |= (ctxt->modrm & 0x38) >> 3;
@@ -4245,7 +4256,7 @@ static const struct opcode opcode_table[256] = {
/* 0x38 - 0x3F */
I6ALU(NoWrite, em_cmp), N, N,
/* 0x40 - 0x4F */
- X8(I(DstReg, em_inc)), X8(I(DstReg, em_dec)),
+ X8(I(DstReg | NoRex2, em_inc)), X8(I(DstReg | NoRex2, em_dec)),
/* 0x50 - 0x57 */
X8(I(SrcReg | Stack, em_push)),
/* 0x58 - 0x5F */
@@ -4263,7 +4274,7 @@ static const struct opcode opcode_table[256] = {
I2bvIP(DstDI | SrcDX | Mov | String | Unaligned, em_in, ins, check_perm_in), /* insb, insw/insd */
I2bvIP(SrcSI | DstDX | String, em_out, outs, check_perm_out), /* outsb, outsw/outsd */
/* 0x70 - 0x7F */
- X16(D(SrcImmByte | NearBranch | IsBranch)),
+ X16(D(SrcImmByte | NearBranch | IsBranch | NoRex2)),
/* 0x80 - 0x87 */
G(ByteOp | DstMem | SrcImm, group1),
G(DstMem | SrcImm, group1),
@@ -4287,15 +4298,15 @@ static const struct opcode opcode_table[256] = {
II(ImplicitOps | Stack, em_popf, popf),
I(ImplicitOps, em_sahf), I(ImplicitOps, em_lahf),
/* 0xA0 - 0xA7 */
- I2bv(DstAcc | SrcMem | Mov | MemAbs, em_mov),
- I2bv(DstMem | SrcAcc | Mov | MemAbs | PageTable, em_mov),
- I2bv(SrcSI | DstDI | Mov | String | TwoMemOp, em_mov),
- I2bv(SrcSI | DstDI | String | NoWrite | TwoMemOp, em_cmp_r),
+ I2bv(DstAcc | SrcMem | Mov | MemAbs | NoRex2, em_mov),
+ I2bv(DstMem | SrcAcc | Mov | MemAbs | PageTable | NoRex2, em_mov),
+ I2bv(SrcSI | DstDI | Mov | String | TwoMemOp | NoRex2, em_mov),
+ I2bv(SrcSI | DstDI | String | NoWrite | TwoMemOp | NoRex2, em_cmp_r),
/* 0xA8 - 0xAF */
- I2bv(DstAcc | SrcImm | NoWrite, em_test),
- I2bv(SrcAcc | DstDI | Mov | String, em_mov),
- I2bv(SrcSI | DstAcc | Mov | String, em_mov),
- I2bv(SrcAcc | DstDI | String | NoWrite, em_cmp_r),
+ I2bv(DstAcc | SrcImm | NoWrite | NoRex2, em_test),
+ I2bv(SrcAcc | DstDI | Mov | String | NoRex2, em_mov),
+ I2bv(SrcSI | DstAcc | Mov | String | NoRex2, em_mov),
+ I2bv(SrcAcc | DstDI | String | NoWrite | NoRex2, em_cmp_r),
/* 0xB0 - 0xB7 */
X8(I(ByteOp | DstReg | SrcImm | Mov, em_mov)),
/* 0xB8 - 0xBF */
@@ -4325,17 +4336,17 @@ static const struct opcode opcode_table[256] = {
/* 0xD8 - 0xDF */
N, E(0, &escape_d9), N, E(0, &escape_db), N, E(0, &escape_dd), N, N,
/* 0xE0 - 0xE7 */
- X3(I(SrcImmByte | NearBranch | IsBranch, em_loop)),
- I(SrcImmByte | NearBranch | IsBranch, em_jcxz),
- I2bvIP(SrcImmUByte | DstAcc, em_in, in, check_perm_in),
- I2bvIP(SrcAcc | DstImmUByte, em_out, out, check_perm_out),
+ X3(I(SrcImmByte | NearBranch | IsBranch | NoRex2, em_loop)),
+ I(SrcImmByte | NearBranch | IsBranch | NoRex2, em_jcxz),
+ I2bvIP(SrcImmUByte | DstAcc | NoRex2, em_in, in, check_perm_in),
+ I2bvIP(SrcAcc | DstImmUByte | NoRex2, em_out, out, check_perm_out),
/* 0xE8 - 0xEF */
- I(SrcImm | NearBranch | IsBranch | ShadowStack, em_call),
- D(SrcImm | ImplicitOps | NearBranch | IsBranch),
- I(SrcImmFAddr | No64 | IsBranch, em_jmp_far),
- D(SrcImmByte | ImplicitOps | NearBranch | IsBranch),
- I2bvIP(SrcDX | DstAcc, em_in, in, check_perm_in),
- I2bvIP(SrcAcc | DstDX, em_out, out, check_perm_out),
+ I(SrcImm | NearBranch | IsBranch | ShadowStack | NoRex2, em_call),
+ D(SrcImm | ImplicitOps | NearBranch | IsBranch | NoRex2),
+ I(SrcImmFAddr | No64 | IsBranch | NoRex2, em_jmp_far),
+ D(SrcImmByte | ImplicitOps | NearBranch | IsBranch | NoRex2),
+ I2bvIP(SrcDX | DstAcc | NoRex2, em_in, in, check_perm_in),
+ I2bvIP(SrcAcc | DstDX | NoRex2, em_out, out, check_perm_out),
/* 0xF0 - 0xF7 */
N, DI(ImplicitOps, icebp), N, N,
DI(ImplicitOps | Priv, hlt), D(ImplicitOps),
@@ -4376,12 +4387,12 @@ static const struct opcode twobyte_table[256] = {
N, GP(ModRM | DstMem | SrcReg | Mov | Sse | Avx, &pfx_0f_2b),
N, N, N, N,
/* 0x30 - 0x3F */
- II(ImplicitOps | Priv, em_wrmsr, wrmsr),
- IIP(ImplicitOps, em_rdtsc, rdtsc, check_rdtsc),
- II(ImplicitOps | Priv, em_rdmsr, rdmsr),
- IIP(ImplicitOps, em_rdpmc, rdpmc, check_rdpmc),
- I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack, em_sysenter),
- I(ImplicitOps | Priv | EmulateOnUD | IsBranch | ShadowStack, em_sysexit),
+ II(ImplicitOps | Priv | NoRex2, em_wrmsr, wrmsr),
+ IIP(ImplicitOps | NoRex2, em_rdtsc, rdtsc, check_rdtsc),
+ II(ImplicitOps | Priv | NoRex2, em_rdmsr, rdmsr),
+ IIP(ImplicitOps | NoRex2, em_rdpmc, rdpmc, check_rdpmc),
+ I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack | NoRex2, em_sysenter),
+ I(ImplicitOps | Priv | EmulateOnUD | IsBranch | ShadowStack | NoRex2, em_sysexit),
N, N,
N, N, N, N, N, N, N, N,
/* 0x40 - 0x4F */
@@ -4399,7 +4410,7 @@ static const struct opcode twobyte_table[256] = {
N, N, N, N,
N, N, N, GP(SrcReg | DstMem | ModRM | Mov, &pfx_0f_6f_0f_7f),
/* 0x80 - 0x8F */
- X16(D(SrcImm | NearBranch | IsBranch)),
+ X16(D(SrcImm | NearBranch | IsBranch | NoRex2)),
/* 0x90 - 0x9F */
X16(D(ByteOp | DstMem | SrcNone | ModRM| Mov)),
/* 0xA0 - 0xA7 */
@@ -4992,6 +5003,13 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
opcode = opcode_table[ctxt->b];
}
+ /*
+ * Instructions marked with NoRex2 ignore a legacy REX prefix, but
+ * #UD should be raised when prefixed with REX2.
+ */
+ if (ctxt->d & NoRex2 && ctxt->rex_prefix == REX2_PREFIX)
+ opcode.flags = Undefined;
+
if (opcode.flags & ModRM)
ctxt->modrm = insn_fetch(u8, ctxt);
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index 16b35a796a7f..dd5d1e489db6 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -325,6 +325,7 @@ typedef void (*fastop_t)(struct fastop *);
enum rex_type {
REX_NONE,
REX_PREFIX,
+ REX2_PREFIX,
};
struct x86_emulate_ctxt {
--
2.51.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 11/16] KVM: emulate: Support REX2-prefixed opcode decode
2026-01-12 23:53 [PATCH v2 00/16] KVM: x86: Enable APX for guests Chang S. Bae
` (9 preceding siblings ...)
2026-01-12 23:54 ` [PATCH v2 10/16] KVM: emulate: Handle EGPR index and REX2-incompatible opcodes Chang S. Bae
@ 2026-01-12 23:54 ` Chang S. Bae
2026-01-12 23:54 ` [PATCH v2 12/16] KVM: emulate: Reject EVEX-prefixed instructions Chang S. Bae
` (4 subsequent siblings)
15 siblings, 0 replies; 39+ messages in thread
From: Chang S. Bae @ 2026-01-12 23:54 UTC (permalink / raw)
To: pbonzini, seanjc; +Cc: kvm, linux-kernel, chao.gao, chang.seok.bae
Extend the instruction decoder to recognize and handle the REX2 prefix,
including validation of prefix sequences and correct opcode table
selection.
REX2 is a terminal prefix: once 0xD5 is encountered, the following byte
is the opcode. When REX.M=0, most prefix bytes are invalid after REX2,
including REX, VEX, EVEX, and another REX2. Also, REX2-prefixed
instructions are only valid in 64-bit mode.
All of the invalid prefix combinations after REX2 coincide with opcodes
that are architecturally invalid in 64-bit mode. Thus, marking such
opcodes with No64 in opcode_table[] naturally disallows those illegal
prefix sequences.
The 0x40–0x4F opcode row was missing the No64 flag. While NoRex2 already
invalidates REX2 for these opcodes, adding No64 makes opcode attributes
explicit and complete.
Link: https://lore.kernel.org/CABgObfYYGTvkYpeyqLSr9JgKMDA_STSff2hXBNchLZuKFU+MMA@mail.gmail.com
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kvm/emulate.c | 38 ++++++++++++++++++++++++++++++++++----
1 file changed, 34 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index ef0da1acab5a..1a565a4e3ff7 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4256,7 +4256,7 @@ static const struct opcode opcode_table[256] = {
/* 0x38 - 0x3F */
I6ALU(NoWrite, em_cmp), N, N,
/* 0x40 - 0x4F */
- X8(I(DstReg | NoRex2, em_inc)), X8(I(DstReg | NoRex2, em_dec)),
+ X8(I(DstReg | NoRex2 | No64, em_inc)), X8(I(DstReg | NoRex2 | No64, em_dec)),
/* 0x50 - 0x57 */
X8(I(SrcReg | Stack, em_push)),
/* 0x58 - 0x5F */
@@ -4850,6 +4850,17 @@ static int x86_decode_avx(struct x86_emulate_ctxt *ctxt,
return rc;
}
+static inline bool rex2_invalid(struct x86_emulate_ctxt *ctxt)
+{
+ const struct x86_emulate_ops *ops = ctxt->ops;
+ u64 xcr = 0;
+
+ return ctxt->rex_prefix == REX_PREFIX ||
+ !(ops->get_cr(ctxt, 4) & X86_CR4_OSXSAVE) ||
+ ops->get_xcr(ctxt, 0, &xcr) ||
+ !(xcr & XFEATURE_MASK_APX);
+}
+
int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int emulation_type)
{
int rc = X86EMUL_CONTINUE;
@@ -4903,7 +4914,7 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
ctxt->op_bytes = def_op_bytes;
ctxt->ad_bytes = def_ad_bytes;
- /* Legacy prefixes. */
+ /* Legacy and REX/REX2 prefixes. */
for (;;) {
switch (ctxt->b = insn_fetch(u8, ctxt)) {
case 0x66: /* operand-size override */
@@ -4949,6 +4960,17 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
ctxt->rex_prefix = REX_PREFIX;
ctxt->rex_bits = ctxt->b & 0xf;
continue;
+ case 0xd5: /* REX2 */
+ if (mode != X86EMUL_MODE_PROT64)
+ goto done_prefixes;
+ if (rex2_invalid(ctxt)) {
+ opcode = ud;
+ goto done_modrm;
+ }
+ ctxt->rex_prefix = REX2_PREFIX;
+ ctxt->rex_bits = insn_fetch(u8, ctxt);
+ ctxt->b = insn_fetch(u8, ctxt);
+ goto done_prefixes;
case 0xf0: /* LOCK */
ctxt->lock_prefix = 1;
break;
@@ -4971,6 +4993,12 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
if (ctxt->rex_bits & REX_W)
ctxt->op_bytes = 8;
+ /* REX2 opcode is one byte unless M-bit selects the two-byte map */
+ if (ctxt->rex_bits & REX_M)
+ goto decode_twobytes;
+ else if (ctxt->rex_prefix == REX2_PREFIX)
+ goto decode_onebyte;
+
/* Opcode byte(s). */
if (ctxt->b == 0xc4 || ctxt->b == 0xc5) {
/* VEX or LDS/LES */
@@ -4988,17 +5016,19 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
goto done;
} else if (ctxt->b == 0x0f) {
/* Two- or three-byte opcode */
- ctxt->opcode_len = 2;
ctxt->b = insn_fetch(u8, ctxt);
+decode_twobytes:
+ ctxt->opcode_len = 2;
opcode = twobyte_table[ctxt->b];
/* 0F_38 opcode map */
- if (ctxt->b == 0x38) {
+ if (ctxt->b == 0x38 && ctxt->rex_prefix != REX2_PREFIX) {
ctxt->opcode_len = 3;
ctxt->b = insn_fetch(u8, ctxt);
opcode = opcode_map_0f_38[ctxt->b];
}
} else {
+decode_onebyte:
/* Opcode byte(s). */
opcode = opcode_table[ctxt->b];
}
--
2.51.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 12/16] KVM: emulate: Reject EVEX-prefixed instructions
2026-01-12 23:53 [PATCH v2 00/16] KVM: x86: Enable APX for guests Chang S. Bae
` (10 preceding siblings ...)
2026-01-12 23:54 ` [PATCH v2 11/16] KVM: emulate: Support REX2-prefixed opcode decode Chang S. Bae
@ 2026-01-12 23:54 ` Chang S. Bae
2026-01-12 23:54 ` [PATCH v2 13/16] KVM: x86: Guard valid XCR0.APX settings Chang S. Bae
` (3 subsequent siblings)
15 siblings, 0 replies; 39+ messages in thread
From: Chang S. Bae @ 2026-01-12 23:54 UTC (permalink / raw)
To: pbonzini, seanjc; +Cc: kvm, linux-kernel, chao.gao, chang.seok.bae
Explicitly mark EVEX-prefixed opcodes (0x62) as unsupported.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kvm/emulate.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1a565a4e3ff7..c5cb356f1524 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -5040,6 +5040,11 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
if (ctxt->d & NoRex2 && ctxt->rex_prefix == REX2_PREFIX)
opcode.flags = Undefined;
+ /* EVEX-prefixed instructions are not implemented */
+ if (ctxt->opcode_len == 1 && ctxt->b == 0x62 &&
+ (mode == X86EMUL_MODE_PROT64 || (ctxt->modrm & 0xc0) == 0xc0))
+ opcode.flags = NotImpl;
+
if (opcode.flags & ModRM)
ctxt->modrm = insn_fetch(u8, ctxt);
--
2.51.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 13/16] KVM: x86: Guard valid XCR0.APX settings
2026-01-12 23:53 [PATCH v2 00/16] KVM: x86: Enable APX for guests Chang S. Bae
` (11 preceding siblings ...)
2026-01-12 23:54 ` [PATCH v2 12/16] KVM: emulate: Reject EVEX-prefixed instructions Chang S. Bae
@ 2026-01-12 23:54 ` Chang S. Bae
2026-01-12 23:54 ` [PATCH v2 14/16] KVM: x86: Expose APX foundational feature bit to guests Chang S. Bae
` (2 subsequent siblings)
15 siblings, 0 replies; 39+ messages in thread
From: Chang S. Bae @ 2026-01-12 23:54 UTC (permalink / raw)
To: pbonzini, seanjc; +Cc: kvm, linux-kernel, chao.gao, chang.seok.bae
Prevent invalid XCR0.APX configurations in two cases: conflict with MPX
and lack of SVM support.
In the non-compacted XSAVE format, APX and MPX conflict on the same
offset. Although MPX is being deprecated in practice, KVM should
explicitly reject such configurations that set both bits.
At this point, only VMX supports EGPRs. SVM will require corresponding
extensions to handle EGPR indices.
The addition to the supported XCR0 mask should accompany guest CPUID
exposure, which will be done separately.
Link: https://lore.kernel.org/ab3f4937-38f5-4354-8850-bf773c159bbe@redhat.com
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kvm/svm/svm.c | 7 ++++++-
arch/x86/kvm/x86.c | 4 ++++
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 209faa742e98..a06e5a24b808 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5301,8 +5301,13 @@ static __init int svm_hardware_setup(void)
}
kvm_enable_efer_bits(EFER_NX);
+ /*
+ * APX introduces EGPRs, which require additional VMCB support.
+ * Disable APX until the necessary extensions are handled.
+ */
kvm_caps.supported_xcr0 &= ~(XFEATURE_MASK_BNDREGS |
- XFEATURE_MASK_BNDCSR);
+ XFEATURE_MASK_BNDCSR |
+ XFEATURE_MASK_APX);
if (boot_cpu_has(X86_FEATURE_FXSR_OPT))
kvm_enable_efer_bits(EFER_FFXSR);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e7f858488f2c..189a03483d03 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1391,6 +1391,10 @@ int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
(!(xcr0 & XFEATURE_MASK_BNDCSR)))
return 1;
+ /* MPX and APX conflict in the non-compacted XSAVE format */
+ if (xcr0 & XFEATURE_MASK_BNDREGS && xcr0 & XFEATURE_MASK_APX)
+ return 1;
+
if (xcr0 & XFEATURE_MASK_AVX512) {
if (!(xcr0 & XFEATURE_MASK_YMM))
return 1;
--
2.51.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 14/16] KVM: x86: Expose APX foundational feature bit to guests
2026-01-12 23:53 [PATCH v2 00/16] KVM: x86: Enable APX for guests Chang S. Bae
` (12 preceding siblings ...)
2026-01-12 23:54 ` [PATCH v2 13/16] KVM: x86: Guard valid XCR0.APX settings Chang S. Bae
@ 2026-01-12 23:54 ` Chang S. Bae
2026-01-19 5:55 ` Xiaoyao Li
2026-01-12 23:54 ` [PATCH v2 15/16] KVM: x86: Expose APX sub-features " Chang S. Bae
2026-01-12 23:54 ` [PATCH v2 16/16] KVM: x86: selftests: Add APX state handling and XCR0 sanity checks Chang S. Bae
15 siblings, 1 reply; 39+ messages in thread
From: Chang S. Bae @ 2026-01-12 23:54 UTC (permalink / raw)
To: pbonzini, seanjc; +Cc: kvm, linux-kernel, chao.gao, chang.seok.bae, Peter Fang
From: Peter Fang <peter.fang@intel.com>
Add the APX xfeature bit to the list of supported XCR0 components and
expose the APX feature to guests. Update the maximum supported CPUID leaf
to 0x29 to include the APX leaf. On SVM systems, ensure that the feature
is not advertised as EGPR support is not yet supported.
No APX sub-features are enumerated yet. Those will be exposed in a
separate patch.
Signed-off-by: Peter Fang <peter.fang@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
V1 -> V2: Exclude the APX CPUID definition as patch8 includes it now
---
arch/x86/kvm/cpuid.c | 8 +++++++-
arch/x86/kvm/svm/svm.c | 8 ++++++++
arch/x86/kvm/x86.c | 3 ++-
3 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 88a5426674a1..5431e31a4851 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1038,6 +1038,7 @@ void kvm_set_cpu_caps(void)
F(AVX_VNNI_INT16),
F(PREFETCHITI),
F(AVX10),
+ SCATTERED_F(APX),
);
kvm_cpu_cap_init(CPUID_7_2_EDX,
@@ -1401,7 +1402,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
switch (function) {
case 0:
/* Limited to the highest leaf implemented in KVM. */
- entry->eax = min(entry->eax, 0x24U);
+ entry->eax = min(entry->eax, 0x29U);
break;
case 1:
cpuid_entry_override(entry, CPUID_1_EDX);
@@ -1646,6 +1647,11 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->edx = 0;
break;
}
+ case 0x29: {
+ /* No APX sub-features are supported yet */
+ entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
+ break;
+ }
case KVM_CPUID_SIGNATURE: {
const u32 *sigptr = (const u32 *)KVM_SIGNATURE;
entry->eax = KVM_CPUID_FEATURES;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a06e5a24b808..9c76ea7a4231 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5284,6 +5284,14 @@ static __init void svm_set_cpu_caps(void)
*/
kvm_cpu_cap_clear(X86_FEATURE_BUS_LOCK_DETECT);
kvm_cpu_cap_clear(X86_FEATURE_MSR_IMM);
+
+ /*
+ * If the APX xfeature bit is not supported, meaning that VMCB
+ * support for EGPRs is unavailable, then the APX feature should
+ * not be exposed to the guest.
+ */
+ if (!(kvm_caps.supported_xcr0 & XFEATURE_MASK_APX))
+ kvm_cpu_cap_clear(X86_FEATURE_APX);
}
static __init int svm_hardware_setup(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 189a03483d03..67b3312ab737 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -214,7 +214,8 @@ static DEFINE_PER_CPU(struct kvm_user_return_msrs, user_return_msrs);
#define KVM_SUPPORTED_XCR0 (XFEATURE_MASK_FP | XFEATURE_MASK_SSE \
| XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
- | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
+ | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE \
+ | XFEATURE_MASK_APX)
#define XFEATURE_MASK_CET_ALL (XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL)
/*
--
2.51.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 15/16] KVM: x86: Expose APX sub-features to guests
2026-01-12 23:53 [PATCH v2 00/16] KVM: x86: Enable APX for guests Chang S. Bae
` (13 preceding siblings ...)
2026-01-12 23:54 ` [PATCH v2 14/16] KVM: x86: Expose APX foundational feature bit to guests Chang S. Bae
@ 2026-01-12 23:54 ` Chang S. Bae
2026-01-12 23:54 ` [PATCH v2 16/16] KVM: x86: selftests: Add APX state handling and XCR0 sanity checks Chang S. Bae
15 siblings, 0 replies; 39+ messages in thread
From: Chang S. Bae @ 2026-01-12 23:54 UTC (permalink / raw)
To: pbonzini, seanjc; +Cc: kvm, linux-kernel, chao.gao, chang.seok.bae
Add CPUID leaf 0x29 sub-leaf 0 to enumerate APX sub-features to guests.
This leaf currently defines the following sub-features:
* New Conditional Instructions (NCI)
* New Data Destination (NDD)
* Flags Suppression (NF)
The CPUID leaf is only exposed if the APX feature is enabled.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/cpuid.c | 10 ++++++++--
arch/x86/kvm/reverse_cpuid.h | 4 ++++
3 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9dedb8d77222..d75a76152340 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -794,6 +794,7 @@ enum kvm_only_cpuid_leafs {
CPUID_24_0_EBX,
CPUID_8000_0021_ECX,
CPUID_7_1_ECX,
+ CPUID_29_0_EBX,
NR_KVM_CPU_CAPS,
NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 5431e31a4851..347b8f2402c7 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1070,6 +1070,10 @@ void kvm_set_cpu_caps(void)
F(AVX10_512),
);
+ kvm_cpu_cap_init(CPUID_29_0_EBX,
+ F(APX_NCI_NDD_NF),
+ );
+
kvm_cpu_cap_init(CPUID_8000_0001_ECX,
F(LAHF_LM),
F(CMP_LEGACY),
@@ -1648,8 +1652,10 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
break;
}
case 0x29: {
- /* No APX sub-features are supported yet */
- entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
+ if (!(kvm_caps.supported_xcr0 & XFEATURE_MASK_APX)) {
+ entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
+ break;
+ }
break;
}
case KVM_CPUID_SIGNATURE: {
diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
index e538b5444919..f8587586d031 100644
--- a/arch/x86/kvm/reverse_cpuid.h
+++ b/arch/x86/kvm/reverse_cpuid.h
@@ -50,6 +50,9 @@
#define X86_FEATURE_AVX10_256 KVM_X86_FEATURE(CPUID_24_0_EBX, 17)
#define X86_FEATURE_AVX10_512 KVM_X86_FEATURE(CPUID_24_0_EBX, 18)
+/* Intel-defined sub-features, CPUID level 0x00000029:0 (EBX) */
+#define X86_FEATURE_APX_NCI_NDD_NF KVM_X86_FEATURE(CPUID_29_0_EBX, 0)
+
/* CPUID level 0x80000007 (EDX). */
#define KVM_X86_FEATURE_CONSTANT_TSC KVM_X86_FEATURE(CPUID_8000_0007_EDX, 8)
@@ -91,6 +94,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
[CPUID_24_0_EBX] = { 0x24, 0, CPUID_EBX},
[CPUID_8000_0021_ECX] = {0x80000021, 0, CPUID_ECX},
[CPUID_7_1_ECX] = { 7, 1, CPUID_ECX},
+ [CPUID_29_0_EBX] = { 0x29, 0, CPUID_EBX},
};
/*
--
2.51.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 16/16] KVM: x86: selftests: Add APX state handling and XCR0 sanity checks
2026-01-12 23:53 [PATCH v2 00/16] KVM: x86: Enable APX for guests Chang S. Bae
` (14 preceding siblings ...)
2026-01-12 23:54 ` [PATCH v2 15/16] KVM: x86: Expose APX sub-features " Chang S. Bae
@ 2026-01-12 23:54 ` Chang S. Bae
2026-03-05 4:28 ` Sean Christopherson
15 siblings, 1 reply; 39+ messages in thread
From: Chang S. Bae @ 2026-01-12 23:54 UTC (permalink / raw)
To: pbonzini, seanjc; +Cc: kvm, linux-kernel, chao.gao, chang.seok.bae
Now that KVM exposes the APX feature to guests on APX-capable systems,
extend the selftests to validate XCR0 configuration and state management.
Since APX repurposes the XSAVE area previously used by MPX in the
non-compacted format, add a check to ensure that MPX states are not set
when APX is enabled.
Also, load non-init APX state data in the guest so that XSTATE_BV[APX] is
set, allowing validation of APX state testing.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
.../selftests/kvm/include/x86/processor.h | 1 +
tools/testing/selftests/kvm/x86/state_test.c | 6 ++++++
.../selftests/kvm/x86/xcr0_cpuid_test.c | 19 +++++++++++++++++++
3 files changed, 26 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 57d62a425109..6a1da26780ea 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -88,6 +88,7 @@ struct xstate {
#define XFEATURE_MASK_LBR BIT_ULL(15)
#define XFEATURE_MASK_XTILE_CFG BIT_ULL(17)
#define XFEATURE_MASK_XTILE_DATA BIT_ULL(18)
+#define XFEATURE_MASK_APX BIT_ULL(19)
#define XFEATURE_MASK_AVX512 (XFEATURE_MASK_OPMASK | \
XFEATURE_MASK_ZMM_Hi256 | \
diff --git a/tools/testing/selftests/kvm/x86/state_test.c b/tools/testing/selftests/kvm/x86/state_test.c
index f2c7a1c297e3..2b7aa4cca011 100644
--- a/tools/testing/selftests/kvm/x86/state_test.c
+++ b/tools/testing/selftests/kvm/x86/state_test.c
@@ -167,6 +167,12 @@ static void __attribute__((__flatten__)) guest_code(void *arg)
asm volatile ("vmovupd %0, %%zmm16" :: "m" (buffer));
}
+ if (supported_xcr0 & XFEATURE_MASK_APX) {
+ /* mov $0xcccccccc, %r16 */
+ asm volatile (".byte 0xd5, 0x18, 0xb8, 0xcc, 0xcc,"
+ "0xcc, 0xcc, 0x00, 0x00, 0x00, 0x00");
+ }
+
if (this_cpu_has(X86_FEATURE_MPX)) {
uint64_t bounds[2] = { 10, 0xffffffffull };
uint64_t output[2] = { };
diff --git a/tools/testing/selftests/kvm/x86/xcr0_cpuid_test.c b/tools/testing/selftests/kvm/x86/xcr0_cpuid_test.c
index d038c1571729..e3d3af5ab6f2 100644
--- a/tools/testing/selftests/kvm/x86/xcr0_cpuid_test.c
+++ b/tools/testing/selftests/kvm/x86/xcr0_cpuid_test.c
@@ -46,6 +46,20 @@ do { \
__supported, (xfeatures)); \
} while (0)
+/*
+ * Verify that mutually exclusive architectural features do not overlap.
+ * For example, APX and MPX must never be reported as supported together.
+ */
+#define ASSERT_XFEATURE_CONFLICT(supported_xcr0, xfeatures, conflicts) \
+do { \
+ uint64_t __supported = (supported_xcr0) & ((xfeatures) | (conflicts)); \
+ \
+ __GUEST_ASSERT((__supported & (xfeatures)) != (xfeatures) || \
+ !(__supported & (conflicts)), \
+ "supported = 0x%lx, xfeatures = 0x%llx, conflicts = 0x%llx", \
+ __supported, (xfeatures), (conflicts)); \
+} while (0)
+
static void guest_code(void)
{
uint64_t initial_xcr0;
@@ -79,6 +93,11 @@ static void guest_code(void)
ASSERT_ALL_OR_NONE_XFEATURE(supported_xcr0,
XFEATURE_MASK_XTILE);
+ /* Check APX by ensuring MPX is not exposed concurrently */
+ ASSERT_XFEATURE_CONFLICT(supported_xcr0,
+ XFEATURE_MASK_APX,
+ XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
+
vector = xsetbv_safe(0, XFEATURE_MASK_FP);
__GUEST_ASSERT(!vector,
"Expected success on XSETBV(FP), got %s",
--
2.51.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 14/16] KVM: x86: Expose APX foundational feature bit to guests
2026-01-12 23:54 ` [PATCH v2 14/16] KVM: x86: Expose APX foundational feature bit to guests Chang S. Bae
@ 2026-01-19 5:55 ` Xiaoyao Li
2026-01-20 18:07 ` Edgecombe, Rick P
0 siblings, 1 reply; 39+ messages in thread
From: Xiaoyao Li @ 2026-01-19 5:55 UTC (permalink / raw)
To: Chang S. Bae, pbonzini, seanjc, Edgecombe, Rick P
Cc: kvm, linux-kernel, chao.gao, Peter Fang, Binbin Wu
+ Rick and Binbin
On 1/13/2026 7:54 AM, Chang S. Bae wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 189a03483d03..67b3312ab737 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -214,7 +214,8 @@ static DEFINE_PER_CPU(struct kvm_user_return_msrs, user_return_msrs);
> #define KVM_SUPPORTED_XCR0 (XFEATURE_MASK_FP | XFEATURE_MASK_SSE \
> | XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
> | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
> - | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
> + | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE \
> + | XFEATURE_MASK_APX)
>
Not any intention of this patch, but this change eventually allows the
userspace to expose APX to TDX guests.
Without any mentioning of TDX APX tests and validation like the one for
CET[1], I think it's unsafe to allow it for TDX guests. E.g., the worst
case would be KVM might need extra handling to keep host's
states/functionalities correct once TD guest is able to manage APX.
I'm thinking maybe we need introduce a supported mask,
KVM_SUPPORTED_TD_XFAM, like KVM_SUPPORTED_TD_ATTRS. So that any new XFAM
related feature for TD needs the explicit enabling in KVM, and people
work on the new XSAVE related feature enabling for normal VMs don't need
to worry about the potential TDX impact.
[1]
https://lore.kernel.org/all/ecaaef65cf1cd90eb8f83e6a53d9689c8b0b9a22.camel@intel.com/
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 14/16] KVM: x86: Expose APX foundational feature bit to guests
2026-01-19 5:55 ` Xiaoyao Li
@ 2026-01-20 18:07 ` Edgecombe, Rick P
2026-01-20 20:50 ` Chang S. Bae
0 siblings, 1 reply; 39+ messages in thread
From: Edgecombe, Rick P @ 2026-01-20 18:07 UTC (permalink / raw)
To: Li, Xiaoyao, Bae, Chang Seok, pbonzini@redhat.com,
seanjc@google.com
Cc: binbin.wu@linux.intel.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, Gao, Chao, Fang, Peter
On Mon, 2026-01-19 at 13:55 +0800, Xiaoyao Li wrote:
> + Rick and Binbin
>
> On 1/13/2026 7:54 AM, Chang S. Bae wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 189a03483d03..67b3312ab737 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -214,7 +214,8 @@ static DEFINE_PER_CPU(struct kvm_user_return_msrs, user_return_msrs);
> > #define KVM_SUPPORTED_XCR0 (XFEATURE_MASK_FP | XFEATURE_MASK_SSE \
> > | XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
> > | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
> > - | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
> > + | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE \
> > + | XFEATURE_MASK_APX)
> >
>
> Not any intention of this patch, but this change eventually allows the
> userspace to expose APX to TDX guests.
>
> Without any mentioning of TDX APX tests and validation like the one for
> CET[1], I think it's unsafe to allow it for TDX guests.
>
That was an especially odd one.
> E.g., the worst
> case would be KVM might need extra handling to keep host's
> states/functionalities correct once TD guest is able to manage APX.
>
> I'm thinking maybe we need introduce a supported mask,
> KVM_SUPPORTED_TD_XFAM, like KVM_SUPPORTED_TD_ATTRS. So that any new XFAM
> related feature for TD needs the explicit enabling in KVM, and people
> work on the new XSAVE related feature enabling for normal VMs don't need
> to worry about the potential TDX impact.
We might need it. But in general, I agree KVM enabling for new features needs to
consider the TDX impact now. For APX, it looks like we don't need to add a new
type of supported feature tracking because the TDX APX arch is public.
Chang, let's circle back internally and figure out who owns what.
>
> [1]
> https://lore.kernel.org/all/ecaaef65cf1cd90eb8f83e6a53d9689c8b0b9a22.camel@intel.com/
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 14/16] KVM: x86: Expose APX foundational feature bit to guests
2026-01-20 18:07 ` Edgecombe, Rick P
@ 2026-01-20 20:50 ` Chang S. Bae
2026-01-21 19:59 ` Edgecombe, Rick P
0 siblings, 1 reply; 39+ messages in thread
From: Chang S. Bae @ 2026-01-20 20:50 UTC (permalink / raw)
To: Edgecombe, Rick P, Li, Xiaoyao, pbonzini@redhat.com,
seanjc@google.com
Cc: binbin.wu@linux.intel.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, Gao, Chao, Fang, Peter
On 1/20/2026 10:07 AM, Edgecombe, Rick P wrote:
> On Mon, 2026-01-19 at 13:55 +0800, Xiaoyao Li wrote:
>>
>> Not any intention of this patch, but this change eventually allows the
>> userspace to expose APX to TDX guests.
>>
>> Without any mentioning of TDX APX tests and validation like the one for
>> CET[1], I think it's unsafe to allow it for TDX guests.
My original assumption was like what I just mentioned in the RFC cover:
The specification deliberately scopes out some areas. For example,
Sections 3.1.4.4.2–7 note that initialization and reset behaviors
follow existing XSTATE conventions.
With there in 3.1.4.4.2 Intel® TDX,
Intel® TDX has an XCR0-derived interface called TDCS.XFAM. Bits in
XFAM act as an opt-in for state and ISA controls. Therefore,
XFAM[APX_F] acts as a control for enabling Intel® APX within Trust
Domains (or TDs), and the XFAM settings are established at TD INIT
(TDH.TD.INIT).
Conceptually, APX enablement for TDX could be explicitly gated, which
helps to narrow the scope of the KVM changes (perhaps, at least for the
early review).
*However*, once the APX bit is set in supported_xcr0, it can flow into
XFAM through the code path as:
tdx_get_supported_xfam(...)
{
u64 val = kvm_caps.supported_xcr0 | kvm_caps.supported_xss;
if ((val & td_conf->xfam_fixed1) != td_conf->xfam_fixed1)
return 0;
val &= td_conf->xfam_fixed0;
return val;
}
So I agree that, in the current codebase, whoever updates the KVM-side
bitmask should ensure that TDX guests are okay with it. I also now
understand the idea that TDX guests are yet another guest type, which is
under the impact of whatever kvm_cap changes.
>>
>
> That was an especially odd one.
>
>> E.g., the worst
>> case would be KVM might need extra handling to keep host's
>> states/functionalities correct once TD guest is able to manage APX.
>>
>> I'm thinking maybe we need introduce a supported mask,
>> KVM_SUPPORTED_TD_XFAM, like KVM_SUPPORTED_TD_ATTRS. So that any new XFAM
>> related feature for TD needs the explicit enabling in KVM, and people
>> work on the new XSAVE related feature enabling for normal VMs don't need
>> to worry about the potential TDX impact.
>
> We might need it. But in general, I agree KVM enabling for new features needs to
> consider the TDX impact now. For APX, it looks like we don't need to add a new
> type of supported feature tracking because the TDX APX arch is public.
>
> Chang, let's circle back internally and figure out who owns what.
I'd come back here with positive TDX test results once available. For
now, I would leave additional guarding or geting outside of this
enabling scope.
Thanks,
Chang
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 14/16] KVM: x86: Expose APX foundational feature bit to guests
2026-01-20 20:50 ` Chang S. Bae
@ 2026-01-21 19:59 ` Edgecombe, Rick P
0 siblings, 0 replies; 39+ messages in thread
From: Edgecombe, Rick P @ 2026-01-21 19:59 UTC (permalink / raw)
To: Li, Xiaoyao, Bae, Chang Seok, pbonzini@redhat.com,
seanjc@google.com
Cc: Fang, Peter, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
binbin.wu@linux.intel.com, Gao, Chao
On Tue, 2026-01-20 at 12:50 -0800, Chang S. Bae wrote:
> > Chang, let's circle back internally and figure out who owns what.
>
> I'd come back here with positive TDX test results once available. For
> now, I would leave additional guarding or geting outside of this
> enabling scope.
After some discussion, I think this will be addressed with future TDX module
opt-in changes. So we can skip the TDX testing for this series. Thanks.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 01/16] KVM: x86: Rename register accessors to be GPR-specific
2026-01-12 23:53 ` [PATCH v2 01/16] KVM: x86: Rename register accessors to be GPR-specific Chang S. Bae
@ 2026-03-05 1:35 ` Sean Christopherson
2026-03-07 1:32 ` Chang S. Bae
0 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2026-03-05 1:35 UTC (permalink / raw)
To: Chang S. Bae; +Cc: pbonzini, kvm, linux-kernel, chao.gao
On Mon, Jan 12, 2026, Chang S. Bae wrote:
> Refactor the VCPU register state accessors to make them explicitly
> GPR-only.
I like "register" though.
> The existing register accessors operate on the cached VCPU register
> state. That cache holds GPRs and RIP. RIP has its own interface already.
Isn't it possible that e.g. get_vmx_mem_address() will do kvm_register_read()
for a RIP-relative address? One could RIP isn't a pure GPR, but it's also not
something entirely different either.
> This renaming clarifies GPR access only.
But then later patches use for Extended GPRs, so the name becomes a lie. I also
don't like unnecessary use of acronyms, even though GPR is ubiquitous in x86.
Everyone looking at KVM knows what a register is, but only x86 folks will know
what GPR is.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 03/16] KVM: x86: Implement accessors for extended GPRs
2026-01-12 23:53 ` [PATCH v2 03/16] KVM: x86: Implement accessors for extended GPRs Chang S. Bae
@ 2026-03-05 1:41 ` Sean Christopherson
2026-03-07 1:32 ` Chang S. Bae
0 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2026-03-05 1:41 UTC (permalink / raw)
To: Chang S. Bae; +Cc: pbonzini, kvm, linux-kernel, chao.gao
On Mon, Jan 12, 2026, Chang S. Bae wrote:
> Add helpers to directly read and write EGPRs (R16–R31).
>
> Unlike legacy GPRs, EGPRs are not cached in vcpu->arch.regs[]. Their
> contents remain live in hardware. If preempted, the EGPR state is
> preserved in the guest XSAVE buffer.
>
> The Advanced Performance Extensions (APX) feature introduces EGPRs as an
> XSAVE-managed state component. The new helpers access the registers
> directly between kvm_fpu_get() and kvm_fpu_put().
>
> Callers should ensure that EGPRs are enabled before using these helpers.
>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> ---
> V1 -> V2: Move _kvm_read_egpr()/_kvm_write_egpr() to x86.c (Paolo)
> ---
> arch/x86/kvm/x86.c | 70 +++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 69 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9857b4d319ed..edac2ec11e2f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1256,13 +1256,81 @@ static inline u64 kvm_guest_supported_xfd(struct kvm_vcpu *vcpu)
> #endif
>
> #ifdef CONFIG_KVM_APX
> +/*
> + * Accessors for extended general-purpose registers. binutils >= 2.43 can
> + * recognize those register symbols.
> + */
> +
> +static void _kvm_read_egpr(int reg, unsigned long *data)
Double underscores please (ignore the bad prior art). And please don't use an
out-param with a void return. That's "necessary" for e.g. _kvm_write_avx_reg()
because the value is large, but this is just 64-bits.
> +{
> + /* mov %r16..%r31, %rax */
> + switch (reg) {
> + case __VCPU_XREG_R16: asm(".byte 0xd5, 0x48, 0x89, 0xc0" : "=a"(*data)); break;
> + case __VCPU_XREG_R17: asm(".byte 0xd5, 0x48, 0x89, 0xc8" : "=a"(*data)); break;
> + case __VCPU_XREG_R18: asm(".byte 0xd5, 0x48, 0x89, 0xd0" : "=a"(*data)); break;
> + case __VCPU_XREG_R19: asm(".byte 0xd5, 0x48, 0x89, 0xd8" : "=a"(*data)); break;
> + case __VCPU_XREG_R20: asm(".byte 0xd5, 0x48, 0x89, 0xe0" : "=a"(*data)); break;
> + case __VCPU_XREG_R21: asm(".byte 0xd5, 0x48, 0x89, 0xe8" : "=a"(*data)); break;
> + case __VCPU_XREG_R22: asm(".byte 0xd5, 0x48, 0x89, 0xf0" : "=a"(*data)); break;
> + case __VCPU_XREG_R23: asm(".byte 0xd5, 0x48, 0x89, 0xf8" : "=a"(*data)); break;
> + case __VCPU_XREG_R24: asm(".byte 0xd5, 0x4c, 0x89, 0xc0" : "=a"(*data)); break;
> + case __VCPU_XREG_R25: asm(".byte 0xd5, 0x4c, 0x89, 0xc8" : "=a"(*data)); break;
> + case __VCPU_XREG_R26: asm(".byte 0xd5, 0x4c, 0x89, 0xd0" : "=a"(*data)); break;
> + case __VCPU_XREG_R27: asm(".byte 0xd5, 0x4c, 0x89, 0xd8" : "=a"(*data)); break;
> + case __VCPU_XREG_R28: asm(".byte 0xd5, 0x4c, 0x89, 0xe0" : "=a"(*data)); break;
> + case __VCPU_XREG_R29: asm(".byte 0xd5, 0x4c, 0x89, 0xe8" : "=a"(*data)); break;
> + case __VCPU_XREG_R30: asm(".byte 0xd5, 0x4c, 0x89, 0xf0" : "=a"(*data)); break;
> + case __VCPU_XREG_R31: asm(".byte 0xd5, 0x4c, 0x89, 0xf8" : "=a"(*data)); break;
Oof, is this really the most effecient way to encode this? I guess so since that's
what all the SIMD instruction do, but ugh.
> + default: BUG();
> + }
> +}
> +
> +static void _kvm_write_egpr(int reg, unsigned long *data)
And then take a value, not a pointer.
> +{
> + /* mov %rax, %r16...%r31*/
> + switch (reg) {
> + case __VCPU_XREG_R16: asm(".byte 0xd5, 0x18, 0x89, 0xc0" : : "a"(*data)); break;
> + case __VCPU_XREG_R17: asm(".byte 0xd5, 0x18, 0x89, 0xc1" : : "a"(*data)); break;
> + case __VCPU_XREG_R18: asm(".byte 0xd5, 0x18, 0x89, 0xc2" : : "a"(*data)); break;
> + case __VCPU_XREG_R19: asm(".byte 0xd5, 0x18, 0x89, 0xc3" : : "a"(*data)); break;
> + case __VCPU_XREG_R20: asm(".byte 0xd5, 0x18, 0x89, 0xc4" : : "a"(*data)); break;
> + case __VCPU_XREG_R21: asm(".byte 0xd5, 0x18, 0x89, 0xc5" : : "a"(*data)); break;
> + case __VCPU_XREG_R22: asm(".byte 0xd5, 0x18, 0x89, 0xc6" : : "a"(*data)); break;
> + case __VCPU_XREG_R23: asm(".byte 0xd5, 0x18, 0x89, 0xc7" : : "a"(*data)); break;
> + case __VCPU_XREG_R24: asm(".byte 0xd5, 0x19, 0x89, 0xc0" : : "a"(*data)); break;
> + case __VCPU_XREG_R25: asm(".byte 0xd5, 0x19, 0x89, 0xc1" : : "a"(*data)); break;
> + case __VCPU_XREG_R26: asm(".byte 0xd5, 0x19, 0x89, 0xc2" : : "a"(*data)); break;
> + case __VCPU_XREG_R27: asm(".byte 0xd5, 0x19, 0x89, 0xc3" : : "a"(*data)); break;
> + case __VCPU_XREG_R28: asm(".byte 0xd5, 0x19, 0x89, 0xc4" : : "a"(*data)); break;
> + case __VCPU_XREG_R29: asm(".byte 0xd5, 0x19, 0x89, 0xc5" : : "a"(*data)); break;
> + case __VCPU_XREG_R30: asm(".byte 0xd5, 0x19, 0x89, 0xc6" : : "a"(*data)); break;
> + case __VCPU_XREG_R31: asm(".byte 0xd5, 0x19, 0x89, 0xc7" : : "a"(*data)); break;
> + default: BUG();
> + }
> +}
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 02/16] KVM: x86: Refactor GPR accessors to differentiate register access types
2026-01-12 23:53 ` [PATCH v2 02/16] KVM: x86: Refactor GPR accessors to differentiate register access types Chang S. Bae
@ 2026-03-05 1:49 ` Sean Christopherson
2026-03-07 1:32 ` Chang S. Bae
0 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2026-03-05 1:49 UTC (permalink / raw)
To: Chang S. Bae; +Cc: pbonzini, kvm, linux-kernel, chao.gao
On Mon, Jan 12, 2026, Chang S. Bae wrote:
> Refactor the GPR accessors to introduce internal helpers to distinguish
> between legacy and extended GPRs. Add CONFIG_KVM_APX to selectively
> enable EGPR support.
Why? If we really want to make this code efficient, use static calls to wire
things up if and only if APX is fully supported.
> +#ifdef CONFIG_KVM_APX
> +static unsigned long kvm_read_egpr(int reg)
> +{
> + return 0;
> +}
> +
> +static void kvm_write_egpr(int reg, unsigned long data)
> +{
> +}
> +
> +unsigned long kvm_gpr_read_raw(struct kvm_vcpu *vcpu, int reg)
> +{
> + switch (reg) {
> + case VCPU_REGS_RAX ... VCPU_REGS_R15:
> + return kvm_register_read_raw(vcpu, reg);
> + case VCPU_XREG_R16 ... VCPU_XREG_R31:
> + return kvm_read_egpr(reg);
> + default:
> + WARN_ON_ONCE(1);
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gpr_read_raw);
> +
> +void kvm_gpr_write_raw(struct kvm_vcpu *vcpu, int reg, unsigned long val)
> +{
> + switch (reg) {
> + case VCPU_REGS_RAX ... VCPU_REGS_R15:
> + kvm_register_write_raw(vcpu, reg, val);
> + break;
> + case VCPU_XREG_R16 ... VCPU_XREG_R31:
> + kvm_write_egpr(reg, val);
> + break;
> + default:
> + WARN_ON_ONCE(1);
> + }
> +}
> +EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gpr_write_raw);
> +#endif
Has anyone done analysis to determine if KVM's currently inlining of
kvm_register_read() and kvm_register_write() is actually a net positive? I.e.
can we just cut over to non-inline functions with static calls?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 06/16] KVM: VMX: Refactor GPR index retrieval from exit qualification
2026-01-12 23:53 ` [PATCH v2 06/16] KVM: VMX: Refactor GPR index retrieval from exit qualification Chang S. Bae
@ 2026-03-05 4:13 ` Sean Christopherson
0 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2026-03-05 4:13 UTC (permalink / raw)
To: Chang S. Bae; +Cc: pbonzini, kvm, linux-kernel, chao.gao
On Mon, Jan 12, 2026, Chang S. Bae wrote:
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 2bb3ac8c5b8b..8d3e0aff2e13 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -411,6 +411,11 @@ static __always_inline unsigned long vmx_get_exit_qual(struct kvm_vcpu *vcpu)
> return vt->exit_qualification;
> }
>
> +static inline int vmx_get_exit_qual_gpr(struct kvm_vcpu *vcpu)
s/gpr/reg
> +{
> + return (vmx_get_exit_qual(vcpu) >> 8) & 0xf;
> +}
> +
> static __always_inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu)
> {
> struct vcpu_vt *vt = to_vt(vcpu);
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 04/16] KVM: VMX: Introduce unified instruction info structure
2026-01-12 23:53 ` [PATCH v2 04/16] KVM: VMX: Introduce unified instruction info structure Chang S. Bae
@ 2026-03-05 4:21 ` Sean Christopherson
2026-03-07 1:33 ` Chang S. Bae
0 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2026-03-05 4:21 UTC (permalink / raw)
To: Chang S. Bae; +Cc: pbonzini, kvm, linux-kernel, chao.gao
On Mon, Jan 12, 2026, Chang S. Bae wrote:
> Define a unified data structure that can represent both the legacy and
> extended VMX instruction information formats.
>
> VMX provides per-instruction metadata for VM exits to help decode the
> attributes of the instruction that triggered the exit. The legacy format,
> however, only supports up to 16 GPRs and thus cannot represent EGPRs. To
> support these new registers, VMX introduces an extended 64-bit layout.
>
> Instead of maintaining separate storage for each format, a single
> union structure makes the overall handling simple. The field names are
> consistent across both layouts. While the presence of certain fields
> depends on the instruction type, the offsets remain fixed within each
> format.
>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> ---
> arch/x86/kvm/vmx/vmx.h | 61 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 61 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index bc3ed3145d7e..567320115a5a 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -311,6 +311,67 @@ struct kvm_vmx {
> u64 *pid_table;
> };
>
> +/*
> + * 32-bit layout of the legacy instruction information field. This format
> + * supports the 16 legacy GPRs.
> + */
> +struct base_insn_info {
> + u32 scale : 2; /* Scaling factor */
> + u32 reserved1 : 1;
> + u32 reg1 : 4; /* First register index */
> + u32 asize : 3; /* Address size */
> + u32 is_reg : 1; /* 0: memory, 1: register */
> + u32 osize : 2; /* Operand size */
> + u32 reserved2 : 2;
> + u32 seg : 3; /* Segment register index */
> + u32 index : 4; /* Index register index */
> + u32 index_invalid : 1; /* 0: valid, 1: invalid */
> + u32 base : 4; /* Base register index */
> + u32 base_invalid : 1; /* 0: valid, 1: invalid */
> + u32 reg2 : 4; /* Second register index */
> +};
> +
> +/*
> + * 64-bit layout of the extended instruction information field, which
> + * supports EGPRs.
> + */
> +struct ext_insn_info {
> + u64 scale : 2; /* Scaling factor */
> + u64 asize : 2; /* Address size */
> + u64 is_reg : 1; /* 0: memory, 1: register */
> + u64 osize : 2; /* Operand size */
> + u64 seg : 3; /* Segment register index */
> + u64 index_invalid : 1; /* 0: valid, 1: invalid */
> + u64 base_invalid : 1; /* 0: valid, 1: invalid */
> + u64 reserved1 : 4;
> + u64 reg1 : 5; /* First register index */
> + u64 reserved2 : 3;
> + u64 index : 5; /* Index register index */
> + u64 reserved3 : 3;
> + u64 base : 5; /* Base register index */
> + u64 reserved4 : 3;
> + u64 reg2 : 5; /* Second register index */
> + u64 reserved5 : 19;
> +};
> +
> +/* Union for accessing either the legacy or extended format. */
> +union insn_info {
> + struct base_insn_info base;
> + struct ext_insn_info ext;
> + u32 word;
> + u64 dword;
word is 16 bits, dword is 32 bits, qword is 64 bits.
> +};
> +
> +/*
> + * Wrapper structure combining the instruction info and a flag indicating
> + * whether the extended layout is in use.
> + */
> +struct vmx_insn_info {
> + /* true if using the extended layout */
> + bool extended;
> + union insn_info info;
> +};
Absolutely not. I despise bit fields, as they're extremely difficult to review,
don't help developers/debuggers understand the expected layout (finding flags and
whatnot in .h files is almost always faster than searching the SDM), and they
often generate suboptimal code.
This is also infrastructure overkill. Two bitfields, a union, and another struct,
just to track a 64-bit value. And the macros added later on only add to the
obfuscation.
Even worse, saving the "extended" flag on the stack and passing it around turns
a static branch into a dynamic branch.
I don't see any reason to do anything more complicated than:
static inline u64 vmx_get_insn_info(void)
{
if (vmx_insn_info_extended())
return vmcs_read64(EXTENDED_INSTRUCTION_INFO);
return vmcs_read32(VMX_INSTRUCTION_INFO);
}
static inline int vmx_get_insn_info_reg(u64 insn_info)
{
return vmx_insn_info_extended() ? (insn_info >> ??) & 0x1f :
(insn_info >> 3) & 0xf;
}
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 09/16] KVM: emulate: Support EGPR accessing and tracking
2026-01-12 23:54 ` [PATCH v2 09/16] KVM: emulate: Support EGPR accessing and tracking Chang S. Bae
@ 2026-03-05 4:22 ` Sean Christopherson
0 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2026-03-05 4:22 UTC (permalink / raw)
To: Chang S. Bae; +Cc: pbonzini, kvm, linux-kernel, chao.gao
For the scope,
KVM: x86:
because other architectures have emulator code, and as is the case here, x86's
emulator code isn't strictly contained to the emulate.c.
On Mon, Jan 12, 2026, Chang S. Bae wrote:
> Extend the emulator context and GPR accessors to handle EGPRs before
> adding support for REX2-prefixed instructions.
>
> Now the KVM GPR accessors can handle EGPRs. Then, the emulator can
> uniformly cache and track all GPRs without requiring separate handling.
>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> ---
> arch/x86/kvm/kvm_emulate.h | 10 +++++-----
> arch/x86/kvm/x86.c | 4 ++--
> 2 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
> index fb3dab4b5a53..16b35a796a7f 100644
> --- a/arch/x86/kvm/kvm_emulate.h
> +++ b/arch/x86/kvm/kvm_emulate.h
> @@ -105,13 +105,13 @@ struct x86_instruction_info {
> struct x86_emulate_ops {
> void (*vm_bugged)(struct x86_emulate_ctxt *ctxt);
> /*
> - * read_gpr: read a general purpose register (rax - r15)
> + * read_gpr: read a general purpose register (rax - r31)
> *
> * @reg: gpr number.
> */
> ulong (*read_gpr)(struct x86_emulate_ctxt *ctxt, unsigned reg);
> /*
> - * write_gpr: write a general purpose register (rax - r15)
> + * write_gpr: write a general purpose register (rax - r31)
> *
> * @reg: gpr number.
> * @val: value to write.
> @@ -314,7 +314,7 @@ typedef void (*fastop_t)(struct fastop *);
> * a ModRM or SIB byte.
> */
> #ifdef CONFIG_X86_64
> -#define NR_EMULATOR_GPRS 16
> +#define NR_EMULATOR_GPRS 32
If we add Kconfig, this would be the place to use it...
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 16/16] KVM: x86: selftests: Add APX state handling and XCR0 sanity checks
2026-01-12 23:54 ` [PATCH v2 16/16] KVM: x86: selftests: Add APX state handling and XCR0 sanity checks Chang S. Bae
@ 2026-03-05 4:28 ` Sean Christopherson
2026-03-07 1:33 ` Chang S. Bae
0 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2026-03-05 4:28 UTC (permalink / raw)
To: Chang S. Bae; +Cc: pbonzini, kvm, linux-kernel, chao.gao
On Mon, Jan 12, 2026, Chang S. Bae wrote:
> Now that KVM exposes the APX feature to guests on APX-capable systems,
> extend the selftests to validate XCR0 configuration and state management.
>
> Since APX repurposes the XSAVE area previously used by MPX in the
> non-compacted format, add a check to ensure that MPX states are not set
> when APX is enabled.
>
> Also, load non-init APX state data in the guest so that XSTATE_BV[APX] is
> set, allowing validation of APX state testing.
I assume/hope there a more tests coming in future versions...
KVM-Unit-Tests' x86/xsave.c would be a good fit, or at least a good reference
for what we should be testing as far as EGPR accesses are concerned.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 01/16] KVM: x86: Rename register accessors to be GPR-specific
2026-03-05 1:35 ` Sean Christopherson
@ 2026-03-07 1:32 ` Chang S. Bae
2026-03-09 23:28 ` Chang S. Bae
2026-03-10 1:23 ` Sean Christopherson
0 siblings, 2 replies; 39+ messages in thread
From: Chang S. Bae @ 2026-03-07 1:32 UTC (permalink / raw)
To: Sean Christopherson; +Cc: pbonzini, kvm, linux-kernel, chao.gao
On 3/4/2026 5:35 PM, Sean Christopherson wrote:
> On Mon, Jan 12, 2026, Chang S. Bae wrote:
>> Refactor the VCPU register state accessors to make them explicitly
>> GPR-only.
>
> I like "register" though.
Yeah, agree that it is more general.
>
>> The existing register accessors operate on the cached VCPU register
>> state. That cache holds GPRs and RIP. RIP has its own interface already.
>
> Isn't it possible that e.g. get_vmx_mem_address() will do kvm_register_read()
> for a RIP-relative address? One could RIP isn't a pure GPR, but it's also not
> something entirely different either.
The 'reg' argument has historically matched the index of the register
cache array, vcpu::arch::regs[]. When extending the accessors to support
EGPRs, it looked smooth to keep using it as a register ID, since that
wires up cleanly with VMX instruction info and emulator sites. But then
reg=16 immediately conflicts with RIP.
Separating accessors for RIP and GPRs was an option. Yes, the usages are
very close and EGPRs are strictly not *legacy* GPRs.
Then, another option would be adjust RIP numbering. For example, use
something like VCPU_REGS_RIP=32 for the accessor, while keeping a
separate value like __VCPU_REGS_RIP=16 for the reg cache index. But
there are many sites directly referencing regs[] and the change looked
rather ugly -- two numberings for RIP alone.
Thoughts?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 02/16] KVM: x86: Refactor GPR accessors to differentiate register access types
2026-03-05 1:49 ` Sean Christopherson
@ 2026-03-07 1:32 ` Chang S. Bae
0 siblings, 0 replies; 39+ messages in thread
From: Chang S. Bae @ 2026-03-07 1:32 UTC (permalink / raw)
To: Sean Christopherson; +Cc: pbonzini, kvm, linux-kernel, chao.gao
On 3/4/2026 5:49 PM, Sean Christopherson wrote:
> On Mon, Jan 12, 2026, Chang S. Bae wrote:
>> Refactor the GPR accessors to introduce internal helpers to distinguish
>> between legacy and extended GPRs. Add CONFIG_KVM_APX to selectively
>> enable EGPR support.
>
> Why? If we really want to make this code efficient, use static calls to wire
> things up if and only if APX is fully supported.
I think the idea was just build out APX-specific code behind the option.
Ah...sorry, the last sentence was wrong to say.
Yes, I agree on static calls for switching.
> Has anyone done analysis to determine if KVM's currently inlining of
> kvm_register_read() and kvm_register_write() is actually a net positive? I.e.
> can we just cut over to non-inline functions with static calls?
A quick measurment between inline vs non-inline: When invoked
repeatedly, the non-inline takes about 5-7% cycles. But, when VM exits
are involved, the diff goes into a noise level -- e.g. invoking
accessors 1000 times accounts for about 1% on the exit/entry route.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 03/16] KVM: x86: Implement accessors for extended GPRs
2026-03-05 1:41 ` Sean Christopherson
@ 2026-03-07 1:32 ` Chang S. Bae
0 siblings, 0 replies; 39+ messages in thread
From: Chang S. Bae @ 2026-03-07 1:32 UTC (permalink / raw)
To: Sean Christopherson; +Cc: pbonzini, kvm, linux-kernel, chao.gao
On 3/4/2026 5:41 PM, Sean Christopherson wrote:
>
> Oof, is this really the most effecient way to encode this? I guess so since that's
> what all the SIMD instruction do, but ugh.
Perhaps, a macro that converts to a bytecode string, or a script. But I
was not convinced that adds much value, so ended up with that.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 04/16] KVM: VMX: Introduce unified instruction info structure
2026-03-05 4:21 ` Sean Christopherson
@ 2026-03-07 1:33 ` Chang S. Bae
2026-03-13 1:05 ` Sean Christopherson
0 siblings, 1 reply; 39+ messages in thread
From: Chang S. Bae @ 2026-03-07 1:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: pbonzini, kvm, linux-kernel, chao.gao
On 3/4/2026 8:21 PM, Sean Christopherson wrote:
>
> Absolutely not. I despise bit fields, as they're extremely difficult to review,
> don't help developers/debuggers understand the expected layout (finding flags and
> whatnot in .h files is almost always faster than searching the SDM), and they
> often generate suboptimal code.
Okay.
> I don't see any reason to do anything more complicated than:
>
> static inline u64 vmx_get_insn_info(void)
> {
> if (vmx_insn_info_extended())
> return vmcs_read64(EXTENDED_INSTRUCTION_INFO);
>
> return vmcs_read32(VMX_INSTRUCTION_INFO);
> }
>
> static inline int vmx_get_insn_info_reg(u64 insn_info)
> {
> return vmx_insn_info_extended() ? (insn_info >> ??) & 0x1f :
> (insn_info >> 3) & 0xf;
> }
There is
int get_vmx_mem_address(...)
{
...
/*
* According to Vol. 3B,...
*/
int scaling = vmx_instruction_info & 3;
int addr_size = (vmx_instruction_info >> 7) & 7;
bool is_reg = vmx_instruction_info & (1u << 10);
int seg_reg = (vmx_instruction_info >> 15) & 7;
int index_reg = (vmx_instruction_info >> 18) & 0xf;
bool index_is_valid = !(vmx_instruction_info & (1u << 22));
int base_reg = (vmx_instruction_info >> 23) & 0xf;
bool base_is_valid = !(vmx_instruction_info & (1u << 27));
I'd assume wrappers like above for each line there. But to confirm your
preference: would you rather keep this open-coded, or introduce another
wrappers for each?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 16/16] KVM: x86: selftests: Add APX state handling and XCR0 sanity checks
2026-03-05 4:28 ` Sean Christopherson
@ 2026-03-07 1:33 ` Chang S. Bae
2026-03-11 18:42 ` Paolo Bonzini
0 siblings, 1 reply; 39+ messages in thread
From: Chang S. Bae @ 2026-03-07 1:33 UTC (permalink / raw)
To: Sean Christopherson; +Cc: pbonzini, kvm, linux-kernel, chao.gao
On 3/4/2026 8:28 PM, Sean Christopherson wrote:
>
> I assume/hope there a more tests coming in future versions...
>
> KVM-Unit-Tests' x86/xsave.c would be a good fit, or at least a good reference
> for what we should be testing as far as EGPR accesses are concerned.
Yeah, I can see establishing a solid test set along with a new feature
is pretty much a requirement for KVM. I'd come up with some updates.
Right now, yes, considering that xsave.c. Thanks a lot!
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 01/16] KVM: x86: Rename register accessors to be GPR-specific
2026-03-07 1:32 ` Chang S. Bae
@ 2026-03-09 23:28 ` Chang S. Bae
2026-03-10 1:23 ` Sean Christopherson
1 sibling, 0 replies; 39+ messages in thread
From: Chang S. Bae @ 2026-03-09 23:28 UTC (permalink / raw)
To: Sean Christopherson; +Cc: pbonzini, kvm, linux-kernel, chao.gao
[-- Attachment #1: Type: text/plain, Size: 855 bytes --]
On 3/6/2026 5:32 PM, Chang S. Bae wrote:
>
> The 'reg' argument has historically matched the index of the register
> cache array, vcpu::arch::regs[]. When extending the accessors to support
> EGPRs, it looked smooth to keep using it as a register ID, since that
> wires up cleanly with VMX instruction info and emulator sites. But then
> reg=16 immediately conflicts with RIP.
I think it is possible to introduce a dedicated field there, instead of
regs[]. RIP appears to be switched by hardware on VM exit/entry anyway.
The attached draft take that:
* First, move RIP into the new field. Then kvm_register_read|write()
family effectively becomes GPR-only (while keeping the generic
'register' name).
* Second, the extra layer adds EGPR support. But it doesn't appear to
have measurable overhead, and it can be compiled out.
[-- Attachment #2: 0001.patch --]
[-- Type: text/plain, Size: 4842 bytes --]
---
arch/x86/include/asm/kvm_host.h | 5 +++--
arch/x86/kvm/kvm_cache_regs.h | 11 ++++++-----
arch/x86/kvm/svm/sev.c | 2 +-
arch/x86/kvm/svm/svm.c | 6 +++---
arch/x86/kvm/vmx/vmx.c | 4 ++--
5 files changed, 15 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ff07c45e3c73..0b95126505ac 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -190,10 +190,10 @@ enum kvm_reg {
VCPU_REGS_R14 = __VCPU_REGS_R14,
VCPU_REGS_R15 = __VCPU_REGS_R15,
#endif
- VCPU_REGS_RIP,
NR_VCPU_REGS,
+ VCPU_REGS_RIP = NR_VCPU_REGS,
- VCPU_EXREG_PDPTR = NR_VCPU_REGS,
+ VCPU_EXREG_PDPTR,
VCPU_EXREG_CR0,
/*
* Alias AMD's ERAPS (not a real register) to CR3 so that common code
@@ -799,6 +799,7 @@ struct kvm_vcpu_arch {
* kvm_{register,rip}_{read,write} functions.
*/
unsigned long regs[NR_VCPU_REGS];
+ unsigned long rip;
u32 regs_avail;
u32 regs_dirty;
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 8ddb01191d6f..33514affb90d 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -115,9 +115,6 @@ static inline unsigned long kvm_register_read_raw(struct kvm_vcpu *vcpu, int reg
if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
return 0;
- if (!kvm_register_is_available(vcpu, reg))
- kvm_x86_call(cache_reg)(vcpu, reg);
-
return vcpu->arch.regs[reg];
}
@@ -133,12 +130,16 @@ static inline void kvm_register_write_raw(struct kvm_vcpu *vcpu, int reg,
static inline unsigned long kvm_rip_read(struct kvm_vcpu *vcpu)
{
- return kvm_register_read_raw(vcpu, VCPU_REGS_RIP);
+ if (!kvm_register_is_available(vcpu, VCPU_REGS_RIP))
+ kvm_x86_call(cache_reg)(vcpu, VCPU_REGS_RIP);
+
+ return vcpu->arch.rip;
}
static inline void kvm_rip_write(struct kvm_vcpu *vcpu, unsigned long val)
{
- kvm_register_write_raw(vcpu, VCPU_REGS_RIP, val);
+ vcpu->arch.rip = val;
+ kvm_register_mark_dirty(vcpu, VCPU_REGS_RIP);
}
static inline unsigned long kvm_rsp_read(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 3f9c1aa39a0a..e1b892531b35 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -913,7 +913,7 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
save->r14 = svm->vcpu.arch.regs[VCPU_REGS_R14];
save->r15 = svm->vcpu.arch.regs[VCPU_REGS_R15];
#endif
- save->rip = svm->vcpu.arch.regs[VCPU_REGS_RIP];
+ save->rip = svm->vcpu.arch.rip;
/* Sync some non-GPR registers before encrypting */
save->xcr0 = svm->vcpu.arch.xcr0;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8f8bc863e214..ea28cfaf30ff 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4294,7 +4294,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX];
svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
- svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
+ svm->vmcb->save.rip = vcpu->arch.rip;
/*
* Disable singlestep if we're injecting an interrupt/exception.
@@ -4378,7 +4378,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
vcpu->arch.cr2 = svm->vmcb->save.cr2;
vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax;
vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
- vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
+ vcpu->arch.rip = svm->vmcb->save.rip;
}
vcpu->arch.regs_dirty = 0;
@@ -4801,7 +4801,7 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX];
svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
- svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
+ svm->vmcb->save.rip = vcpu->arch.rip;
ret = nested_svm_simple_vmexit(svm, SVM_EXIT_SW);
if (ret)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 967b58a8ab9d..9132e53b02ae 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2641,7 +2641,7 @@ void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
vcpu->arch.regs[VCPU_REGS_RSP] = vmcs_readl(GUEST_RSP);
break;
case VCPU_REGS_RIP:
- vcpu->arch.regs[VCPU_REGS_RIP] = vmcs_readl(GUEST_RIP);
+ vcpu->arch.rip = vmcs_readl(GUEST_RIP);
break;
case VCPU_EXREG_PDPTR:
if (enable_ept)
@@ -7646,7 +7646,7 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
if (kvm_register_is_dirty(vcpu, VCPU_REGS_RSP))
vmcs_writel(GUEST_RSP, vcpu->arch.regs[VCPU_REGS_RSP]);
if (kvm_register_is_dirty(vcpu, VCPU_REGS_RIP))
- vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]);
+ vmcs_writel(GUEST_RIP, vcpu->arch.rip);
vcpu->arch.regs_dirty = 0;
if (run_flags & KVM_RUN_LOAD_GUEST_DR6)
--
2.51.0
[-- Attachment #3: 0002.patch --]
[-- Type: text/plain, Size: 5345 bytes --]
---
arch/x86/include/asm/kvm_host.h | 18 ++++++++++++
arch/x86/include/asm/kvm_vcpu_regs.h | 16 +++++++++++
arch/x86/kvm/Kconfig | 4 +++
arch/x86/kvm/x86.c | 43 ++++++++++++++++++++++++++++
arch/x86/kvm/x86.h | 28 +++++++++++++++++-
5 files changed, 108 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0b95126505ac..b246a1a96c4e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -220,6 +220,24 @@ enum {
VCPU_SREG_GS,
VCPU_SREG_TR,
VCPU_SREG_LDTR,
+#ifdef CONFIG_KVM_APX
+ VCPU_XREG_R16 = __VCPU_XREG_R16,
+ VCPU_XREG_R17 = __VCPU_XREG_R17,
+ VCPU_XREG_R18 = __VCPU_XREG_R18,
+ VCPU_XREG_R19 = __VCPU_XREG_R19,
+ VCPU_XREG_R20 = __VCPU_XREG_R20,
+ VCPU_XREG_R21 = __VCPU_XREG_R21,
+ VCPU_XREG_R22 = __VCPU_XREG_R22,
+ VCPU_XREG_R23 = __VCPU_XREG_R23,
+ VCPU_XREG_R24 = __VCPU_XREG_R24,
+ VCPU_XREG_R25 = __VCPU_XREG_R25,
+ VCPU_XREG_R26 = __VCPU_XREG_R26,
+ VCPU_XREG_R27 = __VCPU_XREG_R27,
+ VCPU_XREG_R28 = __VCPU_XREG_R28,
+ VCPU_XREG_R29 = __VCPU_XREG_R29,
+ VCPU_XREG_R30 = __VCPU_XREG_R30,
+ VCPU_XREG_R31 = __VCPU_XREG_R31,
+#endif
};
enum exit_fastpath_completion {
diff --git a/arch/x86/include/asm/kvm_vcpu_regs.h b/arch/x86/include/asm/kvm_vcpu_regs.h
index 1af2cb59233b..dd0cc171f405 100644
--- a/arch/x86/include/asm/kvm_vcpu_regs.h
+++ b/arch/x86/include/asm/kvm_vcpu_regs.h
@@ -20,6 +20,22 @@
#define __VCPU_REGS_R13 13
#define __VCPU_REGS_R14 14
#define __VCPU_REGS_R15 15
+#define __VCPU_XREG_R16 16
+#define __VCPU_XREG_R17 17
+#define __VCPU_XREG_R18 18
+#define __VCPU_XREG_R19 19
+#define __VCPU_XREG_R20 20
+#define __VCPU_XREG_R21 21
+#define __VCPU_XREG_R22 22
+#define __VCPU_XREG_R23 23
+#define __VCPU_XREG_R24 24
+#define __VCPU_XREG_R25 25
+#define __VCPU_XREG_R26 26
+#define __VCPU_XREG_R27 27
+#define __VCPU_XREG_R28 28
+#define __VCPU_XREG_R29 29
+#define __VCPU_XREG_R30 30
+#define __VCPU_XREG_R31 31
#endif
#endif /* _ASM_X86_KVM_VCPU_REGS_H */
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 801bf9e520db..f27e3f2937f0 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -93,10 +93,14 @@ config KVM_SW_PROTECTED_VM
If unsure, say "N".
+config KVM_APX
+ bool
+
config KVM_INTEL
tristate "KVM for Intel (and compatible) processors support"
depends on KVM && IA32_FEAT_CTL
select X86_FRED if X86_64
+ select KVM_APX if X86_64
help
Provides support for KVM on processors equipped with Intel's VT
extensions, a.k.a. Virtual Machine Extensions (VMX).
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a03530795707..07119b4597dc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1261,6 +1261,49 @@ static inline u64 kvm_guest_supported_xfd(struct kvm_vcpu *vcpu)
}
#endif
+#ifdef CONFIG_KVM_APX
+
+static unsigned long kvm_read_egpr(int reg)
+{
+ return 0;
+}
+
+static void kvm_write_egpr(int reg, unsigned long data)
+{
+}
+
+static unsigned long kvm_register_read_ext(struct kvm_vcpu *vcpu, int reg)
+{
+ switch (reg) {
+ case VCPU_REGS_RAX ... VCPU_REGS_R15:
+ return kvm_register_read_raw(vcpu, reg);
+ case VCPU_XREG_R16 ... VCPU_XREG_R31:
+ return kvm_read_egpr(reg);
+ default:
+ WARN_ON_ONCE(1);
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_register_read_ext);
+
+static void kvm_register_write_ext(struct kvm_vcpu *vcpu, int reg, unsigned long val)
+{
+ switch (reg) {
+ case VCPU_REGS_RAX ... VCPU_REGS_R15:
+ kvm_register_write_raw(vcpu, reg, val);
+ break;
+ case VCPU_XREG_R16 ... VCPU_XREG_R31:
+ kvm_write_egpr(reg, val);
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ }
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_register_write_ext);
+
+#endif /* CONFIG_KVM_APX */
+
int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
{
u64 xcr0 = xcr;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 94d4f07aaaa0..3447790849e7 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -411,6 +411,29 @@ static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
return false;
}
+#ifdef CONFIG_KVM_APX
+
+unsigned long kvm_register_read_ext(struct kvm_vcpu *vcpu, int reg);
+void kvm_register_write_ext(struct kvm_vcpu *vcpu, int reg, unsigned long val);
+
+static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, int reg)
+{
+ unsigned long val = kvm_register_read_ext(vcpu, reg);
+
+ return is_64_bit_mode(vcpu) ? val : (u32)val;
+}
+
+static inline void kvm_register_write(struct kvm_vcpu *vcpu,
+ int reg, unsigned long val)
+{
+ if (!is_64_bit_mode(vcpu))
+ val = (u32)val;
+
+ return kvm_register_write_ext(vcpu, reg, val);
+}
+
+#else
+
static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, int reg)
{
unsigned long val = kvm_register_read_raw(vcpu, reg);
@@ -419,13 +442,16 @@ static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, int reg)
}
static inline void kvm_register_write(struct kvm_vcpu *vcpu,
- int reg, unsigned long val)
+ int reg, unsigned long val)
{
if (!is_64_bit_mode(vcpu))
val = (u32)val;
+
return kvm_register_write_raw(vcpu, reg, val);
}
+#endif
+
static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
{
return !(kvm->arch.disabled_quirks & quirk);
--
2.51.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 01/16] KVM: x86: Rename register accessors to be GPR-specific
2026-03-07 1:32 ` Chang S. Bae
2026-03-09 23:28 ` Chang S. Bae
@ 2026-03-10 1:23 ` Sean Christopherson
2026-03-10 22:05 ` Chang S. Bae
1 sibling, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2026-03-10 1:23 UTC (permalink / raw)
To: Chang S. Bae; +Cc: pbonzini, kvm, linux-kernel, chao.gao
On Fri, Mar 06, 2026, Chang S. Bae wrote:
> On 3/4/2026 5:35 PM, Sean Christopherson wrote:
> > On Mon, Jan 12, 2026, Chang S. Bae wrote:
> > > Refactor the VCPU register state accessors to make them explicitly
> > > GPR-only.
> >
> > I like "register" though.
>
> Yeah, agree that it is more general.
>
> >
> > > The existing register accessors operate on the cached VCPU register
> > > state. That cache holds GPRs and RIP. RIP has its own interface already.
> >
> > Isn't it possible that e.g. get_vmx_mem_address() will do kvm_register_read()
> > for a RIP-relative address?
Answering my own question: no, this isn't possible, specifically because RIP can't
be addressed via "normal" methods (as Chang points out below, KVM's index of 16
is completely arbitrary).
Instead, for RIP relative addressing, the full "offset" gets reported via
EXIT_QUALIFICATION.
> One could RIP isn't a pure GPR, but it's also not something entirely different either.
>
> The 'reg' argument has historically matched the index of the register cache
> array, vcpu::arch::regs[]. When extending the accessors to support EGPRs, it
> looked smooth to keep using it as a register ID, since that wires up cleanly
> with VMX instruction info and emulator sites. But then reg=16 immediately
> conflicts with RIP.
>
> Separating accessors for RIP and GPRs was an option. Yes, the usages are
> very close and EGPRs are strictly not *legacy* GPRs.
>
> Then, another option would be adjust RIP numbering. For example, use
> something like VCPU_REGS_RIP=32 for the accessor, while keeping a separate
> value like __VCPU_REGS_RIP=16 for the reg cache index. But there are many
> sites directly referencing regs[] and the change looked rather ugly -- two
> numberings for RIP alone.
Oh, yikes, I didn't even see that this series is playing games with the register
indices.
Whatever we do, the changelog asbolutely needs to call out the real motiviation.
Because nothing in here screams "KVM's APX implementation depends on this and
things will break horribly if kvm_gpr_read() is called with VCPU_REGS_RIP".
The existing register accessors operate on the cached VCPU register
state. That cache holds GPRs and RIP. RIP has its own interface already.
This renaming clarifies GPR access only.
I'll try to come back to this tomorrow with more complete thoughts and hopefully
an idea or two on where to go, but I am most definitely against the current
implementation drops the safeguards provided by kvm_register_{read,write}_raw().
E.g. passing in VCPU_REGS_RIP to kvm_gpr_read() will compile just fine, but will
read the wrong register on APX capable hardware.
There's still kinda sorta some protection there as kvm_read_egpr() will WARN on
!APX hardware, but the hole scheme is kludgy at best.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 01/16] KVM: x86: Rename register accessors to be GPR-specific
2026-03-10 1:23 ` Sean Christopherson
@ 2026-03-10 22:05 ` Chang S. Bae
2026-03-10 23:12 ` Sean Christopherson
0 siblings, 1 reply; 39+ messages in thread
From: Chang S. Bae @ 2026-03-10 22:05 UTC (permalink / raw)
To: Sean Christopherson; +Cc: pbonzini, kvm, linux-kernel, chao.gao
On 3/9/2026 6:23 PM, Sean Christopherson wrote:
>
> Oh, yikes, I didn't even see that this series is playing games with the register
> indices.
>
> Whatever we do, the changelog asbolutely needs to call out the real motiviation.
Given the discussion here, it looks so apparent the changelog is missing
that detail. I'll ensure something like what you wrote here to the revision.
> I'll try to come back to this tomorrow with more complete thoughts and hopefully
Sure, you call it. I know you have a lot on your plate, so I hope you
feel free to take your time. Thanks!
> E.g. passing in VCPU_REGS_RIP to kvm_gpr_read() will compile just fine, but will
> read the wrong register on APX capable hardware.
Right, so new semantics likely need to be established. As responded
before, one option would be separate them in structure:
diff --git a/arch/x86/include/asm/kvm_host.h
b/arch/x86/include/asm/kvm_host.h
index ff07c45e3c73..ff8a317be5cf 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -795,10 +795,14 @@ enum kvm_only_cpuid_leafs {
struct kvm_vcpu_arch {
/*
- * rip and regs accesses must go through
- * kvm_{register,rip}_{read,write} functions.
+ * regs accesses must go through kvm_register_{read,write}
+ * functions.
*/
unsigned long regs[NR_VCPU_REGS];
+
+ /* rip accesses must go through kvm_rip_{read,write} */
+ unsigned long rip;
+
u32 regs_avail;
u32 regs_dirty;
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 01/16] KVM: x86: Rename register accessors to be GPR-specific
2026-03-10 22:05 ` Chang S. Bae
@ 2026-03-10 23:12 ` Sean Christopherson
0 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2026-03-10 23:12 UTC (permalink / raw)
To: Chang S. Bae; +Cc: pbonzini, kvm, linux-kernel, chao.gao
On Tue, Mar 10, 2026, Chang S. Bae wrote:
> On 3/9/2026 6:23 PM, Sean Christopherson wrote:
> >
> > Oh, yikes, I didn't even see that this series is playing games with the register
> > indices.
> >
> > Whatever we do, the changelog asbolutely needs to call out the real motiviation.
>
> Given the discussion here, it looks so apparent the changelog is missing
> that detail. I'll ensure something like what you wrote here to the revision.
>
> > I'll try to come back to this tomorrow with more complete thoughts and hopefully
>
> Sure, you call it. I know you have a lot on your plate, so I hope you feel
> free to take your time. Thanks!
>
> > E.g. passing in VCPU_REGS_RIP to kvm_gpr_read() will compile just fine, but will
> > read the wrong register on APX capable hardware.
>
> Right, so new semantics likely need to be established. As responded before,
> one option would be separate them in structure:
>
> diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h
> index ff07c45e3c73..ff8a317be5cf 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -795,10 +795,14 @@ enum kvm_only_cpuid_leafs {
>
> struct kvm_vcpu_arch {
> /*
> - * rip and regs accesses must go through
> - * kvm_{register,rip}_{read,write} functions.
> + * regs accesses must go through kvm_register_{read,write}
> + * functions.
> */
> unsigned long regs[NR_VCPU_REGS];
> +
> + /* rip accesses must go through kvm_rip_{read,write} */
> + unsigned long rip;
Ya, this is where I ended up too. And then as prep work, we can and should
convert regs_{avail,dirty} to proper bitmaps so that the size can be dynamic
for 32-bit vs. 64-bit vs. APX-capable (or we could just use a "unsigned long",
it would only change what BUILD_BUG_ON()s are needed).
E.g. I have
unsigned long regs[NR_VCPU_GENERAL_PURPOSE_REGS];
unsigned long rip;
DECLARE_BITMAP(regs_avail, NR_VCPU_TOTAL_REGS);
DECLARE_BITMAP(regs_dirty, NR_VCPU_TOTAL_REGS);
and then the below as a final testing hack for APX. I should be able to post a
small series later today, which will map out out most of the register crud (I
didn't do anything with the emulator, so it's not a complete prep series, but
it should be enough to allow us to choose a direction).
enum kvm_reg {
VCPU_REGS_RAX = __VCPU_REGS_RAX,
VCPU_REGS_RCX = __VCPU_REGS_RCX,
VCPU_REGS_RDX = __VCPU_REGS_RDX,
VCPU_REGS_RBX = __VCPU_REGS_RBX,
VCPU_REGS_RSP = __VCPU_REGS_RSP,
VCPU_REGS_RBP = __VCPU_REGS_RBP,
VCPU_REGS_RSI = __VCPU_REGS_RSI,
VCPU_REGS_RDI = __VCPU_REGS_RDI,
#ifdef CONFIG_X86_64
VCPU_REGS_R8 = __VCPU_REGS_R8,
VCPU_REGS_R9 = __VCPU_REGS_R9,
VCPU_REGS_R10 = __VCPU_REGS_R10,
VCPU_REGS_R11 = __VCPU_REGS_R11,
VCPU_REGS_R12 = __VCPU_REGS_R12,
VCPU_REGS_R13 = __VCPU_REGS_R13,
VCPU_REGS_R14 = __VCPU_REGS_R14,
VCPU_REGS_R15 = __VCPU_REGS_R15,
#define CONFIG_X86_APX
#endif
#ifdef CONFIG_X86_APX
VCPU_REG_R16 = VCPU_REGS_R15 + 1,
VCPU_REG_R17,
VCPU_REG_R18,
VCPU_REG_R19,
VCPU_REG_R20,
VCPU_REG_R21,
VCPU_REG_R22,
VCPU_REG_R23,
VCPU_REG_R24,
VCPU_REG_R25,
VCPU_REG_R26,
VCPU_REG_R27,
VCPU_REG_R28,
VCPU_REG_R29,
VCPU_REG_R30,
VCPU_REG_R31,
#endif
NR_VCPU_GENERAL_PURPOSE_REGS,
VCPU_REG_RIP = NR_VCPU_GENERAL_PURPOSE_REGS,
VCPU_REG_PDPTR,
VCPU_REG_CR0,
/*
* Alias AMD's ERAPS (not a real register) to CR3 so that common code
* can trigger emulation of the RAP (Return Address Predictor) with
* minimal support required in common code. Piggyback CR3 as the RAP
* is cleared on writes to CR3, i.e. marking CR3 dirty will naturally
* mark ERAPS dirty as well.
*/
VCPU_REG_CR3,
VCPU_REG_ERAPS = VCPU_REG_CR3,
VCPU_REG_CR4,
VCPU_REG_RFLAGS,
VCPU_REG_SEGMENTS,
VCPU_REG_EXIT_INFO_1,
VCPU_REG_EXIT_INFO_2,
NR_VCPU_TOTAL_REGS,
};
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 16/16] KVM: x86: selftests: Add APX state handling and XCR0 sanity checks
2026-03-07 1:33 ` Chang S. Bae
@ 2026-03-11 18:42 ` Paolo Bonzini
0 siblings, 0 replies; 39+ messages in thread
From: Paolo Bonzini @ 2026-03-11 18:42 UTC (permalink / raw)
To: Chang S. Bae, Sean Christopherson; +Cc: kvm, linux-kernel, chao.gao
On 3/7/26 02:33, Chang S. Bae wrote:
> On 3/4/2026 8:28 PM, Sean Christopherson wrote:
>>
>> I assume/hope there a more tests coming in future versions...
>>
>> KVM-Unit-Tests' x86/xsave.c would be a good fit, or at least a good
>> reference
>> for what we should be testing as far as EGPR accesses are concerned.
>
> Yeah, I can see establishing a solid test set along with a new feature
> is pretty much a requirement for KVM. I'd come up with some updates.
> Right now, yes, considering that xsave.c. Thanks a lot!
Not a lot is needed, but please add at least something that uses the
forced emulation prefix to trigger the emulation paths.
Paolo
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 04/16] KVM: VMX: Introduce unified instruction info structure
2026-03-07 1:33 ` Chang S. Bae
@ 2026-03-13 1:05 ` Sean Christopherson
0 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2026-03-13 1:05 UTC (permalink / raw)
To: Chang S. Bae; +Cc: pbonzini, kvm, linux-kernel, chao.gao
On Fri, Mar 06, 2026, Chang S. Bae wrote:
> On 3/4/2026 8:21 PM, Sean Christopherson wrote:
> > static inline int vmx_get_insn_info_reg(u64 insn_info)
> > {
> > return vmx_insn_info_extended() ? (insn_info >> ??) & 0x1f :
> > (insn_info >> 3) & 0xf;
> > }
>
> There is
>
> int get_vmx_mem_address(...)
> {
> ...
>
> /*
> * According to Vol. 3B,...
> */
> int scaling = vmx_instruction_info & 3;
> int addr_size = (vmx_instruction_info >> 7) & 7;
> bool is_reg = vmx_instruction_info & (1u << 10);
> int seg_reg = (vmx_instruction_info >> 15) & 7;
> int index_reg = (vmx_instruction_info >> 18) & 0xf;
> bool index_is_valid = !(vmx_instruction_info & (1u << 22));
> int base_reg = (vmx_instruction_info >> 23) & 0xf;
> bool base_is_valid = !(vmx_instruction_info & (1u << 27));
>
> I'd assume wrappers like above for each line there.
Ya.
> But to confirm your preference: would you rather keep this open-coded, or
> introduce another wrappers for each?
Assuming the alternative is to open code both the extended and regular versions,
yes, definitely add wrappers.
^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2026-03-13 1:05 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-12 23:53 [PATCH v2 00/16] KVM: x86: Enable APX for guests Chang S. Bae
2026-01-12 23:53 ` [PATCH v2 01/16] KVM: x86: Rename register accessors to be GPR-specific Chang S. Bae
2026-03-05 1:35 ` Sean Christopherson
2026-03-07 1:32 ` Chang S. Bae
2026-03-09 23:28 ` Chang S. Bae
2026-03-10 1:23 ` Sean Christopherson
2026-03-10 22:05 ` Chang S. Bae
2026-03-10 23:12 ` Sean Christopherson
2026-01-12 23:53 ` [PATCH v2 02/16] KVM: x86: Refactor GPR accessors to differentiate register access types Chang S. Bae
2026-03-05 1:49 ` Sean Christopherson
2026-03-07 1:32 ` Chang S. Bae
2026-01-12 23:53 ` [PATCH v2 03/16] KVM: x86: Implement accessors for extended GPRs Chang S. Bae
2026-03-05 1:41 ` Sean Christopherson
2026-03-07 1:32 ` Chang S. Bae
2026-01-12 23:53 ` [PATCH v2 04/16] KVM: VMX: Introduce unified instruction info structure Chang S. Bae
2026-03-05 4:21 ` Sean Christopherson
2026-03-07 1:33 ` Chang S. Bae
2026-03-13 1:05 ` Sean Christopherson
2026-01-12 23:53 ` [PATCH v2 05/16] KVM: VMX: Refactor instruction information retrieval Chang S. Bae
2026-01-12 23:53 ` [PATCH v2 06/16] KVM: VMX: Refactor GPR index retrieval from exit qualification Chang S. Bae
2026-03-05 4:13 ` Sean Christopherson
2026-01-12 23:53 ` [PATCH v2 07/16] KVM: VMX: Support extended register index in exit handling Chang S. Bae
2026-01-12 23:54 ` [PATCH v2 08/16] KVM: nVMX: Propagate the extended instruction info field Chang S. Bae
2026-01-12 23:54 ` [PATCH v2 09/16] KVM: emulate: Support EGPR accessing and tracking Chang S. Bae
2026-03-05 4:22 ` Sean Christopherson
2026-01-12 23:54 ` [PATCH v2 10/16] KVM: emulate: Handle EGPR index and REX2-incompatible opcodes Chang S. Bae
2026-01-12 23:54 ` [PATCH v2 11/16] KVM: emulate: Support REX2-prefixed opcode decode Chang S. Bae
2026-01-12 23:54 ` [PATCH v2 12/16] KVM: emulate: Reject EVEX-prefixed instructions Chang S. Bae
2026-01-12 23:54 ` [PATCH v2 13/16] KVM: x86: Guard valid XCR0.APX settings Chang S. Bae
2026-01-12 23:54 ` [PATCH v2 14/16] KVM: x86: Expose APX foundational feature bit to guests Chang S. Bae
2026-01-19 5:55 ` Xiaoyao Li
2026-01-20 18:07 ` Edgecombe, Rick P
2026-01-20 20:50 ` Chang S. Bae
2026-01-21 19:59 ` Edgecombe, Rick P
2026-01-12 23:54 ` [PATCH v2 15/16] KVM: x86: Expose APX sub-features " Chang S. Bae
2026-01-12 23:54 ` [PATCH v2 16/16] KVM: x86: selftests: Add APX state handling and XCR0 sanity checks Chang S. Bae
2026-03-05 4:28 ` Sean Christopherson
2026-03-07 1:33 ` Chang S. Bae
2026-03-11 18:42 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox