* [PATCH v3 01/40] KVM: SVM: Truncate INVLPGA address in compatibility mode
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
@ 2026-05-29 22:21 ` Sean Christopherson
2026-05-29 22:21 ` [PATCH v3 02/40] KVM: x86/xen: Bug the VM if 32-bit KVM observes a 64-bit mode hypercall Sean Christopherson
` (39 subsequent siblings)
40 siblings, 0 replies; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:21 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Check for full 64-bit mode, not just long mode, when truncating the
virtual address as part of INVLPGA emulation. Compatibility mode doesn't
support 64-bit addressing.
Note, the FIXME still applies, e.g. if the guest deliberately targeted
EAX while in 64-bit via an address size override. That flaw isn't worth
fixing as it would require decoding the code stream, which would open an
entirely different can of worms, and in practice no sane guest would shove
garbage into RAX[63:32] and execute INVLPGA.
Note #2, VMSAVE, VMLOAD, and VMRUN all suffer from the same architectural
flaw of not providing the full linear address in a VMCB exit information
field, because, quoting the APM verbatim:
the linear address is available directly from the guest rAX register
(VMSAVE, VMLOAD, and VMRUN take a physical address, but their behavior
with respect to rAX is otherwise identical).
Fixes: bc9eff67fc35 ("KVM: SVM: Use default rAX size for INVLPGA emulation")
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/svm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 717af5c4d057..7d8a433b5c5e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2416,7 +2416,7 @@ static int invlpga_interception(struct kvm_vcpu *vcpu)
return 1;
/* FIXME: Handle an address size prefix. */
- if (!is_long_mode(vcpu))
+ if (!is_64_bit_mode(vcpu))
gva = (u32)gva;
trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, asid, gva);
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v3 02/40] KVM: x86/xen: Bug the VM if 32-bit KVM observes a 64-bit mode hypercall
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
2026-05-29 22:21 ` [PATCH v3 01/40] KVM: SVM: Truncate INVLPGA address in compatibility mode Sean Christopherson
@ 2026-05-29 22:21 ` Sean Christopherson
2026-05-29 22:21 ` [PATCH v3 03/40] KVM: x86/xen: Don't truncate RAX when handling hypercall from protected guest Sean Christopherson
` (38 subsequent siblings)
40 siblings, 0 replies; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:21 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Bug the VM if 32-bit KVM attempts to handle a 64-bit hypercall, primarily
so that a future change to set "input" in mode-specific code doesn't
trigger a false positive warn=>error:
arch/x86/kvm/xen.c:1687:6: error: variable 'input' is used uninitialized
whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
1687 | if (!longmode) {
| ^~~~~~~~~
arch/x86/kvm/xen.c:1708:31: note: uninitialized use occurs here
1708 | trace_kvm_xen_hypercall(cpl, input, params[0], params[1], params[2],
| ^~~~~
x86/kvm/xen.c:1687:2: note: remove the 'if' if its condition is always true
1687 | if (!longmode) {
| ^~~~~~~~~~~~~~
arch/x86/kvm/xen.c:1677:11: note: initialize the variable 'input' to silence this warning
1677 | u64 input, params[6], r = -ENOSYS;
| ^
1 error generated.
Note, params[] also has the same flaw, but -Wsometimes-uninitialized
doesn't seem to be enforced for arrays, presumably because it's difficult
to avoid false positives on specific entries.
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/xen.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 91fd3673c09a..6d9be74bb673 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1694,16 +1694,19 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
params[4] = (u32)kvm_rdi_read(vcpu);
params[5] = (u32)kvm_rbp_read(vcpu);
}
-#ifdef CONFIG_X86_64
else {
+#ifdef CONFIG_X86_64
params[0] = (u64)kvm_rdi_read(vcpu);
params[1] = (u64)kvm_rsi_read(vcpu);
params[2] = (u64)kvm_rdx_read(vcpu);
params[3] = (u64)kvm_r10_read(vcpu);
params[4] = (u64)kvm_r8_read(vcpu);
params[5] = (u64)kvm_r9_read(vcpu);
- }
+#else
+ KVM_BUG_ON(1, vcpu->kvm);
+ return -EIO;
#endif
+ }
cpl = kvm_x86_call(get_cpl)(vcpu);
trace_kvm_xen_hypercall(cpl, input, params[0], params[1], params[2],
params[3], params[4], params[5]);
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v3 03/40] KVM: x86/xen: Don't truncate RAX when handling hypercall from protected guest
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
2026-05-29 22:21 ` [PATCH v3 01/40] KVM: SVM: Truncate INVLPGA address in compatibility mode Sean Christopherson
2026-05-29 22:21 ` [PATCH v3 02/40] KVM: x86/xen: Bug the VM if 32-bit KVM observes a 64-bit mode hypercall Sean Christopherson
@ 2026-05-29 22:21 ` Sean Christopherson
2026-05-29 22:21 ` [PATCH v3 04/40] KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of 64-bit mode Sean Christopherson
` (37 subsequent siblings)
40 siblings, 0 replies; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:21 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Don't truncate RAX when handling a Xen hypercall for a guest with protected
state, as KVM's ABI is to assume the guest is in 64-bit for such cases
(the guest leaving garbage in 63:32 after a transition to 32-bit mode is
far less likely than 63:32 being necessary to complete the hypercall).
Fixes: b5aead0064f3 ("KVM: x86: Assume a 64-bit hypercall for guests with protected state")
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/xen.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 6d9be74bb673..895095dc684e 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1678,15 +1678,14 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
bool handled = false;
u8 cpl;
- input = (u64)kvm_register_read(vcpu, VCPU_REGS_RAX);
-
/* Hyper-V hypercalls get bit 31 set in EAX */
- if ((input & 0x80000000) &&
+ if ((kvm_rax_read(vcpu) & 0x80000000) &&
kvm_hv_hypercall_enabled(vcpu))
return kvm_hv_hypercall(vcpu);
longmode = is_64_bit_hypercall(vcpu);
if (!longmode) {
+ input = (u32)kvm_rax_read(vcpu);
params[0] = (u32)kvm_rbx_read(vcpu);
params[1] = (u32)kvm_rcx_read(vcpu);
params[2] = (u32)kvm_rdx_read(vcpu);
@@ -1696,6 +1695,7 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
}
else {
#ifdef CONFIG_X86_64
+ input = (u64)kvm_rax_read(vcpu);
params[0] = (u64)kvm_rdi_read(vcpu);
params[1] = (u64)kvm_rsi_read(vcpu);
params[2] = (u64)kvm_rdx_read(vcpu);
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v3 04/40] KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of 64-bit mode
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (2 preceding siblings ...)
2026-05-29 22:21 ` [PATCH v3 03/40] KVM: x86/xen: Don't truncate RAX when handling hypercall from protected guest Sean Christopherson
@ 2026-05-29 22:21 ` Sean Christopherson
2026-05-29 22:48 ` sashiko-bot
2026-05-29 22:21 ` [PATCH v3 05/40] KVM: x86: Trace hypercall register *after* truncating values for 32-bit Sean Christopherson
` (36 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:21 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
When getting register values for ENCLS emulation, use kvm_register_read()
instead of kvm_<reg>_read() so that bits 63:32 of the register are dropped
if the guest is in 32-bit mode.
Note, the misleading/surprising behavior of kvm_<reg>_read() being "raw"
variants under the hood will be addressed once all non-benign bugs are
fixed.
Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions")
Fixes: b6f084ca5538 ("KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC)")
Acked-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/sgx.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index df1d0cf76947..4c61fc33f764 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -225,8 +225,8 @@ static int handle_encls_ecreate(struct kvm_vcpu *vcpu)
struct x86_exception ex;
int r;
- if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 32, 32, &pageinfo_gva) ||
- sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva))
+ if (sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RBX), 32, 32, &pageinfo_gva) ||
+ sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RCX), 4096, 4096, &secs_gva))
return 1;
/*
@@ -302,9 +302,9 @@ static int handle_encls_einit(struct kvm_vcpu *vcpu)
gpa_t sig_gpa, secs_gpa, token_gpa;
int ret, trapnr;
- if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 1808, 4096, &sig_gva) ||
- sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva) ||
- sgx_get_encls_gva(vcpu, kvm_rdx_read(vcpu), 304, 512, &token_gva))
+ if (sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RBX), 1808, 4096, &sig_gva) ||
+ sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RCX), 4096, 4096, &secs_gva) ||
+ sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RDX), 304, 512, &token_gva))
return 1;
/*
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 04/40] KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of 64-bit mode
2026-05-29 22:21 ` [PATCH v3 04/40] KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of 64-bit mode Sean Christopherson
@ 2026-05-29 22:48 ` sashiko-bot
0 siblings, 0 replies; 87+ messages in thread
From: sashiko-bot @ 2026-05-29 22:48 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues:
- [High] Incorrect 32-bit mode segment check emulation in `sgx_get_encls_gva` violates architectural bounds, alignment, and type rules.
- [High] KVM fails to enforce the architectural requirement that 64-bit pointer fields in memory-based ENCLS structures must have their upper 32 bits set to zero when executing in 32-bit mode.
--
commit ca9673f76fcf276ab1d3acd4cb535b328a49e842
Author: Sean Christopherson <seanjc@google.com>
KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of 64-bit mode
When getting register values for ENCLS emulation, use kvm_register_read()
instead of kvm_<reg>_read() so that bits 63:32 of the register are dropped
if the guest is in 32-bit mode.
> diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
> index df1d0cf769475..4c61fc33f7648 100644
> --- a/arch/x86/kvm/vmx/sgx.c
> +++ b/arch/x86/kvm/vmx/sgx.c
> @@ -225,8 +225,8 @@ static int handle_encls_ecreate(struct kvm_vcpu *vcpu)
> struct x86_exception ex;
> int r;
>
> - if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 32, 32, &pageinfo_gva) ||
> - sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva))
> + if (sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RBX), 32, 32, &pageinfo_gva) ||
> + sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RCX), 4096, 4096, &secs_gva))
> return 1;
[Severity: High]
This isn't a bug introduced by this patch, but while reviewing the 32-bit mode
fixes for ENCLS, does the existing 32-bit segment check emulation in
sgx_get_encls_gva() violate architectural bounds, alignment, and type rules?
In sgx_get_encls_gva():
} else {
*gva &= 0xffffffff;
fault = (s.unusable) ||
(s.type != 2 && s.type != 3) ||
(*gva > s.limit) ||
((s.base != 0 || s.limit != 0xffffffff) &&
(((u64)*gva + size - 1) > s.limit + 1));
}
Is it correct that this calculates the linear address (*gva = offset + s.base)
and then evaluates the segment limit against it (*gva > s.limit), rather than
checking against the effective address? A non-zero s.base could cause limits
to be evaluated incorrectly.
Also, does the check (((u64)*gva + size - 1) > s.limit + 1) allow an access
to exceed the inclusive segment limit by 1 byte without faulting?
Finally, by requiring s.type == 2 || s.type == 3 (Read/Write data segments),
does this reject valid Read-Only data segments (types 0, 1), even though
ENCLS structures being read (like PAGEINFO) only require read access?
[Severity: High]
This is a pre-existing issue, but does KVM fail to enforce the architectural
requirement that 64-bit pointer fields in memory-based ENCLS structures must
have their upper 32 bits set to zero when executing in 32-bit mode?
According to the Intel SDM for ENCLS, "Any 64-bit pointer field within the
data structures must have its upper 32 bits zero in 32-bit mode, or else
#GP(0) is generated."
Looking at handle_encls_ecreate(), KVM extracts 64-bit pointers from the
guest PAGEINFO structure:
if (sgx_get_encls_gva(vcpu, pageinfo.metadata, 64, 64, &metadata_gva) ||
sgx_get_encls_gva(vcpu, pageinfo.contents, 4096, 4096,
&contents_gva))
And passes them to sgx_get_encls_gva(). However, in 32-bit mode,
sgx_get_encls_gva() just silently masks the upper 32 bits:
} else {
*gva &= 0xffffffff;
Should this instead validate that the upper 32 bits are zero and inject a
#GP(0) to prevent invalid guest structures from being successfully processed?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260529222223.870923-1-seanjc@google.com?part=4
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 05/40] KVM: x86: Trace hypercall register *after* truncating values for 32-bit
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (3 preceding siblings ...)
2026-05-29 22:21 ` [PATCH v3 04/40] KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of 64-bit mode Sean Christopherson
@ 2026-05-29 22:21 ` Sean Christopherson
2026-05-29 22:21 ` [PATCH v3 06/40] KVM: x86: Rename kvm_cache_regs.h => regs.h Sean Christopherson
` (35 subsequent siblings)
40 siblings, 0 replies; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:21 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
When tracing hypercalls, invoke the tracepoint *after* truncating the
register values for 32-bit guests so as not to record unused garbage (in
the extremely unlikely scenario that the guest left garbage in a register
after transitioning from 64-bit mode to 32-bit mode).
Fixes: 229456fc34b1 ("KVM: convert custom marker based tracing to event traces")
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e6f1dd84f22d..634a6d0197f6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10429,8 +10429,6 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
++vcpu->stat.hypercalls;
- trace_kvm_hypercall(nr, a0, a1, a2, a3);
-
if (!op_64_bit) {
nr &= 0xFFFFFFFF;
a0 &= 0xFFFFFFFF;
@@ -10439,6 +10437,8 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
a3 &= 0xFFFFFFFF;
}
+ trace_kvm_hypercall(nr, a0, a1, a2, a3);
+
if (cpl) {
ret = -KVM_EPERM;
goto out;
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v3 06/40] KVM: x86: Rename kvm_cache_regs.h => regs.h
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (4 preceding siblings ...)
2026-05-29 22:21 ` [PATCH v3 05/40] KVM: x86: Trace hypercall register *after* truncating values for 32-bit Sean Christopherson
@ 2026-05-29 22:21 ` Sean Christopherson
2026-05-29 22:21 ` [PATCH v3 07/40] KVM: x86: Move inlined GPR, CR, and DR helpers from x86.h to regs.h Sean Christopherson
` (34 subsequent siblings)
40 siblings, 0 replies; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:21 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Rename kvm_cache_regs.h to simply regs.h, as the "cache" nomenclature is
already a lie (the file deals with state/registers that aren't cached per
se), and so that more code/functionality can be landed in the header
without making it a truly horrible misnomer.
Deliberately drop the kvm_ prefix/namespace to align with other "local"
headers, and to further differentiate regs.h from the public/global
arch/x86/include/asm/kvm_vcpu_regs.h, which sadly needs to stay in asm/
so that the number of registers can be referenced by kvm_vcpu_arch.
No functional change intended.
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/emulate.c | 2 +-
arch/x86/kvm/lapic.c | 2 +-
arch/x86/kvm/mmu.h | 2 +-
arch/x86/kvm/mmu/mmu.c | 2 +-
arch/x86/kvm/{kvm_cache_regs.h => regs.h} | 4 ++--
arch/x86/kvm/smm.c | 2 +-
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/svm/svm.h | 2 +-
arch/x86/kvm/vmx/nested.h | 2 +-
arch/x86/kvm/vmx/sgx.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.c | 2 +-
arch/x86/kvm/x86.h | 2 +-
14 files changed, 15 insertions(+), 15 deletions(-)
rename arch/x86/kvm/{kvm_cache_regs.h => regs.h} (99%)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 585a8ceab220..b566ab5c7515 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -20,7 +20,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/kvm_host.h>
-#include "kvm_cache_regs.h"
+#include "regs.h"
#include "kvm_emulate.h"
#include <linux/stringify.h>
#include <asm/debugreg.h>
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4e34f75e705d..15777869b292 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -37,7 +37,7 @@
#include <asm/delay.h>
#include <linux/atomic.h>
#include <linux/jump_label.h>
-#include "kvm_cache_regs.h"
+#include "regs.h"
#include "irq.h"
#include "ioapic.h"
#include "trace.h"
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index ddf4e467c071..e1bb663ebbd5 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -3,7 +3,7 @@
#define __KVM_X86_MMU_H
#include <linux/kvm_host.h>
-#include "kvm_cache_regs.h"
+#include "regs.h"
#include "x86.h"
#include "cpuid.h"
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c87c26bf4149..b8f2edf2cfeb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -22,7 +22,7 @@
#include "mmu_internal.h"
#include "tdp_mmu.h"
#include "x86.h"
-#include "kvm_cache_regs.h"
+#include "regs.h"
#include "smm.h"
#include "kvm_emulate.h"
#include "page_track.h"
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/regs.h
similarity index 99%
rename from arch/x86/kvm/kvm_cache_regs.h
rename to arch/x86/kvm/regs.h
index 2ae492ad6412..4440f3992fce 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/regs.h
@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef ASM_KVM_CACHE_REGS_H
-#define ASM_KVM_CACHE_REGS_H
+#ifndef ARCH_X86_KVM_REGS_H
+#define ARCH_X86_KVM_REGS_H
#include <linux/kvm_host.h>
diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index f623c5986119..a446487bdd5c 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -3,7 +3,7 @@
#include <linux/kvm_host.h>
#include "x86.h"
-#include "kvm_cache_regs.h"
+#include "regs.h"
#include "kvm_emulate.h"
#include "smm.h"
#include "cpuid.h"
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7d8a433b5c5e..d4bfda10f1df 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4,7 +4,7 @@
#include "irq.h"
#include "mmu.h"
-#include "kvm_cache_regs.h"
+#include "regs.h"
#include "x86.h"
#include "smm.h"
#include "cpuid.h"
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 87c6b105deef..cbc716885398 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -23,7 +23,7 @@
#include <asm/sev-common.h>
#include "cpuid.h"
-#include "kvm_cache_regs.h"
+#include "regs.h"
#include "x86.h"
#include "pmu.h"
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index 213a448104af..6d6cd5904ddf 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -2,7 +2,7 @@
#ifndef __KVM_X86_VMX_NESTED_H
#define __KVM_X86_VMX_NESTED_H
-#include "kvm_cache_regs.h"
+#include "regs.h"
#include "hyperv.h"
#include "vmcs12.h"
#include "vmx.h"
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 4c61fc33f764..66c315554b46 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -6,7 +6,7 @@
#include <asm/sgx.h>
#include "x86.h"
-#include "kvm_cache_regs.h"
+#include "regs.h"
#include "nested.h"
#include "sgx.h"
#include "vmx.h"
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cbc2034d7924..bb19f6df921b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -59,7 +59,7 @@
#include "hyperv.h"
#include "kvm_onhyperv.h"
#include "irq.h"
-#include "kvm_cache_regs.h"
+#include "regs.h"
#include "lapic.h"
#include "mmu.h"
#include "nested.h"
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index daedf663c0a9..de9de0d2016c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -10,7 +10,7 @@
#include <asm/posted_intr.h>
#include "capabilities.h"
-#include "../kvm_cache_regs.h"
+#include "../regs.h"
#include "pmu_intel.h"
#include "vmcs.h"
#include "vmx_ops.h"
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 634a6d0197f6..4fe2fb3ceba6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -23,7 +23,7 @@
#include "mmu.h"
#include "i8254.h"
#include "tss.h"
-#include "kvm_cache_regs.h"
+#include "regs.h"
#include "kvm_emulate.h"
#include "mmu/page_track.h"
#include "x86.h"
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a49424f9c968..a6b2be462e6d 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -6,7 +6,7 @@
#include <asm/fpu/xstate.h>
#include <asm/mce.h>
#include <asm/pvclock.h>
-#include "kvm_cache_regs.h"
+#include "regs.h"
#include "kvm_emulate.h"
#include "cpuid.h"
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v3 07/40] KVM: x86: Move inlined GPR, CR, and DR helpers from x86.h to regs.h
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (5 preceding siblings ...)
2026-05-29 22:21 ` [PATCH v3 06/40] KVM: x86: Rename kvm_cache_regs.h => regs.h Sean Christopherson
@ 2026-05-29 22:21 ` Sean Christopherson
2026-05-29 22:21 ` [PATCH v3 08/40] KVM: x86: Add mode-aware versions of kvm_<reg>_{read,write}() helpers Sean Christopherson
` (33 subsequent siblings)
40 siblings, 0 replies; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:21 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Move inlined General Purpose Register, Control Register, and Debug
Register helpers from x86.h to the aptly named regs.h, to help trim
down x86.h (and x86.c in the future).
Move *very* select EFER functionality as well, but leave behind the bulk of
EFER handling and all other MSR handling. There is more than enough MSR
code to carve out msrs.{c,h} in the future. Give is_long_bit_mode()
special treatment as it's more along the lines of a CR4 bit check, but just
happens to be accessed through an MSR interface. And more importantly,
because giving regs.h access to is_long_bit_mode() greatly simplifies
dependency chains.
No functional change intended.
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/regs.h | 123 ++++++++++++++++++++++++++++++++++++++++++--
arch/x86/kvm/x86.h | 117 -----------------------------------------
2 files changed, 120 insertions(+), 120 deletions(-)
diff --git a/arch/x86/kvm/regs.h b/arch/x86/kvm/regs.h
index 4440f3992fce..62cc9deea226 100644
--- a/arch/x86/kvm/regs.h
+++ b/arch/x86/kvm/regs.h
@@ -16,6 +16,37 @@
static_assert(!(KVM_POSSIBLE_CR0_GUEST_BITS & X86_CR0_PDPTR_BITS));
+static inline bool is_long_mode(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_X86_64
+ return !!(vcpu->arch.efer & EFER_LMA);
+#else
+ return false;
+#endif
+}
+
+static inline bool is_64_bit_mode(struct kvm_vcpu *vcpu)
+{
+ int cs_db, cs_l;
+
+ WARN_ON_ONCE(vcpu->arch.guest_state_protected);
+
+ if (!is_long_mode(vcpu))
+ return false;
+ kvm_x86_call(get_cs_db_l_bits)(vcpu, &cs_db, &cs_l);
+ return cs_l;
+}
+
+static inline bool is_64_bit_hypercall(struct kvm_vcpu *vcpu)
+{
+ /*
+ * If running with protected guest state, the CS register is not
+ * accessible. The hypercall register values will have had to been
+ * provided in 64-bit mode, so assume the guest is in 64-bit.
+ */
+ return vcpu->arch.guest_state_protected || is_64_bit_mode(vcpu);
+}
+
#define BUILD_KVM_GPR_ACCESSORS(lname, uname) \
static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *vcpu)\
{ \
@@ -143,6 +174,13 @@ static inline unsigned long kvm_register_read_raw(struct kvm_vcpu *vcpu, int reg
return vcpu->arch.regs[reg];
}
+static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, int reg)
+{
+ unsigned long val = kvm_register_read_raw(vcpu, reg);
+
+ return is_64_bit_mode(vcpu) ? val : (u32)val;
+}
+
static inline void kvm_register_write_raw(struct kvm_vcpu *vcpu, int reg,
unsigned long val)
{
@@ -153,6 +191,14 @@ static inline void kvm_register_write_raw(struct kvm_vcpu *vcpu, int reg,
kvm_register_mark_dirty(vcpu, reg);
}
+static inline void kvm_register_write(struct kvm_vcpu *vcpu,
+ int reg, unsigned long val)
+{
+ if (!is_64_bit_mode(vcpu))
+ val = (u32)val;
+ return kvm_register_write_raw(vcpu, reg, val);
+}
+
static inline unsigned long kvm_rip_read(struct kvm_vcpu *vcpu)
{
if (!kvm_register_is_available(vcpu, VCPU_REG_RIP))
@@ -177,6 +223,12 @@ static inline void kvm_rsp_write(struct kvm_vcpu *vcpu, unsigned long val)
kvm_register_write_raw(vcpu, VCPU_REGS_RSP, val);
}
+static inline u64 kvm_read_edx_eax(struct kvm_vcpu *vcpu)
+{
+ return (kvm_rax_read(vcpu) & -1u)
+ | ((u64)(kvm_rdx_read(vcpu) & -1u) << 32);
+}
+
static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index)
{
might_sleep(); /* on svm */
@@ -243,10 +295,75 @@ static inline ulong kvm_read_cr4(struct kvm_vcpu *vcpu)
return kvm_read_cr4_bits(vcpu, ~0UL);
}
-static inline u64 kvm_read_edx_eax(struct kvm_vcpu *vcpu)
+static inline bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
{
- return (kvm_rax_read(vcpu) & -1u)
- | ((u64)(kvm_rdx_read(vcpu) & -1u) << 32);
+ return !(cr4 & vcpu->arch.cr4_guest_rsvd_bits);
+}
+
+#define __cr4_reserved_bits(__cpu_has, __c) \
+({ \
+ u64 __reserved_bits = CR4_RESERVED_BITS; \
+ \
+ if (!__cpu_has(__c, X86_FEATURE_XSAVE)) \
+ __reserved_bits |= X86_CR4_OSXSAVE; \
+ if (!__cpu_has(__c, X86_FEATURE_SMEP)) \
+ __reserved_bits |= X86_CR4_SMEP; \
+ if (!__cpu_has(__c, X86_FEATURE_SMAP)) \
+ __reserved_bits |= X86_CR4_SMAP; \
+ if (!__cpu_has(__c, X86_FEATURE_FSGSBASE)) \
+ __reserved_bits |= X86_CR4_FSGSBASE; \
+ if (!__cpu_has(__c, X86_FEATURE_PKU)) \
+ __reserved_bits |= X86_CR4_PKE; \
+ if (!__cpu_has(__c, X86_FEATURE_LA57)) \
+ __reserved_bits |= X86_CR4_LA57; \
+ if (!__cpu_has(__c, X86_FEATURE_UMIP)) \
+ __reserved_bits |= X86_CR4_UMIP; \
+ if (!__cpu_has(__c, X86_FEATURE_VMX)) \
+ __reserved_bits |= X86_CR4_VMXE; \
+ if (!__cpu_has(__c, X86_FEATURE_PCID)) \
+ __reserved_bits |= X86_CR4_PCIDE; \
+ if (!__cpu_has(__c, X86_FEATURE_LAM)) \
+ __reserved_bits |= X86_CR4_LAM_SUP; \
+ if (!__cpu_has(__c, X86_FEATURE_SHSTK) && \
+ !__cpu_has(__c, X86_FEATURE_IBT)) \
+ __reserved_bits |= X86_CR4_CET; \
+ __reserved_bits; \
+})
+
+static inline bool is_protmode(struct kvm_vcpu *vcpu)
+{
+ return kvm_is_cr0_bit_set(vcpu, X86_CR0_PE);
+}
+
+static inline bool is_pae(struct kvm_vcpu *vcpu)
+{
+ return kvm_is_cr4_bit_set(vcpu, X86_CR4_PAE);
+}
+
+static inline bool is_pse(struct kvm_vcpu *vcpu)
+{
+ return kvm_is_cr4_bit_set(vcpu, X86_CR4_PSE);
+}
+
+static inline bool is_paging(struct kvm_vcpu *vcpu)
+{
+ return likely(kvm_is_cr0_bit_set(vcpu, X86_CR0_PG));
+}
+
+static inline bool is_pae_paging(struct kvm_vcpu *vcpu)
+{
+ return !is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu);
+}
+
+static inline bool kvm_dr7_valid(u64 data)
+{
+ /* Bits [63:32] are reserved */
+ return !(data >> 32);
+}
+static inline bool kvm_dr6_valid(u64 data)
+{
+ /* Bits [63:32] are reserved */
+ return !(data >> 32);
}
static inline void enter_guest_mode(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a6b2be462e6d..3845b10020c9 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -243,42 +243,6 @@ static inline bool kvm_exception_is_soft(unsigned int nr)
return (nr == BP_VECTOR) || (nr == OF_VECTOR);
}
-static inline bool is_protmode(struct kvm_vcpu *vcpu)
-{
- return kvm_is_cr0_bit_set(vcpu, X86_CR0_PE);
-}
-
-static inline bool is_long_mode(struct kvm_vcpu *vcpu)
-{
-#ifdef CONFIG_X86_64
- return !!(vcpu->arch.efer & EFER_LMA);
-#else
- return false;
-#endif
-}
-
-static inline bool is_64_bit_mode(struct kvm_vcpu *vcpu)
-{
- int cs_db, cs_l;
-
- WARN_ON_ONCE(vcpu->arch.guest_state_protected);
-
- if (!is_long_mode(vcpu))
- return false;
- kvm_x86_call(get_cs_db_l_bits)(vcpu, &cs_db, &cs_l);
- return cs_l;
-}
-
-static inline bool is_64_bit_hypercall(struct kvm_vcpu *vcpu)
-{
- /*
- * If running with protected guest state, the CS register is not
- * accessible. The hypercall register values will have had to been
- * provided in 64-bit mode, so assume the guest is in 64-bit.
- */
- return vcpu->arch.guest_state_protected || is_64_bit_mode(vcpu);
-}
-
static inline bool x86_exception_has_error_code(unsigned int vector)
{
static u32 exception_has_error_code = BIT(DF_VECTOR) | BIT(TS_VECTOR) |
@@ -293,26 +257,6 @@ static inline bool mmu_is_nested(struct kvm_vcpu *vcpu)
return vcpu->arch.walk_mmu == &vcpu->arch.nested_mmu;
}
-static inline bool is_pae(struct kvm_vcpu *vcpu)
-{
- return kvm_is_cr4_bit_set(vcpu, X86_CR4_PAE);
-}
-
-static inline bool is_pse(struct kvm_vcpu *vcpu)
-{
- return kvm_is_cr4_bit_set(vcpu, X86_CR4_PSE);
-}
-
-static inline bool is_paging(struct kvm_vcpu *vcpu)
-{
- return likely(kvm_is_cr0_bit_set(vcpu, X86_CR0_PG));
-}
-
-static inline bool is_pae_paging(struct kvm_vcpu *vcpu)
-{
- return !is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu);
-}
-
static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
{
return kvm_is_cr4_bit_set(vcpu, X86_CR4_LA57) ? 57 : 48;
@@ -421,21 +365,6 @@ static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
return false;
}
-static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, int reg)
-{
- unsigned long val = kvm_register_read_raw(vcpu, reg);
-
- return is_64_bit_mode(vcpu) ? val : (u32)val;
-}
-
-static inline void kvm_register_write(struct kvm_vcpu *vcpu,
- int reg, unsigned long val)
-{
- if (!is_64_bit_mode(vcpu))
- val = (u32)val;
- return kvm_register_write_raw(vcpu, reg, val);
-}
-
static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
{
return !(kvm->arch.disabled_quirks & quirk);
@@ -627,17 +556,6 @@ static inline bool kvm_pat_valid(u64 data)
return (data | ((data & 0x0202020202020202ull) << 1)) == data;
}
-static inline bool kvm_dr7_valid(u64 data)
-{
- /* Bits [63:32] are reserved */
- return !(data >> 32);
-}
-static inline bool kvm_dr6_valid(u64 data)
-{
- /* Bits [63:32] are reserved */
- return !(data >> 32);
-}
-
/*
* Trigger machine check on the host. We assume all the MSRs are already set up
* by the CPU and that we still run on the same CPU as the MCE occurred on.
@@ -684,41 +602,6 @@ enum kvm_msr_access {
#define KVM_MSR_RET_UNSUPPORTED 2
#define KVM_MSR_RET_FILTERED 3
-static inline bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
-{
- return !(cr4 & vcpu->arch.cr4_guest_rsvd_bits);
-}
-
-#define __cr4_reserved_bits(__cpu_has, __c) \
-({ \
- u64 __reserved_bits = CR4_RESERVED_BITS; \
- \
- if (!__cpu_has(__c, X86_FEATURE_XSAVE)) \
- __reserved_bits |= X86_CR4_OSXSAVE; \
- if (!__cpu_has(__c, X86_FEATURE_SMEP)) \
- __reserved_bits |= X86_CR4_SMEP; \
- if (!__cpu_has(__c, X86_FEATURE_SMAP)) \
- __reserved_bits |= X86_CR4_SMAP; \
- if (!__cpu_has(__c, X86_FEATURE_FSGSBASE)) \
- __reserved_bits |= X86_CR4_FSGSBASE; \
- if (!__cpu_has(__c, X86_FEATURE_PKU)) \
- __reserved_bits |= X86_CR4_PKE; \
- if (!__cpu_has(__c, X86_FEATURE_LA57)) \
- __reserved_bits |= X86_CR4_LA57; \
- if (!__cpu_has(__c, X86_FEATURE_UMIP)) \
- __reserved_bits |= X86_CR4_UMIP; \
- if (!__cpu_has(__c, X86_FEATURE_VMX)) \
- __reserved_bits |= X86_CR4_VMXE; \
- if (!__cpu_has(__c, X86_FEATURE_PCID)) \
- __reserved_bits |= X86_CR4_PCIDE; \
- if (!__cpu_has(__c, X86_FEATURE_LAM)) \
- __reserved_bits |= X86_CR4_LAM_SUP; \
- if (!__cpu_has(__c, X86_FEATURE_SHSTK) && \
- !__cpu_has(__c, X86_FEATURE_IBT)) \
- __reserved_bits |= X86_CR4_CET; \
- __reserved_bits; \
-})
-
int kvm_sev_es_mmio(struct kvm_vcpu *vcpu, bool is_write, gpa_t gpa,
unsigned int bytes, void *data);
int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v3 08/40] KVM: x86: Add mode-aware versions of kvm_<reg>_{read,write}() helpers
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (6 preceding siblings ...)
2026-05-29 22:21 ` [PATCH v3 07/40] KVM: x86: Move inlined GPR, CR, and DR helpers from x86.h to regs.h Sean Christopherson
@ 2026-05-29 22:21 ` Sean Christopherson
2026-06-03 11:15 ` Huang, Kai
2026-05-29 22:21 ` [PATCH v3 09/40] KVM: x86: Drop non-raw kvm_<reg>_write() helpers Sean Christopherson
` (32 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:21 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Make kvm_<reg>_{read,write}() mode-aware (where the value is truncated to
32 bits if the vCPU isn't in 64-bit mode), and convert all the intentional
"raw" accesses to kvm_<reg>_{read,write}_raw() versions. To avoid
confusion and bikeshedding over whether or not explicit 32-bit accesses
should use the "raw" or mode-aware variants, add and use "e" versions, e.g.
for things like RDMSR, WRMSR, and CPUID, where the instruction uses only
bits 31:0, regardless of mode.
No functional change intended (all use of "e" versions is for cases where
the value is already truncated due to bouncing through a u32).
Cc: Binbin Wu <binbin.wu@linux.intel.com>
Cc: Kai Huang <kai.huang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/cpuid.c | 12 ++--
arch/x86/kvm/hyperv.c | 21 +++----
arch/x86/kvm/hyperv.h | 4 +-
arch/x86/kvm/regs.h | 88 +++++++++++++++++----------
arch/x86/kvm/svm/nested.c | 6 +-
arch/x86/kvm/svm/svm.c | 13 ++--
arch/x86/kvm/vmx/nested.c | 8 +--
arch/x86/kvm/vmx/sgx.c | 4 +-
arch/x86/kvm/vmx/tdx.c | 18 +++---
arch/x86/kvm/x86.c | 121 +++++++++++++++++++-------------------
arch/x86/kvm/xen.c | 32 +++++-----
11 files changed, 173 insertions(+), 154 deletions(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 8e5340dd2621..fd3b02575cd0 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -2166,13 +2166,13 @@ int kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
return 1;
}
- eax = kvm_rax_read(vcpu);
- ecx = kvm_rcx_read(vcpu);
+ eax = kvm_eax_read(vcpu);
+ ecx = kvm_ecx_read(vcpu);
kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx, false);
- kvm_rax_write(vcpu, eax);
- kvm_rbx_write(vcpu, ebx);
- kvm_rcx_write(vcpu, ecx);
- kvm_rdx_write(vcpu, edx);
+ kvm_eax_write(vcpu, eax);
+ kvm_ebx_write(vcpu, ebx);
+ kvm_ecx_write(vcpu, ecx);
+ kvm_edx_write(vcpu, edx);
return kvm_skip_emulated_instruction(vcpu);
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_cpuid);
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 015c6947b462..3551af9a9453 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2377,10 +2377,10 @@ static void kvm_hv_hypercall_set_result(struct kvm_vcpu *vcpu, u64 result)
longmode = is_64_bit_hypercall(vcpu);
if (longmode)
- kvm_rax_write(vcpu, result);
+ kvm_rax_write_raw(vcpu, result);
else {
- kvm_rdx_write(vcpu, result >> 32);
- kvm_rax_write(vcpu, result & 0xffffffff);
+ kvm_edx_write(vcpu, result >> 32);
+ kvm_eax_write(vcpu, result);
}
}
@@ -2544,18 +2544,15 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
#ifdef CONFIG_X86_64
if (is_64_bit_hypercall(vcpu)) {
- hc.param = kvm_rcx_read(vcpu);
- hc.ingpa = kvm_rdx_read(vcpu);
- hc.outgpa = kvm_r8_read(vcpu);
+ hc.param = kvm_rcx_read_raw(vcpu);
+ hc.ingpa = kvm_rdx_read_raw(vcpu);
+ hc.outgpa = kvm_r8_read_raw(vcpu);
} else
#endif
{
- hc.param = ((u64)kvm_rdx_read(vcpu) << 32) |
- (kvm_rax_read(vcpu) & 0xffffffff);
- hc.ingpa = ((u64)kvm_rbx_read(vcpu) << 32) |
- (kvm_rcx_read(vcpu) & 0xffffffff);
- hc.outgpa = ((u64)kvm_rdi_read(vcpu) << 32) |
- (kvm_rsi_read(vcpu) & 0xffffffff);
+ hc.param = ((u64)kvm_edx_read(vcpu) << 32) | kvm_eax_read(vcpu);
+ hc.ingpa = ((u64)kvm_ebx_read(vcpu) << 32) | kvm_ecx_read(vcpu);
+ hc.outgpa = ((u64)kvm_edi_read(vcpu) << 32) | kvm_esi_read(vcpu);
}
hc.code = hc.param & 0xffff;
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 6301f79fcbae..65e89ed65349 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -232,8 +232,8 @@ static inline bool kvm_hv_is_tlb_flush_hcall(struct kvm_vcpu *vcpu)
if (!hv_vcpu)
return false;
- code = is_64_bit_hypercall(vcpu) ? kvm_rcx_read(vcpu) :
- kvm_rax_read(vcpu);
+ code = is_64_bit_hypercall(vcpu) ? kvm_rcx_read_raw(vcpu) :
+ kvm_eax_read(vcpu);
return (code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE ||
code == HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST ||
diff --git a/arch/x86/kvm/regs.h b/arch/x86/kvm/regs.h
index 62cc9deea226..12db5039aace 100644
--- a/arch/x86/kvm/regs.h
+++ b/arch/x86/kvm/regs.h
@@ -47,32 +47,61 @@ static inline bool is_64_bit_hypercall(struct kvm_vcpu *vcpu)
return vcpu->arch.guest_state_protected || is_64_bit_mode(vcpu);
}
-#define BUILD_KVM_GPR_ACCESSORS(lname, uname) \
-static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *vcpu)\
-{ \
- return vcpu->arch.regs[VCPU_REGS_##uname]; \
-} \
-static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu, \
- unsigned long val) \
-{ \
- vcpu->arch.regs[VCPU_REGS_##uname] = val; \
+static __always_inline unsigned long kvm_reg_mode_mask(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_X86_64
+ return is_64_bit_mode(vcpu) ? GENMASK(63, 0) : GENMASK(31, 0);
+#else
+ return GENMASK(31, 0);
+#endif
+}
+
+#define __BUILD_KVM_GPR_ACCESSORS(lname, uname) \
+static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *vcpu) \
+{ \
+ return vcpu->arch.regs[VCPU_REGS_##uname] & kvm_reg_mode_mask(vcpu); \
+} \
+static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu, \
+ unsigned long val) \
+{ \
+ vcpu->arch.regs[VCPU_REGS_##uname] = val & kvm_reg_mode_mask(vcpu); \
+} \
+static __always_inline unsigned long kvm_##lname##_read_raw(struct kvm_vcpu *vcpu) \
+{ \
+ return vcpu->arch.regs[VCPU_REGS_##uname]; \
+} \
+static __always_inline void kvm_##lname##_write_raw(struct kvm_vcpu *vcpu, \
+ unsigned long val) \
+{ \
+ vcpu->arch.regs[VCPU_REGS_##uname] = val; \
}
-BUILD_KVM_GPR_ACCESSORS(rax, RAX)
-BUILD_KVM_GPR_ACCESSORS(rbx, RBX)
-BUILD_KVM_GPR_ACCESSORS(rcx, RCX)
-BUILD_KVM_GPR_ACCESSORS(rdx, RDX)
-BUILD_KVM_GPR_ACCESSORS(rbp, RBP)
-BUILD_KVM_GPR_ACCESSORS(rsi, RSI)
-BUILD_KVM_GPR_ACCESSORS(rdi, RDI)
+#define BUILD_KVM_GPR_ACCESSORS(lname, uname) \
+static __always_inline u32 kvm_e##lname##_read(struct kvm_vcpu *vcpu) \
+{ \
+ return vcpu->arch.regs[VCPU_REGS_##uname]; \
+} \
+static __always_inline void kvm_e##lname##_write(struct kvm_vcpu *vcpu, u32 val) \
+{ \
+ vcpu->arch.regs[VCPU_REGS_##uname] = val; \
+} \
+__BUILD_KVM_GPR_ACCESSORS(r##lname, uname)
+
+BUILD_KVM_GPR_ACCESSORS(ax, RAX)
+BUILD_KVM_GPR_ACCESSORS(bx, RBX)
+BUILD_KVM_GPR_ACCESSORS(cx, RCX)
+BUILD_KVM_GPR_ACCESSORS(dx, RDX)
+BUILD_KVM_GPR_ACCESSORS(bp, RBP)
+BUILD_KVM_GPR_ACCESSORS(si, RSI)
+BUILD_KVM_GPR_ACCESSORS(di, RDI)
#ifdef CONFIG_X86_64
-BUILD_KVM_GPR_ACCESSORS(r8, R8)
-BUILD_KVM_GPR_ACCESSORS(r9, R9)
-BUILD_KVM_GPR_ACCESSORS(r10, R10)
-BUILD_KVM_GPR_ACCESSORS(r11, R11)
-BUILD_KVM_GPR_ACCESSORS(r12, R12)
-BUILD_KVM_GPR_ACCESSORS(r13, R13)
-BUILD_KVM_GPR_ACCESSORS(r14, R14)
-BUILD_KVM_GPR_ACCESSORS(r15, R15)
+__BUILD_KVM_GPR_ACCESSORS(r8, R8)
+__BUILD_KVM_GPR_ACCESSORS(r9, R9)
+__BUILD_KVM_GPR_ACCESSORS(r10, R10)
+__BUILD_KVM_GPR_ACCESSORS(r11, R11)
+__BUILD_KVM_GPR_ACCESSORS(r12, R12)
+__BUILD_KVM_GPR_ACCESSORS(r13, R13)
+__BUILD_KVM_GPR_ACCESSORS(r14, R14)
+__BUILD_KVM_GPR_ACCESSORS(r15, R15)
#endif
/*
@@ -176,9 +205,7 @@ static inline unsigned long kvm_register_read_raw(struct kvm_vcpu *vcpu, int reg
static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, int reg)
{
- unsigned long val = kvm_register_read_raw(vcpu, reg);
-
- return is_64_bit_mode(vcpu) ? val : (u32)val;
+ return kvm_register_read_raw(vcpu, reg) & kvm_reg_mode_mask(vcpu);
}
static inline void kvm_register_write_raw(struct kvm_vcpu *vcpu, int reg,
@@ -194,9 +221,7 @@ static inline void kvm_register_write_raw(struct kvm_vcpu *vcpu, int reg,
static inline void kvm_register_write(struct kvm_vcpu *vcpu,
int reg, unsigned long val)
{
- if (!is_64_bit_mode(vcpu))
- val = (u32)val;
- return kvm_register_write_raw(vcpu, reg, val);
+ return kvm_register_write_raw(vcpu, reg, val & kvm_reg_mode_mask(vcpu));
}
static inline unsigned long kvm_rip_read(struct kvm_vcpu *vcpu)
@@ -225,8 +250,7 @@ static inline void kvm_rsp_write(struct kvm_vcpu *vcpu, unsigned long val)
static inline u64 kvm_read_edx_eax(struct kvm_vcpu *vcpu)
{
- return (kvm_rax_read(vcpu) & -1u)
- | ((u64)(kvm_rdx_read(vcpu) & -1u) << 32);
+ return kvm_eax_read(vcpu) | (u64)(kvm_edx_read(vcpu)) << 32;
}
static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 7ad4b4fb7a1c..d817dbb350d6 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -793,7 +793,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm)
svm->vcpu.arch.cr2 = save->cr2;
- kvm_rax_write(vcpu, save->rax);
+ kvm_rax_write_raw(vcpu, save->rax);
kvm_rsp_write(vcpu, save->rsp);
kvm_rip_write(vcpu, save->rip);
@@ -1272,7 +1272,7 @@ static int nested_svm_vmexit_update_vmcb12(struct kvm_vcpu *vcpu)
vmcb12->save.rflags = kvm_get_rflags(vcpu);
vmcb12->save.rip = kvm_rip_read(vcpu);
vmcb12->save.rsp = kvm_rsp_read(vcpu);
- vmcb12->save.rax = kvm_rax_read(vcpu);
+ vmcb12->save.rax = kvm_rax_read_raw(vcpu);
vmcb12->save.dr7 = vmcb02->save.dr7;
vmcb12->save.dr6 = svm->vcpu.arch.dr6;
vmcb12->save.cpl = vmcb02->save.cpl;
@@ -1424,7 +1424,7 @@ void nested_svm_vmexit(struct vcpu_svm *svm)
svm_set_efer(vcpu, vmcb01->save.efer);
svm_set_cr0(vcpu, vmcb01->save.cr0 | X86_CR0_PE);
svm_set_cr4(vcpu, vmcb01->save.cr4);
- kvm_rax_write(vcpu, vmcb01->save.rax);
+ kvm_rax_write_raw(vcpu, vmcb01->save.rax);
kvm_rsp_write(vcpu, vmcb01->save.rsp);
kvm_rip_write(vcpu, vmcb01->save.rip);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d4bfda10f1df..8402e94ac094 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2409,15 +2409,12 @@ static int clgi_interception(struct kvm_vcpu *vcpu)
static int invlpga_interception(struct kvm_vcpu *vcpu)
{
- gva_t gva = kvm_rax_read(vcpu);
- u32 asid = kvm_rcx_read(vcpu);
-
- if (nested_svm_check_permissions(vcpu))
- return 1;
-
/* FIXME: Handle an address size prefix. */
- if (!is_64_bit_mode(vcpu))
- gva = (u32)gva;
+ gva_t gva = kvm_rax_read(vcpu);
+ u32 asid = kvm_ecx_read(vcpu);
+
+ if (nested_svm_check_permissions(vcpu))
+ return 1;
trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, asid, gva);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 30dcabc899a2..b2c851cc7d5c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6165,7 +6165,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
static int nested_vmx_eptp_switching(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12)
{
- u32 index = kvm_rcx_read(vcpu);
+ u32 index = kvm_ecx_read(vcpu);
u64 new_eptp;
if (WARN_ON_ONCE(!nested_cpu_has_ept(vmcs12)))
@@ -6199,7 +6199,7 @@ static int handle_vmfunc(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct vmcs12 *vmcs12;
- u32 function = kvm_rax_read(vcpu);
+ u32 function = kvm_eax_read(vcpu);
/*
* VMFUNC should never execute cleanly while L1 is active; KVM supports
@@ -6321,7 +6321,7 @@ static bool nested_vmx_exit_handled_msr(struct kvm_vcpu *vcpu,
exit_reason.basic == EXIT_REASON_MSR_WRITE_IMM)
msr_index = vmx_get_exit_qual(vcpu);
else
- msr_index = kvm_rcx_read(vcpu);
+ msr_index = kvm_ecx_read(vcpu);
/*
* The MSR_BITMAP page is divided into four 1024-byte bitmaps,
@@ -6431,7 +6431,7 @@ static bool nested_vmx_exit_handled_encls(struct kvm_vcpu *vcpu,
!nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENCLS_EXITING))
return false;
- encls_leaf = kvm_rax_read(vcpu);
+ encls_leaf = kvm_eax_read(vcpu);
if (encls_leaf > 62)
encls_leaf = 63;
return vmcs12->encls_exiting_bitmap & BIT_ULL(encls_leaf);
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 66c315554b46..2f5a1c58f3c5 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -352,7 +352,7 @@ static int handle_encls_einit(struct kvm_vcpu *vcpu)
rflags &= ~X86_EFLAGS_ZF;
vmx_set_rflags(vcpu, rflags);
- kvm_rax_write(vcpu, ret);
+ kvm_eax_write(vcpu, ret);
return kvm_skip_emulated_instruction(vcpu);
}
@@ -380,7 +380,7 @@ static inline bool sgx_enabled_in_guest_bios(struct kvm_vcpu *vcpu)
int handle_encls(struct kvm_vcpu *vcpu)
{
- u32 leaf = (u32)kvm_rax_read(vcpu);
+ u32 leaf = kvm_eax_read(vcpu);
if (!enable_sgx || !guest_cpu_cap_has(vcpu, X86_FEATURE_SGX) ||
!guest_cpu_cap_has(vcpu, X86_FEATURE_SGX1)) {
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 77e3d1bb24cb..ffe9d0db58c5 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1169,11 +1169,11 @@ static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu)
{
- kvm_rax_write(vcpu, to_tdx(vcpu)->vp_enter_args.r10);
- kvm_rbx_write(vcpu, to_tdx(vcpu)->vp_enter_args.r11);
- kvm_rcx_write(vcpu, to_tdx(vcpu)->vp_enter_args.r12);
- kvm_rdx_write(vcpu, to_tdx(vcpu)->vp_enter_args.r13);
- kvm_rsi_write(vcpu, to_tdx(vcpu)->vp_enter_args.r14);
+ kvm_rax_write_raw(vcpu, to_tdx(vcpu)->vp_enter_args.r10);
+ kvm_rbx_write_raw(vcpu, to_tdx(vcpu)->vp_enter_args.r11);
+ kvm_rcx_write_raw(vcpu, to_tdx(vcpu)->vp_enter_args.r12);
+ kvm_rdx_write_raw(vcpu, to_tdx(vcpu)->vp_enter_args.r13);
+ kvm_rsi_write_raw(vcpu, to_tdx(vcpu)->vp_enter_args.r14);
return __kvm_emulate_hypercall(vcpu, 0, complete_hypercall_exit);
}
@@ -2107,12 +2107,12 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
case EXIT_REASON_IO_INSTRUCTION:
return tdx_emulate_io(vcpu);
case EXIT_REASON_MSR_READ:
- kvm_rcx_write(vcpu, tdx->vp_enter_args.r12);
+ kvm_ecx_write(vcpu, tdx->vp_enter_args.r12);
return kvm_emulate_rdmsr(vcpu);
case EXIT_REASON_MSR_WRITE:
- kvm_rcx_write(vcpu, tdx->vp_enter_args.r12);
- kvm_rax_write(vcpu, tdx->vp_enter_args.r13 & -1u);
- kvm_rdx_write(vcpu, tdx->vp_enter_args.r13 >> 32);
+ kvm_ecx_write(vcpu, tdx->vp_enter_args.r12);
+ kvm_eax_write(vcpu, tdx->vp_enter_args.r13);
+ kvm_edx_write(vcpu, tdx->vp_enter_args.r13 >> 32);
return kvm_emulate_wrmsr(vcpu);
case EXIT_REASON_EPT_MISCONFIG:
return tdx_emulate_mmio(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4fe2fb3ceba6..a102269a7af9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1300,7 +1300,7 @@ int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
{
/* Note, #UD due to CR4.OSXSAVE=0 has priority over the intercept. */
if (kvm_x86_call(get_cpl)(vcpu) != 0 ||
- __kvm_set_xcr(vcpu, kvm_rcx_read(vcpu), kvm_read_edx_eax(vcpu))) {
+ __kvm_set_xcr(vcpu, kvm_ecx_read(vcpu), kvm_read_edx_eax(vcpu))) {
kvm_inject_gp(vcpu, 0);
return 1;
}
@@ -1597,7 +1597,7 @@ static unsigned long kvm_get_effective_dr7(struct kvm_vcpu *vcpu)
int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu)
{
- u32 pmc = kvm_rcx_read(vcpu);
+ u32 pmc = kvm_ecx_read(vcpu);
u64 data;
if (kvm_pmu_rdpmc(vcpu, pmc, &data)) {
@@ -1605,8 +1605,8 @@ int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu)
return 1;
}
- kvm_rax_write(vcpu, (u32)data);
- kvm_rdx_write(vcpu, data >> 32);
+ kvm_eax_write(vcpu, data);
+ kvm_edx_write(vcpu, data >> 32);
return kvm_skip_emulated_instruction(vcpu);
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdpmc);
@@ -2053,8 +2053,8 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_msr_write);
static void complete_userspace_rdmsr(struct kvm_vcpu *vcpu)
{
if (!vcpu->run->msr.error) {
- kvm_rax_write(vcpu, (u32)vcpu->run->msr.data);
- kvm_rdx_write(vcpu, vcpu->run->msr.data >> 32);
+ kvm_eax_write(vcpu, vcpu->run->msr.data);
+ kvm_edx_write(vcpu, vcpu->run->msr.data >> 32);
}
}
@@ -2135,8 +2135,8 @@ static int __kvm_emulate_rdmsr(struct kvm_vcpu *vcpu, u32 msr, int reg,
trace_kvm_msr_read(msr, data);
if (reg < 0) {
- kvm_rax_write(vcpu, data & -1u);
- kvm_rdx_write(vcpu, (data >> 32) & -1u);
+ kvm_eax_write(vcpu, data);
+ kvm_edx_write(vcpu, data >> 32);
} else {
kvm_register_write(vcpu, reg, data);
}
@@ -2153,7 +2153,7 @@ static int __kvm_emulate_rdmsr(struct kvm_vcpu *vcpu, u32 msr, int reg,
int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
{
- return __kvm_emulate_rdmsr(vcpu, kvm_rcx_read(vcpu), -1,
+ return __kvm_emulate_rdmsr(vcpu, kvm_ecx_read(vcpu), -1,
complete_fast_rdmsr);
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdmsr);
@@ -2189,7 +2189,7 @@ static int __kvm_emulate_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
{
- return __kvm_emulate_wrmsr(vcpu, kvm_rcx_read(vcpu),
+ return __kvm_emulate_wrmsr(vcpu, kvm_ecx_read(vcpu),
kvm_read_edx_eax(vcpu));
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_wrmsr);
@@ -2299,7 +2299,7 @@ static fastpath_t __handle_fastpath_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 da
fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu)
{
- return __handle_fastpath_wrmsr(vcpu, kvm_rcx_read(vcpu),
+ return __handle_fastpath_wrmsr(vcpu, kvm_ecx_read(vcpu),
kvm_read_edx_eax(vcpu));
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_wrmsr);
@@ -9690,7 +9690,7 @@ static int complete_fast_pio_out(struct kvm_vcpu *vcpu)
static int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size,
unsigned short port)
{
- unsigned long val = kvm_rax_read(vcpu);
+ unsigned long val = kvm_rax_read_raw(vcpu);
int ret = emulator_pio_out(vcpu, size, port, &val, 1);
if (ret)
@@ -9726,10 +9726,10 @@ static int complete_fast_pio_in(struct kvm_vcpu *vcpu)
}
/* For size less than 4 we merge, else we zero extend */
- val = (vcpu->arch.pio.size < 4) ? kvm_rax_read(vcpu) : 0;
+ val = (vcpu->arch.pio.size < 4) ? kvm_rax_read_raw(vcpu) : 0;
complete_emulator_pio_in(vcpu, &val);
- kvm_rax_write(vcpu, val);
+ kvm_rax_write_raw(vcpu, val);
return kvm_skip_emulated_instruction(vcpu);
}
@@ -9741,11 +9741,11 @@ static int kvm_fast_pio_in(struct kvm_vcpu *vcpu, int size,
int ret;
/* For size less than 4 we merge, else we zero extend */
- val = (size < 4) ? kvm_rax_read(vcpu) : 0;
+ val = (size < 4) ? kvm_rax_read_raw(vcpu) : 0;
ret = emulator_pio_in(vcpu, size, port, &val, 1);
if (ret) {
- kvm_rax_write(vcpu, val);
+ kvm_rax_write_raw(vcpu, val);
return ret;
}
@@ -10412,29 +10412,30 @@ static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
if (!is_64_bit_hypercall(vcpu))
ret = (u32)ret;
- kvm_rax_write(vcpu, ret);
+ kvm_rax_write_raw(vcpu, ret);
return kvm_skip_emulated_instruction(vcpu);
}
int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
int (*complete_hypercall)(struct kvm_vcpu *))
{
- unsigned long ret;
- unsigned long nr = kvm_rax_read(vcpu);
- unsigned long a0 = kvm_rbx_read(vcpu);
- unsigned long a1 = kvm_rcx_read(vcpu);
- unsigned long a2 = kvm_rdx_read(vcpu);
- unsigned long a3 = kvm_rsi_read(vcpu);
int op_64_bit = is_64_bit_hypercall(vcpu);
+ unsigned long ret, nr, a0, a1, a2, a3;
++vcpu->stat.hypercalls;
- if (!op_64_bit) {
- nr &= 0xFFFFFFFF;
- a0 &= 0xFFFFFFFF;
- a1 &= 0xFFFFFFFF;
- a2 &= 0xFFFFFFFF;
- a3 &= 0xFFFFFFFF;
+ if (op_64_bit) {
+ nr = kvm_rax_read_raw(vcpu);
+ a0 = kvm_rbx_read_raw(vcpu);
+ a1 = kvm_rcx_read_raw(vcpu);
+ a2 = kvm_rdx_read_raw(vcpu);
+ a3 = kvm_rsi_read_raw(vcpu);
+ } else {
+ nr = kvm_eax_read(vcpu);
+ a0 = kvm_ebx_read(vcpu);
+ a1 = kvm_ecx_read(vcpu);
+ a2 = kvm_edx_read(vcpu);
+ a3 = kvm_esi_read(vcpu);
}
trace_kvm_hypercall(nr, a0, a1, a2, a3);
@@ -12132,23 +12133,23 @@ static void __get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
emulator_writeback_register_cache(vcpu->arch.emulate_ctxt);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
}
- regs->rax = kvm_rax_read(vcpu);
- regs->rbx = kvm_rbx_read(vcpu);
- regs->rcx = kvm_rcx_read(vcpu);
- regs->rdx = kvm_rdx_read(vcpu);
- regs->rsi = kvm_rsi_read(vcpu);
- regs->rdi = kvm_rdi_read(vcpu);
+ regs->rax = kvm_rax_read_raw(vcpu);
+ regs->rbx = kvm_rbx_read_raw(vcpu);
+ regs->rcx = kvm_rcx_read_raw(vcpu);
+ regs->rdx = kvm_rdx_read_raw(vcpu);
+ regs->rsi = kvm_rsi_read_raw(vcpu);
+ regs->rdi = kvm_rdi_read_raw(vcpu);
regs->rsp = kvm_rsp_read(vcpu);
- regs->rbp = kvm_rbp_read(vcpu);
+ regs->rbp = kvm_rbp_read_raw(vcpu);
#ifdef CONFIG_X86_64
- regs->r8 = kvm_r8_read(vcpu);
- regs->r9 = kvm_r9_read(vcpu);
- regs->r10 = kvm_r10_read(vcpu);
- regs->r11 = kvm_r11_read(vcpu);
- regs->r12 = kvm_r12_read(vcpu);
- regs->r13 = kvm_r13_read(vcpu);
- regs->r14 = kvm_r14_read(vcpu);
- regs->r15 = kvm_r15_read(vcpu);
+ regs->r8 = kvm_r8_read_raw(vcpu);
+ regs->r9 = kvm_r9_read_raw(vcpu);
+ regs->r10 = kvm_r10_read_raw(vcpu);
+ regs->r11 = kvm_r11_read_raw(vcpu);
+ regs->r12 = kvm_r12_read_raw(vcpu);
+ regs->r13 = kvm_r13_read_raw(vcpu);
+ regs->r14 = kvm_r14_read_raw(vcpu);
+ regs->r15 = kvm_r15_read_raw(vcpu);
#endif
regs->rip = kvm_rip_read(vcpu);
@@ -12172,23 +12173,23 @@ static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
vcpu->arch.emulate_regs_need_sync_from_vcpu = true;
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
- kvm_rax_write(vcpu, regs->rax);
- kvm_rbx_write(vcpu, regs->rbx);
- kvm_rcx_write(vcpu, regs->rcx);
- kvm_rdx_write(vcpu, regs->rdx);
- kvm_rsi_write(vcpu, regs->rsi);
- kvm_rdi_write(vcpu, regs->rdi);
+ kvm_rax_write_raw(vcpu, regs->rax);
+ kvm_rbx_write_raw(vcpu, regs->rbx);
+ kvm_rcx_write_raw(vcpu, regs->rcx);
+ kvm_rdx_write_raw(vcpu, regs->rdx);
+ kvm_rsi_write_raw(vcpu, regs->rsi);
+ kvm_rdi_write_raw(vcpu, regs->rdi);
kvm_rsp_write(vcpu, regs->rsp);
- kvm_rbp_write(vcpu, regs->rbp);
+ kvm_rbp_write_raw(vcpu, regs->rbp);
#ifdef CONFIG_X86_64
- kvm_r8_write(vcpu, regs->r8);
- kvm_r9_write(vcpu, regs->r9);
- kvm_r10_write(vcpu, regs->r10);
- kvm_r11_write(vcpu, regs->r11);
- kvm_r12_write(vcpu, regs->r12);
- kvm_r13_write(vcpu, regs->r13);
- kvm_r14_write(vcpu, regs->r14);
- kvm_r15_write(vcpu, regs->r15);
+ kvm_r8_write_raw(vcpu, regs->r8);
+ kvm_r9_write_raw(vcpu, regs->r9);
+ kvm_r10_write_raw(vcpu, regs->r10);
+ kvm_r11_write_raw(vcpu, regs->r11);
+ kvm_r12_write_raw(vcpu, regs->r12);
+ kvm_r13_write_raw(vcpu, regs->r13);
+ kvm_r14_write_raw(vcpu, regs->r14);
+ kvm_r15_write_raw(vcpu, regs->r15);
#endif
kvm_rip_write(vcpu, regs->rip);
@@ -13091,7 +13092,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
* on RESET. But, go through the motions in case that's ever remedied.
*/
cpuid_0x1 = kvm_find_cpuid_entry(vcpu, 1);
- kvm_rdx_write(vcpu, cpuid_0x1 ? cpuid_0x1->eax : 0x600);
+ kvm_edx_write(vcpu, cpuid_0x1 ? cpuid_0x1->eax : 0x600);
kvm_x86_call(vcpu_reset)(vcpu, init_event);
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 895095dc684e..694b31c1fcc9 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1408,7 +1408,7 @@ int kvm_xen_hvm_config(struct kvm *kvm, struct kvm_xen_hvm_config *xhc)
static int kvm_xen_hypercall_set_result(struct kvm_vcpu *vcpu, u64 result)
{
- kvm_rax_write(vcpu, result);
+ kvm_rax_write_raw(vcpu, result);
return kvm_skip_emulated_instruction(vcpu);
}
@@ -1679,29 +1679,29 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
u8 cpl;
/* Hyper-V hypercalls get bit 31 set in EAX */
- if ((kvm_rax_read(vcpu) & 0x80000000) &&
+ if ((kvm_rax_read_raw(vcpu) & 0x80000000) &&
kvm_hv_hypercall_enabled(vcpu))
return kvm_hv_hypercall(vcpu);
longmode = is_64_bit_hypercall(vcpu);
if (!longmode) {
- input = (u32)kvm_rax_read(vcpu);
- params[0] = (u32)kvm_rbx_read(vcpu);
- params[1] = (u32)kvm_rcx_read(vcpu);
- params[2] = (u32)kvm_rdx_read(vcpu);
- params[3] = (u32)kvm_rsi_read(vcpu);
- params[4] = (u32)kvm_rdi_read(vcpu);
- params[5] = (u32)kvm_rbp_read(vcpu);
+ input = kvm_eax_read(vcpu);
+ params[0] = kvm_ebx_read(vcpu);
+ params[1] = kvm_ecx_read(vcpu);
+ params[2] = kvm_edx_read(vcpu);
+ params[3] = kvm_esi_read(vcpu);
+ params[4] = kvm_edi_read(vcpu);
+ params[5] = kvm_ebp_read(vcpu);
}
else {
#ifdef CONFIG_X86_64
- input = (u64)kvm_rax_read(vcpu);
- params[0] = (u64)kvm_rdi_read(vcpu);
- params[1] = (u64)kvm_rsi_read(vcpu);
- params[2] = (u64)kvm_rdx_read(vcpu);
- params[3] = (u64)kvm_r10_read(vcpu);
- params[4] = (u64)kvm_r8_read(vcpu);
- params[5] = (u64)kvm_r9_read(vcpu);
+ input = (u64)kvm_rax_read_raw(vcpu);
+ params[0] = (u64)kvm_rdi_read_raw(vcpu);
+ params[1] = (u64)kvm_rsi_read_raw(vcpu);
+ params[2] = (u64)kvm_rdx_read_raw(vcpu);
+ params[3] = (u64)kvm_r10_read_raw(vcpu);
+ params[4] = (u64)kvm_r8_read_raw(vcpu);
+ params[5] = (u64)kvm_r9_read_raw(vcpu);
#else
KVM_BUG_ON(1, vcpu->kvm);
return -EIO;
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 08/40] KVM: x86: Add mode-aware versions of kvm_<reg>_{read,write}() helpers
2026-05-29 22:21 ` [PATCH v3 08/40] KVM: x86: Add mode-aware versions of kvm_<reg>_{read,write}() helpers Sean Christopherson
@ 2026-06-03 11:15 ` Huang, Kai
0 siblings, 0 replies; 87+ messages in thread
From: Huang, Kai @ 2026-06-03 11:15 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com,
dwmw2@infradead.org, paul@xen.org
Cc: kvm@vger.kernel.org, dwmw@amazon.co.uk,
linux-kernel@vger.kernel.org, yosry@kernel.org,
binbin.wu@linux.intel.com
On Fri, 2026-05-29 at 15:21 -0700, Sean Christopherson wrote:
> Make kvm_<reg>_{read,write}() mode-aware (where the value is truncated to
> 32 bits if the vCPU isn't in 64-bit mode), and convert all the intentional
> "raw" accesses to kvm_<reg>_{read,write}_raw() versions. To avoid
> confusion and bikeshedding over whether or not explicit 32-bit accesses
> should use the "raw" or mode-aware variants, add and use "e" versions, e.g.
> for things like RDMSR, WRMSR, and CPUID, where the instruction uses only
> bits 31:0, regardless of mode.
>
> No functional change intended (all use of "e" versions is for cases where
> the value is already truncated due to bouncing through a u32).
>
> Cc: Binbin Wu <binbin.wu@linux.intel.com>
> Cc: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
>
Reviewed-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 09/40] KVM: x86: Drop non-raw kvm_<reg>_write() helpers
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (7 preceding siblings ...)
2026-05-29 22:21 ` [PATCH v3 08/40] KVM: x86: Add mode-aware versions of kvm_<reg>_{read,write}() helpers Sean Christopherson
@ 2026-05-29 22:21 ` Sean Christopherson
2026-05-29 22:21 ` [PATCH v3 10/40] KVM: nSVM: Use kvm_rax_read() now that it's mode-aware Sean Christopherson
` (31 subsequent siblings)
40 siblings, 0 replies; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:21 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Drop the non-raw, mode-aware kvm_<reg>_write() helpers as there is no
usage in KVM, and in all likelihood there will never be usage in KVM as
use of hardcoded registers in instructions is uncommon, and *modifying*
hardcoded registers is practically unheard of. While there are a few
instructions that modify registers in mode-aware ways, e.g. REP string
and some ENCLS varieties, the odds of KVM needing to emulate such
instructions (outside of the fully emulator) are vanishingly small.
Drop kvm_<reg>_write() to prevent incorrect usage; _if_ a new instruction
comes along that needs to modify a hardcoded register, this can be
reverted.
No functional change intended.
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/regs.h | 5 -----
1 file changed, 5 deletions(-)
diff --git a/arch/x86/kvm/regs.h b/arch/x86/kvm/regs.h
index 12db5039aace..f22b3a8cd483 100644
--- a/arch/x86/kvm/regs.h
+++ b/arch/x86/kvm/regs.h
@@ -61,11 +61,6 @@ static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *vcpu)
{ \
return vcpu->arch.regs[VCPU_REGS_##uname] & kvm_reg_mode_mask(vcpu); \
} \
-static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu, \
- unsigned long val) \
-{ \
- vcpu->arch.regs[VCPU_REGS_##uname] = val & kvm_reg_mode_mask(vcpu); \
-} \
static __always_inline unsigned long kvm_##lname##_read_raw(struct kvm_vcpu *vcpu) \
{ \
return vcpu->arch.regs[VCPU_REGS_##uname]; \
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v3 10/40] KVM: nSVM: Use kvm_rax_read() now that it's mode-aware
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (8 preceding siblings ...)
2026-05-29 22:21 ` [PATCH v3 09/40] KVM: x86: Drop non-raw kvm_<reg>_write() helpers Sean Christopherson
@ 2026-05-29 22:21 ` Sean Christopherson
2026-05-29 22:21 ` [PATCH v3 11/40] Revert "KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of 64-bit mode" Sean Christopherson
` (30 subsequent siblings)
40 siblings, 0 replies; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:21 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Now that kvm_rax_read() truncates the output value to 32 bits if the
vCPU isn't in 64-bit mode, use it instead of the more verbose (and very
technically slower) kvm_register_read().
Note! VMLOAD, VMSAVE, and VMRUN emulation are still technically buggy,
as they can use EAX (versus RAX) in 64-bit mode via an operand size
prefix. Don't bother trying to handle that case, as it would require
decoding the code stream, which would open an entirely different can of
worms, and in practice no sane guest would shove garbage into RAX[63:32]
and then execute VMLOAD/VMSAVE/VMRUN with just EAX.
No functional change intended.
Cc: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/nested.c | 2 +-
arch/x86/kvm/svm/svm.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index d817dbb350d6..1ab8b95975a4 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1135,7 +1135,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
if (WARN_ON_ONCE(!svm->nested.initialized))
return -EINVAL;
- vmcb12_gpa = kvm_register_read(vcpu, VCPU_REGS_RAX);
+ vmcb12_gpa = kvm_rax_read(vcpu);
if (!page_address_valid(vcpu, vmcb12_gpa)) {
kvm_inject_gp(vcpu, 0);
return 1;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8402e94ac094..526e0fdcd16b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2218,7 +2218,7 @@ static int intr_interception(struct kvm_vcpu *vcpu)
static int vmload_vmsave_interception(struct kvm_vcpu *vcpu, bool vmload)
{
- u64 vmcb12_gpa = kvm_register_read(vcpu, VCPU_REGS_RAX);
+ u64 vmcb12_gpa = kvm_rax_read(vcpu);
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb12;
struct kvm_host_map map;
@@ -2326,7 +2326,7 @@ static int gp_interception(struct kvm_vcpu *vcpu)
if (nested_svm_check_permissions(vcpu))
return 1;
- if (!page_address_valid(vcpu, kvm_register_read(vcpu, VCPU_REGS_RAX)))
+ if (!page_address_valid(vcpu, kvm_rax_read(vcpu)))
goto reinject;
/*
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v3 11/40] Revert "KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of 64-bit mode"
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (9 preceding siblings ...)
2026-05-29 22:21 ` [PATCH v3 10/40] KVM: nSVM: Use kvm_rax_read() now that it's mode-aware Sean Christopherson
@ 2026-05-29 22:21 ` Sean Christopherson
2026-05-29 22:21 ` [PATCH v3 12/40] KVM: x86: Harden is_64_bit_hypercall() against bugs on 32-bit kernels Sean Christopherson
` (29 subsequent siblings)
40 siblings, 0 replies; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:21 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Now that kvm_<reg>_read() are mode aware, i.e. are functionally equivalent
to kvm_register_read(), revert aback to the less verbose versions.
No functional change intended.
This reverts commit 60919eccf6764c71cef31a1afeaa1a36b8e5ab85.
Acked-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/sgx.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 2f5a1c58f3c5..876dc2814108 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -225,8 +225,8 @@ static int handle_encls_ecreate(struct kvm_vcpu *vcpu)
struct x86_exception ex;
int r;
- if (sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RBX), 32, 32, &pageinfo_gva) ||
- sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RCX), 4096, 4096, &secs_gva))
+ if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 32, 32, &pageinfo_gva) ||
+ sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva))
return 1;
/*
@@ -302,9 +302,9 @@ static int handle_encls_einit(struct kvm_vcpu *vcpu)
gpa_t sig_gpa, secs_gpa, token_gpa;
int ret, trapnr;
- if (sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RBX), 1808, 4096, &sig_gva) ||
- sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RCX), 4096, 4096, &secs_gva) ||
- sgx_get_encls_gva(vcpu, kvm_register_read(vcpu, VCPU_REGS_RDX), 304, 512, &token_gva))
+ if (sgx_get_encls_gva(vcpu, kvm_rbx_read(vcpu), 1808, 4096, &sig_gva) ||
+ sgx_get_encls_gva(vcpu, kvm_rcx_read(vcpu), 4096, 4096, &secs_gva) ||
+ sgx_get_encls_gva(vcpu, kvm_rdx_read(vcpu), 304, 512, &token_gva))
return 1;
/*
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v3 12/40] KVM: x86: Harden is_64_bit_hypercall() against bugs on 32-bit kernels
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (10 preceding siblings ...)
2026-05-29 22:21 ` [PATCH v3 11/40] Revert "KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of 64-bit mode" Sean Christopherson
@ 2026-05-29 22:21 ` Sean Christopherson
2026-05-29 22:21 ` [PATCH v3 13/40] KVM: x86: Move update_cr8_intercept() to lapic.c Sean Christopherson
` (28 subsequent siblings)
40 siblings, 0 replies; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:21 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Unconditionally return %false for is_64_bit_hypercall() on 32-bit kernels
to guard against incorrectly setting guest_state_protected, and because
in a (very) hypothetical world where 32-bit KVM supports protected guests,
assuming a hypercall was made in 64-bit mode is flat out wrong.
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/regs.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/kvm/regs.h b/arch/x86/kvm/regs.h
index f22b3a8cd483..a57ba26279ed 100644
--- a/arch/x86/kvm/regs.h
+++ b/arch/x86/kvm/regs.h
@@ -39,12 +39,16 @@ static inline bool is_64_bit_mode(struct kvm_vcpu *vcpu)
static inline bool is_64_bit_hypercall(struct kvm_vcpu *vcpu)
{
+#ifdef CONFIG_X86_64
/*
* If running with protected guest state, the CS register is not
* accessible. The hypercall register values will have had to been
* provided in 64-bit mode, so assume the guest is in 64-bit.
*/
return vcpu->arch.guest_state_protected || is_64_bit_mode(vcpu);
+#else
+ return false;
+#endif
}
static __always_inline unsigned long kvm_reg_mode_mask(struct kvm_vcpu *vcpu)
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v3 13/40] KVM: x86: Move update_cr8_intercept() to lapic.c
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (11 preceding siblings ...)
2026-05-29 22:21 ` [PATCH v3 12/40] KVM: x86: Harden is_64_bit_hypercall() against bugs on 32-bit kernels Sean Christopherson
@ 2026-05-29 22:21 ` Sean Christopherson
2026-05-30 0:35 ` Yosry Ahmed
2026-06-03 11:16 ` Huang, Kai
2026-05-29 22:21 ` [PATCH v3 14/40] KVM: x86: Move async #PF helpers to x86.h (as inlines) Sean Christopherson
` (27 subsequent siblings)
40 siblings, 2 replies; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:21 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Move update_cr8_intercept() to lapic.c so that it's globally visible
in anticipation of extracting most of the register-specific code out of
x86.c and into a new compilation unit. Opportunistically prefix the
helper kvm_lapic_ to make its role/scope more obvious.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/lapic.c | 26 ++++++++++++++++++++++++++
arch/x86/kvm/lapic.h | 1 +
arch/x86/kvm/x86.c | 34 +++-------------------------------
3 files changed, 30 insertions(+), 31 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 15777869b292..9d2df8623f6d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2761,6 +2761,32 @@ u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu)
return (tpr & 0xf0) >> 4;
}
+void kvm_lapic_update_cr8_intercept(struct kvm_vcpu *vcpu)
+{
+ int max_irr, tpr;
+
+ if (!kvm_x86_ops.update_cr8_intercept)
+ return;
+
+ if (!lapic_in_kernel(vcpu))
+ return;
+
+ if (vcpu->arch.apic->apicv_active)
+ return;
+
+ if (!vcpu->arch.apic->vapic_addr)
+ max_irr = kvm_lapic_find_highest_irr(vcpu);
+ else
+ max_irr = -1;
+
+ if (max_irr != -1)
+ max_irr >>= 4;
+
+ tpr = kvm_lapic_get_cr8(vcpu);
+
+ kvm_x86_call(update_cr8_intercept)(vcpu, tpr, max_irr);
+}
+
static void __kvm_apic_set_base(struct kvm_vcpu *vcpu, u64 value)
{
u64 old_value = vcpu->arch.apic_base;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index f763cd29a508..71970213dc1f 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -100,6 +100,7 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu);
void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event);
u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
+void kvm_lapic_update_cr8_intercept(struct kvm_vcpu *vcpu);
void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu);
void kvm_apic_set_version(struct kvm_vcpu *vcpu);
void kvm_apic_after_set_mcg_cap(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a102269a7af9..034428b3d8e9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -128,7 +128,6 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST | \
KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
-static void update_cr8_intercept(struct kvm_vcpu *vcpu);
static void process_nmi(struct kvm_vcpu *vcpu);
static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
static void store_regs(struct kvm_vcpu *vcpu);
@@ -5337,7 +5336,7 @@ static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
r = kvm_apic_set_state(vcpu, s);
if (r)
return r;
- update_cr8_intercept(vcpu);
+ kvm_lapic_update_cr8_intercept(vcpu);
return 0;
}
@@ -10582,33 +10581,6 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu)
kvm_run->flags |= KVM_RUN_X86_GUEST_MODE;
}
-static void update_cr8_intercept(struct kvm_vcpu *vcpu)
-{
- int max_irr, tpr;
-
- if (!kvm_x86_ops.update_cr8_intercept)
- return;
-
- if (!lapic_in_kernel(vcpu))
- return;
-
- if (vcpu->arch.apic->apicv_active)
- return;
-
- if (!vcpu->arch.apic->vapic_addr)
- max_irr = kvm_lapic_find_highest_irr(vcpu);
- else
- max_irr = -1;
-
- if (max_irr != -1)
- max_irr >>= 4;
-
- tpr = kvm_lapic_get_cr8(vcpu);
-
- kvm_x86_call(update_cr8_intercept)(vcpu, tpr, max_irr);
-}
-
-
int kvm_check_nested_events(struct kvm_vcpu *vcpu)
{
if (kvm_test_request(KVM_REQ_TRIPLE_FAULT, vcpu)) {
@@ -11349,7 +11321,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_x86_call(enable_irq_window)(vcpu);
if (kvm_lapic_enabled(vcpu)) {
- update_cr8_intercept(vcpu);
+ kvm_lapic_update_cr8_intercept(vcpu);
kvm_lapic_sync_to_vapic(vcpu);
}
}
@@ -12495,7 +12467,7 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
kvm_set_segment(vcpu, &sregs->tr, VCPU_SREG_TR);
kvm_set_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR);
- update_cr8_intercept(vcpu);
+ kvm_lapic_update_cr8_intercept(vcpu);
/* Older userspace won't unhalt the vcpu on reset. */
if (kvm_vcpu_is_bsp(vcpu) && kvm_rip_read(vcpu) == 0xfff0 &&
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 13/40] KVM: x86: Move update_cr8_intercept() to lapic.c
2026-05-29 22:21 ` [PATCH v3 13/40] KVM: x86: Move update_cr8_intercept() to lapic.c Sean Christopherson
@ 2026-05-30 0:35 ` Yosry Ahmed
2026-06-03 11:16 ` Huang, Kai
1 sibling, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 0:35 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:21:56PM -0700, Sean Christopherson wrote:
> Move update_cr8_intercept() to lapic.c so that it's globally visible
> in anticipation of extracting most of the register-specific code out of
> x86.c and into a new compilation unit. Opportunistically prefix the
> helper kvm_lapic_ to make its role/scope more obvious.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v3 13/40] KVM: x86: Move update_cr8_intercept() to lapic.c
2026-05-29 22:21 ` [PATCH v3 13/40] KVM: x86: Move update_cr8_intercept() to lapic.c Sean Christopherson
2026-05-30 0:35 ` Yosry Ahmed
@ 2026-06-03 11:16 ` Huang, Kai
1 sibling, 0 replies; 87+ messages in thread
From: Huang, Kai @ 2026-06-03 11:16 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com,
dwmw2@infradead.org, paul@xen.org
Cc: kvm@vger.kernel.org, dwmw@amazon.co.uk,
linux-kernel@vger.kernel.org, yosry@kernel.org,
binbin.wu@linux.intel.com
On Fri, 2026-05-29 at 15:21 -0700, Sean Christopherson wrote:
> Move update_cr8_intercept() to lapic.c so that it's globally visible
> in anticipation of extracting most of the register-specific code out of
> x86.c and into a new compilation unit. Opportunistically prefix the
> helper kvm_lapic_ to make its role/scope more obvious.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
>
Reviewed-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 14/40] KVM: x86: Move async #PF helpers to x86.h (as inlines)
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (12 preceding siblings ...)
2026-05-29 22:21 ` [PATCH v3 13/40] KVM: x86: Move update_cr8_intercept() to lapic.c Sean Christopherson
@ 2026-05-29 22:21 ` Sean Christopherson
2026-05-30 0:36 ` Yosry Ahmed
2026-06-03 11:18 ` Huang, Kai
2026-05-29 22:21 ` [PATCH v3 15/40] KVM: x86: Move the bulk of register specific code from x86.c to regs.c Sean Christopherson
` (26 subsequent siblings)
40 siblings, 2 replies; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:21 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Move kvm_pv_async_pf_enabled() and kvm_async_pf_hash_reset() to x86.h in
anticipation of extracting the majority of register and MSR specific code
out of x86.c.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 19 -------------------
arch/x86/kvm/x86.h | 19 +++++++++++++++++++
2 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 034428b3d8e9..cd68a5bad0c6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -566,13 +566,6 @@ static struct kmem_cache *kvm_alloc_emulator_cache(void)
static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
-static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
-{
- int i;
- for (i = 0; i < ASYNC_PF_PER_VCPU; i++)
- vcpu->arch.apf.gfns[i] = ~0;
-}
-
static void kvm_destroy_user_return_msrs(void)
{
int cpu;
@@ -1023,18 +1016,6 @@ bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr)
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_require_dr);
-static bool __kvm_pv_async_pf_enabled(u64 data)
-{
- u64 mask = KVM_ASYNC_PF_ENABLED | KVM_ASYNC_PF_DELIVERY_AS_INT;
-
- return (data & mask) == mask;
-}
-
-static bool kvm_pv_async_pf_enabled(struct kvm_vcpu *vcpu)
-{
- return __kvm_pv_async_pf_enabled(vcpu->arch.apf.msr_en_val);
-}
-
static inline u64 pdptr_rsvd_bits(struct kvm_vcpu *vcpu)
{
return vcpu->arch.reserved_gpa_bits | rsvd_bits(5, 8) | rsvd_bits(1, 2);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 3845b10020c9..acb22167901f 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -556,6 +556,25 @@ static inline bool kvm_pat_valid(u64 data)
return (data | ((data & 0x0202020202020202ull) << 1)) == data;
}
+static inline bool __kvm_pv_async_pf_enabled(u64 data)
+{
+ u64 mask = KVM_ASYNC_PF_ENABLED | KVM_ASYNC_PF_DELIVERY_AS_INT;
+
+ return (data & mask) == mask;
+}
+
+static inline bool kvm_pv_async_pf_enabled(struct kvm_vcpu *vcpu)
+{
+ return __kvm_pv_async_pf_enabled(vcpu->arch.apf.msr_en_val);
+}
+
+static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
+{
+ int i;
+ for (i = 0; i < ASYNC_PF_PER_VCPU; i++)
+ vcpu->arch.apf.gfns[i] = ~0;
+}
+
/*
* Trigger machine check on the host. We assume all the MSRs are already set up
* by the CPU and that we still run on the same CPU as the MCE occurred on.
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 14/40] KVM: x86: Move async #PF helpers to x86.h (as inlines)
2026-05-29 22:21 ` [PATCH v3 14/40] KVM: x86: Move async #PF helpers to x86.h (as inlines) Sean Christopherson
@ 2026-05-30 0:36 ` Yosry Ahmed
2026-05-30 0:39 ` Sean Christopherson
2026-06-03 11:18 ` Huang, Kai
1 sibling, 1 reply; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 0:36 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:21:57PM -0700, Sean Christopherson wrote:
> Move kvm_pv_async_pf_enabled() and kvm_async_pf_hash_reset() to x86.h in
> anticipation of extracting the majority of register and MSR specific code
> out of x86.c.
Not sure how these things are relevant?
^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v3 14/40] KVM: x86: Move async #PF helpers to x86.h (as inlines)
2026-05-30 0:36 ` Yosry Ahmed
@ 2026-05-30 0:39 ` Sean Christopherson
2026-05-30 0:45 ` Yosry Ahmed
0 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-30 0:39 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Sat, May 30, 2026, Yosry Ahmed wrote:
> On Fri, May 29, 2026 at 03:21:57PM -0700, Sean Christopherson wrote:
> > Move kvm_pv_async_pf_enabled() and kvm_async_pf_hash_reset() to x86.h in
> > anticipation of extracting the majority of register and MSR specific code
> > out of x86.c.
>
> Not sure how these things are relevant?
Oh, I should have explained that. The PV async #PF stuff is controlled via MSR,
so kvm_async_pf_hash_reset() will also be called from msrs.c. And there are fun
CR0.PG interactions that need to be handled, so kvm_pv_async_pf_enabled() will
be called from regs.c.
But they also need to be called from x86.c where the core PV async #PF support
lives.
^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v3 14/40] KVM: x86: Move async #PF helpers to x86.h (as inlines)
2026-05-30 0:39 ` Sean Christopherson
@ 2026-05-30 0:45 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 0:45 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 05:39:58PM -0700, Sean Christopherson wrote:
> On Sat, May 30, 2026, Yosry Ahmed wrote:
> > On Fri, May 29, 2026 at 03:21:57PM -0700, Sean Christopherson wrote:
> > > Move kvm_pv_async_pf_enabled() and kvm_async_pf_hash_reset() to x86.h in
> > > anticipation of extracting the majority of register and MSR specific code
> > > out of x86.c.
> >
> > Not sure how these things are relevant?
>
> Oh, I should have explained that. The PV async #PF stuff is controlled via MSR,
> so kvm_async_pf_hash_reset() will also be called from msrs.c. And there are fun
> CR0.PG interactions that need to be handled, so kvm_pv_async_pf_enabled() will
> be called from regs.c.
>
> But they also need to be called from x86.c where the core PV async #PF support
> lives.
Makes sense.
With the changelog updated:
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v3 14/40] KVM: x86: Move async #PF helpers to x86.h (as inlines)
2026-05-29 22:21 ` [PATCH v3 14/40] KVM: x86: Move async #PF helpers to x86.h (as inlines) Sean Christopherson
2026-05-30 0:36 ` Yosry Ahmed
@ 2026-06-03 11:18 ` Huang, Kai
1 sibling, 0 replies; 87+ messages in thread
From: Huang, Kai @ 2026-06-03 11:18 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com,
dwmw2@infradead.org, paul@xen.org
Cc: kvm@vger.kernel.org, dwmw@amazon.co.uk,
linux-kernel@vger.kernel.org, yosry@kernel.org,
binbin.wu@linux.intel.com
On Fri, 2026-05-29 at 15:21 -0700, Sean Christopherson wrote:
> Move kvm_pv_async_pf_enabled() and kvm_async_pf_hash_reset() to x86.h in
> anticipation of extracting the majority of register and MSR specific code
> out of x86.c.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
>
Reviewed-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 15/40] KVM: x86: Move the bulk of register specific code from x86.c to regs.c
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (13 preceding siblings ...)
2026-05-29 22:21 ` [PATCH v3 14/40] KVM: x86: Move async #PF helpers to x86.h (as inlines) Sean Christopherson
@ 2026-05-29 22:21 ` Sean Christopherson
2026-05-30 0:43 ` Yosry Ahmed
2026-06-03 11:33 ` Huang, Kai
2026-05-29 22:21 ` [PATCH v3 16/40] KVM: x86: Move local APIC specific helpers out of asm/kvm_host.h Sean Christopherson
` (25 subsequent siblings)
40 siblings, 2 replies; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:21 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Introduce regs.c, and move the vast majority of register specific code out
of x86.c and into regs.c. Deliberately leave behind MSR code (except for
EFER, which can hardly be called an MSR), as KVM's MSR support is complex
enough to warrant its own compilation unit, and doesn't have much in common
with the other register code.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 2 -
arch/x86/kvm/Makefile | 4 +-
arch/x86/kvm/regs.c | 875 +++++++++++++++++++++++++++++++
arch/x86/kvm/regs.h | 31 ++
arch/x86/kvm/x86.c | 885 +-------------------------------
arch/x86/kvm/x86.h | 2 +
6 files changed, 918 insertions(+), 881 deletions(-)
create mode 100644 arch/x86/kvm/regs.c
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6ae7d539af90..983bdc84f9f9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2332,8 +2332,6 @@ static inline int __kvm_irq_line_state(unsigned long *irq_state,
void kvm_inject_nmi(struct kvm_vcpu *vcpu);
int kvm_get_nr_pending_nmis(struct kvm_vcpu *vcpu);
-void kvm_update_dr7(struct kvm_vcpu *vcpu);
-
bool __kvm_mmu_unprotect_gfn_and_retry(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
bool always_retry);
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 77337c37324b..f39c311fd756 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -5,8 +5,8 @@ ccflags-$(CONFIG_KVM_WERROR) += -Werror
include $(srctree)/virt/kvm/Makefile.kvm
-kvm-y += x86.o emulate.o irq.o lapic.o cpuid.o pmu.o mtrr.o \
- debugfs.o mmu/mmu.o mmu/page_track.o mmu/spte.o
+kvm-y += x86.o emulate.o irq.o lapic.o cpuid.o pmu.o regs.o \
+ mtrr.o debugfs.o mmu/mmu.o mmu/page_track.o mmu/spte.o
kvm-$(CONFIG_X86_64) += mmu/tdp_iter.o mmu/tdp_mmu.o
kvm-$(CONFIG_KVM_IOAPIC) += i8259.o i8254.o ioapic.o
diff --git a/arch/x86/kvm/regs.c b/arch/x86/kvm/regs.c
new file mode 100644
index 000000000000..fb4478301076
--- /dev/null
+++ b/arch/x86/kvm/regs.c
@@ -0,0 +1,875 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/kvm_host.h>
+
+#include "lapic.h"
+#include "mmu.h"
+#include "regs.h"
+#include "x86.h"
+
+unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu)
+{
+ /* Can't read the RIP when guest state is protected, just return 0 */
+ if (vcpu->arch.guest_state_protected)
+ return 0;
+
+ if (is_64_bit_mode(vcpu))
+ return kvm_rip_read(vcpu);
+ return (u32)(kvm_get_segment_base(vcpu, VCPU_SREG_CS) +
+ kvm_rip_read(vcpu));
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_linear_rip);
+
+bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip)
+{
+ return kvm_get_linear_rip(vcpu) == linear_rip;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_is_linear_rip);
+
+unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
+{
+ unsigned long rflags;
+
+ rflags = kvm_x86_call(get_rflags)(vcpu);
+ if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
+ rflags &= ~X86_EFLAGS_TF;
+ return rflags;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_rflags);
+
+void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+ if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP &&
+ kvm_is_linear_rip(vcpu, vcpu->arch.singlestep_rip))
+ rflags |= X86_EFLAGS_TF;
+ kvm_x86_call(set_rflags)(vcpu, rflags);
+}
+
+void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+ __kvm_set_rflags(vcpu, rflags);
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_rflags);
+
+static void __get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+ if (vcpu->arch.emulate_regs_need_sync_to_vcpu) {
+ /*
+ * We are here if userspace calls get_regs() in the middle of
+ * instruction emulation. Registers state needs to be copied
+ * back from emulation context to vcpu. Userspace shouldn't do
+ * that usually, but some bad designed PV devices (vmware
+ * backdoor interface) need this to work
+ */
+ emulator_writeback_register_cache(vcpu->arch.emulate_ctxt);
+ vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
+ }
+ regs->rax = kvm_rax_read_raw(vcpu);
+ regs->rbx = kvm_rbx_read_raw(vcpu);
+ regs->rcx = kvm_rcx_read_raw(vcpu);
+ regs->rdx = kvm_rdx_read_raw(vcpu);
+ regs->rsi = kvm_rsi_read_raw(vcpu);
+ regs->rdi = kvm_rdi_read_raw(vcpu);
+ regs->rsp = kvm_rsp_read(vcpu);
+ regs->rbp = kvm_rbp_read_raw(vcpu);
+#ifdef CONFIG_X86_64
+ regs->r8 = kvm_r8_read_raw(vcpu);
+ regs->r9 = kvm_r9_read_raw(vcpu);
+ regs->r10 = kvm_r10_read_raw(vcpu);
+ regs->r11 = kvm_r11_read_raw(vcpu);
+ regs->r12 = kvm_r12_read_raw(vcpu);
+ regs->r13 = kvm_r13_read_raw(vcpu);
+ regs->r14 = kvm_r14_read_raw(vcpu);
+ regs->r15 = kvm_r15_read_raw(vcpu);
+#endif
+
+ regs->rip = kvm_rip_read(vcpu);
+ regs->rflags = kvm_get_rflags(vcpu);
+}
+
+int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+ if (vcpu->kvm->arch.has_protected_state &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
+ vcpu_load(vcpu);
+ __get_regs(vcpu, regs);
+ vcpu_put(vcpu);
+ return 0;
+}
+
+static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+ vcpu->arch.emulate_regs_need_sync_from_vcpu = true;
+ vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
+
+ kvm_rax_write_raw(vcpu, regs->rax);
+ kvm_rbx_write_raw(vcpu, regs->rbx);
+ kvm_rcx_write_raw(vcpu, regs->rcx);
+ kvm_rdx_write_raw(vcpu, regs->rdx);
+ kvm_rsi_write_raw(vcpu, regs->rsi);
+ kvm_rdi_write_raw(vcpu, regs->rdi);
+ kvm_rsp_write(vcpu, regs->rsp);
+ kvm_rbp_write_raw(vcpu, regs->rbp);
+#ifdef CONFIG_X86_64
+ kvm_r8_write_raw(vcpu, regs->r8);
+ kvm_r9_write_raw(vcpu, regs->r9);
+ kvm_r10_write_raw(vcpu, regs->r10);
+ kvm_r11_write_raw(vcpu, regs->r11);
+ kvm_r12_write_raw(vcpu, regs->r12);
+ kvm_r13_write_raw(vcpu, regs->r13);
+ kvm_r14_write_raw(vcpu, regs->r14);
+ kvm_r15_write_raw(vcpu, regs->r15);
+#endif
+
+ kvm_rip_write(vcpu, regs->rip);
+ kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED);
+
+ vcpu->arch.exception.pending = false;
+ vcpu->arch.exception_vmexit.pending = false;
+
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+}
+
+int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+ if (vcpu->kvm->arch.has_protected_state &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
+ vcpu_load(vcpu);
+ __set_regs(vcpu, regs);
+ vcpu_put(vcpu);
+ return 0;
+}
+
+static inline u64 pdptr_rsvd_bits(struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.reserved_gpa_bits | rsvd_bits(5, 8) | rsvd_bits(1, 2);
+}
+
+/*
+ * Load the pae pdptrs. Return 1 if they are all valid, 0 otherwise.
+ */
+int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
+{
+ struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
+ gfn_t pdpt_gfn = cr3 >> PAGE_SHIFT;
+ gpa_t real_gpa;
+ int i;
+ int ret;
+ u64 pdpte[ARRAY_SIZE(mmu->pdptrs)];
+
+ /*
+ * If the MMU is nested, CR3 holds an L2 GPA and needs to be translated
+ * to an L1 GPA.
+ */
+ real_gpa = kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(pdpt_gfn),
+ PFERR_USER_MASK | PFERR_WRITE_MASK |
+ PFERR_GUEST_PAGE_MASK, NULL, 0);
+ if (real_gpa == INVALID_GPA)
+ return 0;
+
+ /* Note the offset, PDPTRs are 32 byte aligned when using PAE paging. */
+ ret = kvm_vcpu_read_guest_page(vcpu, gpa_to_gfn(real_gpa), pdpte,
+ cr3 & GENMASK(11, 5), sizeof(pdpte));
+ if (ret < 0)
+ return 0;
+
+ for (i = 0; i < ARRAY_SIZE(pdpte); ++i) {
+ if ((pdpte[i] & PT_PRESENT_MASK) &&
+ (pdpte[i] & pdptr_rsvd_bits(vcpu))) {
+ return 0;
+ }
+ }
+
+ /*
+ * Marking VCPU_REG_PDPTR dirty doesn't work for !tdp_enabled.
+ * Shadow page roots need to be reconstructed instead.
+ */
+ if (!tdp_enabled && memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)))
+ kvm_mmu_free_roots(vcpu->kvm, mmu, KVM_MMU_ROOT_CURRENT);
+
+ memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs));
+ kvm_register_mark_dirty(vcpu, VCPU_REG_PDPTR);
+ kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
+ vcpu->arch.pdptrs_from_userspace = false;
+
+ return 1;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(load_pdptrs);
+
+static bool kvm_is_valid_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
+{
+#ifdef CONFIG_X86_64
+ if (cr0 & 0xffffffff00000000UL)
+ return false;
+#endif
+
+ if ((cr0 & X86_CR0_NW) && !(cr0 & X86_CR0_CD))
+ return false;
+
+ if ((cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PE))
+ return false;
+
+ return kvm_x86_call(is_valid_cr0)(vcpu, cr0);
+}
+
+void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0)
+{
+ /*
+ * CR0.WP is incorporated into the MMU role, but only for non-nested,
+ * indirect shadow MMUs. If paging is disabled, no updates are needed
+ * as there are no permission bits to emulate. If TDP is enabled, the
+ * MMU's metadata needs to be updated, e.g. so that emulating guest
+ * translations does the right thing, but there's no need to unload the
+ * root as CR0.WP doesn't affect SPTEs.
+ */
+ if ((cr0 ^ old_cr0) == X86_CR0_WP) {
+ if (!(cr0 & X86_CR0_PG))
+ return;
+
+ if (tdp_enabled) {
+ kvm_init_mmu(vcpu);
+ return;
+ }
+ }
+
+ if ((cr0 ^ old_cr0) & X86_CR0_PG) {
+ /*
+ * Clearing CR0.PG is defined to flush the TLB from the guest's
+ * perspective.
+ */
+ if (!(cr0 & X86_CR0_PG))
+ kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+ /*
+ * Check for async #PF completion events when enabling paging,
+ * as the vCPU may have previously encountered async #PFs (it's
+ * entirely legal for the guest to toggle paging on/off without
+ * waiting for the async #PF queue to drain).
+ */
+ else if (kvm_pv_async_pf_enabled(vcpu))
+ kvm_make_request(KVM_REQ_APF_READY, vcpu);
+ }
+
+ if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS)
+ kvm_mmu_reset_context(vcpu);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_post_set_cr0);
+
+int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
+{
+ unsigned long old_cr0 = kvm_read_cr0(vcpu);
+
+ if (!kvm_is_valid_cr0(vcpu, cr0))
+ return 1;
+
+ cr0 |= X86_CR0_ET;
+
+ /* Write to CR0 reserved bits are ignored, even on Intel. */
+ cr0 &= ~CR0_RESERVED_BITS;
+
+#ifdef CONFIG_X86_64
+ if ((vcpu->arch.efer & EFER_LME) && !is_paging(vcpu) &&
+ (cr0 & X86_CR0_PG)) {
+ int cs_db, cs_l;
+
+ if (!is_pae(vcpu))
+ return 1;
+ kvm_x86_call(get_cs_db_l_bits)(vcpu, &cs_db, &cs_l);
+ if (cs_l)
+ return 1;
+ }
+#endif
+ if (!(vcpu->arch.efer & EFER_LME) && (cr0 & X86_CR0_PG) &&
+ is_pae(vcpu) && ((cr0 ^ old_cr0) & X86_CR0_PDPTR_BITS) &&
+ !load_pdptrs(vcpu, kvm_read_cr3(vcpu)))
+ return 1;
+
+ if (!(cr0 & X86_CR0_PG) &&
+ (is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)))
+ return 1;
+
+ if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET))
+ return 1;
+
+ kvm_x86_call(set_cr0)(vcpu, cr0);
+
+ kvm_post_set_cr0(vcpu, old_cr0, cr0);
+
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr0);
+
+void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
+{
+ (void)kvm_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~0x0eul) | (msw & 0x0f));
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lmsw);
+
+int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
+{
+ bool skip_tlb_flush = false;
+ unsigned long pcid = 0;
+#ifdef CONFIG_X86_64
+ if (kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)) {
+ skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
+ cr3 &= ~X86_CR3_PCID_NOFLUSH;
+ pcid = cr3 & X86_CR3_PCID_MASK;
+ }
+#endif
+
+ /* PDPTRs are always reloaded for PAE paging. */
+ if (cr3 == kvm_read_cr3(vcpu) && !is_pae_paging(vcpu))
+ goto handle_tlb_flush;
+
+ /*
+ * Do not condition the GPA check on long mode, this helper is used to
+ * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
+ * the current vCPU mode is accurate.
+ */
+ if (!kvm_vcpu_is_legal_cr3(vcpu, cr3))
+ return 1;
+
+ if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
+ return 1;
+
+ if (cr3 != kvm_read_cr3(vcpu))
+ kvm_mmu_new_pgd(vcpu, cr3);
+
+ vcpu->arch.cr3 = cr3;
+ kvm_register_mark_dirty(vcpu, VCPU_REG_CR3);
+ /* Do not call post_set_cr3, we do not get here for confidential guests. */
+
+handle_tlb_flush:
+ /*
+ * A load of CR3 that flushes the TLB flushes only the current PCID,
+ * even if PCID is disabled, in which case PCID=0 is flushed. It's a
+ * moot point in the end because _disabling_ PCID will flush all PCIDs,
+ * and it's impossible to use a non-zero PCID when PCID is disabled,
+ * i.e. only PCID=0 can be relevant.
+ */
+ if (!skip_tlb_flush)
+ kvm_invalidate_pcid(vcpu, pcid);
+
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr3);
+
+static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+{
+ return __kvm_is_valid_cr4(vcpu, cr4) &&
+ kvm_x86_call(is_valid_cr4)(vcpu, cr4);
+}
+
+void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4)
+{
+ if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS)
+ kvm_mmu_reset_context(vcpu);
+
+ /*
+ * If CR4.PCIDE is changed 0 -> 1, there is no need to flush the TLB
+ * according to the SDM; however, stale prev_roots could be reused
+ * incorrectly in the future after a MOV to CR3 with NOFLUSH=1, so we
+ * free them all. This is *not* a superset of KVM_REQ_TLB_FLUSH_GUEST
+ * or KVM_REQ_TLB_FLUSH_CURRENT, because the hardware TLB is not flushed,
+ * so fall through.
+ */
+ if (!tdp_enabled &&
+ (cr4 & X86_CR4_PCIDE) && !(old_cr4 & X86_CR4_PCIDE))
+ kvm_mmu_unload(vcpu);
+
+ /*
+ * The TLB has to be flushed for all PCIDs if any of the following
+ * (architecturally required) changes happen:
+ * - CR4.PCIDE is changed from 1 to 0
+ * - CR4.PGE is toggled
+ *
+ * This is a superset of KVM_REQ_TLB_FLUSH_CURRENT.
+ */
+ if (((cr4 ^ old_cr4) & X86_CR4_PGE) ||
+ (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
+ kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+
+ /*
+ * The TLB has to be flushed for the current PCID if any of the
+ * following (architecturally required) changes happen:
+ * - CR4.SMEP is changed from 0 to 1
+ * - CR4.PAE is toggled
+ */
+ else if (((cr4 ^ old_cr4) & X86_CR4_PAE) ||
+ ((cr4 & X86_CR4_SMEP) && !(old_cr4 & X86_CR4_SMEP)))
+ kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
+
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_post_set_cr4);
+
+int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+{
+ unsigned long old_cr4 = kvm_read_cr4(vcpu);
+
+ if (!kvm_is_valid_cr4(vcpu, cr4))
+ return 1;
+
+ if (is_long_mode(vcpu)) {
+ if (!(cr4 & X86_CR4_PAE))
+ return 1;
+ if ((cr4 ^ old_cr4) & X86_CR4_LA57)
+ return 1;
+ } else if (is_paging(vcpu) && (cr4 & X86_CR4_PAE)
+ && ((cr4 ^ old_cr4) & X86_CR4_PDPTR_BITS)
+ && !load_pdptrs(vcpu, kvm_read_cr3(vcpu)))
+ return 1;
+
+ if ((cr4 & X86_CR4_PCIDE) && !(old_cr4 & X86_CR4_PCIDE)) {
+ /* PCID can not be enabled when cr3[11:0]!=000H or EFER.LMA=0 */
+ if ((kvm_read_cr3(vcpu) & X86_CR3_PCID_MASK) || !is_long_mode(vcpu))
+ return 1;
+ }
+
+ if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP))
+ return 1;
+
+ kvm_x86_call(set_cr4)(vcpu, cr4);
+
+ kvm_post_set_cr4(vcpu, old_cr4, cr4);
+
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr4);
+
+int kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8)
+{
+ if (cr8 & CR8_RESERVED_BITS)
+ return 1;
+ if (lapic_in_kernel(vcpu))
+ kvm_lapic_set_tpr(vcpu, cr8);
+ else
+ vcpu->arch.cr8 = cr8;
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr8);
+
+unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu)
+{
+ if (lapic_in_kernel(vcpu))
+ return kvm_lapic_get_cr8(vcpu);
+ else
+ return vcpu->arch.cr8;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_cr8);
+
+static void __get_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
+{
+ struct desc_ptr dt;
+
+ if (vcpu->arch.guest_state_protected)
+ goto skip_protected_regs;
+
+ kvm_handle_exception_payload_quirk(vcpu);
+
+ kvm_get_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
+ kvm_get_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
+ kvm_get_segment(vcpu, &sregs->es, VCPU_SREG_ES);
+ kvm_get_segment(vcpu, &sregs->fs, VCPU_SREG_FS);
+ kvm_get_segment(vcpu, &sregs->gs, VCPU_SREG_GS);
+ kvm_get_segment(vcpu, &sregs->ss, VCPU_SREG_SS);
+
+ kvm_get_segment(vcpu, &sregs->tr, VCPU_SREG_TR);
+ kvm_get_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR);
+
+ kvm_x86_call(get_idt)(vcpu, &dt);
+ sregs->idt.limit = dt.size;
+ sregs->idt.base = dt.address;
+ kvm_x86_call(get_gdt)(vcpu, &dt);
+ sregs->gdt.limit = dt.size;
+ sregs->gdt.base = dt.address;
+
+ sregs->cr2 = vcpu->arch.cr2;
+ sregs->cr3 = kvm_read_cr3(vcpu);
+
+skip_protected_regs:
+ sregs->cr0 = kvm_read_cr0(vcpu);
+ sregs->cr4 = kvm_read_cr4(vcpu);
+ sregs->cr8 = kvm_get_cr8(vcpu);
+ sregs->efer = vcpu->arch.efer;
+ sregs->apic_base = vcpu->arch.apic_base;
+}
+
+static void __get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
+{
+ __get_sregs_common(vcpu, sregs);
+
+ if (vcpu->arch.guest_state_protected)
+ return;
+
+ if (vcpu->arch.interrupt.injected && !vcpu->arch.interrupt.soft)
+ set_bit(vcpu->arch.interrupt.nr,
+ (unsigned long *)sregs->interrupt_bitmap);
+}
+
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+ struct kvm_sregs *sregs)
+{
+ if (vcpu->kvm->arch.has_protected_state &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
+ vcpu_load(vcpu);
+ __get_sregs(vcpu, sregs);
+ vcpu_put(vcpu);
+ return 0;
+}
+
+void kvm_x86_vcpu_ioctl_get_sregs2(struct kvm_vcpu *vcpu,
+ struct kvm_sregs2 *sregs2)
+{
+ int i;
+
+ __get_sregs_common(vcpu, (struct kvm_sregs *)sregs2);
+
+ if (vcpu->arch.guest_state_protected)
+ return;
+
+ if (is_pae_paging(vcpu)) {
+ kvm_vcpu_srcu_read_lock(vcpu);
+ for (i = 0 ; i < 4 ; i++)
+ sregs2->pdptrs[i] = kvm_pdptr_read(vcpu, i);
+ sregs2->flags |= KVM_SREGS2_FLAGS_PDPTRS_VALID;
+ kvm_vcpu_srcu_read_unlock(vcpu);
+ }
+}
+
+static bool kvm_is_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
+{
+ if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG)) {
+ /*
+ * When EFER.LME and CR0.PG are set, the processor is in
+ * 64-bit mode (though maybe in a 32-bit code segment).
+ * CR4.PAE and EFER.LMA must be set.
+ */
+ if (!(sregs->cr4 & X86_CR4_PAE) || !(sregs->efer & EFER_LMA))
+ return false;
+ if (!kvm_vcpu_is_legal_cr3(vcpu, sregs->cr3))
+ return false;
+ } else {
+ /*
+ * Not in 64-bit mode: EFER.LMA is clear and the code
+ * segment cannot be 64-bit.
+ */
+ if (sregs->efer & EFER_LMA || sregs->cs.l)
+ return false;
+ }
+
+ return kvm_is_valid_cr4(vcpu, sregs->cr4) &&
+ kvm_is_valid_cr0(vcpu, sregs->cr0);
+}
+
+static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
+ int *mmu_reset_needed, bool update_pdptrs)
+{
+ int idx;
+ struct desc_ptr dt;
+
+ if (!kvm_is_valid_sregs(vcpu, sregs))
+ return -EINVAL;
+
+ if (kvm_apic_set_base(vcpu, sregs->apic_base, true))
+ return -EINVAL;
+
+ if (vcpu->arch.guest_state_protected)
+ return 0;
+
+ dt.size = sregs->idt.limit;
+ dt.address = sregs->idt.base;
+ kvm_x86_call(set_idt)(vcpu, &dt);
+ dt.size = sregs->gdt.limit;
+ dt.address = sregs->gdt.base;
+ kvm_x86_call(set_gdt)(vcpu, &dt);
+
+ vcpu->arch.cr2 = sregs->cr2;
+ *mmu_reset_needed |= kvm_read_cr3(vcpu) != sregs->cr3;
+ vcpu->arch.cr3 = sregs->cr3;
+ kvm_register_mark_dirty(vcpu, VCPU_REG_CR3);
+ kvm_x86_call(post_set_cr3)(vcpu, sregs->cr3);
+
+ kvm_set_cr8(vcpu, sregs->cr8);
+
+ *mmu_reset_needed |= vcpu->arch.efer != sregs->efer;
+ kvm_x86_call(set_efer)(vcpu, sregs->efer);
+
+ *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
+ kvm_x86_call(set_cr0)(vcpu, sregs->cr0);
+
+ *mmu_reset_needed |= kvm_read_cr4(vcpu) != sregs->cr4;
+ kvm_x86_call(set_cr4)(vcpu, sregs->cr4);
+
+ if (update_pdptrs) {
+ idx = srcu_read_lock(&vcpu->kvm->srcu);
+ if (is_pae_paging(vcpu)) {
+ load_pdptrs(vcpu, kvm_read_cr3(vcpu));
+ *mmu_reset_needed = 1;
+ }
+ srcu_read_unlock(&vcpu->kvm->srcu, idx);
+ }
+
+ kvm_set_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
+ kvm_set_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
+ kvm_set_segment(vcpu, &sregs->es, VCPU_SREG_ES);
+ kvm_set_segment(vcpu, &sregs->fs, VCPU_SREG_FS);
+ kvm_set_segment(vcpu, &sregs->gs, VCPU_SREG_GS);
+ kvm_set_segment(vcpu, &sregs->ss, VCPU_SREG_SS);
+
+ kvm_set_segment(vcpu, &sregs->tr, VCPU_SREG_TR);
+ kvm_set_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR);
+
+ kvm_lapic_update_cr8_intercept(vcpu);
+
+ /* Older userspace won't unhalt the vcpu on reset. */
+ if (kvm_vcpu_is_bsp(vcpu) && kvm_rip_read(vcpu) == 0xfff0 &&
+ sregs->cs.selector == 0xf000 && sregs->cs.base == 0xffff0000 &&
+ !is_protmode(vcpu))
+ kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
+
+ return 0;
+}
+
+static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
+{
+ int pending_vec, max_bits;
+ int mmu_reset_needed = 0;
+ int ret = __set_sregs_common(vcpu, sregs, &mmu_reset_needed, true);
+
+ if (ret)
+ return ret;
+
+ if (mmu_reset_needed) {
+ kvm_mmu_reset_context(vcpu);
+ kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+ }
+
+ max_bits = KVM_NR_INTERRUPTS;
+ pending_vec = find_first_bit(
+ (const unsigned long *)sregs->interrupt_bitmap, max_bits);
+
+ if (pending_vec < max_bits) {
+ kvm_queue_interrupt(vcpu, pending_vec, false);
+ pr_debug("Set back pending irq %d\n", pending_vec);
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+ }
+ return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+ struct kvm_sregs *sregs)
+{
+ int ret;
+
+ if (vcpu->kvm->arch.has_protected_state &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
+ vcpu_load(vcpu);
+ ret = __set_sregs(vcpu, sregs);
+ vcpu_put(vcpu);
+ return ret;
+}
+
+int kvm_x86_vcpu_ioctl_set_sregs2(struct kvm_vcpu *vcpu,
+ struct kvm_sregs2 *sregs2)
+{
+ int mmu_reset_needed = 0;
+ bool valid_pdptrs = sregs2->flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
+ bool pae = (sregs2->cr0 & X86_CR0_PG) && (sregs2->cr4 & X86_CR4_PAE) &&
+ !(sregs2->efer & EFER_LMA);
+ int i, ret;
+
+ if (sregs2->flags & ~KVM_SREGS2_FLAGS_PDPTRS_VALID)
+ return -EINVAL;
+
+ if (valid_pdptrs && (!pae || vcpu->arch.guest_state_protected))
+ return -EINVAL;
+
+ ret = __set_sregs_common(vcpu, (struct kvm_sregs *)sregs2,
+ &mmu_reset_needed, !valid_pdptrs);
+ if (ret)
+ return ret;
+
+ if (valid_pdptrs) {
+ for (i = 0; i < 4 ; i++)
+ kvm_pdptr_write(vcpu, i, sregs2->pdptrs[i]);
+
+ kvm_register_mark_dirty(vcpu, VCPU_REG_PDPTR);
+ mmu_reset_needed = 1;
+ vcpu->arch.pdptrs_from_userspace = true;
+ }
+ if (mmu_reset_needed) {
+ kvm_mmu_reset_context(vcpu);
+ kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+ }
+ return 0;
+}
+
+void kvm_run_sync_regs_to_user(struct kvm_vcpu *vcpu)
+{
+ BUILD_BUG_ON(sizeof(struct kvm_sync_regs) > SYNC_REGS_SIZE_BYTES);
+
+ if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_REGS)
+ __get_regs(vcpu, &vcpu->run->s.regs.regs);
+
+ if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_SREGS)
+ __get_sregs(vcpu, &vcpu->run->s.regs.sregs);
+}
+
+int kvm_run_sync_regs_from_user(struct kvm_vcpu *vcpu)
+{
+ if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_REGS) {
+ __set_regs(vcpu, &vcpu->run->s.regs.regs);
+ vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_REGS;
+ }
+
+ if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_SREGS) {
+ struct kvm_sregs sregs = vcpu->run->s.regs.sregs;
+
+ if (__set_sregs(vcpu, &sregs))
+ return -EINVAL;
+
+ vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_SREGS;
+ }
+
+ return 0;
+}
+
+void kvm_update_dr0123(struct kvm_vcpu *vcpu)
+{
+ int i;
+
+ if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)) {
+ for (i = 0; i < KVM_NR_DB_REGS; i++)
+ vcpu->arch.eff_db[i] = vcpu->arch.db[i];
+ }
+}
+
+void kvm_update_dr7(struct kvm_vcpu *vcpu)
+{
+ unsigned long dr7;
+
+ if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)
+ dr7 = vcpu->arch.guest_debug_dr7;
+ else
+ dr7 = vcpu->arch.dr7;
+ kvm_x86_call(set_dr7)(vcpu, dr7);
+ vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_BP_ENABLED;
+ if (dr7 & DR7_BP_EN_MASK)
+ vcpu->arch.switch_db_regs |= KVM_DEBUGREG_BP_ENABLED;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_update_dr7);
+
+static u64 kvm_dr6_fixed(struct kvm_vcpu *vcpu)
+{
+ u64 fixed = DR6_FIXED_1;
+
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_RTM))
+ fixed |= DR6_RTM;
+
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
+ fixed |= DR6_BUS_LOCK;
+ return fixed;
+}
+
+int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
+{
+ size_t size = ARRAY_SIZE(vcpu->arch.db);
+
+ switch (dr) {
+ case 0 ... 3:
+ vcpu->arch.db[array_index_nospec(dr, size)] = val;
+ if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP))
+ vcpu->arch.eff_db[dr] = val;
+ break;
+ case 4:
+ case 6:
+ if (!kvm_dr6_valid(val))
+ return 1; /* #GP */
+ vcpu->arch.dr6 = (val & DR6_VOLATILE) | kvm_dr6_fixed(vcpu);
+ break;
+ case 5:
+ default: /* 7 */
+ if (!kvm_dr7_valid(val))
+ return 1; /* #GP */
+ vcpu->arch.dr7 = (val & DR7_VOLATILE) | DR7_FIXED_1;
+ kvm_update_dr7(vcpu);
+ break;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_dr);
+
+unsigned long kvm_get_dr(struct kvm_vcpu *vcpu, int dr)
+{
+ size_t size = ARRAY_SIZE(vcpu->arch.db);
+
+ switch (dr) {
+ case 0 ... 3:
+ return vcpu->arch.db[array_index_nospec(dr, size)];
+ case 4:
+ case 6:
+ return vcpu->arch.dr6;
+ case 5:
+ default: /* 7 */
+ return vcpu->arch.dr7;
+ }
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_dr);
+
+int kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
+ struct kvm_debugregs *dbgregs)
+{
+ unsigned int i;
+
+ if (vcpu->kvm->arch.has_protected_state &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
+ kvm_handle_exception_payload_quirk(vcpu);
+
+ memset(dbgregs, 0, sizeof(*dbgregs));
+
+ BUILD_BUG_ON(ARRAY_SIZE(vcpu->arch.db) != ARRAY_SIZE(dbgregs->db));
+ for (i = 0; i < ARRAY_SIZE(vcpu->arch.db); i++)
+ dbgregs->db[i] = vcpu->arch.db[i];
+
+ dbgregs->dr6 = vcpu->arch.dr6;
+ dbgregs->dr7 = vcpu->arch.dr7;
+ return 0;
+}
+
+int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
+ struct kvm_debugregs *dbgregs)
+{
+ unsigned int i;
+
+ if (vcpu->kvm->arch.has_protected_state &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
+ if (dbgregs->flags)
+ return -EINVAL;
+
+ if (!kvm_dr6_valid(dbgregs->dr6))
+ return -EINVAL;
+ if (!kvm_dr7_valid(dbgregs->dr7))
+ return -EINVAL;
+
+ for (i = 0; i < ARRAY_SIZE(vcpu->arch.db); i++)
+ vcpu->arch.db[i] = dbgregs->db[i];
+
+ kvm_update_dr0123(vcpu);
+ vcpu->arch.dr6 = dbgregs->dr6;
+ vcpu->arch.dr7 = dbgregs->dr7;
+ kvm_update_dr7(vcpu);
+
+ return 0;
+}
diff --git a/arch/x86/kvm/regs.h b/arch/x86/kvm/regs.h
index a57ba26279ed..c224874bbdde 100644
--- a/arch/x86/kvm/regs.h
+++ b/arch/x86/kvm/regs.h
@@ -389,6 +389,14 @@ static inline bool kvm_dr6_valid(u64 data)
return !(data >> 32);
}
+static inline unsigned long kvm_get_effective_dr7(struct kvm_vcpu *vcpu)
+{
+ if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)
+ return vcpu->arch.guest_debug_dr7;
+
+ return vcpu->arch.dr7;
+}
+
static inline void enter_guest_mode(struct kvm_vcpu *vcpu)
{
vcpu->arch.hflags |= HF_GUEST_MASK;
@@ -412,4 +420,27 @@ static inline bool is_guest_mode(struct kvm_vcpu *vcpu)
return vcpu->arch.hflags & HF_GUEST_MASK;
}
+static inline unsigned long kvm_get_segment_base(struct kvm_vcpu *vcpu, int seg)
+{
+ return kvm_x86_call(get_segment_base)(vcpu, seg);
+}
+
+void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
+
+void kvm_x86_vcpu_ioctl_get_sregs2(struct kvm_vcpu *vcpu,
+ struct kvm_sregs2 *sregs2);
+int kvm_x86_vcpu_ioctl_set_sregs2(struct kvm_vcpu *vcpu,
+ struct kvm_sregs2 *sregs2);
+
+void kvm_run_sync_regs_to_user(struct kvm_vcpu *vcpu);
+int kvm_run_sync_regs_from_user(struct kvm_vcpu *vcpu);
+
+void kvm_update_dr0123(struct kvm_vcpu *vcpu);
+void kvm_update_dr7(struct kvm_vcpu *vcpu);
+int kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
+ struct kvm_debugregs *dbgregs);
+int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
+ struct kvm_debugregs *dbgregs);
+
+
#endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cd68a5bad0c6..20eeff79b46d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -129,13 +129,9 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
static void process_nmi(struct kvm_vcpu *vcpu);
-static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
static void store_regs(struct kvm_vcpu *vcpu);
static int sync_regs(struct kvm_vcpu *vcpu);
-static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
-static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
-
static DEFINE_MUTEX(vendor_module_lock);
static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
@@ -1016,170 +1012,6 @@ bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr)
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_require_dr);
-static inline u64 pdptr_rsvd_bits(struct kvm_vcpu *vcpu)
-{
- return vcpu->arch.reserved_gpa_bits | rsvd_bits(5, 8) | rsvd_bits(1, 2);
-}
-
-/*
- * Load the pae pdptrs. Return 1 if they are all valid, 0 otherwise.
- */
-int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
-{
- struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
- gfn_t pdpt_gfn = cr3 >> PAGE_SHIFT;
- gpa_t real_gpa;
- int i;
- int ret;
- u64 pdpte[ARRAY_SIZE(mmu->pdptrs)];
-
- /*
- * If the MMU is nested, CR3 holds an L2 GPA and needs to be translated
- * to an L1 GPA.
- */
- real_gpa = kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(pdpt_gfn),
- PFERR_USER_MASK | PFERR_WRITE_MASK |
- PFERR_GUEST_PAGE_MASK, NULL, 0);
- if (real_gpa == INVALID_GPA)
- return 0;
-
- /* Note the offset, PDPTRs are 32 byte aligned when using PAE paging. */
- ret = kvm_vcpu_read_guest_page(vcpu, gpa_to_gfn(real_gpa), pdpte,
- cr3 & GENMASK(11, 5), sizeof(pdpte));
- if (ret < 0)
- return 0;
-
- for (i = 0; i < ARRAY_SIZE(pdpte); ++i) {
- if ((pdpte[i] & PT_PRESENT_MASK) &&
- (pdpte[i] & pdptr_rsvd_bits(vcpu))) {
- return 0;
- }
- }
-
- /*
- * Marking VCPU_REG_PDPTR dirty doesn't work for !tdp_enabled.
- * Shadow page roots need to be reconstructed instead.
- */
- if (!tdp_enabled && memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)))
- kvm_mmu_free_roots(vcpu->kvm, mmu, KVM_MMU_ROOT_CURRENT);
-
- memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs));
- kvm_register_mark_dirty(vcpu, VCPU_REG_PDPTR);
- kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
- vcpu->arch.pdptrs_from_userspace = false;
-
- return 1;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(load_pdptrs);
-
-static bool kvm_is_valid_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
-{
-#ifdef CONFIG_X86_64
- if (cr0 & 0xffffffff00000000UL)
- return false;
-#endif
-
- if ((cr0 & X86_CR0_NW) && !(cr0 & X86_CR0_CD))
- return false;
-
- if ((cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PE))
- return false;
-
- return kvm_x86_call(is_valid_cr0)(vcpu, cr0);
-}
-
-void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0)
-{
- /*
- * CR0.WP is incorporated into the MMU role, but only for non-nested,
- * indirect shadow MMUs. If paging is disabled, no updates are needed
- * as there are no permission bits to emulate. If TDP is enabled, the
- * MMU's metadata needs to be updated, e.g. so that emulating guest
- * translations does the right thing, but there's no need to unload the
- * root as CR0.WP doesn't affect SPTEs.
- */
- if ((cr0 ^ old_cr0) == X86_CR0_WP) {
- if (!(cr0 & X86_CR0_PG))
- return;
-
- if (tdp_enabled) {
- kvm_init_mmu(vcpu);
- return;
- }
- }
-
- if ((cr0 ^ old_cr0) & X86_CR0_PG) {
- /*
- * Clearing CR0.PG is defined to flush the TLB from the guest's
- * perspective.
- */
- if (!(cr0 & X86_CR0_PG))
- kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
- /*
- * Check for async #PF completion events when enabling paging,
- * as the vCPU may have previously encountered async #PFs (it's
- * entirely legal for the guest to toggle paging on/off without
- * waiting for the async #PF queue to drain).
- */
- else if (kvm_pv_async_pf_enabled(vcpu))
- kvm_make_request(KVM_REQ_APF_READY, vcpu);
- }
-
- if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS)
- kvm_mmu_reset_context(vcpu);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_post_set_cr0);
-
-int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
-{
- unsigned long old_cr0 = kvm_read_cr0(vcpu);
-
- if (!kvm_is_valid_cr0(vcpu, cr0))
- return 1;
-
- cr0 |= X86_CR0_ET;
-
- /* Write to CR0 reserved bits are ignored, even on Intel. */
- cr0 &= ~CR0_RESERVED_BITS;
-
-#ifdef CONFIG_X86_64
- if ((vcpu->arch.efer & EFER_LME) && !is_paging(vcpu) &&
- (cr0 & X86_CR0_PG)) {
- int cs_db, cs_l;
-
- if (!is_pae(vcpu))
- return 1;
- kvm_x86_call(get_cs_db_l_bits)(vcpu, &cs_db, &cs_l);
- if (cs_l)
- return 1;
- }
-#endif
- if (!(vcpu->arch.efer & EFER_LME) && (cr0 & X86_CR0_PG) &&
- is_pae(vcpu) && ((cr0 ^ old_cr0) & X86_CR0_PDPTR_BITS) &&
- !load_pdptrs(vcpu, kvm_read_cr3(vcpu)))
- return 1;
-
- if (!(cr0 & X86_CR0_PG) &&
- (is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)))
- return 1;
-
- if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET))
- return 1;
-
- kvm_x86_call(set_cr0)(vcpu, cr0);
-
- kvm_post_set_cr0(vcpu, old_cr0, cr0);
-
- return 0;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr0);
-
-void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
-{
- (void)kvm_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~0x0eul) | (msw & 0x0f));
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lmsw);
-
static void kvm_load_xfeatures(struct kvm_vcpu *vcpu, bool load_guest)
{
if (vcpu->arch.guest_state_protected)
@@ -1289,89 +1121,7 @@ int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_xsetbv);
-static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
-{
- return __kvm_is_valid_cr4(vcpu, cr4) &&
- kvm_x86_call(is_valid_cr4)(vcpu, cr4);
-}
-
-void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4)
-{
- if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS)
- kvm_mmu_reset_context(vcpu);
-
- /*
- * If CR4.PCIDE is changed 0 -> 1, there is no need to flush the TLB
- * according to the SDM; however, stale prev_roots could be reused
- * incorrectly in the future after a MOV to CR3 with NOFLUSH=1, so we
- * free them all. This is *not* a superset of KVM_REQ_TLB_FLUSH_GUEST
- * or KVM_REQ_TLB_FLUSH_CURRENT, because the hardware TLB is not flushed,
- * so fall through.
- */
- if (!tdp_enabled &&
- (cr4 & X86_CR4_PCIDE) && !(old_cr4 & X86_CR4_PCIDE))
- kvm_mmu_unload(vcpu);
-
- /*
- * The TLB has to be flushed for all PCIDs if any of the following
- * (architecturally required) changes happen:
- * - CR4.PCIDE is changed from 1 to 0
- * - CR4.PGE is toggled
- *
- * This is a superset of KVM_REQ_TLB_FLUSH_CURRENT.
- */
- if (((cr4 ^ old_cr4) & X86_CR4_PGE) ||
- (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
- kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
-
- /*
- * The TLB has to be flushed for the current PCID if any of the
- * following (architecturally required) changes happen:
- * - CR4.SMEP is changed from 0 to 1
- * - CR4.PAE is toggled
- */
- else if (((cr4 ^ old_cr4) & X86_CR4_PAE) ||
- ((cr4 & X86_CR4_SMEP) && !(old_cr4 & X86_CR4_SMEP)))
- kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
-
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_post_set_cr4);
-
-int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
-{
- unsigned long old_cr4 = kvm_read_cr4(vcpu);
-
- if (!kvm_is_valid_cr4(vcpu, cr4))
- return 1;
-
- if (is_long_mode(vcpu)) {
- if (!(cr4 & X86_CR4_PAE))
- return 1;
- if ((cr4 ^ old_cr4) & X86_CR4_LA57)
- return 1;
- } else if (is_paging(vcpu) && (cr4 & X86_CR4_PAE)
- && ((cr4 ^ old_cr4) & X86_CR4_PDPTR_BITS)
- && !load_pdptrs(vcpu, kvm_read_cr3(vcpu)))
- return 1;
-
- if ((cr4 & X86_CR4_PCIDE) && !(old_cr4 & X86_CR4_PCIDE)) {
- /* PCID can not be enabled when cr3[11:0]!=000H or EFER.LMA=0 */
- if ((kvm_read_cr3(vcpu) & X86_CR3_PCID_MASK) || !is_long_mode(vcpu))
- return 1;
- }
-
- if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP))
- return 1;
-
- kvm_x86_call(set_cr4)(vcpu, cr4);
-
- kvm_post_set_cr4(vcpu, old_cr4, cr4);
-
- return 0;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr4);
-
-static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
+void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
{
struct kvm_mmu *mmu = vcpu->arch.mmu;
unsigned long roots_to_free = 0;
@@ -1414,167 +1164,6 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
kvm_mmu_free_roots(vcpu->kvm, mmu, roots_to_free);
}
-int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
-{
- bool skip_tlb_flush = false;
- unsigned long pcid = 0;
-#ifdef CONFIG_X86_64
- if (kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)) {
- skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
- cr3 &= ~X86_CR3_PCID_NOFLUSH;
- pcid = cr3 & X86_CR3_PCID_MASK;
- }
-#endif
-
- /* PDPTRs are always reloaded for PAE paging. */
- if (cr3 == kvm_read_cr3(vcpu) && !is_pae_paging(vcpu))
- goto handle_tlb_flush;
-
- /*
- * Do not condition the GPA check on long mode, this helper is used to
- * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
- * the current vCPU mode is accurate.
- */
- if (!kvm_vcpu_is_legal_cr3(vcpu, cr3))
- return 1;
-
- if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
- return 1;
-
- if (cr3 != kvm_read_cr3(vcpu))
- kvm_mmu_new_pgd(vcpu, cr3);
-
- vcpu->arch.cr3 = cr3;
- kvm_register_mark_dirty(vcpu, VCPU_REG_CR3);
- /* Do not call post_set_cr3, we do not get here for confidential guests. */
-
-handle_tlb_flush:
- /*
- * A load of CR3 that flushes the TLB flushes only the current PCID,
- * even if PCID is disabled, in which case PCID=0 is flushed. It's a
- * moot point in the end because _disabling_ PCID will flush all PCIDs,
- * and it's impossible to use a non-zero PCID when PCID is disabled,
- * i.e. only PCID=0 can be relevant.
- */
- if (!skip_tlb_flush)
- kvm_invalidate_pcid(vcpu, pcid);
-
- return 0;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr3);
-
-int kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8)
-{
- if (cr8 & CR8_RESERVED_BITS)
- return 1;
- if (lapic_in_kernel(vcpu))
- kvm_lapic_set_tpr(vcpu, cr8);
- else
- vcpu->arch.cr8 = cr8;
- return 0;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr8);
-
-unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu)
-{
- if (lapic_in_kernel(vcpu))
- return kvm_lapic_get_cr8(vcpu);
- else
- return vcpu->arch.cr8;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_cr8);
-
-static void kvm_update_dr0123(struct kvm_vcpu *vcpu)
-{
- int i;
-
- if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)) {
- for (i = 0; i < KVM_NR_DB_REGS; i++)
- vcpu->arch.eff_db[i] = vcpu->arch.db[i];
- }
-}
-
-void kvm_update_dr7(struct kvm_vcpu *vcpu)
-{
- unsigned long dr7;
-
- if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)
- dr7 = vcpu->arch.guest_debug_dr7;
- else
- dr7 = vcpu->arch.dr7;
- kvm_x86_call(set_dr7)(vcpu, dr7);
- vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_BP_ENABLED;
- if (dr7 & DR7_BP_EN_MASK)
- vcpu->arch.switch_db_regs |= KVM_DEBUGREG_BP_ENABLED;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_update_dr7);
-
-static u64 kvm_dr6_fixed(struct kvm_vcpu *vcpu)
-{
- u64 fixed = DR6_FIXED_1;
-
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_RTM))
- fixed |= DR6_RTM;
-
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
- fixed |= DR6_BUS_LOCK;
- return fixed;
-}
-
-int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
-{
- size_t size = ARRAY_SIZE(vcpu->arch.db);
-
- switch (dr) {
- case 0 ... 3:
- vcpu->arch.db[array_index_nospec(dr, size)] = val;
- if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP))
- vcpu->arch.eff_db[dr] = val;
- break;
- case 4:
- case 6:
- if (!kvm_dr6_valid(val))
- return 1; /* #GP */
- vcpu->arch.dr6 = (val & DR6_VOLATILE) | kvm_dr6_fixed(vcpu);
- break;
- case 5:
- default: /* 7 */
- if (!kvm_dr7_valid(val))
- return 1; /* #GP */
- vcpu->arch.dr7 = (val & DR7_VOLATILE) | DR7_FIXED_1;
- kvm_update_dr7(vcpu);
- break;
- }
-
- return 0;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_dr);
-
-unsigned long kvm_get_dr(struct kvm_vcpu *vcpu, int dr)
-{
- size_t size = ARRAY_SIZE(vcpu->arch.db);
-
- switch (dr) {
- case 0 ... 3:
- return vcpu->arch.db[array_index_nospec(dr, size)];
- case 4:
- case 6:
- return vcpu->arch.dr6;
- case 5:
- default: /* 7 */
- return vcpu->arch.dr7;
- }
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_dr);
-
-static unsigned long kvm_get_effective_dr7(struct kvm_vcpu *vcpu)
-{
- if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)
- return vcpu->arch.guest_debug_dr7;
-
- return vcpu->arch.dr7;
-}
-
int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu)
{
u32 pmc = kvm_ecx_read(vcpu);
@@ -5532,7 +5121,7 @@ static struct kvm_queued_exception *kvm_get_exception_to_save(struct kvm_vcpu *v
return &vcpu->arch.exception;
}
-static void kvm_handle_exception_payload_quirk(struct kvm_vcpu *vcpu)
+void kvm_handle_exception_payload_quirk(struct kvm_vcpu *vcpu)
{
struct kvm_queued_exception *ex = kvm_get_exception_to_save(vcpu);
@@ -5736,57 +5325,6 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
return 0;
}
-static int kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
- struct kvm_debugregs *dbgregs)
-{
- unsigned int i;
-
- if (vcpu->kvm->arch.has_protected_state &&
- vcpu->arch.guest_state_protected)
- return -EINVAL;
-
- kvm_handle_exception_payload_quirk(vcpu);
-
- memset(dbgregs, 0, sizeof(*dbgregs));
-
- BUILD_BUG_ON(ARRAY_SIZE(vcpu->arch.db) != ARRAY_SIZE(dbgregs->db));
- for (i = 0; i < ARRAY_SIZE(vcpu->arch.db); i++)
- dbgregs->db[i] = vcpu->arch.db[i];
-
- dbgregs->dr6 = vcpu->arch.dr6;
- dbgregs->dr7 = vcpu->arch.dr7;
- return 0;
-}
-
-static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
- struct kvm_debugregs *dbgregs)
-{
- unsigned int i;
-
- if (vcpu->kvm->arch.has_protected_state &&
- vcpu->arch.guest_state_protected)
- return -EINVAL;
-
- if (dbgregs->flags)
- return -EINVAL;
-
- if (!kvm_dr6_valid(dbgregs->dr6))
- return -EINVAL;
- if (!kvm_dr7_valid(dbgregs->dr7))
- return -EINVAL;
-
- for (i = 0; i < ARRAY_SIZE(vcpu->arch.db); i++)
- vcpu->arch.db[i] = dbgregs->db[i];
-
- kvm_update_dr0123(vcpu);
- vcpu->arch.dr6 = dbgregs->dr6;
- vcpu->arch.dr7 = dbgregs->dr7;
- kvm_update_dr7(vcpu);
-
- return 0;
-}
-
-
static int kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
u8 *state, unsigned int size)
{
@@ -6623,7 +6161,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = -ENOMEM;
if (!u.sregs2)
goto out;
- __get_sregs2(vcpu, u.sregs2);
+ kvm_x86_vcpu_ioctl_get_sregs2(vcpu, u.sregs2);
r = -EFAULT;
if (copy_to_user(argp, u.sregs2, sizeof(struct kvm_sregs2)))
goto out;
@@ -6642,7 +6180,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
u.sregs2 = NULL;
goto out;
}
- r = __set_sregs2(vcpu, u.sregs2);
+ r = kvm_x86_vcpu_ioctl_set_sregs2(vcpu, u.sregs2);
break;
}
case KVM_HAS_DEVICE_ATTR:
@@ -8492,11 +8030,6 @@ static int emulator_pio_out_emulated(struct x86_emulate_ctxt *ctxt,
return emulator_pio_out(emul_to_vcpu(ctxt), size, port, val, count);
}
-static unsigned long get_segment_base(struct kvm_vcpu *vcpu, int seg)
-{
- return kvm_x86_call(get_segment_base)(vcpu, seg);
-}
-
static void emulator_invlpg(struct x86_emulate_ctxt *ctxt, ulong address)
{
kvm_mmu_invlpg(emul_to_vcpu(ctxt), address);
@@ -8641,7 +8174,7 @@ static void emulator_set_idt(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt)
static unsigned long emulator_get_cached_segment_base(
struct x86_emulate_ctxt *ctxt, int seg)
{
- return get_segment_base(emul_to_vcpu(ctxt), seg);
+ return kvm_get_segment_base(emul_to_vcpu(ctxt), seg);
}
static bool emulator_get_segment(struct x86_emulate_ctxt *ctxt, u16 *selector,
@@ -12073,179 +11606,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
return r;
}
-static void __get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
-{
- if (vcpu->arch.emulate_regs_need_sync_to_vcpu) {
- /*
- * We are here if userspace calls get_regs() in the middle of
- * instruction emulation. Registers state needs to be copied
- * back from emulation context to vcpu. Userspace shouldn't do
- * that usually, but some bad designed PV devices (vmware
- * backdoor interface) need this to work
- */
- emulator_writeback_register_cache(vcpu->arch.emulate_ctxt);
- vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
- }
- regs->rax = kvm_rax_read_raw(vcpu);
- regs->rbx = kvm_rbx_read_raw(vcpu);
- regs->rcx = kvm_rcx_read_raw(vcpu);
- regs->rdx = kvm_rdx_read_raw(vcpu);
- regs->rsi = kvm_rsi_read_raw(vcpu);
- regs->rdi = kvm_rdi_read_raw(vcpu);
- regs->rsp = kvm_rsp_read(vcpu);
- regs->rbp = kvm_rbp_read_raw(vcpu);
-#ifdef CONFIG_X86_64
- regs->r8 = kvm_r8_read_raw(vcpu);
- regs->r9 = kvm_r9_read_raw(vcpu);
- regs->r10 = kvm_r10_read_raw(vcpu);
- regs->r11 = kvm_r11_read_raw(vcpu);
- regs->r12 = kvm_r12_read_raw(vcpu);
- regs->r13 = kvm_r13_read_raw(vcpu);
- regs->r14 = kvm_r14_read_raw(vcpu);
- regs->r15 = kvm_r15_read_raw(vcpu);
-#endif
-
- regs->rip = kvm_rip_read(vcpu);
- regs->rflags = kvm_get_rflags(vcpu);
-}
-
-int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
-{
- if (vcpu->kvm->arch.has_protected_state &&
- vcpu->arch.guest_state_protected)
- return -EINVAL;
-
- vcpu_load(vcpu);
- __get_regs(vcpu, regs);
- vcpu_put(vcpu);
- return 0;
-}
-
-static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
-{
- vcpu->arch.emulate_regs_need_sync_from_vcpu = true;
- vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
-
- kvm_rax_write_raw(vcpu, regs->rax);
- kvm_rbx_write_raw(vcpu, regs->rbx);
- kvm_rcx_write_raw(vcpu, regs->rcx);
- kvm_rdx_write_raw(vcpu, regs->rdx);
- kvm_rsi_write_raw(vcpu, regs->rsi);
- kvm_rdi_write_raw(vcpu, regs->rdi);
- kvm_rsp_write(vcpu, regs->rsp);
- kvm_rbp_write_raw(vcpu, regs->rbp);
-#ifdef CONFIG_X86_64
- kvm_r8_write_raw(vcpu, regs->r8);
- kvm_r9_write_raw(vcpu, regs->r9);
- kvm_r10_write_raw(vcpu, regs->r10);
- kvm_r11_write_raw(vcpu, regs->r11);
- kvm_r12_write_raw(vcpu, regs->r12);
- kvm_r13_write_raw(vcpu, regs->r13);
- kvm_r14_write_raw(vcpu, regs->r14);
- kvm_r15_write_raw(vcpu, regs->r15);
-#endif
-
- kvm_rip_write(vcpu, regs->rip);
- kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED);
-
- vcpu->arch.exception.pending = false;
- vcpu->arch.exception_vmexit.pending = false;
-
- kvm_make_request(KVM_REQ_EVENT, vcpu);
-}
-
-int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
-{
- if (vcpu->kvm->arch.has_protected_state &&
- vcpu->arch.guest_state_protected)
- return -EINVAL;
-
- vcpu_load(vcpu);
- __set_regs(vcpu, regs);
- vcpu_put(vcpu);
- return 0;
-}
-
-static void __get_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
-{
- struct desc_ptr dt;
-
- if (vcpu->arch.guest_state_protected)
- goto skip_protected_regs;
-
- kvm_handle_exception_payload_quirk(vcpu);
-
- kvm_get_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
- kvm_get_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
- kvm_get_segment(vcpu, &sregs->es, VCPU_SREG_ES);
- kvm_get_segment(vcpu, &sregs->fs, VCPU_SREG_FS);
- kvm_get_segment(vcpu, &sregs->gs, VCPU_SREG_GS);
- kvm_get_segment(vcpu, &sregs->ss, VCPU_SREG_SS);
-
- kvm_get_segment(vcpu, &sregs->tr, VCPU_SREG_TR);
- kvm_get_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR);
-
- kvm_x86_call(get_idt)(vcpu, &dt);
- sregs->idt.limit = dt.size;
- sregs->idt.base = dt.address;
- kvm_x86_call(get_gdt)(vcpu, &dt);
- sregs->gdt.limit = dt.size;
- sregs->gdt.base = dt.address;
-
- sregs->cr2 = vcpu->arch.cr2;
- sregs->cr3 = kvm_read_cr3(vcpu);
-
-skip_protected_regs:
- sregs->cr0 = kvm_read_cr0(vcpu);
- sregs->cr4 = kvm_read_cr4(vcpu);
- sregs->cr8 = kvm_get_cr8(vcpu);
- sregs->efer = vcpu->arch.efer;
- sregs->apic_base = vcpu->arch.apic_base;
-}
-
-static void __get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
-{
- __get_sregs_common(vcpu, sregs);
-
- if (vcpu->arch.guest_state_protected)
- return;
-
- if (vcpu->arch.interrupt.injected && !vcpu->arch.interrupt.soft)
- set_bit(vcpu->arch.interrupt.nr,
- (unsigned long *)sregs->interrupt_bitmap);
-}
-
-static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
-{
- int i;
-
- __get_sregs_common(vcpu, (struct kvm_sregs *)sregs2);
-
- if (vcpu->arch.guest_state_protected)
- return;
-
- if (is_pae_paging(vcpu)) {
- kvm_vcpu_srcu_read_lock(vcpu);
- for (i = 0 ; i < 4 ; i++)
- sregs2->pdptrs[i] = kvm_pdptr_read(vcpu, i);
- sregs2->flags |= KVM_SREGS2_FLAGS_PDPTRS_VALID;
- kvm_vcpu_srcu_read_unlock(vcpu);
- }
-}
-
-int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
- struct kvm_sregs *sregs)
-{
- if (vcpu->kvm->arch.has_protected_state &&
- vcpu->arch.guest_state_protected)
- return -EINVAL;
-
- vcpu_load(vcpu);
- __get_sregs(vcpu, sregs);
- vcpu_put(vcpu);
- return 0;
-}
-
int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state)
{
@@ -12365,175 +11725,6 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_task_switch);
-static bool kvm_is_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
-{
- if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG)) {
- /*
- * When EFER.LME and CR0.PG are set, the processor is in
- * 64-bit mode (though maybe in a 32-bit code segment).
- * CR4.PAE and EFER.LMA must be set.
- */
- if (!(sregs->cr4 & X86_CR4_PAE) || !(sregs->efer & EFER_LMA))
- return false;
- if (!kvm_vcpu_is_legal_cr3(vcpu, sregs->cr3))
- return false;
- } else {
- /*
- * Not in 64-bit mode: EFER.LMA is clear and the code
- * segment cannot be 64-bit.
- */
- if (sregs->efer & EFER_LMA || sregs->cs.l)
- return false;
- }
-
- return kvm_is_valid_cr4(vcpu, sregs->cr4) &&
- kvm_is_valid_cr0(vcpu, sregs->cr0);
-}
-
-static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
- int *mmu_reset_needed, bool update_pdptrs)
-{
- int idx;
- struct desc_ptr dt;
-
- if (!kvm_is_valid_sregs(vcpu, sregs))
- return -EINVAL;
-
- if (kvm_apic_set_base(vcpu, sregs->apic_base, true))
- return -EINVAL;
-
- if (vcpu->arch.guest_state_protected)
- return 0;
-
- dt.size = sregs->idt.limit;
- dt.address = sregs->idt.base;
- kvm_x86_call(set_idt)(vcpu, &dt);
- dt.size = sregs->gdt.limit;
- dt.address = sregs->gdt.base;
- kvm_x86_call(set_gdt)(vcpu, &dt);
-
- vcpu->arch.cr2 = sregs->cr2;
- *mmu_reset_needed |= kvm_read_cr3(vcpu) != sregs->cr3;
- vcpu->arch.cr3 = sregs->cr3;
- kvm_register_mark_dirty(vcpu, VCPU_REG_CR3);
- kvm_x86_call(post_set_cr3)(vcpu, sregs->cr3);
-
- kvm_set_cr8(vcpu, sregs->cr8);
-
- *mmu_reset_needed |= vcpu->arch.efer != sregs->efer;
- kvm_x86_call(set_efer)(vcpu, sregs->efer);
-
- *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
- kvm_x86_call(set_cr0)(vcpu, sregs->cr0);
-
- *mmu_reset_needed |= kvm_read_cr4(vcpu) != sregs->cr4;
- kvm_x86_call(set_cr4)(vcpu, sregs->cr4);
-
- if (update_pdptrs) {
- idx = srcu_read_lock(&vcpu->kvm->srcu);
- if (is_pae_paging(vcpu)) {
- load_pdptrs(vcpu, kvm_read_cr3(vcpu));
- *mmu_reset_needed = 1;
- }
- srcu_read_unlock(&vcpu->kvm->srcu, idx);
- }
-
- kvm_set_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
- kvm_set_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
- kvm_set_segment(vcpu, &sregs->es, VCPU_SREG_ES);
- kvm_set_segment(vcpu, &sregs->fs, VCPU_SREG_FS);
- kvm_set_segment(vcpu, &sregs->gs, VCPU_SREG_GS);
- kvm_set_segment(vcpu, &sregs->ss, VCPU_SREG_SS);
-
- kvm_set_segment(vcpu, &sregs->tr, VCPU_SREG_TR);
- kvm_set_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR);
-
- kvm_lapic_update_cr8_intercept(vcpu);
-
- /* Older userspace won't unhalt the vcpu on reset. */
- if (kvm_vcpu_is_bsp(vcpu) && kvm_rip_read(vcpu) == 0xfff0 &&
- sregs->cs.selector == 0xf000 && sregs->cs.base == 0xffff0000 &&
- !is_protmode(vcpu))
- kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
-
- return 0;
-}
-
-static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
-{
- int pending_vec, max_bits;
- int mmu_reset_needed = 0;
- int ret = __set_sregs_common(vcpu, sregs, &mmu_reset_needed, true);
-
- if (ret)
- return ret;
-
- if (mmu_reset_needed) {
- kvm_mmu_reset_context(vcpu);
- kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
- }
-
- max_bits = KVM_NR_INTERRUPTS;
- pending_vec = find_first_bit(
- (const unsigned long *)sregs->interrupt_bitmap, max_bits);
-
- if (pending_vec < max_bits) {
- kvm_queue_interrupt(vcpu, pending_vec, false);
- pr_debug("Set back pending irq %d\n", pending_vec);
- kvm_make_request(KVM_REQ_EVENT, vcpu);
- }
- return 0;
-}
-
-static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
-{
- int mmu_reset_needed = 0;
- bool valid_pdptrs = sregs2->flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
- bool pae = (sregs2->cr0 & X86_CR0_PG) && (sregs2->cr4 & X86_CR4_PAE) &&
- !(sregs2->efer & EFER_LMA);
- int i, ret;
-
- if (sregs2->flags & ~KVM_SREGS2_FLAGS_PDPTRS_VALID)
- return -EINVAL;
-
- if (valid_pdptrs && (!pae || vcpu->arch.guest_state_protected))
- return -EINVAL;
-
- ret = __set_sregs_common(vcpu, (struct kvm_sregs *)sregs2,
- &mmu_reset_needed, !valid_pdptrs);
- if (ret)
- return ret;
-
- if (valid_pdptrs) {
- for (i = 0; i < 4 ; i++)
- kvm_pdptr_write(vcpu, i, sregs2->pdptrs[i]);
-
- kvm_register_mark_dirty(vcpu, VCPU_REG_PDPTR);
- mmu_reset_needed = 1;
- vcpu->arch.pdptrs_from_userspace = true;
- }
- if (mmu_reset_needed) {
- kvm_mmu_reset_context(vcpu);
- kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
- }
- return 0;
-}
-
-int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
- struct kvm_sregs *sregs)
-{
- int ret;
-
- if (vcpu->kvm->arch.has_protected_state &&
- vcpu->arch.guest_state_protected)
- return -EINVAL;
-
- vcpu_load(vcpu);
- ret = __set_sregs(vcpu, sregs);
- vcpu_put(vcpu);
- return ret;
-}
-
static void kvm_arch_vcpu_guestdbg_update_apicv_inhibit(struct kvm *kvm)
{
bool set = false;
@@ -12691,11 +11882,7 @@ static void store_regs(struct kvm_vcpu *vcpu)
{
BUILD_BUG_ON(sizeof(struct kvm_sync_regs) > SYNC_REGS_SIZE_BYTES);
- if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_REGS)
- __get_regs(vcpu, &vcpu->run->s.regs.regs);
-
- if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_SREGS)
- __get_sregs(vcpu, &vcpu->run->s.regs.sregs);
+ kvm_run_sync_regs_to_user(vcpu);
if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_EVENTS)
kvm_vcpu_ioctl_x86_get_vcpu_events(
@@ -12704,19 +11891,8 @@ static void store_regs(struct kvm_vcpu *vcpu)
static int sync_regs(struct kvm_vcpu *vcpu)
{
- if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_REGS) {
- __set_regs(vcpu, &vcpu->run->s.regs.regs);
- vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_REGS;
- }
-
- if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_SREGS) {
- struct kvm_sregs sregs = vcpu->run->s.regs.sregs;
-
- if (__set_sregs(vcpu, &sregs))
- return -EINVAL;
-
- vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_SREGS;
- }
+ if (kvm_run_sync_regs_from_user(vcpu))
+ return -EINVAL;
if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_EVENTS) {
struct kvm_vcpu_events events = vcpu->run->s.regs.events;
@@ -13818,51 +12994,6 @@ int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu)
return kvm_x86_call(interrupt_allowed)(vcpu, false);
}
-unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu)
-{
- /* Can't read the RIP when guest state is protected, just return 0 */
- if (vcpu->arch.guest_state_protected)
- return 0;
-
- if (is_64_bit_mode(vcpu))
- return kvm_rip_read(vcpu);
- return (u32)(get_segment_base(vcpu, VCPU_SREG_CS) +
- kvm_rip_read(vcpu));
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_linear_rip);
-
-bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip)
-{
- return kvm_get_linear_rip(vcpu) == linear_rip;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_is_linear_rip);
-
-unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
-{
- unsigned long rflags;
-
- rflags = kvm_x86_call(get_rflags)(vcpu);
- if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
- rflags &= ~X86_EFLAGS_TF;
- return rflags;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_rflags);
-
-static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
-{
- if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP &&
- kvm_is_linear_rip(vcpu, vcpu->arch.singlestep_rip))
- rflags |= X86_EFLAGS_TF;
- kvm_x86_call(set_rflags)(vcpu, rflags);
-}
-
-void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
-{
- __kvm_set_rflags(vcpu, rflags);
- kvm_make_request(KVM_REQ_EVENT, vcpu);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_rflags);
-
static inline u32 kvm_async_pf_hash_fn(gfn_t gfn)
{
BUILD_BUG_ON(!is_power_of_2(ASYNC_PF_PER_VCPU));
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index acb22167901f..80ed36d5d62a 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -403,6 +403,7 @@ int handle_ud(struct kvm_vcpu *vcpu);
void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu,
struct kvm_queued_exception *ex);
+void kvm_handle_exception_payload_quirk(struct kvm_vcpu *vcpu);
int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data);
int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
@@ -597,6 +598,7 @@ static inline void kvm_machine_check(void)
int kvm_spec_ctrl_test_value(u64 value);
int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
struct x86_exception *e);
+void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid);
int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva);
bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 15/40] KVM: x86: Move the bulk of register specific code from x86.c to regs.c
2026-05-29 22:21 ` [PATCH v3 15/40] KVM: x86: Move the bulk of register specific code from x86.c to regs.c Sean Christopherson
@ 2026-05-30 0:43 ` Yosry Ahmed
2026-06-01 14:15 ` Sean Christopherson
2026-06-03 11:33 ` Huang, Kai
1 sibling, 1 reply; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 0:43 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:21:58PM -0700, Sean Christopherson wrote:
> Introduce regs.c, and move the vast majority of register specific code out
> of x86.c and into regs.c. Deliberately leave behind MSR code (except for
> EFER, which can hardly be called an MSR), as KVM's MSR support is complex
> enough to warrant its own compilation unit, and doesn't have much in common
> with the other register code.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
This is not just code movement. You're also renaming and introducing new
helpers in the process, making the patch not so easy to review.
A wise mine once told me not to do this:
https://lore.kernel.org/kvm/aYU87QeMg8_kTM-G@google.com/
I have been waiting a few months for this, so here goes:
Stop. Bundling. Things. Together.
^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v3 15/40] KVM: x86: Move the bulk of register specific code from x86.c to regs.c
2026-05-30 0:43 ` Yosry Ahmed
@ 2026-06-01 14:15 ` Sean Christopherson
2026-06-01 23:35 ` Yosry Ahmed
0 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-06-01 14:15 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Sat, May 30, 2026, Yosry Ahmed wrote:
> On Fri, May 29, 2026 at 03:21:58PM -0700, Sean Christopherson wrote:
> > Introduce regs.c, and move the vast majority of register specific code out
> > of x86.c and into regs.c. Deliberately leave behind MSR code (except for
> > EFER, which can hardly be called an MSR), as KVM's MSR support is complex
> > enough to warrant its own compilation unit, and doesn't have much in common
> > with the other register code.
> >
> > No functional change intended.
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
>
> This is not just code movement. You're also renaming and introducing new
> helpers in the process, making the patch not so easy to review.
>
> A wise mine once told me not to do this:
> https://lore.kernel.org/kvm/aYU87QeMg8_kTM-G@google.com/
>
> I have been waiting a few months for this, so here goes:
>
> Stop. Bundling. Things. Together.
Gooood. Use your aggressive feelings, boy. Let the hate flow through you!
^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v3 15/40] KVM: x86: Move the bulk of register specific code from x86.c to regs.c
2026-06-01 14:15 ` Sean Christopherson
@ 2026-06-01 23:35 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-06-01 23:35 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Mon, Jun 01, 2026 at 07:15:37AM -0700, Sean Christopherson wrote:
> On Sat, May 30, 2026, Yosry Ahmed wrote:
> > On Fri, May 29, 2026 at 03:21:58PM -0700, Sean Christopherson wrote:
> > > Introduce regs.c, and move the vast majority of register specific code out
> > > of x86.c and into regs.c. Deliberately leave behind MSR code (except for
> > > EFER, which can hardly be called an MSR), as KVM's MSR support is complex
> > > enough to warrant its own compilation unit, and doesn't have much in common
> > > with the other register code.
> > >
> > > No functional change intended.
> > >
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> >
> > This is not just code movement. You're also renaming and introducing new
> > helpers in the process, making the patch not so easy to review.
> >
> > A wise mine once told me not to do this:
> > https://lore.kernel.org/kvm/aYU87QeMg8_kTM-G@google.com/
> >
> > I have been waiting a few months for this, so here goes:
> >
> > Stop. Bundling. Things. Together.
>
> Gooood. Use your aggressive feelings, boy. Let the hate flow through you!
I guess I did turn to the dark side after all.
^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v3 15/40] KVM: x86: Move the bulk of register specific code from x86.c to regs.c
2026-05-29 22:21 ` [PATCH v3 15/40] KVM: x86: Move the bulk of register specific code from x86.c to regs.c Sean Christopherson
2026-05-30 0:43 ` Yosry Ahmed
@ 2026-06-03 11:33 ` Huang, Kai
1 sibling, 0 replies; 87+ messages in thread
From: Huang, Kai @ 2026-06-03 11:33 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com,
dwmw2@infradead.org, paul@xen.org
Cc: kvm@vger.kernel.org, dwmw@amazon.co.uk,
linux-kernel@vger.kernel.org, yosry@kernel.org,
binbin.wu@linux.intel.com
On Fri, 2026-05-29 at 15:21 -0700, Sean Christopherson wrote:
> Introduce regs.c, and move the vast majority of register specific code out
> of x86.c and into regs.c. Deliberately leave behind MSR code (except for
> EFER, which can hardly be called an MSR), as KVM's MSR support is complex
> enough to warrant its own compilation unit, and doesn't have much in common
> with the other register code.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
>
Reviewed-by: Kai Huang <kai.huang@intel.com>
> @@ -12691,11 +11882,7 @@ static void store_regs(struct kvm_vcpu *vcpu)
> {
> BUILD_BUG_ON(sizeof(struct kvm_sync_regs) > SYNC_REGS_SIZE_BYTES);
This BUILD_BUG_ON() can be removed too since the new kvm_run_sync_regs_to_user()
already has it.
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 16/40] KVM: x86: Move local APIC specific helpers out of asm/kvm_host.h
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (14 preceding siblings ...)
2026-05-29 22:21 ` [PATCH v3 15/40] KVM: x86: Move the bulk of register specific code from x86.c to regs.c Sean Christopherson
@ 2026-05-29 22:21 ` Sean Christopherson
2026-05-30 0:37 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 17/40] KVM: x86: Drop defunct vcpu_tsc_khz() declaration Sean Christopherson
` (24 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:21 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Move single-use local APIC IRQ helpers out of asm/kvm_host.h so that they
are co-located with their user, and not exposed to the broader world.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 12 ------------
arch/x86/kvm/irq.c | 7 +++++++
arch/x86/kvm/lapic.h | 5 +++++
3 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 983bdc84f9f9..ee205f8ad5af 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1772,11 +1772,6 @@ struct kvm_lapic_irq {
bool msi_redir_hint;
};
-static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
-{
- return dest_mode_logical ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL;
-}
-
enum kvm_x86_run_flags {
KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0),
KVM_RUN_LOAD_GUEST_DR6 = BIT(1),
@@ -2510,13 +2505,6 @@ void __user *__x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu);
bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu);
-static inline bool kvm_irq_is_postable(struct kvm_lapic_irq *irq)
-{
- /* We can only post Fixed and LowPrio IRQs */
- return (irq->delivery_mode == APIC_DM_FIXED ||
- irq->delivery_mode == APIC_DM_LOWEST);
-}
-
static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
{
kvm_x86_call(vcpu_blocking)(vcpu);
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 9519fec09ee6..2b4e68e7cadb 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -423,6 +423,13 @@ void kvm_arch_irq_routing_update(struct kvm *kvm)
kvm_make_scan_ioapic_request(kvm);
}
+static bool kvm_irq_is_postable(struct kvm_lapic_irq *irq)
+{
+ /* We can only post Fixed and LowPrio IRQs */
+ return (irq->delivery_mode == APIC_DM_FIXED ||
+ irq->delivery_mode == APIC_DM_LOWEST);
+}
+
static int kvm_pi_update_irte(struct kvm_kernel_irqfd *irqfd,
struct kvm_kernel_irq_routing_entry *entry)
{
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 71970213dc1f..32f09b25884a 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -237,6 +237,11 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
return lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events);
}
+static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
+{
+ return dest_mode_logical ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL;
+}
+
bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
bool kvm_lapic_suppress_eoi_broadcast(struct kvm_lapic *apic);
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 16/40] KVM: x86: Move local APIC specific helpers out of asm/kvm_host.h
2026-05-29 22:21 ` [PATCH v3 16/40] KVM: x86: Move local APIC specific helpers out of asm/kvm_host.h Sean Christopherson
@ 2026-05-30 0:37 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 0:37 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:21:59PM -0700, Sean Christopherson wrote:
> Move single-use local APIC IRQ helpers out of asm/kvm_host.h so that they
> are co-located with their user, and not exposed to the broader world.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 17/40] KVM: x86: Drop defunct vcpu_tsc_khz() declaration
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (15 preceding siblings ...)
2026-05-29 22:21 ` [PATCH v3 16/40] KVM: x86: Move local APIC specific helpers out of asm/kvm_host.h Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 0:45 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 18/40] KVM: x86: Move kvm_caps and kvm_host_values to asm/kvm_host.h Sean Christopherson
` (23 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Remove a dead vcpu_tsc_khz() declaration. No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ee205f8ad5af..2e535027dd5c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2146,8 +2146,6 @@ int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
extern bool tdp_enabled;
-u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
-
/*
* EMULTYPE_NO_DECODE - Set when re-emulating an instruction (after completing
* userspace I/O) to indicate that the emulation context
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 17/40] KVM: x86: Drop defunct vcpu_tsc_khz() declaration
2026-05-29 22:22 ` [PATCH v3 17/40] KVM: x86: Drop defunct vcpu_tsc_khz() declaration Sean Christopherson
@ 2026-05-30 0:45 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 0:45 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:00PM -0700, Sean Christopherson wrote:
> Remove a dead vcpu_tsc_khz() declaration. No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 18/40] KVM: x86: Move kvm_caps and kvm_host_values to asm/kvm_host.h
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (16 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 17/40] KVM: x86: Drop defunct vcpu_tsc_khz() declaration Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 0:46 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 19/40] KVM: x86: Swap the include order between x86.h and mmu.h Sean Christopherson
` (22 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Relocate the kvm_caps and kvm_host_values struct definitions and their
associated global variable declarations to asm/kvm_host.h to allow for a
variety of cleanups in x86.h and mmu.h, and to establish a (hopefully)
maintainable rule that asm/kvm_host.h's role is to define common
structures (and declare any associated globals), and anything needed by
arch-neutral KVM.
While it would be lovely to trim kvm_host.h down to the point where it
*only* holds things needed by arch-neutral and/or non-KVM code, multiple
attempts to do just that have failed miserably. Trying to "hide" code
from arch-neutral KVM is too restrictive (and ultimately pointless), and
KVM x86 itself also needs a place to define common structures and their
globals, e.g. to avoid inconsistent header include chains and/or misplaced
helpers.
E.g. as pointed out by Kai, it's weird that x86.h, which is a kitchen sink
of sorts, includes regs.h, but not mmu.h. Literally the only reason that
x86.h doesn't include mmu.h is that mmu.h references struct kvm_host, which
is currently defined in x86.h. As a result of odd include ordering, the
very clearly MMU-specific helper mmu_is_nested() lives in x86.h, not mmu.h
"Fix" the kvm_host dependency so that x86.h can be the "central" include
everyone expects it to be, and set KVM x86 on the path to having somewhat
sensible "rules" for what goes where:
- asm/kvm_host.h holds "common" structure definitions and associated key
global variables, and things that are referenced by arch-neutral KVM.
- <thing>.{c,h} holds relevant declarations and definitions.
- x86.{c,h} is the kitchen sink for everything else.
Cc: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 44 ++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.h | 45 ---------------------------------
2 files changed, 44 insertions(+), 45 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2e535027dd5c..f7130eb98473 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -315,6 +315,50 @@ enum x86_intercept_stage;
struct kvm_kernel_irqfd;
struct kvm_kernel_irq_routing_entry;
+struct kvm_caps {
+ /* control of guest tsc rate supported? */
+ bool has_tsc_control;
+ /* maximum supported tsc_khz for guests */
+ u32 max_guest_tsc_khz;
+ /* number of bits of the fractional part of the TSC scaling ratio */
+ u8 tsc_scaling_ratio_frac_bits;
+ /* maximum allowed value of TSC scaling ratio */
+ u64 max_tsc_scaling_ratio;
+ /* 1ull << kvm_caps.tsc_scaling_ratio_frac_bits */
+ u64 default_tsc_scaling_ratio;
+ /* bus lock detection supported? */
+ bool has_bus_lock_exit;
+ /* notify VM exit supported? */
+ bool has_notify_vmexit;
+ /* bit mask of VM types */
+ u32 supported_vm_types;
+
+ u64 supported_mce_cap;
+ u64 supported_xcr0;
+ u64 supported_xss;
+ u64 supported_perf_cap;
+
+ u64 supported_quirks;
+ u64 inapplicable_quirks;
+};
+extern struct kvm_caps kvm_caps;
+
+struct kvm_host_values {
+ /*
+ * The host's raw MAXPHYADDR, i.e. the number of non-reserved physical
+ * address bits irrespective of features that repurpose legal bits,
+ * e.g. MKTME.
+ */
+ u8 maxphyaddr;
+
+ u64 efer;
+ u64 xcr0;
+ u64 xss;
+ u64 s_cet;
+ u64 arch_capabilities;
+};
+extern struct kvm_host_values kvm_host;
+
/*
* kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
* also includes TDP pages) to determine whether or not a page can be used in
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 80ed36d5d62a..b7d3b54cde15 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -12,48 +12,6 @@
#define KVM_MAX_MCE_BANKS 32
-struct kvm_caps {
- /* control of guest tsc rate supported? */
- bool has_tsc_control;
- /* maximum supported tsc_khz for guests */
- u32 max_guest_tsc_khz;
- /* number of bits of the fractional part of the TSC scaling ratio */
- u8 tsc_scaling_ratio_frac_bits;
- /* maximum allowed value of TSC scaling ratio */
- u64 max_tsc_scaling_ratio;
- /* 1ull << kvm_caps.tsc_scaling_ratio_frac_bits */
- u64 default_tsc_scaling_ratio;
- /* bus lock detection supported? */
- bool has_bus_lock_exit;
- /* notify VM exit supported? */
- bool has_notify_vmexit;
- /* bit mask of VM types */
- u32 supported_vm_types;
-
- u64 supported_mce_cap;
- u64 supported_xcr0;
- u64 supported_xss;
- u64 supported_perf_cap;
-
- u64 supported_quirks;
- u64 inapplicable_quirks;
-};
-
-struct kvm_host_values {
- /*
- * The host's raw MAXPHYADDR, i.e. the number of non-reserved physical
- * address bits irrespective of features that repurpose legal bits,
- * e.g. MKTME.
- */
- u8 maxphyaddr;
-
- u64 efer;
- u64 xcr0;
- u64 xss;
- u64 s_cet;
- u64 arch_capabilities;
-};
-
void kvm_spurious_fault(void);
#define SIZE_OF_MEMSLOTS_HASHTABLE \
@@ -417,9 +375,6 @@ fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu);
fastpath_t handle_fastpath_invd(struct kvm_vcpu *vcpu);
-extern struct kvm_caps kvm_caps;
-extern struct kvm_host_values kvm_host;
-
void kvm_setup_xss_caps(void);
/*
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 18/40] KVM: x86: Move kvm_caps and kvm_host_values to asm/kvm_host.h
2026-05-29 22:22 ` [PATCH v3 18/40] KVM: x86: Move kvm_caps and kvm_host_values to asm/kvm_host.h Sean Christopherson
@ 2026-05-30 0:46 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 0:46 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:01PM -0700, Sean Christopherson wrote:
> Relocate the kvm_caps and kvm_host_values struct definitions and their
> associated global variable declarations to asm/kvm_host.h to allow for a
> variety of cleanups in x86.h and mmu.h, and to establish a (hopefully)
> maintainable rule that asm/kvm_host.h's role is to define common
> structures (and declare any associated globals), and anything needed by
> arch-neutral KVM.
>
> While it would be lovely to trim kvm_host.h down to the point where it
> *only* holds things needed by arch-neutral and/or non-KVM code, multiple
> attempts to do just that have failed miserably. Trying to "hide" code
> from arch-neutral KVM is too restrictive (and ultimately pointless), and
> KVM x86 itself also needs a place to define common structures and their
> globals, e.g. to avoid inconsistent header include chains and/or misplaced
> helpers.
>
> E.g. as pointed out by Kai, it's weird that x86.h, which is a kitchen sink
> of sorts, includes regs.h, but not mmu.h. Literally the only reason that
> x86.h doesn't include mmu.h is that mmu.h references struct kvm_host, which
> is currently defined in x86.h. As a result of odd include ordering, the
> very clearly MMU-specific helper mmu_is_nested() lives in x86.h, not mmu.h
>
> "Fix" the kvm_host dependency so that x86.h can be the "central" include
> everyone expects it to be, and set KVM x86 on the path to having somewhat
> sensible "rules" for what goes where:
>
> - asm/kvm_host.h holds "common" structure definitions and associated key
> global variables, and things that are referenced by arch-neutral KVM.
> - <thing>.{c,h} holds relevant declarations and definitions.
> - x86.{c,h} is the kitchen sink for everything else.
>
> Cc: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 19/40] KVM: x86: Swap the include order between x86.h and mmu.h
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (17 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 18/40] KVM: x86: Move kvm_caps and kvm_host_values to asm/kvm_host.h Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 0:48 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 20/40] KVM: x86: Move tdp_enabled from kvm_host.h to mmu.h Sean Christopherson
` (21 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Invert the include ordering between x86.h and mmu.h, so that x86.h is the
"top-level" include for KVM x86.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu.h | 6 +++++-
arch/x86/kvm/x86.h | 6 +-----
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index e1bb663ebbd5..28fca48dcf64 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -4,7 +4,6 @@
#include <linux/kvm_host.h>
#include "regs.h"
-#include "x86.h"
#include "cpuid.h"
extern bool __read_mostly enable_mmio_caching;
@@ -300,6 +299,11 @@ static inline void kvm_update_page_stats(struct kvm *kvm, int level, int count)
atomic64_add(count, &kvm->stat.pages[level - 1]);
}
+static inline bool mmu_is_nested(struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.walk_mmu == &vcpu->arch.nested_mmu;
+}
+
static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *vcpu,
struct kvm_mmu *mmu,
gpa_t gpa, u64 access,
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index b7d3b54cde15..a0e68eaf1f80 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -6,6 +6,7 @@
#include <asm/fpu/xstate.h>
#include <asm/mce.h>
#include <asm/pvclock.h>
+#include "mmu.h"
#include "regs.h"
#include "kvm_emulate.h"
#include "cpuid.h"
@@ -210,11 +211,6 @@ static inline bool x86_exception_has_error_code(unsigned int vector)
return (1U << vector) & exception_has_error_code;
}
-static inline bool mmu_is_nested(struct kvm_vcpu *vcpu)
-{
- return vcpu->arch.walk_mmu == &vcpu->arch.nested_mmu;
-}
-
static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
{
return kvm_is_cr4_bit_set(vcpu, X86_CR4_LA57) ? 57 : 48;
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 19/40] KVM: x86: Swap the include order between x86.h and mmu.h
2026-05-29 22:22 ` [PATCH v3 19/40] KVM: x86: Swap the include order between x86.h and mmu.h Sean Christopherson
@ 2026-05-30 0:48 ` Yosry Ahmed
2026-06-01 14:55 ` Sean Christopherson
0 siblings, 1 reply; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 0:48 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:02PM -0700, Sean Christopherson wrote:
> Invert the include ordering between x86.h and mmu.h, so that x86.h is the
> "top-level" include for KVM x86.
You're also silently moving mmu_is_nested().
Aside from that, I thought top-level include means that other headers
will include it, and it will include fewer headers. Seems like this is
doing the opposite?
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/mmu.h | 6 +++++-
> arch/x86/kvm/x86.h | 6 +-----
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index e1bb663ebbd5..28fca48dcf64 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -4,7 +4,6 @@
>
> #include <linux/kvm_host.h>
> #include "regs.h"
> -#include "x86.h"
> #include "cpuid.h"
>
> extern bool __read_mostly enable_mmio_caching;
> @@ -300,6 +299,11 @@ static inline void kvm_update_page_stats(struct kvm *kvm, int level, int count)
> atomic64_add(count, &kvm->stat.pages[level - 1]);
> }
>
> +static inline bool mmu_is_nested(struct kvm_vcpu *vcpu)
> +{
> + return vcpu->arch.walk_mmu == &vcpu->arch.nested_mmu;
> +}
> +
> static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *vcpu,
> struct kvm_mmu *mmu,
> gpa_t gpa, u64 access,
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index b7d3b54cde15..a0e68eaf1f80 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -6,6 +6,7 @@
> #include <asm/fpu/xstate.h>
> #include <asm/mce.h>
> #include <asm/pvclock.h>
> +#include "mmu.h"
> #include "regs.h"
> #include "kvm_emulate.h"
> #include "cpuid.h"
> @@ -210,11 +211,6 @@ static inline bool x86_exception_has_error_code(unsigned int vector)
> return (1U << vector) & exception_has_error_code;
> }
>
> -static inline bool mmu_is_nested(struct kvm_vcpu *vcpu)
> -{
> - return vcpu->arch.walk_mmu == &vcpu->arch.nested_mmu;
> -}
> -
> static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
> {
> return kvm_is_cr4_bit_set(vcpu, X86_CR4_LA57) ? 57 : 48;
> --
> 2.54.0.823.g6e5bcc1fc9-goog
>
^ permalink raw reply [flat|nested] 87+ messages in thread* Re: [PATCH v3 19/40] KVM: x86: Swap the include order between x86.h and mmu.h
2026-05-30 0:48 ` Yosry Ahmed
@ 2026-06-01 14:55 ` Sean Christopherson
2026-06-01 20:27 ` Yosry Ahmed
0 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-06-01 14:55 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Sat, May 30, 2026, Yosry Ahmed wrote:
> On Fri, May 29, 2026 at 03:22:02PM -0700, Sean Christopherson wrote:
> > Invert the include ordering between x86.h and mmu.h, so that x86.h is the
> > "top-level" include for KVM x86.
>
> You're also silently moving mmu_is_nested().
I'll explicitly call that out.
> Aside from that, I thought top-level include means that other headers
> will include it, and it will include fewer headers. Seems like this is
> doing the opposite?
Yeah, I'm probably using confusing terminology. I could quite figure out how
to concisely describe this. I like my pyramid visualation, so about:
Invert the include ordering between x86.h and mmu.h, and move
mmu_is_nested() to mmu.h where it belongs (mmu_is_nested()'s placement in
x86.h was solely responsible for the existing ordering), so that x86.h is
the top of KVM x86's "include pyramid".
^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v3 19/40] KVM: x86: Swap the include order between x86.h and mmu.h
2026-06-01 14:55 ` Sean Christopherson
@ 2026-06-01 20:27 ` Yosry Ahmed
2026-06-01 21:19 ` Sean Christopherson
0 siblings, 1 reply; 87+ messages in thread
From: Yosry Ahmed @ 2026-06-01 20:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Mon, Jun 01, 2026 at 07:55:28AM -0700, Sean Christopherson wrote:
> On Sat, May 30, 2026, Yosry Ahmed wrote:
> > On Fri, May 29, 2026 at 03:22:02PM -0700, Sean Christopherson wrote:
> > > Invert the include ordering between x86.h and mmu.h, so that x86.h is the
> > > "top-level" include for KVM x86.
> >
> > You're also silently moving mmu_is_nested().
>
> I'll explicitly call that out.
>
> > Aside from that, I thought top-level include means that other headers
> > will include it, and it will include fewer headers. Seems like this is
> > doing the opposite?
>
> Yeah, I'm probably using confusing terminology. I could quite figure out how
> to concisely describe this. I like my pyramid visualation, so about:
>
> Invert the include ordering between x86.h and mmu.h, and move
> mmu_is_nested() to mmu.h where it belongs (mmu_is_nested()'s placement in
> x86.h was solely responsible for the existing ordering), so that x86.h is
> the top of KVM x86's "include pyramid".
Not to be pedantic, but I still can't quite figure out this
analogy/visualization.
How about just spelling it out:
so that x86.h is not included by most headers (but includes them).
^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v3 19/40] KVM: x86: Swap the include order between x86.h and mmu.h
2026-06-01 20:27 ` Yosry Ahmed
@ 2026-06-01 21:19 ` Sean Christopherson
0 siblings, 0 replies; 87+ messages in thread
From: Sean Christopherson @ 2026-06-01 21:19 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Mon, Jun 01, 2026, Yosry Ahmed wrote:
> On Mon, Jun 01, 2026 at 07:55:28AM -0700, Sean Christopherson wrote:
> > On Sat, May 30, 2026, Yosry Ahmed wrote:
> > > On Fri, May 29, 2026 at 03:22:02PM -0700, Sean Christopherson wrote:
> > > > Invert the include ordering between x86.h and mmu.h, so that x86.h is the
> > > > "top-level" include for KVM x86.
> > >
> > > You're also silently moving mmu_is_nested().
> >
> > I'll explicitly call that out.
> >
> > > Aside from that, I thought top-level include means that other headers
> > > will include it, and it will include fewer headers. Seems like this is
> > > doing the opposite?
> >
> > Yeah, I'm probably using confusing terminology. I could quite figure out how
> > to concisely describe this. I like my pyramid visualation, so about:
> >
> > Invert the include ordering between x86.h and mmu.h, and move
> > mmu_is_nested() to mmu.h where it belongs (mmu_is_nested()'s placement in
> > x86.h was solely responsible for the existing ordering), so that x86.h is
> > the top of KVM x86's "include pyramid".
>
> Not to be pedantic, but I still can't quite figure out this
> analogy/visualization.
>
> How about just spelling it out:
>
> so that x86.h is not included by most headers (but includes them).
Works for me.
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 20/40] KVM: x86: Move tdp_enabled from kvm_host.h to mmu.h
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (18 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 19/40] KVM: x86: Swap the include order between x86.h and mmu.h Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 0:51 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 21/40] KVM: x86: Move eager_page_split to mmu.{c,h} Sean Christopherson
` (20 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Relocated the declaration of tdp_enabled into mmu.h, and opportunistically
hoist tdp_mmu_enabled up to the top so that the two are co-located.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 2 --
arch/x86/kvm/mmu.h | 12 ++++++------
2 files changed, 6 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f7130eb98473..19091d89d3cc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2188,8 +2188,6 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
-extern bool tdp_enabled;
-
/*
* EMULTYPE_NO_DECODE - Set when re-emulating an instruction (after completing
* userspace I/O) to indicate that the emulation context
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 28fca48dcf64..0eaea2d4fac9 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -6,6 +6,12 @@
#include "regs.h"
#include "cpuid.h"
+extern bool tdp_enabled;
+#ifdef CONFIG_X86_64
+extern bool tdp_mmu_enabled;
+#else
+#define tdp_mmu_enabled false
+#endif
extern bool __read_mostly enable_mmio_caching;
#define PT_WRITABLE_SHIFT 1
@@ -260,12 +266,6 @@ static inline bool kvm_shadow_root_allocated(struct kvm *kvm)
return smp_load_acquire(&kvm->arch.shadow_root_allocated);
}
-#ifdef CONFIG_X86_64
-extern bool tdp_mmu_enabled;
-#else
-#define tdp_mmu_enabled false
-#endif
-
int kvm_tdp_mmu_map_private_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn);
static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 20/40] KVM: x86: Move tdp_enabled from kvm_host.h to mmu.h
2026-05-29 22:22 ` [PATCH v3 20/40] KVM: x86: Move tdp_enabled from kvm_host.h to mmu.h Sean Christopherson
@ 2026-05-30 0:51 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 0:51 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:03PM -0700, Sean Christopherson wrote:
> Relocated the declaration of tdp_enabled into mmu.h, and opportunistically
> hoist tdp_mmu_enabled up to the top so that the two are co-located.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 21/40] KVM: x86: Move eager_page_split to mmu.{c,h}
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (19 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 20/40] KVM: x86: Move tdp_enabled from kvm_host.h to mmu.h Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 0:51 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 22/40] KVM: x86/hyperv: Eliminate an unnecessary include of x86.h in hyperv.h Sean Christopherson
` (19 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Move KVM's eager_page_split module param to the MMU, as it is very much an
MMU knob.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu.h | 1 +
arch/x86/kvm/mmu/mmu.c | 3 +++
arch/x86/kvm/x86.c | 3 ---
arch/x86/kvm/x86.h | 2 --
4 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 0eaea2d4fac9..d30676935fff 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -13,6 +13,7 @@ extern bool tdp_mmu_enabled;
#define tdp_mmu_enabled false
#endif
extern bool __read_mostly enable_mmio_caching;
+extern bool eager_page_split;
#define PT_WRITABLE_SHIFT 1
#define PT_USER_SHIFT 2
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b8f2edf2cfeb..e4d971d42f0e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -114,6 +114,9 @@ module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0444);
EXPORT_SYMBOL_FOR_KVM_INTERNAL(tdp_mmu_enabled);
#endif
+bool __read_mostly eager_page_split = true;
+module_param(eager_page_split, bool, 0644);
+
static int max_huge_page_level __read_mostly;
static int tdp_root_level __read_mostly;
static int max_tdp_level __read_mostly;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 20eeff79b46d..be421f467563 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -177,9 +177,6 @@ module_param(force_emulation_prefix, int, 0644);
int __read_mostly pi_inject_timer = -1;
module_param(pi_inject_timer, bint, 0644);
-bool __read_mostly eager_page_split = true;
-module_param(eager_page_split, bool, 0644);
-
/* Enable/disable SMT_RSB bug mitigation */
static bool __read_mostly mitigate_smt_rsb;
module_param(mitigate_smt_rsb, bool, 0444);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a0e68eaf1f80..635a21bfa681 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -415,8 +415,6 @@ extern int pi_inject_timer;
extern bool report_ignored_msrs;
-extern bool eager_page_split;
-
static inline void kvm_pr_unimpl_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
{
if (report_ignored_msrs)
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 21/40] KVM: x86: Move eager_page_split to mmu.{c,h}
2026-05-29 22:22 ` [PATCH v3 21/40] KVM: x86: Move eager_page_split to mmu.{c,h} Sean Christopherson
@ 2026-05-30 0:51 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 0:51 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:04PM -0700, Sean Christopherson wrote:
> Move KVM's eager_page_split module param to the MMU, as it is very much an
> MMU knob.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 22/40] KVM: x86/hyperv: Eliminate an unnecessary include of x86.h in hyperv.h
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (20 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 21/40] KVM: x86: Move eager_page_split to mmu.{c,h} Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-29 22:22 ` [PATCH v3 23/40] KVM: x86: Move kvm_{load,put}_guest_fpu() to fpu.h Sean Christopherson
` (18 subsequent siblings)
40 siblings, 0 replies; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Drop an mostly unused include of x86.h from hyperv.h, and instead pull in
regs.h, which is need for at least is_guest_mode(). This eliminates the
last include of x86.h from a common x86 header, i.e. solidifies that x86.h
is the top of the pyramid.
Add a missing x86.h include in cpuid.c to avoid build breakage.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/cpuid.c | 1 +
arch/x86/kvm/hyperv.h | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index fd3b02575cd0..db8be9173bd0 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -28,6 +28,7 @@
#include "trace.h"
#include "pmu.h"
#include "xen.h"
+#include "x86.h"
/*
* Unlike "struct cpuinfo_x86.x86_capability", kvm_cpu_caps doesn't need to be
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 65e89ed65349..1c8f7aaab063 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -22,7 +22,8 @@
#define __ARCH_X86_KVM_HYPERV_H__
#include <linux/kvm_host.h>
-#include "x86.h"
+
+#include "regs.h"
#ifdef CONFIG_KVM_HYPERV
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v3 23/40] KVM: x86: Move kvm_{load,put}_guest_fpu() to fpu.h
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (21 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 22/40] KVM: x86/hyperv: Eliminate an unnecessary include of x86.h in hyperv.h Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 0:52 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 24/40] KVM: x86: Extract get/set MSR (list) ioctl logic to helpers Sean Christopherson
` (17 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Move the kvm_{load,put}_guest_fpu() helpers to fpu.h in anticipation of
moving the bulk of KVM's register specific code out of x86.c.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/fpu.h | 26 ++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 24 ------------------------
2 files changed, 26 insertions(+), 24 deletions(-)
diff --git a/arch/x86/kvm/fpu.h b/arch/x86/kvm/fpu.h
index f898781b6a06..6b7b628f530d 100644
--- a/arch/x86/kvm/fpu.h
+++ b/arch/x86/kvm/fpu.h
@@ -3,8 +3,34 @@
#ifndef __KVM_FPU_H_
#define __KVM_FPU_H_
+#include <linux/kvm_host.h>
+
+#include <trace/events/kvm.h>
+
#include <asm/fpu/api.h>
+/* Swap (qemu) user FPU context for the guest FPU context. */
+static inline void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
+{
+ if (KVM_BUG_ON(vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm))
+ return;
+
+ /* Exclude PKRU, it's restored separately immediately after VM-Exit. */
+ fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, true);
+ trace_kvm_fpu(1);
+}
+
+/* When vcpu_run ends, restore user space FPU context. */
+static inline void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
+{
+ if (KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm))
+ return;
+
+ fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, false);
+ ++vcpu->stat.fpu_reload;
+ trace_kvm_fpu(0);
+}
+
typedef u32 __attribute__((vector_size(16))) sse128_t;
#define __sse128_u union { sse128_t vec; u64 as_u64[2]; u32 as_u32[4]; }
#define sse128_lo(x) ({ __sse128_u t; t.vec = x; t.as_u64[0]; })
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index be421f467563..56ccb6b77abb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -133,8 +133,6 @@ static void store_regs(struct kvm_vcpu *vcpu);
static int sync_regs(struct kvm_vcpu *vcpu);
static DEFINE_MUTEX(vendor_module_lock);
-static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
-static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
struct kvm_x86_ops kvm_x86_ops __read_mostly;
@@ -11432,28 +11430,6 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
return 0;
}
-/* Swap (qemu) user FPU context for the guest FPU context. */
-static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
-{
- if (KVM_BUG_ON(vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm))
- return;
-
- /* Exclude PKRU, it's restored separately immediately after VM-Exit. */
- fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, true);
- trace_kvm_fpu(1);
-}
-
-/* When vcpu_run ends, restore user space FPU context. */
-static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
-{
- if (KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm))
- return;
-
- fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, false);
- ++vcpu->stat.fpu_reload;
- trace_kvm_fpu(0);
-}
-
static int kvm_x86_vcpu_pre_run(struct kvm_vcpu *vcpu)
{
/*
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 23/40] KVM: x86: Move kvm_{load,put}_guest_fpu() to fpu.h
2026-05-29 22:22 ` [PATCH v3 23/40] KVM: x86: Move kvm_{load,put}_guest_fpu() to fpu.h Sean Christopherson
@ 2026-05-30 0:52 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 0:52 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:06PM -0700, Sean Christopherson wrote:
> Move the kvm_{load,put}_guest_fpu() helpers to fpu.h in anticipation of
> moving the bulk of KVM's register specific code out of x86.c.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
> ---
> arch/x86/kvm/fpu.h | 26 ++++++++++++++++++++++++++
> arch/x86/kvm/x86.c | 24 ------------------------
> 2 files changed, 26 insertions(+), 24 deletions(-)
>
> diff --git a/arch/x86/kvm/fpu.h b/arch/x86/kvm/fpu.h
> index f898781b6a06..6b7b628f530d 100644
> --- a/arch/x86/kvm/fpu.h
> +++ b/arch/x86/kvm/fpu.h
> @@ -3,8 +3,34 @@
> #ifndef __KVM_FPU_H_
> #define __KVM_FPU_H_
>
> +#include <linux/kvm_host.h>
> +
> +#include <trace/events/kvm.h>
> +
> #include <asm/fpu/api.h>
>
> +/* Swap (qemu) user FPU context for the guest FPU context. */
I didn't know KVM was allowed to break the fourth wall like this?
> +static inline void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
> +{
> + if (KVM_BUG_ON(vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm))
> + return;
> +
> + /* Exclude PKRU, it's restored separately immediately after VM-Exit. */
> + fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, true);
> + trace_kvm_fpu(1);
> +}
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 24/40] KVM: x86: Extract get/set MSR (list) ioctl logic to helpers
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (22 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 23/40] KVM: x86: Move kvm_{load,put}_guest_fpu() to fpu.h Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 0:55 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 25/40] KVM: x86: Expose several TSC helpers via x86.h for use by MSR code Sean Christopherson
` (16 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Extract the code for getting/setting MSRs and MSR lists to dedicated
helpers in anticipation of moving the MSR code to a new msrs.c.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 135 ++++++++++++++++++++++++++-------------------
1 file changed, 78 insertions(+), 57 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 56ccb6b77abb..64c3680d889b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4602,6 +4602,61 @@ static int kvm_x86_dev_has_attr(struct kvm_device_attr *attr)
return __kvm_x86_dev_get_attr(attr, &val);
}
+static int kvm_get_msr_index_list(struct kvm_msr_list __user *user_msr_list)
+{
+ struct kvm_msr_list msr_list;
+ unsigned int n;
+
+ if (copy_from_user(&msr_list, user_msr_list, sizeof(msr_list)))
+ return -EFAULT;
+
+ n = msr_list.nmsrs;
+ msr_list.nmsrs = num_msrs_to_save + num_emulated_msrs;
+ if (copy_to_user(user_msr_list, &msr_list, sizeof(msr_list)))
+ return -EFAULT;
+
+ if (n < msr_list.nmsrs)
+ return -E2BIG;
+
+ if (copy_to_user(user_msr_list->indices, &msrs_to_save,
+ num_msrs_to_save * sizeof(u32)))
+ return -EFAULT;
+
+ if (copy_to_user(user_msr_list->indices + num_msrs_to_save,
+ &emulated_msrs, num_emulated_msrs * sizeof(u32)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int kvm_get_feature_msr_index_list(struct kvm_msr_list __user *user_msr_list)
+{
+ struct kvm_msr_list msr_list;
+ unsigned int n;
+
+ if (copy_from_user(&msr_list, user_msr_list, sizeof(msr_list)))
+ return -EFAULT;
+
+ n = msr_list.nmsrs;
+ msr_list.nmsrs = num_msr_based_features;
+ if (copy_to_user(user_msr_list, &msr_list, sizeof(msr_list)))
+ return -EFAULT;
+
+ if (n < msr_list.nmsrs)
+ return -E2BIG;
+
+ if (copy_to_user(user_msr_list->indices, &msr_based_features,
+ num_msr_based_features * sizeof(u32)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int kvm_get_feature_msrs(struct kvm_msrs __user *user_msrs)
+{
+ return msr_io(NULL, user_msrs, do_get_feature_msr, 1);
+}
+
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -4609,32 +4664,9 @@ long kvm_arch_dev_ioctl(struct file *filp,
long r;
switch (ioctl) {
- case KVM_GET_MSR_INDEX_LIST: {
- struct kvm_msr_list __user *user_msr_list = argp;
- struct kvm_msr_list msr_list;
- unsigned n;
-
- r = -EFAULT;
- if (copy_from_user(&msr_list, user_msr_list, sizeof(msr_list)))
- goto out;
- n = msr_list.nmsrs;
- msr_list.nmsrs = num_msrs_to_save + num_emulated_msrs;
- if (copy_to_user(user_msr_list, &msr_list, sizeof(msr_list)))
- goto out;
- r = -E2BIG;
- if (n < msr_list.nmsrs)
- goto out;
- r = -EFAULT;
- if (copy_to_user(user_msr_list->indices, &msrs_to_save,
- num_msrs_to_save * sizeof(u32)))
- goto out;
- if (copy_to_user(user_msr_list->indices + num_msrs_to_save,
- &emulated_msrs,
- num_emulated_msrs * sizeof(u32)))
- goto out;
- r = 0;
+ case KVM_GET_MSR_INDEX_LIST:
+ r = kvm_get_msr_index_list(argp);
break;
- }
case KVM_GET_SUPPORTED_CPUID:
case KVM_GET_EMULATED_CPUID: {
struct kvm_cpuid2 __user *cpuid_arg = argp;
@@ -4662,30 +4694,11 @@ long kvm_arch_dev_ioctl(struct file *filp,
goto out;
r = 0;
break;
- case KVM_GET_MSR_FEATURE_INDEX_LIST: {
- struct kvm_msr_list __user *user_msr_list = argp;
- struct kvm_msr_list msr_list;
- unsigned int n;
-
- r = -EFAULT;
- if (copy_from_user(&msr_list, user_msr_list, sizeof(msr_list)))
- goto out;
- n = msr_list.nmsrs;
- msr_list.nmsrs = num_msr_based_features;
- if (copy_to_user(user_msr_list, &msr_list, sizeof(msr_list)))
- goto out;
- r = -E2BIG;
- if (n < msr_list.nmsrs)
- goto out;
- r = -EFAULT;
- if (copy_to_user(user_msr_list->indices, &msr_based_features,
- num_msr_based_features * sizeof(u32)))
- goto out;
- r = 0;
+ case KVM_GET_MSR_FEATURE_INDEX_LIST:
+ r = kvm_get_feature_msr_index_list(argp);
break;
- }
case KVM_GET_MSRS:
- r = msr_io(NULL, argp, do_get_feature_msr, 1);
+ r = kvm_get_feature_msrs(argp);
break;
#ifdef CONFIG_KVM_HYPERV
case KVM_GET_SUPPORTED_HV_CPUID:
@@ -5719,6 +5732,20 @@ static int kvm_get_reg_list(struct kvm_vcpu *vcpu,
return 0;
}
+static int kvm_get_msrs(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs)
+{
+ guard(srcu)(&vcpu->kvm->srcu);
+
+ return msr_io(vcpu, user_msrs, do_get_msr, 1);
+}
+
+static int kvm_set_msrs(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs)
+{
+ guard(srcu)(&vcpu->kvm->srcu);
+
+ return msr_io(vcpu, user_msrs, do_set_msr, 0);
+}
+
long kvm_arch_vcpu_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -5823,18 +5850,12 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = 0;
break;
}
- case KVM_GET_MSRS: {
- int idx = srcu_read_lock(&vcpu->kvm->srcu);
- r = msr_io(vcpu, argp, do_get_msr, 1);
- srcu_read_unlock(&vcpu->kvm->srcu, idx);
+ case KVM_GET_MSRS:
+ r = kvm_get_msrs(vcpu, argp);
break;
- }
- case KVM_SET_MSRS: {
- int idx = srcu_read_lock(&vcpu->kvm->srcu);
- r = msr_io(vcpu, argp, do_set_msr, 0);
- srcu_read_unlock(&vcpu->kvm->srcu, idx);
+ case KVM_SET_MSRS:
+ r = kvm_set_msrs(vcpu, argp);
break;
- }
case KVM_GET_ONE_REG:
case KVM_SET_ONE_REG:
r = kvm_get_set_one_reg(vcpu, ioctl, argp);
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 24/40] KVM: x86: Extract get/set MSR (list) ioctl logic to helpers
2026-05-29 22:22 ` [PATCH v3 24/40] KVM: x86: Extract get/set MSR (list) ioctl logic to helpers Sean Christopherson
@ 2026-05-30 0:55 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 0:55 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:07PM -0700, Sean Christopherson wrote:
> Extract the code for getting/setting MSRs and MSR lists to dedicated
> helpers in anticipation of moving the MSR code to a new msrs.c.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
This is nice. Aside from the spring cleaning, it makes parsing the
return values easier for KVM_GET_MSR_INDEX_LIST and KVM_GET_MSRS.
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 25/40] KVM: x86: Expose several TSC helpers via x86.h for use by MSR code
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (23 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 24/40] KVM: x86: Extract get/set MSR (list) ioctl logic to helpers Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-29 22:22 ` [PATCH v3 26/40] KVM: x86: Move the bulk of MSR specific code from x86.c to msrs.{c,h} Sean Christopherson
` (15 subsequent siblings)
40 siblings, 0 replies; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Begrudgingly move adjust_tsc_offset_{guest,host}() to x86.h as inlines,
and expose several other TSC helpers in anticipation of moving KVM's MSR
code to a dedicated msrs.c. Unfortunately for KVM, several MSRs that KVM
emulates can affect TSC state.
Opportunistically drop a superfluous local "tsc_offset" variable, whose
existence causes checkpatch to complain about lack of a blank line.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 22 +++-------------------
arch/x86/kvm/x86.h | 19 +++++++++++++++++++
2 files changed, 22 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 64c3680d889b..bf15c122f837 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2225,7 +2225,7 @@ u64 kvm_scale_tsc(u64 tsc, u64 ratio)
return _tsc;
}
-static u64 kvm_compute_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
+u64 kvm_compute_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
{
u64 tsc;
@@ -2266,7 +2266,7 @@ u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier)
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_calc_nested_tsc_multiplier);
-static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset)
+void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset)
{
if (vcpu->arch.guest_tsc_protected)
return;
@@ -2380,7 +2380,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
kvm_track_tsc_matching(vcpu, !matched);
}
-static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 *user_value)
+void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 *user_value)
{
u64 data = user_value ? *user_value : 0;
struct kvm *kvm = vcpu->kvm;
@@ -2448,22 +2448,6 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 *user_value)
raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
}
-static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
- s64 adjustment)
-{
- u64 tsc_offset = vcpu->arch.l1_tsc_offset;
- kvm_vcpu_write_tsc_offset(vcpu, tsc_offset + adjustment);
-}
-
-static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
-{
- if (vcpu->arch.l1_tsc_scaling_ratio != kvm_caps.default_tsc_scaling_ratio)
- WARN_ON(adjustment < 0);
- adjustment = kvm_scale_tsc((u64) adjustment,
- vcpu->arch.l1_tsc_scaling_ratio);
- adjust_tsc_offset_guest(vcpu, adjustment);
-}
-
#ifdef CONFIG_X86_64
static u64 read_tsc(void)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 635a21bfa681..31e67b060148 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -345,6 +345,25 @@ uint64_t kvm_get_wall_clock_epoch(struct kvm *kvm);
bool kvm_get_monotonic_and_clockread(s64 *kernel_ns, u64 *tsc_timestamp);
int kvm_guest_time_update(struct kvm_vcpu *v);
+void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 *user_value);
+u64 kvm_compute_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc);
+void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset);
+
+static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
+ s64 adjustment)
+{
+ kvm_vcpu_write_tsc_offset(vcpu, vcpu->arch.l1_tsc_offset + adjustment);
+}
+
+static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
+{
+ if (vcpu->arch.l1_tsc_scaling_ratio != kvm_caps.default_tsc_scaling_ratio)
+ WARN_ON(adjustment < 0);
+ adjustment = kvm_scale_tsc((u64) adjustment,
+ vcpu->arch.l1_tsc_scaling_ratio);
+ adjust_tsc_offset_guest(vcpu, adjustment);
+}
+
int kvm_read_guest_virt(struct kvm_vcpu *vcpu,
gva_t addr, void *val, unsigned int bytes,
struct x86_exception *exception);
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v3 26/40] KVM: x86: Move the bulk of MSR specific code from x86.c to msrs.{c,h}
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (24 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 25/40] KVM: x86: Expose several TSC helpers via x86.h for use by MSR code Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-29 22:22 ` [PATCH v3 27/40] KVM: x86: Move register helper declarations from kvm_host.h => regs.h Sean Christopherson
` (14 subsequent siblings)
40 siblings, 0 replies; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Introduce msrs.{c,h}, and move the vast majority of MSR specific code out
of x86.{c,h}. Use a plural "msrs" instead of just "msr" to be consistent
with regs.{c,h}, and to make it easier to differentiate KVM's code from the
other 5+ msr.c files in the kernel.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/Makefile | 2 +-
arch/x86/kvm/msrs.c | 2732 +++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/msrs.h | 128 ++
arch/x86/kvm/mtrr.c | 1 +
arch/x86/kvm/x86.c | 2711 +---------------------------------------
arch/x86/kvm/x86.h | 87 +-
6 files changed, 2867 insertions(+), 2794 deletions(-)
create mode 100644 arch/x86/kvm/msrs.c
create mode 100644 arch/x86/kvm/msrs.h
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f39c311fd756..0474604ab8a1 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -5,7 +5,7 @@ ccflags-$(CONFIG_KVM_WERROR) += -Werror
include $(srctree)/virt/kvm/Makefile.kvm
-kvm-y += x86.o emulate.o irq.o lapic.o cpuid.o pmu.o regs.o \
+kvm-y += x86.o emulate.o irq.o lapic.o cpuid.o msrs.o pmu.o regs.o \
mtrr.o debugfs.o mmu/mmu.o mmu/page_track.o mmu/spte.o
kvm-$(CONFIG_X86_64) += mmu/tdp_iter.o mmu/tdp_mmu.o
diff --git a/arch/x86/kvm/msrs.c b/arch/x86/kvm/msrs.c
new file mode 100644
index 000000000000..67ed0d36ed91
--- /dev/null
+++ b/arch/x86/kvm/msrs.c
@@ -0,0 +1,2732 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/kvm_host.h>
+#include <asm/intel_pt.h>
+#include <asm/vmx.h>
+
+#include "hyperv.h"
+#include "lapic.h"
+#include "msrs.h"
+#include "pmu.h"
+#include "trace.h"
+#include "vmx/vmx.h"
+#include "xen.h"
+#include "x86.h"
+
+bool __read_mostly ignore_msrs = 0;
+module_param(ignore_msrs, bool, 0644);
+
+bool __read_mostly report_ignored_msrs = true;
+module_param(report_ignored_msrs, bool, 0644);
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(report_ignored_msrs);
+
+/* EFER defaults:
+ * - enable syscall per default because its emulated by KVM
+ * - enable LME and LMA per default on 64 bit KVM
+ */
+#ifdef CONFIG_X86_64
+static
+u64 __read_mostly efer_reserved_bits = ~((u64)(EFER_SCE | EFER_LME | EFER_LMA));
+#else
+static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
+#endif
+
+#define MAX_IO_MSRS 256
+
+/*
+ * Restoring the host value for MSRs that are only consumed when running in
+ * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
+ * returns to userspace, i.e. the kernel can run with the guest's value.
+ */
+#define KVM_MAX_NR_USER_RETURN_MSRS 16
+
+struct kvm_user_return_msrs {
+ struct user_return_notifier urn;
+ bool registered;
+ struct kvm_user_return_msr_values {
+ u64 host;
+ u64 curr;
+ } values[KVM_MAX_NR_USER_RETURN_MSRS];
+};
+
+u32 __read_mostly kvm_nr_uret_msrs;
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_nr_uret_msrs);
+static u32 __read_mostly kvm_uret_msrs_list[KVM_MAX_NR_USER_RETURN_MSRS];
+static DEFINE_PER_CPU(struct kvm_user_return_msrs, user_return_msrs);
+
+void kvm_destroy_user_return_msrs(void)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu)
+ WARN_ON_ONCE(per_cpu(user_return_msrs, cpu).registered);
+
+ kvm_nr_uret_msrs = 0;
+}
+
+static void kvm_on_user_return(struct user_return_notifier *urn)
+{
+ unsigned slot;
+ struct kvm_user_return_msrs *msrs
+ = container_of(urn, struct kvm_user_return_msrs, urn);
+ struct kvm_user_return_msr_values *values;
+
+ msrs->registered = false;
+ user_return_notifier_unregister(urn);
+
+ for (slot = 0; slot < kvm_nr_uret_msrs; ++slot) {
+ values = &msrs->values[slot];
+ if (values->host != values->curr) {
+ wrmsrq(kvm_uret_msrs_list[slot], values->host);
+ values->curr = values->host;
+ }
+ }
+}
+
+static int kvm_probe_user_return_msr(u32 msr)
+{
+ u64 val;
+ int ret;
+
+ preempt_disable();
+ ret = rdmsrq_safe(msr, &val);
+ if (ret)
+ goto out;
+ ret = wrmsrq_safe(msr, val);
+out:
+ preempt_enable();
+ return ret;
+}
+
+int kvm_add_user_return_msr(u32 msr)
+{
+ BUG_ON(kvm_nr_uret_msrs >= KVM_MAX_NR_USER_RETURN_MSRS);
+
+ if (kvm_probe_user_return_msr(msr))
+ return -1;
+
+ kvm_uret_msrs_list[kvm_nr_uret_msrs] = msr;
+ return kvm_nr_uret_msrs++;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_add_user_return_msr);
+
+int kvm_find_user_return_msr(u32 msr)
+{
+ int i;
+
+ for (i = 0; i < kvm_nr_uret_msrs; ++i) {
+ if (kvm_uret_msrs_list[i] == msr)
+ return i;
+ }
+ return -1;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_find_user_return_msr);
+
+void kvm_user_return_msr_cpu_online(void)
+{
+ struct kvm_user_return_msrs *msrs = this_cpu_ptr(&user_return_msrs);
+ u64 value;
+ int i;
+
+ for (i = 0; i < kvm_nr_uret_msrs; ++i) {
+ rdmsrq_safe(kvm_uret_msrs_list[i], &value);
+ msrs->values[i].host = value;
+ msrs->values[i].curr = value;
+ }
+}
+
+static void kvm_user_return_register_notifier(struct kvm_user_return_msrs *msrs)
+{
+ if (!msrs->registered) {
+ msrs->urn.on_user_return = kvm_on_user_return;
+ user_return_notifier_register(&msrs->urn);
+ msrs->registered = true;
+ }
+}
+
+int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
+{
+ struct kvm_user_return_msrs *msrs = this_cpu_ptr(&user_return_msrs);
+ int err;
+
+ value = (value & mask) | (msrs->values[slot].host & ~mask);
+ if (value == msrs->values[slot].curr)
+ return 0;
+ err = wrmsrq_safe(kvm_uret_msrs_list[slot], value);
+ if (err)
+ return 1;
+
+ msrs->values[slot].curr = value;
+ kvm_user_return_register_notifier(msrs);
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_user_return_msr);
+
+u64 kvm_get_user_return_msr(unsigned int slot)
+{
+ return this_cpu_ptr(&user_return_msrs)->values[slot].curr;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_user_return_msr);
+
+void drop_user_return_notifiers(void)
+{
+ struct kvm_user_return_msrs *msrs = this_cpu_ptr(&user_return_msrs);
+
+ if (msrs->registered)
+ kvm_on_user_return(&msrs->urn);
+}
+
+/*
+ * The three MSR lists(msrs_to_save, emulated_msrs, msr_based_features) track
+ * the set of MSRs that KVM exposes to userspace through KVM_GET_MSRS,
+ * KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. msrs_to_save holds MSRs that
+ * require host support, i.e. should be probed via RDMSR. emulated_msrs holds
+ * MSRs that KVM emulates without strictly requiring host support.
+ * msr_based_features holds MSRs that enumerate features, i.e. are effectively
+ * CPUID leafs. Note, msr_based_features isn't mutually exclusive with
+ * msrs_to_save and emulated_msrs.
+ */
+
+static const u32 msrs_to_save_base[] = {
+ MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
+ MSR_STAR,
+#ifdef CONFIG_X86_64
+ MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
+#endif
+ MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
+ MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+ MSR_IA32_SPEC_CTRL, MSR_IA32_TSX_CTRL,
+ MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
+ MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
+ MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
+ MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
+ MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
+ MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
+ MSR_IA32_UMWAIT_CONTROL,
+
+ MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS,
+
+ MSR_IA32_U_CET, MSR_IA32_S_CET,
+ MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
+ MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB,
+ MSR_IA32_DEBUGCTLMSR,
+ MSR_IA32_LASTBRANCHFROMIP, MSR_IA32_LASTBRANCHTOIP,
+ MSR_IA32_LASTINTFROMIP, MSR_IA32_LASTINTTOIP,
+};
+
+static const u32 msrs_to_save_pmu[] = {
+ MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
+ MSR_ARCH_PERFMON_FIXED_CTR0 + 2,
+ MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
+ MSR_CORE_PERF_GLOBAL_CTRL,
+ MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
+
+ /* This part of MSRs should match KVM_MAX_NR_INTEL_GP_COUNTERS. */
+ MSR_ARCH_PERFMON_PERFCTR0, MSR_ARCH_PERFMON_PERFCTR1,
+ MSR_ARCH_PERFMON_PERFCTR0 + 2, MSR_ARCH_PERFMON_PERFCTR0 + 3,
+ MSR_ARCH_PERFMON_PERFCTR0 + 4, MSR_ARCH_PERFMON_PERFCTR0 + 5,
+ MSR_ARCH_PERFMON_PERFCTR0 + 6, MSR_ARCH_PERFMON_PERFCTR0 + 7,
+ MSR_ARCH_PERFMON_EVENTSEL0, MSR_ARCH_PERFMON_EVENTSEL1,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 2, MSR_ARCH_PERFMON_EVENTSEL0 + 3,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 4, MSR_ARCH_PERFMON_EVENTSEL0 + 5,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 6, MSR_ARCH_PERFMON_EVENTSEL0 + 7,
+
+ MSR_K7_EVNTSEL0, MSR_K7_EVNTSEL1, MSR_K7_EVNTSEL2, MSR_K7_EVNTSEL3,
+ MSR_K7_PERFCTR0, MSR_K7_PERFCTR1, MSR_K7_PERFCTR2, MSR_K7_PERFCTR3,
+
+ /* This part of MSRs should match KVM_MAX_NR_AMD_GP_COUNTERS. */
+ MSR_F15H_PERF_CTL0, MSR_F15H_PERF_CTL1, MSR_F15H_PERF_CTL2,
+ MSR_F15H_PERF_CTL3, MSR_F15H_PERF_CTL4, MSR_F15H_PERF_CTL5,
+ MSR_F15H_PERF_CTR0, MSR_F15H_PERF_CTR1, MSR_F15H_PERF_CTR2,
+ MSR_F15H_PERF_CTR3, MSR_F15H_PERF_CTR4, MSR_F15H_PERF_CTR5,
+
+ MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
+ MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
+ MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
+ MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET,
+};
+
+static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_base) +
+ ARRAY_SIZE(msrs_to_save_pmu)];
+static unsigned num_msrs_to_save;
+
+static const u32 emulated_msrs_all[] = {
+ MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
+ MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
+
+#ifdef CONFIG_KVM_HYPERV
+ HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
+ HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_REFERENCE_TSC,
+ HV_X64_MSR_TSC_FREQUENCY, HV_X64_MSR_APIC_FREQUENCY,
+ HV_X64_MSR_CRASH_P0, HV_X64_MSR_CRASH_P1, HV_X64_MSR_CRASH_P2,
+ HV_X64_MSR_CRASH_P3, HV_X64_MSR_CRASH_P4, HV_X64_MSR_CRASH_CTL,
+ HV_X64_MSR_RESET,
+ HV_X64_MSR_VP_INDEX,
+ HV_X64_MSR_VP_RUNTIME,
+ HV_X64_MSR_SCONTROL,
+ HV_X64_MSR_STIMER0_CONFIG,
+ HV_X64_MSR_VP_ASSIST_PAGE,
+ HV_X64_MSR_REENLIGHTENMENT_CONTROL, HV_X64_MSR_TSC_EMULATION_CONTROL,
+ HV_X64_MSR_TSC_EMULATION_STATUS, HV_X64_MSR_TSC_INVARIANT_CONTROL,
+ HV_X64_MSR_SYNDBG_OPTIONS,
+ HV_X64_MSR_SYNDBG_CONTROL, HV_X64_MSR_SYNDBG_STATUS,
+ HV_X64_MSR_SYNDBG_SEND_BUFFER, HV_X64_MSR_SYNDBG_RECV_BUFFER,
+ HV_X64_MSR_SYNDBG_PENDING_BUFFER,
+#endif
+
+ MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
+ MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF_INT, MSR_KVM_ASYNC_PF_ACK,
+
+ MSR_IA32_TSC_ADJUST,
+ MSR_IA32_TSC_DEADLINE,
+ MSR_IA32_ARCH_CAPABILITIES,
+ MSR_IA32_PERF_CAPABILITIES,
+ MSR_IA32_MISC_ENABLE,
+ MSR_IA32_MCG_STATUS,
+ MSR_IA32_MCG_CTL,
+ MSR_IA32_MCG_EXT_CTL,
+ MSR_IA32_SMBASE,
+ MSR_SMI_COUNT,
+ MSR_PLATFORM_INFO,
+ MSR_MISC_FEATURES_ENABLES,
+ MSR_AMD64_VIRT_SPEC_CTRL,
+ MSR_AMD64_TSC_RATIO,
+ MSR_IA32_POWER_CTL,
+ MSR_IA32_UCODE_REV,
+
+ /*
+ * KVM always supports the "true" VMX control MSRs, even if the host
+ * does not. The VMX MSRs as a whole are considered "emulated" as KVM
+ * doesn't strictly require them to exist in the host (ignoring that
+ * KVM would refuse to load in the first place if the core set of MSRs
+ * aren't supported).
+ */
+ MSR_IA32_VMX_BASIC,
+ MSR_IA32_VMX_TRUE_PINBASED_CTLS,
+ MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
+ MSR_IA32_VMX_TRUE_EXIT_CTLS,
+ MSR_IA32_VMX_TRUE_ENTRY_CTLS,
+ MSR_IA32_VMX_MISC,
+ MSR_IA32_VMX_CR0_FIXED0,
+ MSR_IA32_VMX_CR4_FIXED0,
+ MSR_IA32_VMX_VMCS_ENUM,
+ MSR_IA32_VMX_PROCBASED_CTLS2,
+ MSR_IA32_VMX_EPT_VPID_CAP,
+ MSR_IA32_VMX_VMFUNC,
+
+ MSR_K7_HWCR,
+ MSR_KVM_POLL_CONTROL,
+};
+
+static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
+static unsigned num_emulated_msrs;
+
+/*
+ * List of MSRs that control the existence of MSR-based features, i.e. MSRs
+ * that are effectively CPUID leafs. VMX MSRs are also included in the set of
+ * feature MSRs, but are handled separately to allow expedited lookups.
+ */
+static const u32 msr_based_features_all_except_vmx[] = {
+ MSR_AMD64_DE_CFG,
+ MSR_IA32_UCODE_REV,
+ MSR_IA32_ARCH_CAPABILITIES,
+ MSR_IA32_PERF_CAPABILITIES,
+ MSR_PLATFORM_INFO,
+};
+
+static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all_except_vmx) +
+ (KVM_LAST_EMULATED_VMX_MSR - KVM_FIRST_EMULATED_VMX_MSR + 1)];
+static unsigned int num_msr_based_features;
+
+int kvm_get_msr_index_list(struct kvm_msr_list __user *user_msr_list)
+{
+ struct kvm_msr_list msr_list;
+ unsigned int n;
+
+ if (copy_from_user(&msr_list, user_msr_list, sizeof(msr_list)))
+ return -EFAULT;
+
+ n = msr_list.nmsrs;
+ msr_list.nmsrs = num_msrs_to_save + num_emulated_msrs;
+ if (copy_to_user(user_msr_list, &msr_list, sizeof(msr_list)))
+ return -EFAULT;
+
+ if (n < msr_list.nmsrs)
+ return -E2BIG;
+
+ if (copy_to_user(user_msr_list->indices, &msrs_to_save,
+ num_msrs_to_save * sizeof(u32)))
+ return -EFAULT;
+
+ if (copy_to_user(user_msr_list->indices + num_msrs_to_save,
+ &emulated_msrs, num_emulated_msrs * sizeof(u32)))
+ return -EFAULT;
+
+ return 0;
+}
+
+int kvm_get_feature_msr_index_list(struct kvm_msr_list __user *user_msr_list)
+{
+ struct kvm_msr_list msr_list;
+ unsigned int n;
+
+ if (copy_from_user(&msr_list, user_msr_list, sizeof(msr_list)))
+ return -EFAULT;
+
+ n = msr_list.nmsrs;
+ msr_list.nmsrs = num_msr_based_features;
+ if (copy_to_user(user_msr_list, &msr_list, sizeof(msr_list)))
+ return -EFAULT;
+
+ if (n < msr_list.nmsrs)
+ return -E2BIG;
+
+ if (copy_to_user(user_msr_list->indices, &msr_based_features,
+ num_msr_based_features * sizeof(u32)))
+ return -EFAULT;
+
+ return 0;
+}
+
+/*
+ * All feature MSRs except uCode revID, which tracks the currently loaded uCode
+ * patch, are immutable once the vCPU model is defined.
+ */
+static bool kvm_is_immutable_feature_msr(u32 msr)
+{
+ int i;
+
+ if (msr >= KVM_FIRST_EMULATED_VMX_MSR && msr <= KVM_LAST_EMULATED_VMX_MSR)
+ return true;
+
+ for (i = 0; i < ARRAY_SIZE(msr_based_features_all_except_vmx); i++) {
+ if (msr == msr_based_features_all_except_vmx[i])
+ return msr != MSR_IA32_UCODE_REV;
+ }
+
+ return false;
+}
+
+static bool kvm_is_advertised_msr(u32 msr_index)
+{
+ unsigned int i;
+
+ for (i = 0; i < num_msrs_to_save; i++) {
+ if (msrs_to_save[i] == msr_index)
+ return true;
+ }
+
+ for (i = 0; i < num_emulated_msrs; i++) {
+ if (emulated_msrs[i] == msr_index)
+ return true;
+ }
+
+ return false;
+}
+
+
+/*
+ * Some IA32_ARCH_CAPABILITIES bits have dependencies on MSRs that KVM
+ * does not yet virtualize. These include:
+ * 10 - MISC_PACKAGE_CTRLS
+ * 11 - ENERGY_FILTERING_CTL
+ * 12 - DOITM
+ * 18 - FB_CLEAR_CTRL
+ * 21 - XAPIC_DISABLE_STATUS
+ * 23 - OVERCLOCKING_STATUS
+ */
+
+#define KVM_SUPPORTED_ARCH_CAP \
+ (ARCH_CAP_RDCL_NO | ARCH_CAP_IBRS_ALL | ARCH_CAP_RSBA | \
+ ARCH_CAP_SKIP_VMENTRY_L1DFLUSH | ARCH_CAP_SSB_NO | ARCH_CAP_MDS_NO | \
+ ARCH_CAP_PSCHANGE_MC_NO | ARCH_CAP_TSX_CTRL_MSR | ARCH_CAP_TAA_NO | \
+ ARCH_CAP_SBDR_SSDP_NO | ARCH_CAP_FBSDP_NO | ARCH_CAP_PSDP_NO | \
+ ARCH_CAP_FB_CLEAR | ARCH_CAP_RRSBA | ARCH_CAP_PBRSB_NO | ARCH_CAP_GDS_NO | \
+ ARCH_CAP_RFDS_NO | ARCH_CAP_RFDS_CLEAR | ARCH_CAP_BHI_NO | ARCH_CAP_ITS_NO)
+
+u64 kvm_get_arch_capabilities(void)
+{
+ u64 data = kvm_host.arch_capabilities & KVM_SUPPORTED_ARCH_CAP;
+
+ /*
+ * If nx_huge_pages is enabled, KVM's shadow paging will ensure that
+ * the nested hypervisor runs with NX huge pages. If it is not,
+ * L1 is anyway vulnerable to ITLB_MULTIHIT exploits from other
+ * L1 guests, so it need not worry about its own (L2) guests.
+ */
+ data |= ARCH_CAP_PSCHANGE_MC_NO;
+
+ /*
+ * If we're doing cache flushes (either "always" or "cond")
+ * we will do one whenever the guest does a vmlaunch/vmresume.
+ * If an outer hypervisor is doing the cache flush for us
+ * (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that
+ * capability to the guest too, and if EPT is disabled we're not
+ * vulnerable. Overall, only VMENTER_L1D_FLUSH_NEVER will
+ * require a nested hypervisor to do a flush of its own.
+ */
+ if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER)
+ data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
+
+ if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
+ data |= ARCH_CAP_RDCL_NO;
+ if (!boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS))
+ data |= ARCH_CAP_SSB_NO;
+ if (!boot_cpu_has_bug(X86_BUG_MDS))
+ data |= ARCH_CAP_MDS_NO;
+ if (!boot_cpu_has_bug(X86_BUG_RFDS))
+ data |= ARCH_CAP_RFDS_NO;
+ if (!boot_cpu_has_bug(X86_BUG_ITS))
+ data |= ARCH_CAP_ITS_NO;
+
+ if (!boot_cpu_has(X86_FEATURE_RTM)) {
+ /*
+ * If RTM=0 because the kernel has disabled TSX, the host might
+ * have TAA_NO or TSX_CTRL. Clear TAA_NO (the guest sees RTM=0
+ * and therefore knows that there cannot be TAA) but keep
+ * TSX_CTRL: some buggy userspaces leave it set on tsx=on hosts,
+ * and we want to allow migrating those guests to tsx=off hosts.
+ */
+ data &= ~ARCH_CAP_TAA_NO;
+ } else if (!boot_cpu_has_bug(X86_BUG_TAA)) {
+ data |= ARCH_CAP_TAA_NO;
+ } else {
+ /*
+ * Nothing to do here; we emulate TSX_CTRL if present on the
+ * host so the guest can choose between disabling TSX or
+ * using VERW to clear CPU buffers.
+ */
+ }
+
+ if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
+ data |= ARCH_CAP_GDS_NO;
+
+ return data;
+}
+
+static int kvm_get_feature_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
+ bool host_initiated)
+{
+ WARN_ON_ONCE(!host_initiated);
+
+ switch (index) {
+ case MSR_IA32_ARCH_CAPABILITIES:
+ *data = kvm_get_arch_capabilities();
+ break;
+ case MSR_IA32_PERF_CAPABILITIES:
+ *data = kvm_caps.supported_perf_cap;
+ break;
+ case MSR_PLATFORM_INFO:
+ *data = MSR_PLATFORM_INFO_CPUID_FAULT;
+ break;
+ case MSR_IA32_UCODE_REV:
+ rdmsrq_safe(index, data);
+ break;
+ default:
+ return kvm_x86_call(get_feature_msr)(index, data);
+ }
+ return 0;
+}
+
+typedef int (*msr_access_t)(struct kvm_vcpu *vcpu, u32 index, u64 *data,
+ bool host_initiated);
+
+static __always_inline int kvm_do_msr_access(struct kvm_vcpu *vcpu, u32 msr,
+ u64 *data, bool host_initiated,
+ enum kvm_msr_access rw,
+ msr_access_t msr_access_fn)
+{
+ const char *op = rw == MSR_TYPE_W ? "wrmsr" : "rdmsr";
+ int ret;
+
+ BUILD_BUG_ON(rw != MSR_TYPE_R && rw != MSR_TYPE_W);
+
+ /*
+ * Zero the data on read failures to avoid leaking stack data to the
+ * guest and/or userspace, e.g. if the failure is ignored below.
+ */
+ ret = msr_access_fn(vcpu, msr, data, host_initiated);
+ if (ret && rw == MSR_TYPE_R)
+ *data = 0;
+
+ if (ret != KVM_MSR_RET_UNSUPPORTED)
+ return ret;
+
+ /*
+ * Userspace is allowed to read MSRs, and write '0' to MSRs, that KVM
+ * advertises to userspace, even if an MSR isn't fully supported.
+ * Simply check that @data is '0', which covers both the write '0' case
+ * and all reads (in which case @data is zeroed on failure; see above).
+ */
+ if (host_initiated && !*data && kvm_is_advertised_msr(msr))
+ return 0;
+
+ if (!ignore_msrs) {
+ kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n",
+ op, msr, *data);
+ return ret;
+ }
+
+ if (report_ignored_msrs)
+ kvm_pr_unimpl("ignored %s: 0x%x data 0x%llx\n", op, msr, *data);
+
+ return 0;
+}
+
+static int do_get_feature_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
+{
+ return kvm_do_msr_access(vcpu, index, data, true, MSR_TYPE_R,
+ kvm_get_feature_msr);
+}
+
+static bool __kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
+{
+ if (efer & EFER_AUTOIBRS && !guest_cpu_cap_has(vcpu, X86_FEATURE_AUTOIBRS))
+ return false;
+
+ if (efer & EFER_FFXSR && !guest_cpu_cap_has(vcpu, X86_FEATURE_FXSR_OPT))
+ return false;
+
+ if (efer & EFER_SVME && !guest_cpu_cap_has(vcpu, X86_FEATURE_SVM))
+ return false;
+
+ if (efer & (EFER_LME | EFER_LMA) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
+ return false;
+
+ if (efer & EFER_NX && !guest_cpu_cap_has(vcpu, X86_FEATURE_NX))
+ return false;
+
+ return true;
+
+}
+bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
+{
+ if (efer & efer_reserved_bits)
+ return false;
+
+ return __kvm_valid_efer(vcpu, efer);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_valid_efer);
+
+static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+ u64 old_efer = vcpu->arch.efer;
+ u64 efer = msr_info->data;
+ int r;
+
+ if (efer & efer_reserved_bits)
+ return 1;
+
+ if (!msr_info->host_initiated) {
+ if (!__kvm_valid_efer(vcpu, efer))
+ return 1;
+
+ if (is_paging(vcpu) &&
+ (vcpu->arch.efer & EFER_LME) != (efer & EFER_LME))
+ return 1;
+ }
+
+ efer &= ~EFER_LMA;
+ efer |= vcpu->arch.efer & EFER_LMA;
+
+ r = kvm_x86_call(set_efer)(vcpu, efer);
+ if (r) {
+ WARN_ON(r > 0);
+ return r;
+ }
+
+ if ((efer ^ old_efer) & KVM_MMU_EFER_ROLE_BITS)
+ kvm_mmu_reset_context(vcpu);
+
+ if (!static_cpu_has(X86_FEATURE_XSAVES) &&
+ (efer & EFER_SVME))
+ kvm_hv_xsaves_xsavec_maybe_warn(vcpu);
+
+ return 0;
+}
+
+void kvm_enable_efer_bits(u64 mask)
+{
+ efer_reserved_bits &= ~mask;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_enable_efer_bits);
+
+bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
+{
+ struct kvm_x86_msr_filter *msr_filter;
+ struct msr_bitmap_range *ranges;
+ struct kvm *kvm = vcpu->kvm;
+ bool allowed;
+ int idx;
+ u32 i;
+
+ /* x2APIC MSRs do not support filtering. */
+ if (index >= 0x800 && index <= 0x8ff)
+ return true;
+
+ idx = srcu_read_lock(&kvm->srcu);
+
+ msr_filter = srcu_dereference(kvm->arch.msr_filter, &kvm->srcu);
+ if (!msr_filter) {
+ allowed = true;
+ goto out;
+ }
+
+ allowed = msr_filter->default_allow;
+ ranges = msr_filter->ranges;
+
+ for (i = 0; i < msr_filter->count; i++) {
+ u32 start = ranges[i].base;
+ u32 end = start + ranges[i].nmsrs;
+ u32 flags = ranges[i].flags;
+ unsigned long *bitmap = ranges[i].bitmap;
+
+ if ((index >= start) && (index < end) && (flags & type)) {
+ allowed = test_bit(index - start, bitmap);
+ break;
+ }
+ }
+
+out:
+ srcu_read_unlock(&kvm->srcu, idx);
+
+ return allowed;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_msr_allowed);
+
+/*
+ * Write @data into the MSR specified by @index. Select MSR specific fault
+ * checks are bypassed if @host_initiated is %true.
+ * Returns 0 on success, non-0 otherwise.
+ * Assumes vcpu_load() was already called.
+ */
+static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
+ bool host_initiated)
+{
+ struct msr_data msr;
+
+ switch (index) {
+ case MSR_FS_BASE:
+ case MSR_GS_BASE:
+ case MSR_KERNEL_GS_BASE:
+ case MSR_CSTAR:
+ case MSR_LSTAR:
+ if (is_noncanonical_msr_address(data, vcpu))
+ return 1;
+ break;
+ case MSR_IA32_SYSENTER_EIP:
+ case MSR_IA32_SYSENTER_ESP:
+ /*
+ * IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
+ * non-canonical address is written on Intel but not on
+ * AMD (which ignores the top 32-bits, because it does
+ * not implement 64-bit SYSENTER).
+ *
+ * 64-bit code should hence be able to write a non-canonical
+ * value on AMD. Making the address canonical ensures that
+ * vmentry does not fail on Intel after writing a non-canonical
+ * value, and that something deterministic happens if the guest
+ * invokes 64-bit SYSENTER.
+ */
+ data = __canonical_address(data, max_host_virt_addr_bits());
+ break;
+ case MSR_TSC_AUX:
+ if (!kvm_is_supported_user_return_msr(MSR_TSC_AUX))
+ return 1;
+
+ if (!host_initiated &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
+ return 1;
+
+ /*
+ * Per Intel's SDM, bits 63:32 are reserved, but AMD's APM has
+ * incomplete and conflicting architectural behavior. Current
+ * AMD CPUs completely ignore bits 63:32, i.e. they aren't
+ * reserved and always read as zeros. Enforce Intel's reserved
+ * bits check if the guest CPU is Intel compatible, otherwise
+ * clear the bits. This ensures cross-vendor migration will
+ * provide consistent behavior for the guest.
+ */
+ if (guest_cpuid_is_intel_compatible(vcpu) && (data >> 32) != 0)
+ return 1;
+
+ data = (u32)data;
+ break;
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
+ return KVM_MSR_RET_UNSUPPORTED;
+ if (!kvm_is_valid_u_s_cet(vcpu, data))
+ return 1;
+ break;
+ case MSR_KVM_INTERNAL_GUEST_SSP:
+ if (!host_initiated)
+ return 1;
+ fallthrough;
+ /*
+ * Note that the MSR emulation here is flawed when a vCPU
+ * doesn't support the Intel 64 architecture. The expected
+ * architectural behavior in this case is that the upper 32
+ * bits do not exist and should always read '0'. However,
+ * because the actual hardware on which the virtual CPU is
+ * running does support Intel 64, XRSTORS/XSAVES in the
+ * guest could observe behavior that violates the
+ * architecture. Intercepting XRSTORS/XSAVES for this
+ * special case isn't deemed worthwhile.
+ */
+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+ return KVM_MSR_RET_UNSUPPORTED;
+ /*
+ * MSR_IA32_INT_SSP_TAB is not present on processors that do
+ * not support Intel 64 architecture.
+ */
+ if (index == MSR_IA32_INT_SSP_TAB && !guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
+ return KVM_MSR_RET_UNSUPPORTED;
+ if (is_noncanonical_msr_address(data, vcpu))
+ return 1;
+ /* All SSP MSRs except MSR_IA32_INT_SSP_TAB must be 4-byte aligned */
+ if (index != MSR_IA32_INT_SSP_TAB && !IS_ALIGNED(data, 4))
+ return 1;
+ break;
+ }
+
+ msr.data = data;
+ msr.index = index;
+ msr.host_initiated = host_initiated;
+
+ return kvm_x86_call(set_msr)(vcpu, &msr);
+}
+
+static int _kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
+ bool host_initiated)
+{
+ return __kvm_set_msr(vcpu, index, *data, host_initiated);
+}
+
+static int kvm_set_msr_ignored_check(struct kvm_vcpu *vcpu,
+ u32 index, u64 data, bool host_initiated)
+{
+ return kvm_do_msr_access(vcpu, index, &data, host_initiated, MSR_TYPE_W,
+ _kvm_set_msr);
+}
+
+/*
+ * Read the MSR specified by @index into @data. Select MSR specific fault
+ * checks are bypassed if @host_initiated is %true.
+ * Returns 0 on success, non-0 otherwise.
+ * Assumes vcpu_load() was already called.
+ */
+static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
+ bool host_initiated)
+{
+ struct msr_data msr;
+ int ret;
+
+ switch (index) {
+ case MSR_TSC_AUX:
+ if (!kvm_is_supported_user_return_msr(MSR_TSC_AUX))
+ return 1;
+
+ if (!host_initiated &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
+ return 1;
+ break;
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
+ return KVM_MSR_RET_UNSUPPORTED;
+ break;
+ case MSR_KVM_INTERNAL_GUEST_SSP:
+ if (!host_initiated)
+ return 1;
+ fallthrough;
+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+ return KVM_MSR_RET_UNSUPPORTED;
+ break;
+ }
+
+ msr.index = index;
+ msr.host_initiated = host_initiated;
+
+ ret = kvm_x86_call(get_msr)(vcpu, &msr);
+ if (!ret)
+ *data = msr.data;
+ return ret;
+}
+
+static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu,
+ u32 index, u64 *data, bool host_initiated)
+{
+ return kvm_do_msr_access(vcpu, index, data, host_initiated, MSR_TYPE_R,
+ __kvm_get_msr);
+}
+
+int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data)
+{
+ return __kvm_set_msr(vcpu, index, data, true);
+}
+
+int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
+{
+ return __kvm_get_msr(vcpu, index, data, true);
+}
+
+int __kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
+{
+ return kvm_get_msr_ignored_check(vcpu, index, data, false);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_emulate_msr_read);
+
+int __kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data)
+{
+ return kvm_set_msr_ignored_check(vcpu, index, data, false);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_emulate_msr_write);
+
+int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
+{
+ if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ))
+ return KVM_MSR_RET_FILTERED;
+
+ return __kvm_emulate_msr_read(vcpu, index, data);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_msr_read);
+
+int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data)
+{
+ if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE))
+ return KVM_MSR_RET_FILTERED;
+
+ return __kvm_emulate_msr_write(vcpu, index, data);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_msr_write);
+
+static fastpath_t __handle_fastpath_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
+{
+ if (!kvm_pmu_is_fastpath_emulation_allowed(vcpu))
+ return EXIT_FASTPATH_NONE;
+
+ switch (msr) {
+ case APIC_BASE_MSR + (APIC_ICR >> 4):
+ if (!lapic_in_kernel(vcpu) || !apic_x2apic_mode(vcpu->arch.apic) ||
+ kvm_x2apic_icr_write_fast(vcpu->arch.apic, data))
+ return EXIT_FASTPATH_NONE;
+ break;
+ case MSR_IA32_TSC_DEADLINE:
+ kvm_set_lapic_tscdeadline_msr(vcpu, data);
+ break;
+ default:
+ return EXIT_FASTPATH_NONE;
+ }
+
+ trace_kvm_msr_write(msr, data);
+
+ if (!kvm_skip_emulated_instruction(vcpu))
+ return EXIT_FASTPATH_EXIT_USERSPACE;
+
+ return EXIT_FASTPATH_REENTER_GUEST;
+}
+
+fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu)
+{
+ return __handle_fastpath_wrmsr(vcpu, kvm_ecx_read(vcpu),
+ kvm_read_edx_eax(vcpu));
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_wrmsr);
+
+fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg)
+{
+ return __handle_fastpath_wrmsr(vcpu, msr, kvm_register_read(vcpu, reg));
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_wrmsr_imm);
+
+static void complete_userspace_rdmsr(struct kvm_vcpu *vcpu)
+{
+ if (!vcpu->run->msr.error) {
+ kvm_eax_write(vcpu, vcpu->run->msr.data);
+ kvm_edx_write(vcpu, vcpu->run->msr.data >> 32);
+ }
+}
+
+static int complete_emulated_insn_gp(struct kvm_vcpu *vcpu, int err)
+{
+ if (err) {
+ kvm_inject_gp(vcpu, 0);
+ return 1;
+ }
+
+ return kvm_emulate_instruction(vcpu, EMULTYPE_NO_DECODE | EMULTYPE_SKIP |
+ EMULTYPE_COMPLETE_USER_EXIT);
+}
+
+static int complete_emulated_msr_access(struct kvm_vcpu *vcpu)
+{
+ return complete_emulated_insn_gp(vcpu, vcpu->run->msr.error);
+}
+
+static int complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
+{
+ complete_userspace_rdmsr(vcpu);
+ return complete_emulated_msr_access(vcpu);
+}
+
+static int complete_fast_msr_access(struct kvm_vcpu *vcpu)
+{
+ return kvm_x86_call(complete_emulated_msr)(vcpu, vcpu->run->msr.error);
+}
+
+static int complete_fast_rdmsr(struct kvm_vcpu *vcpu)
+{
+ complete_userspace_rdmsr(vcpu);
+ return complete_fast_msr_access(vcpu);
+}
+
+static int complete_fast_rdmsr_imm(struct kvm_vcpu *vcpu)
+{
+ if (!vcpu->run->msr.error)
+ kvm_register_write(vcpu, vcpu->arch.cui_rdmsr_imm_reg,
+ vcpu->run->msr.data);
+
+ return complete_fast_msr_access(vcpu);
+}
+
+static u64 kvm_msr_reason(int r)
+{
+ switch (r) {
+ case KVM_MSR_RET_UNSUPPORTED:
+ return KVM_MSR_EXIT_REASON_UNKNOWN;
+ case KVM_MSR_RET_FILTERED:
+ return KVM_MSR_EXIT_REASON_FILTER;
+ default:
+ return KVM_MSR_EXIT_REASON_INVAL;
+ }
+}
+
+static int kvm_msr_user_space(struct kvm_vcpu *vcpu, u32 index,
+ u32 exit_reason, u64 data,
+ int (*completion)(struct kvm_vcpu *vcpu),
+ int r)
+{
+ u64 msr_reason = kvm_msr_reason(r);
+
+ /* Check if the user wanted to know about this MSR fault */
+ if (!(vcpu->kvm->arch.user_space_msr_mask & msr_reason))
+ return 0;
+
+ vcpu->run->exit_reason = exit_reason;
+ vcpu->run->msr.error = 0;
+ memset(vcpu->run->msr.pad, 0, sizeof(vcpu->run->msr.pad));
+ vcpu->run->msr.reason = msr_reason;
+ vcpu->run->msr.index = index;
+ vcpu->run->msr.data = data;
+ vcpu->arch.complete_userspace_io = completion;
+
+ return 1;
+}
+
+static int __kvm_emulate_rdmsr(struct kvm_vcpu *vcpu, u32 msr, int reg,
+ int (*complete_rdmsr)(struct kvm_vcpu *))
+{
+ u64 data;
+ int r;
+
+ r = kvm_emulate_msr_read(vcpu, msr, &data);
+
+ if (!r) {
+ trace_kvm_msr_read(msr, data);
+
+ if (reg < 0) {
+ kvm_eax_write(vcpu, data);
+ kvm_edx_write(vcpu, data >> 32);
+ } else {
+ kvm_register_write(vcpu, reg, data);
+ }
+ } else {
+ /* MSR read failed? See if we should ask user space */
+ if (kvm_msr_user_space(vcpu, msr, KVM_EXIT_X86_RDMSR, 0,
+ complete_rdmsr, r))
+ return 0;
+ trace_kvm_msr_read_ex(msr);
+ }
+
+ return kvm_x86_call(complete_emulated_msr)(vcpu, r);
+}
+
+int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
+{
+ return __kvm_emulate_rdmsr(vcpu, kvm_ecx_read(vcpu), -1,
+ complete_fast_rdmsr);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdmsr);
+
+int kvm_emulate_rdmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg)
+{
+ vcpu->arch.cui_rdmsr_imm_reg = reg;
+
+ return __kvm_emulate_rdmsr(vcpu, msr, reg, complete_fast_rdmsr_imm);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdmsr_imm);
+
+static int __kvm_emulate_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
+{
+ int r;
+
+ r = kvm_emulate_msr_write(vcpu, msr, data);
+ if (!r) {
+ trace_kvm_msr_write(msr, data);
+ } else {
+ /* MSR write failed? See if we should ask user space */
+ if (kvm_msr_user_space(vcpu, msr, KVM_EXIT_X86_WRMSR, data,
+ complete_fast_msr_access, r))
+ return 0;
+ /* Signal all other negative errors to userspace */
+ if (r < 0)
+ return r;
+ trace_kvm_msr_write_ex(msr, data);
+ }
+
+ return kvm_x86_call(complete_emulated_msr)(vcpu, r);
+}
+
+int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
+{
+ return __kvm_emulate_wrmsr(vcpu, kvm_ecx_read(vcpu),
+ kvm_read_edx_eax(vcpu));
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_wrmsr);
+
+int kvm_emulate_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg)
+{
+ return __kvm_emulate_wrmsr(vcpu, msr, kvm_register_read(vcpu, reg));
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_wrmsr_imm);
+
+int kvm_emulator_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 msr_index,
+ u64 *pdata)
+{
+ int r;
+
+ r = kvm_emulate_msr_read(vcpu, msr_index, pdata);
+ if (r < 0)
+ return X86EMUL_UNHANDLEABLE;
+
+ if (r) {
+ if (kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_RDMSR, 0,
+ complete_emulated_rdmsr, r))
+ return X86EMUL_IO_NEEDED;
+
+ trace_kvm_msr_read_ex(msr_index);
+ return X86EMUL_PROPAGATE_FAULT;
+ }
+
+ trace_kvm_msr_read(msr_index, *pdata);
+ return X86EMUL_CONTINUE;
+}
+
+int kvm_emulator_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 msr_index,
+ u64 data)
+{
+ int r;
+
+ r = kvm_emulate_msr_write(vcpu, msr_index, data);
+ if (r < 0)
+ return X86EMUL_UNHANDLEABLE;
+
+ if (r) {
+ if (kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_WRMSR, data,
+ complete_emulated_msr_access, r))
+ return X86EMUL_IO_NEEDED;
+
+ trace_kvm_msr_write_ex(msr_index, data);
+ return X86EMUL_PROPAGATE_FAULT;
+ }
+
+ trace_kvm_msr_write(msr_index, data);
+ return X86EMUL_CONTINUE;
+}
+
+int kvm_emulator_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
+{
+ /*
+ * Treat emulator accesses to the current shadow stack pointer as host-
+ * initiated, as they aren't true MSR accesses (SSP is a "just a reg"),
+ * and this API is used only for implicit accesses, i.e. not RDMSR, and
+ * so the index is fully KVM-controlled.
+ */
+ if (unlikely(msr_index == MSR_KVM_INTERNAL_GUEST_SSP))
+ return kvm_msr_read(vcpu, msr_index, pdata);
+
+ return __kvm_emulate_msr_read(vcpu, msr_index, pdata);
+}
+
+/*
+ * Returns true if the MSR in question is managed via XSTATE, i.e. is context
+ * switched with the rest of guest FPU state.
+ *
+ * Note, S_CET is _not_ saved/restored via XSAVES/XRSTORS.
+ */
+static bool is_xstate_managed_msr(struct kvm_vcpu *vcpu, u32 msr)
+{
+ if (!vcpu)
+ return false;
+
+ switch (msr) {
+ case MSR_IA32_U_CET:
+ return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ||
+ guest_cpu_cap_has(vcpu, X86_FEATURE_IBT);
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
+ default:
+ return false;
+ }
+}
+
+/*
+ * Lock (and if necessary, re-load) the guest FPU, i.e. XSTATE, and access an
+ * MSR that is managed via XSTATE. Note, the caller is responsible for doing
+ * the initial FPU load, this helper only ensures that guest state is resident
+ * in hardware (the kernel can load its FPU state in IRQ context).
+ *
+ * Note, loading guest values for U_CET and PL[0-3]_SSP while executing in the
+ * kernel is safe, as U_CET is specific to userspace, and PL[0-3]_SSP are only
+ * consumed when transitioning to lower privilege levels, i.e. are effectively
+ * only consumed by userspace as well.
+ */
+static __always_inline void kvm_access_xstate_msr(struct kvm_vcpu *vcpu,
+ struct msr_data *msr_info,
+ int access)
+{
+ BUILD_BUG_ON(access != MSR_TYPE_R && access != MSR_TYPE_W);
+
+ KVM_BUG_ON(!is_xstate_managed_msr(vcpu, msr_info->index), vcpu->kvm);
+ KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
+
+ kvm_fpu_get();
+ if (access == MSR_TYPE_R)
+ rdmsrq(msr_info->index, msr_info->data);
+ else
+ wrmsrq(msr_info->index, msr_info->data);
+ kvm_fpu_put();
+}
+
+static void kvm_set_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+ kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_W);
+}
+
+static void kvm_get_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+ kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_R);
+}
+
+static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int sec_hi_ofs)
+{
+ int version;
+ int r;
+ struct pvclock_wall_clock wc;
+ u32 wc_sec_hi;
+ u64 wall_nsec;
+
+ if (!wall_clock)
+ return;
+
+ r = kvm_read_guest(kvm, wall_clock, &version, sizeof(version));
+ if (r)
+ return;
+
+ if (version & 1)
+ ++version; /* first time write, random junk */
+
+ ++version;
+
+ if (kvm_write_guest(kvm, wall_clock, &version, sizeof(version)))
+ return;
+
+ wall_nsec = kvm_get_wall_clock_epoch(kvm);
+
+ wc.nsec = do_div(wall_nsec, NSEC_PER_SEC);
+ wc.sec = (u32)wall_nsec; /* overflow in 2106 guest time */
+ wc.version = version;
+
+ kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc));
+
+ if (sec_hi_ofs) {
+ wc_sec_hi = wall_nsec >> 32;
+ kvm_write_guest(kvm, wall_clock + sec_hi_ofs,
+ &wc_sec_hi, sizeof(wc_sec_hi));
+ }
+
+ version++;
+ kvm_write_guest(kvm, wall_clock, &version, sizeof(version));
+}
+
+static void kvm_write_system_time(struct kvm_vcpu *vcpu, gpa_t system_time,
+ bool old_msr, bool host_initiated)
+{
+ struct kvm_arch *ka = &vcpu->kvm->arch;
+
+ if (vcpu->vcpu_id == 0 && !host_initiated) {
+ if (ka->boot_vcpu_runs_old_kvmclock != old_msr)
+ kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
+
+ ka->boot_vcpu_runs_old_kvmclock = old_msr;
+ }
+
+ vcpu->arch.time = system_time;
+ kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
+
+ /* we verify if the enable bit is set... */
+ if (system_time & 1)
+ kvm_gpc_activate(&vcpu->arch.pv_time, system_time & ~1ULL,
+ sizeof(struct pvclock_vcpu_time_info));
+ else
+ kvm_gpc_deactivate(&vcpu->arch.pv_time);
+
+ return;
+}
+
+/* These helpers are safe iff @msr is known to be an MCx bank MSR. */
+static bool is_mci_control_msr(u32 msr)
+{
+ return (msr & 3) == 0;
+}
+static bool is_mci_status_msr(u32 msr)
+{
+ return (msr & 3) == 1;
+}
+
+/*
+ * On AMD, HWCR[McStatusWrEn] controls whether setting MCi_STATUS results in #GP.
+ */
+static bool can_set_mci_status(struct kvm_vcpu *vcpu)
+{
+ /* McStatusWrEn enabled? */
+ if (guest_cpuid_is_amd_compatible(vcpu))
+ return !!(vcpu->arch.msr_hwcr & BIT_ULL(18));
+
+ return false;
+}
+
+static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+ u64 mcg_cap = vcpu->arch.mcg_cap;
+ unsigned bank_num = mcg_cap & 0xff;
+ u32 msr = msr_info->index;
+ u64 data = msr_info->data;
+ u32 offset, last_msr;
+
+ switch (msr) {
+ case MSR_IA32_MCG_STATUS:
+ vcpu->arch.mcg_status = data;
+ break;
+ case MSR_IA32_MCG_CTL:
+ if (!(mcg_cap & MCG_CTL_P) &&
+ (data || !msr_info->host_initiated))
+ return 1;
+ if (data != 0 && data != ~(u64)0)
+ return 1;
+ vcpu->arch.mcg_ctl = data;
+ break;
+ case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
+ last_msr = MSR_IA32_MCx_CTL2(bank_num) - 1;
+ if (msr > last_msr)
+ return 1;
+
+ if (!(mcg_cap & MCG_CMCI_P) && (data || !msr_info->host_initiated))
+ return 1;
+ /* An attempt to write a 1 to a reserved bit raises #GP */
+ if (data & ~(MCI_CTL2_CMCI_EN | MCI_CTL2_CMCI_THRESHOLD_MASK))
+ return 1;
+ offset = array_index_nospec(msr - MSR_IA32_MC0_CTL2,
+ last_msr + 1 - MSR_IA32_MC0_CTL2);
+ vcpu->arch.mci_ctl2_banks[offset] = data;
+ break;
+ case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+ last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
+ if (msr > last_msr)
+ return 1;
+
+ /*
+ * Only 0 or all 1s can be written to IA32_MCi_CTL, all other
+ * values are architecturally undefined. But, some Linux
+ * kernels clear bit 10 in bank 4 to workaround a BIOS/GART TLB
+ * issue on AMD K8s, allow bit 10 to be clear when setting all
+ * other bits in order to avoid an uncaught #GP in the guest.
+ *
+ * UNIXWARE clears bit 0 of MC1_CTL to ignore correctable,
+ * single-bit ECC data errors.
+ */
+ if (is_mci_control_msr(msr) &&
+ data != 0 && (data | (1 << 10) | 1) != ~(u64)0)
+ return 1;
+
+ /*
+ * All CPUs allow writing 0 to MCi_STATUS MSRs to clear the MSR.
+ * AMD-based CPUs allow non-zero values, but if and only if
+ * HWCR[McStatusWrEn] is set.
+ */
+ if (!msr_info->host_initiated && is_mci_status_msr(msr) &&
+ data != 0 && !can_set_mci_status(vcpu))
+ return 1;
+
+ offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
+ last_msr + 1 - MSR_IA32_MC0_CTL);
+ vcpu->arch.mce_banks[offset] = data;
+ break;
+ default:
+ return 1;
+ }
+ return 0;
+}
+
+static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
+{
+ gpa_t gpa = data & ~0x3f;
+
+ /* Bits 4:5 are reserved, Should be zero */
+ if (data & 0x30)
+ return 1;
+
+ if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_VMEXIT) &&
+ (data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT))
+ return 1;
+
+ if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT) &&
+ (data & KVM_ASYNC_PF_DELIVERY_AS_INT))
+ return 1;
+
+ if (!lapic_in_kernel(vcpu))
+ return data ? 1 : 0;
+
+ if (__kvm_pv_async_pf_enabled(data) &&
+ kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa,
+ sizeof(u64)))
+ return 1;
+
+ vcpu->arch.apf.msr_en_val = data;
+
+ if (__kvm_pv_async_pf_enabled(data)) {
+ kvm_async_pf_wakeup_all(vcpu);
+ } else {
+ kvm_clear_async_pf_completion_queue(vcpu);
+ kvm_async_pf_hash_reset(vcpu);
+ }
+ return 0;
+}
+
+static int kvm_pv_enable_async_pf_int(struct kvm_vcpu *vcpu, u64 data)
+{
+ /* Bits 8-63 are reserved */
+ if (data >> 8)
+ return 1;
+
+ if (!lapic_in_kernel(vcpu))
+ return 1;
+
+ vcpu->arch.apf.msr_int_val = data;
+
+ vcpu->arch.apf.vec = data & KVM_ASYNC_PF_VEC_MASK;
+
+ return 0;
+}
+
+#ifdef CONFIG_X86_64
+static inline u64 kvm_guest_supported_xfd(struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.guest_supported_xcr0 & XFEATURE_MASK_USER_DYNAMIC;
+}
+#endif
+
+int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+ u32 msr = msr_info->index;
+ u64 data = msr_info->data;
+
+ /*
+ * Do not allow host-initiated writes to trigger the Xen hypercall
+ * page setup; it could incur locking paths which are not expected
+ * if userspace sets the MSR in an unusual location.
+ */
+ if (kvm_xen_is_hypercall_page_msr(vcpu->kvm, msr) &&
+ !msr_info->host_initiated)
+ return kvm_xen_write_hypercall_page(vcpu, data);
+
+ switch (msr) {
+ case MSR_AMD64_NB_CFG:
+ case MSR_IA32_UCODE_WRITE:
+ case MSR_VM_HSAVE_PA:
+ case MSR_AMD64_PATCH_LOADER:
+ case MSR_AMD64_BU_CFG2:
+ case MSR_AMD64_DC_CFG:
+ case MSR_AMD64_TW_CFG:
+ case MSR_F15H_EX_CFG:
+ break;
+
+ case MSR_IA32_UCODE_REV:
+ if (msr_info->host_initiated)
+ vcpu->arch.microcode_version = data;
+ break;
+ case MSR_IA32_ARCH_CAPABILITIES:
+ if (!msr_info->host_initiated ||
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+ return KVM_MSR_RET_UNSUPPORTED;
+ vcpu->arch.arch_capabilities = data;
+ break;
+ case MSR_IA32_PERF_CAPABILITIES:
+ if (!msr_info->host_initiated ||
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ if (data & ~kvm_caps.supported_perf_cap)
+ return 1;
+
+ /*
+ * Note, this is not just a performance optimization! KVM
+ * disallows changing feature MSRs after the vCPU has run; PMU
+ * refresh will bug the VM if called after the vCPU has run.
+ */
+ if (vcpu->arch.perf_capabilities == data)
+ break;
+
+ vcpu->arch.perf_capabilities = data;
+ kvm_pmu_refresh(vcpu);
+ kvm_make_request(KVM_REQ_RECALC_INTERCEPTS, vcpu);
+ break;
+ case MSR_IA32_PRED_CMD: {
+ u64 reserved_bits = ~(PRED_CMD_IBPB | PRED_CMD_SBPB);
+
+ if (!msr_info->host_initiated) {
+ if ((!guest_has_pred_cmd_msr(vcpu)))
+ return 1;
+
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_IBPB))
+ reserved_bits |= PRED_CMD_IBPB;
+
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SBPB))
+ reserved_bits |= PRED_CMD_SBPB;
+ }
+
+ if (!boot_cpu_has(X86_FEATURE_IBPB))
+ reserved_bits |= PRED_CMD_IBPB;
+
+ if (!boot_cpu_has(X86_FEATURE_SBPB))
+ reserved_bits |= PRED_CMD_SBPB;
+
+ if (data & reserved_bits)
+ return 1;
+
+ if (!data)
+ break;
+
+ wrmsrq(MSR_IA32_PRED_CMD, data);
+ break;
+ }
+ case MSR_IA32_FLUSH_CMD:
+ if (!msr_info->host_initiated &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D))
+ return 1;
+
+ if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D) || (data & ~L1D_FLUSH))
+ return 1;
+ if (!data)
+ break;
+
+ wrmsrq(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
+ break;
+ case MSR_EFER:
+ return set_efer(vcpu, msr_info);
+ case MSR_K7_HWCR: {
+ /*
+ * Allow McStatusWrEn and TscFreqSel. (Linux guests from v3.2
+ * through at least v6.6 whine if TscFreqSel is clear,
+ * depending on F/M/S.
+ */
+ u64 valid = BIT_ULL(18) | BIT_ULL(24);
+
+ data &= ~(u64)0x40; /* ignore flush filter disable */
+ data &= ~(u64)0x100; /* ignore ignne emulation enable */
+ data &= ~(u64)0x8; /* ignore TLB cache disable */
+
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_GP_ON_USER_CPUID))
+ valid |= MSR_K7_HWCR_CPUID_USER_DIS;
+
+ if (data & ~valid) {
+ kvm_pr_unimpl_wrmsr(vcpu, msr, data);
+ return 1;
+ }
+ vcpu->arch.msr_hwcr = data;
+ break;
+ }
+ case MSR_FAM10H_MMIO_CONF_BASE:
+ if (data != 0) {
+ kvm_pr_unimpl_wrmsr(vcpu, msr, data);
+ return 1;
+ }
+ break;
+ case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
+ vcpu->arch.pat = data;
+ break;
+ case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000:
+ case MSR_MTRRdefType:
+ return kvm_mtrr_set_msr(vcpu, msr, data);
+ case MSR_IA32_APICBASE:
+ return kvm_apic_set_base(vcpu, data, msr_info->host_initiated);
+ case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
+ return kvm_x2apic_msr_write(vcpu, msr, data);
+ case MSR_IA32_TSC_DEADLINE:
+ kvm_set_lapic_tscdeadline_msr(vcpu, data);
+ break;
+ case MSR_IA32_TSC_ADJUST:
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_TSC_ADJUST)) {
+ if (!msr_info->host_initiated) {
+ s64 adj = data - vcpu->arch.ia32_tsc_adjust_msr;
+ adjust_tsc_offset_guest(vcpu, adj);
+ /* Before back to guest, tsc_timestamp must be adjusted
+ * as well, otherwise guest's percpu pvclock time could jump.
+ */
+ kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+ }
+ vcpu->arch.ia32_tsc_adjust_msr = data;
+ }
+ break;
+ case MSR_IA32_MISC_ENABLE: {
+ u64 old_val = vcpu->arch.ia32_misc_enable_msr;
+
+ if (!msr_info->host_initiated) {
+ /* RO bits */
+ if ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PMU_RO_MASK)
+ return 1;
+
+ /* R bits, i.e. writes are ignored, but don't fault. */
+ data = data & ~MSR_IA32_MISC_ENABLE_EMON;
+ data |= old_val & MSR_IA32_MISC_ENABLE_EMON;
+ }
+
+ if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
+ ((old_val ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_XMM3))
+ return 1;
+ vcpu->arch.ia32_misc_enable_msr = data;
+ vcpu->arch.cpuid_dynamic_bits_dirty = true;
+ } else {
+ vcpu->arch.ia32_misc_enable_msr = data;
+ }
+ break;
+ }
+ case MSR_IA32_SMBASE:
+ if (!IS_ENABLED(CONFIG_KVM_SMM) || !msr_info->host_initiated)
+ return 1;
+ vcpu->arch.smbase = data;
+ break;
+ case MSR_IA32_POWER_CTL:
+ vcpu->arch.msr_ia32_power_ctl = data;
+ break;
+ case MSR_IA32_TSC:
+ if (msr_info->host_initiated) {
+ kvm_synchronize_tsc(vcpu, &data);
+ } else if (!vcpu->arch.guest_tsc_protected) {
+ u64 adj = kvm_compute_l1_tsc_offset(vcpu, data) - vcpu->arch.l1_tsc_offset;
+ adjust_tsc_offset_guest(vcpu, adj);
+ vcpu->arch.ia32_tsc_adjust_msr += adj;
+ }
+ break;
+ case MSR_IA32_XSS:
+ if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ if (data & ~vcpu->arch.guest_supported_xss)
+ return 1;
+ if (vcpu->arch.ia32_xss == data)
+ break;
+ vcpu->arch.ia32_xss = data;
+ vcpu->arch.cpuid_dynamic_bits_dirty = true;
+ break;
+ case MSR_SMI_COUNT:
+ if (!msr_info->host_initiated)
+ return 1;
+ vcpu->arch.smi_count = data;
+ break;
+ case MSR_KVM_WALL_CLOCK_NEW:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ vcpu->kvm->arch.wall_clock = data;
+ kvm_write_wall_clock(vcpu->kvm, data, 0);
+ break;
+ case MSR_KVM_WALL_CLOCK:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ vcpu->kvm->arch.wall_clock = data;
+ kvm_write_wall_clock(vcpu->kvm, data, 0);
+ break;
+ case MSR_KVM_SYSTEM_TIME_NEW:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ kvm_write_system_time(vcpu, data, false, msr_info->host_initiated);
+ break;
+ case MSR_KVM_SYSTEM_TIME:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ kvm_write_system_time(vcpu, data, true, msr_info->host_initiated);
+ break;
+ case MSR_KVM_ASYNC_PF_EN:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ if (kvm_pv_enable_async_pf(vcpu, data))
+ return 1;
+ break;
+ case MSR_KVM_ASYNC_PF_INT:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ if (kvm_pv_enable_async_pf_int(vcpu, data))
+ return 1;
+ break;
+ case MSR_KVM_ASYNC_PF_ACK:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT))
+ return KVM_MSR_RET_UNSUPPORTED;
+ if (data & 0x1) {
+ /*
+ * Pairs with the smp_mb__after_atomic() in
+ * kvm_arch_async_page_present_queued().
+ */
+ smp_store_mb(vcpu->arch.apf.pageready_pending, false);
+
+ kvm_check_async_pf_completion(vcpu);
+ }
+ break;
+ case MSR_KVM_STEAL_TIME:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_STEAL_TIME))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ if (unlikely(!sched_info_on()))
+ return 1;
+
+ if (data & KVM_STEAL_RESERVED_MASK)
+ return 1;
+
+ vcpu->arch.st.msr_val = data;
+
+ if (!(data & KVM_MSR_ENABLED))
+ break;
+
+ kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
+
+ break;
+ case MSR_KVM_PV_EOI_EN:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_PV_EOI))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ if (kvm_lapic_set_pv_eoi(vcpu, data, sizeof(u8)))
+ return 1;
+ break;
+
+ case MSR_KVM_POLL_CONTROL:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_POLL_CONTROL))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ /* only enable bit supported */
+ if (data & (-1ULL << 1))
+ return 1;
+
+ vcpu->arch.msr_kvm_poll_control = data;
+ break;
+
+ case MSR_IA32_MCG_CTL:
+ case MSR_IA32_MCG_STATUS:
+ case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+ case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
+ return set_msr_mce(vcpu, msr_info);
+
+ case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
+ case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
+ case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL3:
+ case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
+ if (kvm_pmu_is_valid_msr(vcpu, msr))
+ return kvm_pmu_set_msr(vcpu, msr_info);
+
+ if (data)
+ kvm_pr_unimpl_wrmsr(vcpu, msr, data);
+ break;
+ case MSR_K7_CLK_CTL:
+ /*
+ * Ignore all writes to this no longer documented MSR.
+ * Writes are only relevant for old K7 processors,
+ * all pre-dating SVM, but a recommended workaround from
+ * AMD for these chips. It is possible to specify the
+ * affected processor models on the command line, hence
+ * the need to ignore the workaround.
+ */
+ break;
+#ifdef CONFIG_KVM_HYPERV
+ case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15:
+ case HV_X64_MSR_SYNDBG_CONTROL ... HV_X64_MSR_SYNDBG_PENDING_BUFFER:
+ case HV_X64_MSR_SYNDBG_OPTIONS:
+ case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
+ case HV_X64_MSR_CRASH_CTL:
+ case HV_X64_MSR_STIMER0_CONFIG ... HV_X64_MSR_STIMER3_COUNT:
+ case HV_X64_MSR_REENLIGHTENMENT_CONTROL:
+ case HV_X64_MSR_TSC_EMULATION_CONTROL:
+ case HV_X64_MSR_TSC_EMULATION_STATUS:
+ case HV_X64_MSR_TSC_INVARIANT_CONTROL:
+ return kvm_hv_set_msr_common(vcpu, msr, data,
+ msr_info->host_initiated);
+#endif
+ case MSR_IA32_BBL_CR_CTL3:
+ /* Drop writes to this legacy MSR -- see rdmsr
+ * counterpart for further detail.
+ */
+ kvm_pr_unimpl_wrmsr(vcpu, msr, data);
+ break;
+ case MSR_AMD64_OSVW_ID_LENGTH:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
+ return 1;
+ vcpu->arch.osvw.length = data;
+ break;
+ case MSR_AMD64_OSVW_STATUS:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
+ return 1;
+ vcpu->arch.osvw.status = data;
+ break;
+ case MSR_PLATFORM_INFO:
+ if (!msr_info->host_initiated)
+ return 1;
+ vcpu->arch.msr_platform_info = data;
+ break;
+ case MSR_MISC_FEATURES_ENABLES:
+ if (data & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT ||
+ (data & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT &&
+ !(vcpu->arch.msr_platform_info & MSR_PLATFORM_INFO_CPUID_FAULT)))
+ return 1;
+ vcpu->arch.msr_misc_features_enables = data;
+ break;
+#ifdef CONFIG_X86_64
+ case MSR_IA32_XFD:
+ if (!msr_info->host_initiated &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
+ return 1;
+
+ if (data & ~kvm_guest_supported_xfd(vcpu))
+ return 1;
+
+ fpu_update_guest_xfd(&vcpu->arch.guest_fpu, data);
+ break;
+ case MSR_IA32_XFD_ERR:
+ if (!msr_info->host_initiated &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
+ return 1;
+
+ if (data & ~kvm_guest_supported_xfd(vcpu))
+ return 1;
+
+ vcpu->arch.guest_fpu.xfd_err = data;
+ break;
+#endif
+ case MSR_IA32_U_CET:
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ kvm_set_xstate_msr(vcpu, msr_info);
+ break;
+ default:
+ if (kvm_pmu_is_valid_msr(vcpu, msr))
+ return kvm_pmu_set_msr(vcpu, msr_info);
+
+ return KVM_MSR_RET_UNSUPPORTED;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_msr_common);
+
+static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
+{
+ u64 data;
+ u64 mcg_cap = vcpu->arch.mcg_cap;
+ unsigned bank_num = mcg_cap & 0xff;
+ u32 offset, last_msr;
+
+ switch (msr) {
+ case MSR_IA32_P5_MC_ADDR:
+ case MSR_IA32_P5_MC_TYPE:
+ data = 0;
+ break;
+ case MSR_IA32_MCG_CAP:
+ data = vcpu->arch.mcg_cap;
+ break;
+ case MSR_IA32_MCG_CTL:
+ if (!(mcg_cap & MCG_CTL_P) && !host)
+ return 1;
+ data = vcpu->arch.mcg_ctl;
+ break;
+ case MSR_IA32_MCG_STATUS:
+ data = vcpu->arch.mcg_status;
+ break;
+ case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
+ last_msr = MSR_IA32_MCx_CTL2(bank_num) - 1;
+ if (msr > last_msr)
+ return 1;
+
+ if (!(mcg_cap & MCG_CMCI_P) && !host)
+ return 1;
+ offset = array_index_nospec(msr - MSR_IA32_MC0_CTL2,
+ last_msr + 1 - MSR_IA32_MC0_CTL2);
+ data = vcpu->arch.mci_ctl2_banks[offset];
+ break;
+ case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+ last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
+ if (msr > last_msr)
+ return 1;
+
+ offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
+ last_msr + 1 - MSR_IA32_MC0_CTL);
+ data = vcpu->arch.mce_banks[offset];
+ break;
+ default:
+ return 1;
+ }
+ *pdata = data;
+ return 0;
+}
+
+int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+ switch (msr_info->index) {
+ case MSR_IA32_PLATFORM_ID:
+ case MSR_IA32_EBL_CR_POWERON:
+ case MSR_IA32_LASTBRANCHFROMIP:
+ case MSR_IA32_LASTBRANCHTOIP:
+ case MSR_IA32_LASTINTFROMIP:
+ case MSR_IA32_LASTINTTOIP:
+ case MSR_AMD64_SYSCFG:
+ case MSR_K8_TSEG_ADDR:
+ case MSR_K8_TSEG_MASK:
+ case MSR_VM_HSAVE_PA:
+ case MSR_K8_INT_PENDING_MSG:
+ case MSR_AMD64_NB_CFG:
+ case MSR_FAM10H_MMIO_CONF_BASE:
+ case MSR_AMD64_BU_CFG2:
+ case MSR_IA32_PERF_CTL:
+ case MSR_AMD64_DC_CFG:
+ case MSR_AMD64_TW_CFG:
+ case MSR_F15H_EX_CFG:
+ /*
+ * Intel Sandy Bridge CPUs must support the RAPL (running average power
+ * limit) MSRs. Just return 0, as we do not want to expose the host
+ * data here. Do not conditionalize this on CPUID, as KVM does not do
+ * so for existing CPU-specific MSRs.
+ */
+ case MSR_RAPL_POWER_UNIT:
+ case MSR_PP0_ENERGY_STATUS: /* Power plane 0 (core) */
+ case MSR_PP1_ENERGY_STATUS: /* Power plane 1 (graphics uncore) */
+ case MSR_PKG_ENERGY_STATUS: /* Total package */
+ case MSR_DRAM_ENERGY_STATUS: /* DRAM controller */
+ msr_info->data = 0;
+ break;
+ case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL3:
+ case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
+ case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
+ case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
+ if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
+ return kvm_pmu_get_msr(vcpu, msr_info);
+ msr_info->data = 0;
+ break;
+ case MSR_IA32_UCODE_REV:
+ msr_info->data = vcpu->arch.microcode_version;
+ break;
+ case MSR_IA32_ARCH_CAPABILITIES:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+ return KVM_MSR_RET_UNSUPPORTED;
+ msr_info->data = vcpu->arch.arch_capabilities;
+ break;
+ case MSR_IA32_PERF_CAPABILITIES:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
+ return KVM_MSR_RET_UNSUPPORTED;
+ msr_info->data = vcpu->arch.perf_capabilities;
+ break;
+ case MSR_IA32_POWER_CTL:
+ msr_info->data = vcpu->arch.msr_ia32_power_ctl;
+ break;
+ case MSR_IA32_TSC: {
+ /*
+ * Intel SDM states that MSR_IA32_TSC read adds the TSC offset
+ * even when not intercepted. AMD manual doesn't explicitly
+ * state this but appears to behave the same.
+ *
+ * On userspace reads and writes, however, we unconditionally
+ * return L1's TSC value to ensure backwards-compatible
+ * behavior for migration.
+ */
+ u64 offset, ratio;
+
+ if (msr_info->host_initiated) {
+ offset = vcpu->arch.l1_tsc_offset;
+ ratio = vcpu->arch.l1_tsc_scaling_ratio;
+ } else {
+ offset = vcpu->arch.tsc_offset;
+ ratio = vcpu->arch.tsc_scaling_ratio;
+ }
+
+ msr_info->data = kvm_scale_tsc(rdtsc(), ratio) + offset;
+ break;
+ }
+ case MSR_IA32_CR_PAT:
+ msr_info->data = vcpu->arch.pat;
+ break;
+ case MSR_MTRRcap:
+ case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000:
+ case MSR_MTRRdefType:
+ return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data);
+ case 0xcd: /* fsb frequency */
+ msr_info->data = 3;
+ break;
+ /*
+ * MSR_EBC_FREQUENCY_ID
+ * Conservative value valid for even the basic CPU models.
+ * Models 0,1: 000 in bits 23:21 indicating a bus speed of
+ * 100MHz, model 2 000 in bits 18:16 indicating 100MHz,
+ * and 266MHz for model 3, or 4. Set Core Clock
+ * Frequency to System Bus Frequency Ratio to 1 (bits
+ * 31:24) even though these are only valid for CPU
+ * models > 2, however guests may end up dividing or
+ * multiplying by zero otherwise.
+ */
+ case MSR_EBC_FREQUENCY_ID:
+ msr_info->data = 1 << 24;
+ break;
+ case MSR_IA32_APICBASE:
+ msr_info->data = vcpu->arch.apic_base;
+ break;
+ case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
+ return kvm_x2apic_msr_read(vcpu, msr_info->index, &msr_info->data);
+ case MSR_IA32_TSC_DEADLINE:
+ msr_info->data = kvm_get_lapic_tscdeadline_msr(vcpu);
+ break;
+ case MSR_IA32_TSC_ADJUST:
+ msr_info->data = (u64)vcpu->arch.ia32_tsc_adjust_msr;
+ break;
+ case MSR_IA32_MISC_ENABLE:
+ msr_info->data = vcpu->arch.ia32_misc_enable_msr;
+ break;
+ case MSR_IA32_SMBASE:
+ if (!IS_ENABLED(CONFIG_KVM_SMM) || !msr_info->host_initiated)
+ return 1;
+ msr_info->data = vcpu->arch.smbase;
+ break;
+ case MSR_SMI_COUNT:
+ msr_info->data = vcpu->arch.smi_count;
+ break;
+ case MSR_IA32_PERF_STATUS:
+ /* TSC increment by tick */
+ msr_info->data = 1000ULL;
+ /* CPU multiplier */
+ msr_info->data |= (((uint64_t)4ULL) << 40);
+ break;
+ case MSR_EFER:
+ msr_info->data = vcpu->arch.efer;
+ break;
+ case MSR_KVM_WALL_CLOCK:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->kvm->arch.wall_clock;
+ break;
+ case MSR_KVM_WALL_CLOCK_NEW:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->kvm->arch.wall_clock;
+ break;
+ case MSR_KVM_SYSTEM_TIME:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->arch.time;
+ break;
+ case MSR_KVM_SYSTEM_TIME_NEW:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->arch.time;
+ break;
+ case MSR_KVM_ASYNC_PF_EN:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->arch.apf.msr_en_val;
+ break;
+ case MSR_KVM_ASYNC_PF_INT:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->arch.apf.msr_int_val;
+ break;
+ case MSR_KVM_ASYNC_PF_ACK:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = 0;
+ break;
+ case MSR_KVM_STEAL_TIME:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_STEAL_TIME))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->arch.st.msr_val;
+ break;
+ case MSR_KVM_PV_EOI_EN:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_PV_EOI))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->arch.pv_eoi.msr_val;
+ break;
+ case MSR_KVM_POLL_CONTROL:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_POLL_CONTROL))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->arch.msr_kvm_poll_control;
+ break;
+ case MSR_IA32_P5_MC_ADDR:
+ case MSR_IA32_P5_MC_TYPE:
+ case MSR_IA32_MCG_CAP:
+ case MSR_IA32_MCG_CTL:
+ case MSR_IA32_MCG_STATUS:
+ case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+ case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
+ return get_msr_mce(vcpu, msr_info->index, &msr_info->data,
+ msr_info->host_initiated);
+ case MSR_IA32_XSS:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
+ return 1;
+ msr_info->data = vcpu->arch.ia32_xss;
+ break;
+ case MSR_K7_CLK_CTL:
+ /*
+ * Provide expected ramp-up count for K7. All other
+ * are set to zero, indicating minimum divisors for
+ * every field.
+ *
+ * This prevents guest kernels on AMD host with CPU
+ * type 6, model 8 and higher from exploding due to
+ * the rdmsr failing.
+ */
+ msr_info->data = 0x20000000;
+ break;
+#ifdef CONFIG_KVM_HYPERV
+ case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15:
+ case HV_X64_MSR_SYNDBG_CONTROL ... HV_X64_MSR_SYNDBG_PENDING_BUFFER:
+ case HV_X64_MSR_SYNDBG_OPTIONS:
+ case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
+ case HV_X64_MSR_CRASH_CTL:
+ case HV_X64_MSR_STIMER0_CONFIG ... HV_X64_MSR_STIMER3_COUNT:
+ case HV_X64_MSR_REENLIGHTENMENT_CONTROL:
+ case HV_X64_MSR_TSC_EMULATION_CONTROL:
+ case HV_X64_MSR_TSC_EMULATION_STATUS:
+ case HV_X64_MSR_TSC_INVARIANT_CONTROL:
+ return kvm_hv_get_msr_common(vcpu,
+ msr_info->index, &msr_info->data,
+ msr_info->host_initiated);
+#endif
+ case MSR_IA32_BBL_CR_CTL3:
+ /* This legacy MSR exists but isn't fully documented in current
+ * silicon. It is however accessed by winxp in very narrow
+ * scenarios where it sets bit #19, itself documented as
+ * a "reserved" bit. Best effort attempt to source coherent
+ * read data here should the balance of the register be
+ * interpreted by the guest:
+ *
+ * L2 cache control register 3: 64GB range, 256KB size,
+ * enabled, latency 0x1, configured
+ */
+ msr_info->data = 0xbe702111;
+ break;
+ case MSR_AMD64_OSVW_ID_LENGTH:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
+ return 1;
+ msr_info->data = vcpu->arch.osvw.length;
+ break;
+ case MSR_AMD64_OSVW_STATUS:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
+ return 1;
+ msr_info->data = vcpu->arch.osvw.status;
+ break;
+ case MSR_PLATFORM_INFO:
+ if (!msr_info->host_initiated &&
+ !vcpu->kvm->arch.guest_can_read_msr_platform_info)
+ return 1;
+ msr_info->data = vcpu->arch.msr_platform_info;
+ break;
+ case MSR_MISC_FEATURES_ENABLES:
+ msr_info->data = vcpu->arch.msr_misc_features_enables;
+ break;
+ case MSR_K7_HWCR:
+ msr_info->data = vcpu->arch.msr_hwcr;
+ break;
+#ifdef CONFIG_X86_64
+ case MSR_IA32_XFD:
+ if (!msr_info->host_initiated &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
+ return 1;
+
+ msr_info->data = vcpu->arch.guest_fpu.fpstate->xfd;
+ break;
+ case MSR_IA32_XFD_ERR:
+ if (!msr_info->host_initiated &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
+ return 1;
+
+ msr_info->data = vcpu->arch.guest_fpu.xfd_err;
+ break;
+#endif
+ case MSR_IA32_U_CET:
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ kvm_get_xstate_msr(vcpu, msr_info);
+ break;
+ default:
+ if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
+ return kvm_pmu_get_msr(vcpu, msr_info);
+
+ return KVM_MSR_RET_UNSUPPORTED;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_msr_common);
+
+static int do_get_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
+{
+ return kvm_get_msr_ignored_check(vcpu, index, data, true);
+}
+
+static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
+{
+ u64 val;
+
+ /*
+ * Reject writes to immutable feature MSRs if the vCPU model is frozen,
+ * as KVM doesn't support modifying the guest vCPU model on the fly,
+ * e.g. changing the VMX capabilities MSRs while L2 is active is
+ * nonsensical. Allow writes of the same value, e.g. so that userspace
+ * can blindly stuff all MSRs when emulating RESET.
+ */
+ if (!kvm_can_set_cpuid_and_feature_msrs(vcpu) &&
+ kvm_is_immutable_feature_msr(index) &&
+ (do_get_msr(vcpu, index, &val) || *data != val))
+ return -EINVAL;
+
+ return kvm_set_msr_ignored_check(vcpu, index, *data, true);
+}
+
+/*
+ * Read or write a bunch of msrs. All parameters are kernel addresses.
+ *
+ * @return number of msrs set successfully.
+ */
+static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
+ struct kvm_msr_entry *entries,
+ int (*do_msr)(struct kvm_vcpu *vcpu,
+ unsigned index, u64 *data))
+{
+ bool fpu_loaded = false;
+ int i;
+
+ for (i = 0; i < msrs->nmsrs; ++i) {
+ /*
+ * If userspace is accessing one or more XSTATE-managed MSRs,
+ * temporarily load the guest's FPU state so that the guest's
+ * MSR value(s) is resident in hardware and thus can be accessed
+ * via RDMSR/WRMSR.
+ */
+ if (!fpu_loaded && is_xstate_managed_msr(vcpu, entries[i].index)) {
+ kvm_load_guest_fpu(vcpu);
+ fpu_loaded = true;
+ }
+ if (do_msr(vcpu, entries[i].index, &entries[i].data))
+ break;
+ }
+ if (fpu_loaded)
+ kvm_put_guest_fpu(vcpu);
+
+ return i;
+}
+
+/*
+ * Read or write a bunch of msrs. Parameters are user addresses.
+ *
+ * @return number of msrs set successfully.
+ */
+static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
+ int (*do_msr)(struct kvm_vcpu *vcpu,
+ unsigned index, u64 *data),
+ int writeback)
+{
+ struct kvm_msrs msrs;
+ struct kvm_msr_entry *entries;
+ unsigned size;
+ int r;
+
+ r = -EFAULT;
+ if (copy_from_user(&msrs, user_msrs, sizeof(msrs)))
+ goto out;
+
+ r = -E2BIG;
+ if (msrs.nmsrs >= MAX_IO_MSRS)
+ goto out;
+
+ size = sizeof(struct kvm_msr_entry) * msrs.nmsrs;
+ entries = memdup_user(user_msrs->entries, size);
+ if (IS_ERR(entries)) {
+ r = PTR_ERR(entries);
+ goto out;
+ }
+
+ r = __msr_io(vcpu, &msrs, entries, do_msr);
+
+ if (writeback && copy_to_user(user_msrs->entries, entries, size))
+ r = -EFAULT;
+
+ kfree(entries);
+out:
+ return r;
+}
+
+int kvm_get_feature_msrs(struct kvm_msrs __user *user_msrs)
+{
+ return msr_io(NULL, user_msrs, do_get_feature_msr, 1);
+}
+
+int kvm_get_msrs(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs)
+{
+ guard(srcu)(&vcpu->kvm->srcu);
+
+ return msr_io(vcpu, user_msrs, do_get_msr, 1);
+}
+
+int kvm_set_msrs(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs)
+{
+ guard(srcu)(&vcpu->kvm->srcu);
+
+ return msr_io(vcpu, user_msrs, do_set_msr, 0);
+}
+
+static int kvm_get_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
+{
+ u64 val;
+
+ if (do_get_msr(vcpu, msr, &val))
+ return -EINVAL;
+
+ if (put_user(val, user_val))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int kvm_set_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
+{
+ u64 val;
+
+ if (get_user(val, user_val))
+ return -EFAULT;
+
+ if (do_set_msr(vcpu, msr, &val))
+ return -EINVAL;
+
+ return 0;
+}
+
+struct kvm_x86_reg_id {
+ __u32 index;
+ __u8 type;
+ __u8 rsvd1;
+ __u8 rsvd2:4;
+ __u8 size:4;
+ __u8 x86;
+};
+
+static int kvm_translate_kvm_reg(struct kvm_vcpu *vcpu,
+ struct kvm_x86_reg_id *reg)
+{
+ switch (reg->index) {
+ case KVM_REG_GUEST_SSP:
+ /*
+ * FIXME: If host-initiated accesses are ever exempted from
+ * ignore_msrs (in kvm_do_msr_access()), drop this manual check
+ * and rely on KVM's standard checks to reject accesses to regs
+ * that don't exist.
+ */
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+ return -EINVAL;
+
+ reg->type = KVM_X86_REG_TYPE_MSR;
+ reg->index = MSR_KVM_INTERNAL_GUEST_SSP;
+ break;
+ default:
+ return -EINVAL;
+ }
+ return 0;
+}
+
+int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
+ void __user *argp)
+{
+ struct kvm_one_reg one_reg;
+ struct kvm_x86_reg_id *reg;
+ u64 __user *user_val;
+ bool load_fpu;
+ int r;
+
+ if (copy_from_user(&one_reg, argp, sizeof(one_reg)))
+ return -EFAULT;
+
+ if ((one_reg.id & KVM_REG_ARCH_MASK) != KVM_REG_X86)
+ return -EINVAL;
+
+ reg = (struct kvm_x86_reg_id *)&one_reg.id;
+ if (reg->rsvd1 || reg->rsvd2)
+ return -EINVAL;
+
+ if (reg->type == KVM_X86_REG_TYPE_KVM) {
+ r = kvm_translate_kvm_reg(vcpu, reg);
+ if (r)
+ return r;
+ }
+
+ if (reg->type != KVM_X86_REG_TYPE_MSR)
+ return -EINVAL;
+
+ if ((one_reg.id & KVM_REG_SIZE_MASK) != KVM_REG_SIZE_U64)
+ return -EINVAL;
+
+ guard(srcu)(&vcpu->kvm->srcu);
+
+ load_fpu = is_xstate_managed_msr(vcpu, reg->index);
+ if (load_fpu)
+ kvm_load_guest_fpu(vcpu);
+
+ user_val = u64_to_user_ptr(one_reg.addr);
+ if (ioctl == KVM_GET_ONE_REG)
+ r = kvm_get_one_msr(vcpu, reg->index, user_val);
+ else
+ r = kvm_set_one_msr(vcpu, reg->index, user_val);
+
+ if (load_fpu)
+ kvm_put_guest_fpu(vcpu);
+ return r;
+}
+
+int kvm_get_reg_list(struct kvm_vcpu *vcpu,
+ struct kvm_reg_list __user *user_list)
+{
+ u64 nr_regs = guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ? 1 : 0;
+ u64 user_nr_regs;
+
+ if (get_user(user_nr_regs, &user_list->n))
+ return -EFAULT;
+
+ if (put_user(nr_regs, &user_list->n))
+ return -EFAULT;
+
+ if (user_nr_regs < nr_regs)
+ return -E2BIG;
+
+ if (nr_regs &&
+ put_user(KVM_X86_REG_KVM(KVM_REG_GUEST_SSP), &user_list->reg[0]))
+ return -EFAULT;
+
+ return 0;
+}
+
+static struct kvm_x86_msr_filter *kvm_alloc_msr_filter(bool default_allow)
+{
+ struct kvm_x86_msr_filter *msr_filter;
+
+ msr_filter = kzalloc_obj(*msr_filter, GFP_KERNEL_ACCOUNT);
+ if (!msr_filter)
+ return NULL;
+
+ msr_filter->default_allow = default_allow;
+ return msr_filter;
+}
+
+void kvm_free_msr_filter(struct kvm_x86_msr_filter *msr_filter)
+{
+ u32 i;
+
+ if (!msr_filter)
+ return;
+
+ for (i = 0; i < msr_filter->count; i++)
+ kfree(msr_filter->ranges[i].bitmap);
+
+ kfree(msr_filter);
+}
+
+static int kvm_add_msr_filter(struct kvm_x86_msr_filter *msr_filter,
+ struct kvm_msr_filter_range *user_range)
+{
+ unsigned long *bitmap;
+ size_t bitmap_size;
+
+ if (!user_range->nmsrs)
+ return 0;
+
+ if (user_range->flags & ~KVM_MSR_FILTER_RANGE_VALID_MASK)
+ return -EINVAL;
+
+ if (!user_range->flags)
+ return -EINVAL;
+
+ bitmap_size = BITS_TO_LONGS(user_range->nmsrs) * sizeof(long);
+ if (!bitmap_size || bitmap_size > KVM_MSR_FILTER_MAX_BITMAP_SIZE)
+ return -EINVAL;
+
+ bitmap = memdup_user((__user u8*)user_range->bitmap, bitmap_size);
+ if (IS_ERR(bitmap))
+ return PTR_ERR(bitmap);
+
+ msr_filter->ranges[msr_filter->count] = (struct msr_bitmap_range) {
+ .flags = user_range->flags,
+ .base = user_range->base,
+ .nmsrs = user_range->nmsrs,
+ .bitmap = bitmap,
+ };
+
+ msr_filter->count++;
+ return 0;
+}
+
+int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, struct kvm_msr_filter *filter)
+{
+ struct kvm_x86_msr_filter *new_filter, *old_filter;
+ bool default_allow;
+ bool empty = true;
+ int r;
+ u32 i;
+
+ if (filter->flags & ~KVM_MSR_FILTER_VALID_MASK)
+ return -EINVAL;
+
+ for (i = 0; i < ARRAY_SIZE(filter->ranges); i++)
+ empty &= !filter->ranges[i].nmsrs;
+
+ default_allow = !(filter->flags & KVM_MSR_FILTER_DEFAULT_DENY);
+ if (empty && !default_allow)
+ return -EINVAL;
+
+ new_filter = kvm_alloc_msr_filter(default_allow);
+ if (!new_filter)
+ return -ENOMEM;
+
+ for (i = 0; i < ARRAY_SIZE(filter->ranges); i++) {
+ r = kvm_add_msr_filter(new_filter, &filter->ranges[i]);
+ if (r) {
+ kvm_free_msr_filter(new_filter);
+ return r;
+ }
+ }
+
+ mutex_lock(&kvm->lock);
+ old_filter = rcu_replace_pointer(kvm->arch.msr_filter, new_filter,
+ mutex_is_locked(&kvm->lock));
+ mutex_unlock(&kvm->lock);
+ synchronize_srcu(&kvm->srcu);
+
+ kvm_free_msr_filter(old_filter);
+
+ /*
+ * Recalc MSR intercepts as userspace may want to intercept accesses to
+ * MSRs that KVM would otherwise pass through to the guest.
+ */
+ kvm_make_all_cpus_request(kvm, KVM_REQ_RECALC_INTERCEPTS);
+
+ return 0;
+}
+
+
+static void kvm_probe_feature_msr(u32 msr_index)
+{
+ u64 data;
+
+ if (kvm_get_feature_msr(NULL, msr_index, &data, true))
+ return;
+
+ msr_based_features[num_msr_based_features++] = msr_index;
+}
+
+static void kvm_probe_msr_to_save(u32 msr_index)
+{
+ u32 dummy[2];
+
+ if (rdmsr_safe(msr_index, &dummy[0], &dummy[1]))
+ return;
+
+ /*
+ * Even MSRs that are valid in the host may not be exposed to guests in
+ * some cases.
+ */
+ switch (msr_index) {
+ case MSR_IA32_BNDCFGS:
+ if (!kvm_mpx_supported())
+ return;
+ break;
+ case MSR_TSC_AUX:
+ if (!kvm_cpu_cap_has(X86_FEATURE_RDTSCP) &&
+ !kvm_cpu_cap_has(X86_FEATURE_RDPID))
+ return;
+ break;
+ case MSR_IA32_UMWAIT_CONTROL:
+ if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG))
+ return;
+ break;
+ case MSR_IA32_RTIT_CTL:
+ case MSR_IA32_RTIT_STATUS:
+ if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT))
+ return;
+ break;
+ case MSR_IA32_RTIT_CR3_MATCH:
+ if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT) ||
+ !intel_pt_validate_hw_cap(PT_CAP_cr3_filtering))
+ return;
+ break;
+ case MSR_IA32_RTIT_OUTPUT_BASE:
+ case MSR_IA32_RTIT_OUTPUT_MASK:
+ if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT) ||
+ (!intel_pt_validate_hw_cap(PT_CAP_topa_output) &&
+ !intel_pt_validate_hw_cap(PT_CAP_single_range_output)))
+ return;
+ break;
+ case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
+ if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT) ||
+ (msr_index - MSR_IA32_RTIT_ADDR0_A >=
+ intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2))
+ return;
+ break;
+ case MSR_ARCH_PERFMON_PERFCTR0 ...
+ MSR_ARCH_PERFMON_PERFCTR0 + KVM_MAX_NR_GP_COUNTERS - 1:
+ if (msr_index - MSR_ARCH_PERFMON_PERFCTR0 >=
+ kvm_pmu_cap.num_counters_gp)
+ return;
+ break;
+ case MSR_ARCH_PERFMON_EVENTSEL0 ...
+ MSR_ARCH_PERFMON_EVENTSEL0 + KVM_MAX_NR_GP_COUNTERS - 1:
+ if (msr_index - MSR_ARCH_PERFMON_EVENTSEL0 >=
+ kvm_pmu_cap.num_counters_gp)
+ return;
+ break;
+ case MSR_ARCH_PERFMON_FIXED_CTR0 ...
+ MSR_ARCH_PERFMON_FIXED_CTR0 + KVM_MAX_NR_FIXED_COUNTERS - 1:
+ if (msr_index - MSR_ARCH_PERFMON_FIXED_CTR0 >=
+ kvm_pmu_cap.num_counters_fixed)
+ return;
+ break;
+ case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET:
+ if (!kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2))
+ return;
+ break;
+ case MSR_IA32_XFD:
+ case MSR_IA32_XFD_ERR:
+ if (!kvm_cpu_cap_has(X86_FEATURE_XFD))
+ return;
+ break;
+ case MSR_IA32_TSX_CTRL:
+ if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
+ return;
+ break;
+ case MSR_IA32_XSS:
+ if (!kvm_caps.supported_xss)
+ return;
+ break;
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+ !kvm_cpu_cap_has(X86_FEATURE_IBT))
+ return;
+ break;
+ case MSR_IA32_INT_SSP_TAB:
+ if (!kvm_cpu_cap_has(X86_FEATURE_LM))
+ return;
+ fallthrough;
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
+ return;
+ break;
+ default:
+ break;
+ }
+
+ msrs_to_save[num_msrs_to_save++] = msr_index;
+}
+
+void kvm_init_msr_lists(void)
+{
+ unsigned i;
+
+ BUILD_BUG_ON_MSG(KVM_MAX_NR_FIXED_COUNTERS != 3,
+ "Please update the fixed PMCs in msrs_to_save_pmu[]");
+
+ num_msrs_to_save = 0;
+ num_emulated_msrs = 0;
+ num_msr_based_features = 0;
+
+ for (i = 0; i < ARRAY_SIZE(msrs_to_save_base); i++)
+ kvm_probe_msr_to_save(msrs_to_save_base[i]);
+
+ if (enable_pmu) {
+ for (i = 0; i < ARRAY_SIZE(msrs_to_save_pmu); i++)
+ kvm_probe_msr_to_save(msrs_to_save_pmu[i]);
+ }
+
+ for (i = 0; i < ARRAY_SIZE(emulated_msrs_all); i++) {
+ if (!kvm_x86_call(has_emulated_msr)(NULL,
+ emulated_msrs_all[i]))
+ continue;
+
+ emulated_msrs[num_emulated_msrs++] = emulated_msrs_all[i];
+ }
+
+ for (i = KVM_FIRST_EMULATED_VMX_MSR; i <= KVM_LAST_EMULATED_VMX_MSR; i++)
+ kvm_probe_feature_msr(i);
+
+ for (i = 0; i < ARRAY_SIZE(msr_based_features_all_except_vmx); i++)
+ kvm_probe_feature_msr(msr_based_features_all_except_vmx[i]);
+}
+
+int kvm_spec_ctrl_test_value(u64 value)
+{
+ /*
+ * test that setting IA32_SPEC_CTRL to given value
+ * is allowed by the host processor
+ */
+
+ u64 saved_value;
+ unsigned long flags;
+ int ret = 0;
+
+ local_irq_save(flags);
+
+ if (rdmsrq_safe(MSR_IA32_SPEC_CTRL, &saved_value))
+ ret = 1;
+ else if (wrmsrq_safe(MSR_IA32_SPEC_CTRL, value))
+ ret = 1;
+ else
+ wrmsrq(MSR_IA32_SPEC_CTRL, saved_value);
+
+ local_irq_restore(flags);
+
+ return ret;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_spec_ctrl_test_value);
diff --git a/arch/x86/kvm/msrs.h b/arch/x86/kvm/msrs.h
new file mode 100644
index 000000000000..c34f0411ced6
--- /dev/null
+++ b/arch/x86/kvm/msrs.h
@@ -0,0 +1,128 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef ARCH_X86_KVM_MSR_H
+#define ARCH_X86_KVM_MSR_H
+
+#include <linux/kvm_host.h>
+#include <linux/user-return-notifier.h>
+
+#include "cpuid.h"
+#include "regs.h"
+
+extern bool report_ignored_msrs;
+extern bool ignore_msrs;
+
+static inline void kvm_pr_unimpl_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
+{
+ if (report_ignored_msrs)
+ vcpu_unimpl(vcpu, "Unhandled WRMSR(0x%x) = 0x%llx\n", msr, data);
+}
+
+static inline void kvm_pr_unimpl_rdmsr(struct kvm_vcpu *vcpu, u32 msr)
+{
+ if (report_ignored_msrs)
+ vcpu_unimpl(vcpu, "Unhandled RDMSR(0x%x)\n", msr);
+}
+
+/*
+ * The first...last VMX feature MSRs that are emulated by KVM. This may or may
+ * not cover all known VMX MSRs, as KVM doesn't emulate an MSR until there's an
+ * associated feature that KVM supports for nested virtualization.
+ */
+#define KVM_FIRST_EMULATED_VMX_MSR MSR_IA32_VMX_BASIC
+#define KVM_LAST_EMULATED_VMX_MSR MSR_IA32_VMX_VMFUNC
+
+/*
+ * KVM's internal, non-ABI indices for synthetic MSRs. The values themselves
+ * are arbitrary and have no meaning, the only requirement is that they don't
+ * conflict with "real" MSRs that KVM supports. Use values at the upper end
+ * of KVM's reserved paravirtual MSR range to minimize churn, i.e. these values
+ * will be usable until KVM exhausts its supply of paravirtual MSR indices.
+ */
+#define MSR_KVM_INTERNAL_GUEST_SSP 0x4b564dff
+
+#define MSR_IA32_CR_PAT_DEFAULT \
+ PAT_VALUE(WB, WT, UC_MINUS, UC, WB, WT, UC_MINUS, UC)
+
+void kvm_init_msr_lists(void);
+int kvm_get_msr_index_list(struct kvm_msr_list __user *user_msr_list);
+int kvm_get_feature_msr_index_list(struct kvm_msr_list __user *user_msr_list);
+int kvm_get_feature_msrs(struct kvm_msrs __user *user_msrs);
+
+int kvm_get_msrs(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs);
+int kvm_set_msrs(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs);
+
+int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
+ void __user *argp);
+int kvm_get_reg_list(struct kvm_vcpu *vcpu,
+ struct kvm_reg_list __user *user_list);
+
+void kvm_user_return_msr_cpu_online(void);
+void drop_user_return_notifiers(void);
+void kvm_destroy_user_return_msrs(void);
+
+fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu);
+fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
+
+int kvm_emulator_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 msr_index,
+ u64 *pdata);
+int kvm_emulator_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 msr_index,
+ u64 data);
+int kvm_emulator_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
+
+bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
+
+enum kvm_msr_access {
+ MSR_TYPE_R = BIT(0),
+ MSR_TYPE_W = BIT(1),
+ MSR_TYPE_RW = MSR_TYPE_R | MSR_TYPE_W,
+};
+
+/*
+ * Internal error codes that are used to indicate that MSR emulation encountered
+ * an error that should result in #GP in the guest, unless userspace handles it.
+ * Note, '1', '0', and negative numbers are off limits, as they are used by KVM
+ * as part of KVM's lightly documented internal KVM_RUN return codes.
+ *
+ * UNSUPPORTED - The MSR isn't supported, either because it is completely
+ * unknown to KVM, or because the MSR should not exist according
+ * to the vCPU model.
+ *
+ * FILTERED - Access to the MSR is denied by a userspace MSR filter.
+ */
+#define KVM_MSR_RET_UNSUPPORTED 2
+#define KVM_MSR_RET_FILTERED 3
+
+int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, struct kvm_msr_filter *filter);
+void kvm_free_msr_filter(struct kvm_x86_msr_filter *msr_filter);
+
+int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data);
+int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
+
+u64 kvm_get_arch_capabilities(void);
+int kvm_spec_ctrl_test_value(u64 value);
+
+#define CET_US_RESERVED_BITS GENMASK(9, 6)
+#define CET_US_SHSTK_MASK_BITS GENMASK(1, 0)
+#define CET_US_IBT_MASK_BITS (GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
+#define CET_US_LEGACY_BITMAP_BASE(data) ((data) >> 12)
+
+static inline bool kvm_is_valid_u_s_cet(struct kvm_vcpu *vcpu, u64 data)
+{
+ if (data & CET_US_RESERVED_BITS)
+ return false;
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+ (data & CET_US_SHSTK_MASK_BITS))
+ return false;
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) &&
+ (data & CET_US_IBT_MASK_BITS))
+ return false;
+ if (!IS_ALIGNED(CET_US_LEGACY_BITMAP_BASE(data), 4))
+ return false;
+ /* IBT can be suppressed iff the TRACKER isn't WAIT_ENDBR. */
+ if ((data & CET_SUPPRESS) && (data & CET_WAIT_ENDBR))
+ return false;
+
+ return true;
+}
+
+#endif
\ No newline at end of file
diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index 6f74e2b27c1e..4f3b7b0f6565 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -19,6 +19,7 @@
#include <asm/mtrr.h>
#include "cpuid.h"
+#include "msrs.h"
#include "x86.h"
static u64 *find_mtrr(struct kvm_vcpu *vcpu, unsigned int msr)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bf15c122f837..c648fac802f6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -80,7 +80,6 @@
#include <asm/mshyperv.h>
#include <asm/hypervisor.h>
#include <asm/tlbflush.h>
-#include <asm/intel_pt.h>
#include <asm/emulate_prefix.h>
#include <asm/sgx.h>
#include <asm/virt.h>
@@ -90,8 +89,6 @@
#define CREATE_TRACE_POINTS
#include "trace.h"
-#define MAX_IO_MSRS 256
-
/*
* Note, kvm_caps fields should *never* have default values, all fields must be
* recomputed from scratch during vendor module load, e.g. to account for a
@@ -108,17 +105,6 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_host);
#define emul_to_vcpu(ctxt) \
((struct kvm_vcpu *)(ctxt)->vcpu)
-/* EFER defaults:
- * - enable syscall per default because its emulated by KVM
- * - enable LME and LMA per default on 64 bit KVM
- */
-#ifdef CONFIG_X86_64
-static
-u64 __read_mostly efer_reserved_bits = ~((u64)(EFER_SCE | EFER_LME | EFER_LMA));
-#else
-static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
-#endif
-
#define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)
#define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
@@ -133,7 +119,6 @@ static void store_regs(struct kvm_vcpu *vcpu);
static int sync_regs(struct kvm_vcpu *vcpu);
static DEFINE_MUTEX(vendor_module_lock);
-
struct kvm_x86_ops kvm_x86_ops __read_mostly;
#define KVM_X86_OP(func) \
@@ -146,13 +131,6 @@ EXPORT_STATIC_CALL_GPL(kvm_x86_get_cs_db_l_bits);
EXPORT_STATIC_CALL_GPL(kvm_x86_cache_reg);
EXPORT_STATIC_CALL_GPL(kvm_x86_get_cpl);
-static bool __read_mostly ignore_msrs = 0;
-module_param(ignore_msrs, bool, 0644);
-
-bool __read_mostly report_ignored_msrs = true;
-module_param(report_ignored_msrs, bool, 0644);
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(report_ignored_msrs);
-
unsigned int min_timer_period_us = 200;
module_param(min_timer_period_us, uint, 0644);
@@ -179,27 +157,6 @@ module_param(pi_inject_timer, bint, 0644);
static bool __read_mostly mitigate_smt_rsb;
module_param(mitigate_smt_rsb, bool, 0444);
-/*
- * Restoring the host value for MSRs that are only consumed when running in
- * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
- * returns to userspace, i.e. the kernel can run with the guest's value.
- */
-#define KVM_MAX_NR_USER_RETURN_MSRS 16
-
-struct kvm_user_return_msrs {
- struct user_return_notifier urn;
- bool registered;
- struct kvm_user_return_msr_values {
- u64 host;
- u64 curr;
- } values[KVM_MAX_NR_USER_RETURN_MSRS];
-};
-
-u32 __read_mostly kvm_nr_uret_msrs;
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_nr_uret_msrs);
-static u32 __read_mostly kvm_uret_msrs_list[KVM_MAX_NR_USER_RETURN_MSRS];
-static DEFINE_PER_CPU(struct kvm_user_return_msrs, user_return_msrs);
-
#define KVM_SUPPORTED_XCR0 (XFEATURE_MASK_FP | XFEATURE_MASK_SSE \
| XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
@@ -301,249 +258,6 @@ const struct kvm_stats_header kvm_vcpu_stats_header = {
static struct kmem_cache *x86_emulator_cache;
-/*
- * The three MSR lists(msrs_to_save, emulated_msrs, msr_based_features) track
- * the set of MSRs that KVM exposes to userspace through KVM_GET_MSRS,
- * KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. msrs_to_save holds MSRs that
- * require host support, i.e. should be probed via RDMSR. emulated_msrs holds
- * MSRs that KVM emulates without strictly requiring host support.
- * msr_based_features holds MSRs that enumerate features, i.e. are effectively
- * CPUID leafs. Note, msr_based_features isn't mutually exclusive with
- * msrs_to_save and emulated_msrs.
- */
-
-static const u32 msrs_to_save_base[] = {
- MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
- MSR_STAR,
-#ifdef CONFIG_X86_64
- MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
-#endif
- MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
- MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
- MSR_IA32_SPEC_CTRL, MSR_IA32_TSX_CTRL,
- MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
- MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
- MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
- MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
- MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
- MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
- MSR_IA32_UMWAIT_CONTROL,
-
- MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS,
-
- MSR_IA32_U_CET, MSR_IA32_S_CET,
- MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
- MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB,
- MSR_IA32_DEBUGCTLMSR,
- MSR_IA32_LASTBRANCHFROMIP, MSR_IA32_LASTBRANCHTOIP,
- MSR_IA32_LASTINTFROMIP, MSR_IA32_LASTINTTOIP,
-};
-
-static const u32 msrs_to_save_pmu[] = {
- MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
- MSR_ARCH_PERFMON_FIXED_CTR0 + 2,
- MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
- MSR_CORE_PERF_GLOBAL_CTRL,
- MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
-
- /* This part of MSRs should match KVM_MAX_NR_INTEL_GP_COUNTERS. */
- MSR_ARCH_PERFMON_PERFCTR0, MSR_ARCH_PERFMON_PERFCTR1,
- MSR_ARCH_PERFMON_PERFCTR0 + 2, MSR_ARCH_PERFMON_PERFCTR0 + 3,
- MSR_ARCH_PERFMON_PERFCTR0 + 4, MSR_ARCH_PERFMON_PERFCTR0 + 5,
- MSR_ARCH_PERFMON_PERFCTR0 + 6, MSR_ARCH_PERFMON_PERFCTR0 + 7,
- MSR_ARCH_PERFMON_EVENTSEL0, MSR_ARCH_PERFMON_EVENTSEL1,
- MSR_ARCH_PERFMON_EVENTSEL0 + 2, MSR_ARCH_PERFMON_EVENTSEL0 + 3,
- MSR_ARCH_PERFMON_EVENTSEL0 + 4, MSR_ARCH_PERFMON_EVENTSEL0 + 5,
- MSR_ARCH_PERFMON_EVENTSEL0 + 6, MSR_ARCH_PERFMON_EVENTSEL0 + 7,
-
- MSR_K7_EVNTSEL0, MSR_K7_EVNTSEL1, MSR_K7_EVNTSEL2, MSR_K7_EVNTSEL3,
- MSR_K7_PERFCTR0, MSR_K7_PERFCTR1, MSR_K7_PERFCTR2, MSR_K7_PERFCTR3,
-
- /* This part of MSRs should match KVM_MAX_NR_AMD_GP_COUNTERS. */
- MSR_F15H_PERF_CTL0, MSR_F15H_PERF_CTL1, MSR_F15H_PERF_CTL2,
- MSR_F15H_PERF_CTL3, MSR_F15H_PERF_CTL4, MSR_F15H_PERF_CTL5,
- MSR_F15H_PERF_CTR0, MSR_F15H_PERF_CTR1, MSR_F15H_PERF_CTR2,
- MSR_F15H_PERF_CTR3, MSR_F15H_PERF_CTR4, MSR_F15H_PERF_CTR5,
-
- MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
- MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
- MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
- MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET,
-};
-
-static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_base) +
- ARRAY_SIZE(msrs_to_save_pmu)];
-static unsigned num_msrs_to_save;
-
-static const u32 emulated_msrs_all[] = {
- MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
- MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
-
-#ifdef CONFIG_KVM_HYPERV
- HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
- HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_REFERENCE_TSC,
- HV_X64_MSR_TSC_FREQUENCY, HV_X64_MSR_APIC_FREQUENCY,
- HV_X64_MSR_CRASH_P0, HV_X64_MSR_CRASH_P1, HV_X64_MSR_CRASH_P2,
- HV_X64_MSR_CRASH_P3, HV_X64_MSR_CRASH_P4, HV_X64_MSR_CRASH_CTL,
- HV_X64_MSR_RESET,
- HV_X64_MSR_VP_INDEX,
- HV_X64_MSR_VP_RUNTIME,
- HV_X64_MSR_SCONTROL,
- HV_X64_MSR_STIMER0_CONFIG,
- HV_X64_MSR_VP_ASSIST_PAGE,
- HV_X64_MSR_REENLIGHTENMENT_CONTROL, HV_X64_MSR_TSC_EMULATION_CONTROL,
- HV_X64_MSR_TSC_EMULATION_STATUS, HV_X64_MSR_TSC_INVARIANT_CONTROL,
- HV_X64_MSR_SYNDBG_OPTIONS,
- HV_X64_MSR_SYNDBG_CONTROL, HV_X64_MSR_SYNDBG_STATUS,
- HV_X64_MSR_SYNDBG_SEND_BUFFER, HV_X64_MSR_SYNDBG_RECV_BUFFER,
- HV_X64_MSR_SYNDBG_PENDING_BUFFER,
-#endif
-
- MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
- MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF_INT, MSR_KVM_ASYNC_PF_ACK,
-
- MSR_IA32_TSC_ADJUST,
- MSR_IA32_TSC_DEADLINE,
- MSR_IA32_ARCH_CAPABILITIES,
- MSR_IA32_PERF_CAPABILITIES,
- MSR_IA32_MISC_ENABLE,
- MSR_IA32_MCG_STATUS,
- MSR_IA32_MCG_CTL,
- MSR_IA32_MCG_EXT_CTL,
- MSR_IA32_SMBASE,
- MSR_SMI_COUNT,
- MSR_PLATFORM_INFO,
- MSR_MISC_FEATURES_ENABLES,
- MSR_AMD64_VIRT_SPEC_CTRL,
- MSR_AMD64_TSC_RATIO,
- MSR_IA32_POWER_CTL,
- MSR_IA32_UCODE_REV,
-
- /*
- * KVM always supports the "true" VMX control MSRs, even if the host
- * does not. The VMX MSRs as a whole are considered "emulated" as KVM
- * doesn't strictly require them to exist in the host (ignoring that
- * KVM would refuse to load in the first place if the core set of MSRs
- * aren't supported).
- */
- MSR_IA32_VMX_BASIC,
- MSR_IA32_VMX_TRUE_PINBASED_CTLS,
- MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
- MSR_IA32_VMX_TRUE_EXIT_CTLS,
- MSR_IA32_VMX_TRUE_ENTRY_CTLS,
- MSR_IA32_VMX_MISC,
- MSR_IA32_VMX_CR0_FIXED0,
- MSR_IA32_VMX_CR4_FIXED0,
- MSR_IA32_VMX_VMCS_ENUM,
- MSR_IA32_VMX_PROCBASED_CTLS2,
- MSR_IA32_VMX_EPT_VPID_CAP,
- MSR_IA32_VMX_VMFUNC,
-
- MSR_K7_HWCR,
- MSR_KVM_POLL_CONTROL,
-};
-
-static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
-static unsigned num_emulated_msrs;
-
-/*
- * List of MSRs that control the existence of MSR-based features, i.e. MSRs
- * that are effectively CPUID leafs. VMX MSRs are also included in the set of
- * feature MSRs, but are handled separately to allow expedited lookups.
- */
-static const u32 msr_based_features_all_except_vmx[] = {
- MSR_AMD64_DE_CFG,
- MSR_IA32_UCODE_REV,
- MSR_IA32_ARCH_CAPABILITIES,
- MSR_IA32_PERF_CAPABILITIES,
- MSR_PLATFORM_INFO,
-};
-
-static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all_except_vmx) +
- (KVM_LAST_EMULATED_VMX_MSR - KVM_FIRST_EMULATED_VMX_MSR + 1)];
-static unsigned int num_msr_based_features;
-
-/*
- * All feature MSRs except uCode revID, which tracks the currently loaded uCode
- * patch, are immutable once the vCPU model is defined.
- */
-static bool kvm_is_immutable_feature_msr(u32 msr)
-{
- int i;
-
- if (msr >= KVM_FIRST_EMULATED_VMX_MSR && msr <= KVM_LAST_EMULATED_VMX_MSR)
- return true;
-
- for (i = 0; i < ARRAY_SIZE(msr_based_features_all_except_vmx); i++) {
- if (msr == msr_based_features_all_except_vmx[i])
- return msr != MSR_IA32_UCODE_REV;
- }
-
- return false;
-}
-
-static bool kvm_is_advertised_msr(u32 msr_index)
-{
- unsigned int i;
-
- for (i = 0; i < num_msrs_to_save; i++) {
- if (msrs_to_save[i] == msr_index)
- return true;
- }
-
- for (i = 0; i < num_emulated_msrs; i++) {
- if (emulated_msrs[i] == msr_index)
- return true;
- }
-
- return false;
-}
-
-typedef int (*msr_access_t)(struct kvm_vcpu *vcpu, u32 index, u64 *data,
- bool host_initiated);
-
-static __always_inline int kvm_do_msr_access(struct kvm_vcpu *vcpu, u32 msr,
- u64 *data, bool host_initiated,
- enum kvm_msr_access rw,
- msr_access_t msr_access_fn)
-{
- const char *op = rw == MSR_TYPE_W ? "wrmsr" : "rdmsr";
- int ret;
-
- BUILD_BUG_ON(rw != MSR_TYPE_R && rw != MSR_TYPE_W);
-
- /*
- * Zero the data on read failures to avoid leaking stack data to the
- * guest and/or userspace, e.g. if the failure is ignored below.
- */
- ret = msr_access_fn(vcpu, msr, data, host_initiated);
- if (ret && rw == MSR_TYPE_R)
- *data = 0;
-
- if (ret != KVM_MSR_RET_UNSUPPORTED)
- return ret;
-
- /*
- * Userspace is allowed to read MSRs, and write '0' to MSRs, that KVM
- * advertises to userspace, even if an MSR isn't fully supported.
- * Simply check that @data is '0', which covers both the write '0' case
- * and all reads (in which case @data is zeroed on failure; see above).
- */
- if (host_initiated && !*data && kvm_is_advertised_msr(msr))
- return 0;
-
- if (!ignore_msrs) {
- kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n",
- op, msr, *data);
- return ret;
- }
-
- if (report_ignored_msrs)
- kvm_pr_unimpl("ignored %s: 0x%x data 0x%llx\n", op, msr, *data);
-
- return 0;
-}
-
static struct kmem_cache *kvm_alloc_emulator_cache(void)
{
unsigned int useroffset = offsetof(struct x86_emulate_ctxt, src);
@@ -557,128 +271,6 @@ static struct kmem_cache *kvm_alloc_emulator_cache(void)
static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
-static void kvm_destroy_user_return_msrs(void)
-{
- int cpu;
-
- for_each_possible_cpu(cpu)
- WARN_ON_ONCE(per_cpu(user_return_msrs, cpu).registered);
-
- kvm_nr_uret_msrs = 0;
-}
-
-static void kvm_on_user_return(struct user_return_notifier *urn)
-{
- unsigned slot;
- struct kvm_user_return_msrs *msrs
- = container_of(urn, struct kvm_user_return_msrs, urn);
- struct kvm_user_return_msr_values *values;
-
- msrs->registered = false;
- user_return_notifier_unregister(urn);
-
- for (slot = 0; slot < kvm_nr_uret_msrs; ++slot) {
- values = &msrs->values[slot];
- if (values->host != values->curr) {
- wrmsrq(kvm_uret_msrs_list[slot], values->host);
- values->curr = values->host;
- }
- }
-}
-
-static int kvm_probe_user_return_msr(u32 msr)
-{
- u64 val;
- int ret;
-
- preempt_disable();
- ret = rdmsrq_safe(msr, &val);
- if (ret)
- goto out;
- ret = wrmsrq_safe(msr, val);
-out:
- preempt_enable();
- return ret;
-}
-
-int kvm_add_user_return_msr(u32 msr)
-{
- BUG_ON(kvm_nr_uret_msrs >= KVM_MAX_NR_USER_RETURN_MSRS);
-
- if (kvm_probe_user_return_msr(msr))
- return -1;
-
- kvm_uret_msrs_list[kvm_nr_uret_msrs] = msr;
- return kvm_nr_uret_msrs++;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_add_user_return_msr);
-
-int kvm_find_user_return_msr(u32 msr)
-{
- int i;
-
- for (i = 0; i < kvm_nr_uret_msrs; ++i) {
- if (kvm_uret_msrs_list[i] == msr)
- return i;
- }
- return -1;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_find_user_return_msr);
-
-static void kvm_user_return_msr_cpu_online(void)
-{
- struct kvm_user_return_msrs *msrs = this_cpu_ptr(&user_return_msrs);
- u64 value;
- int i;
-
- for (i = 0; i < kvm_nr_uret_msrs; ++i) {
- rdmsrq_safe(kvm_uret_msrs_list[i], &value);
- msrs->values[i].host = value;
- msrs->values[i].curr = value;
- }
-}
-
-static void kvm_user_return_register_notifier(struct kvm_user_return_msrs *msrs)
-{
- if (!msrs->registered) {
- msrs->urn.on_user_return = kvm_on_user_return;
- user_return_notifier_register(&msrs->urn);
- msrs->registered = true;
- }
-}
-
-int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
-{
- struct kvm_user_return_msrs *msrs = this_cpu_ptr(&user_return_msrs);
- int err;
-
- value = (value & mask) | (msrs->values[slot].host & ~mask);
- if (value == msrs->values[slot].curr)
- return 0;
- err = wrmsrq_safe(kvm_uret_msrs_list[slot], value);
- if (err)
- return 1;
-
- msrs->values[slot].curr = value;
- kvm_user_return_register_notifier(msrs);
- return 0;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_user_return_msr);
-
-u64 kvm_get_user_return_msr(unsigned int slot)
-{
- return this_cpu_ptr(&user_return_msrs)->values[slot].curr;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_user_return_msr);
-
-static void drop_user_return_notifiers(void)
-{
- struct kvm_user_return_msrs *msrs = this_cpu_ptr(&user_return_msrs);
-
- if (msrs->registered)
- kvm_on_user_return(&msrs->urn);
-}
-
/*
* Handle a fault on a hardware virtualization (VMX or SVM) instruction.
*
@@ -933,17 +525,6 @@ int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err)
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_complete_insn_gp);
-static int complete_emulated_insn_gp(struct kvm_vcpu *vcpu, int err)
-{
- if (err) {
- kvm_inject_gp(vcpu, 0);
- return 1;
- }
-
- return kvm_emulate_instruction(vcpu, EMULTYPE_NO_DECODE | EMULTYPE_SKIP |
- EMULTYPE_COMPLETE_USER_EXIT);
-}
-
void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault,
bool from_hardware)
{
@@ -1050,13 +631,6 @@ static void kvm_load_host_pkru(struct kvm_vcpu *vcpu)
}
}
-#ifdef CONFIG_X86_64
-static inline u64 kvm_guest_supported_xfd(struct kvm_vcpu *vcpu)
-{
- return vcpu->arch.guest_supported_xcr0 & XFEATURE_MASK_USER_DYNAMIC;
-}
-#endif
-
int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
{
u64 xcr0 = xcr;
@@ -1175,595 +749,6 @@ int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdpmc);
-/*
- * Some IA32_ARCH_CAPABILITIES bits have dependencies on MSRs that KVM
- * does not yet virtualize. These include:
- * 10 - MISC_PACKAGE_CTRLS
- * 11 - ENERGY_FILTERING_CTL
- * 12 - DOITM
- * 18 - FB_CLEAR_CTRL
- * 21 - XAPIC_DISABLE_STATUS
- * 23 - OVERCLOCKING_STATUS
- */
-
-#define KVM_SUPPORTED_ARCH_CAP \
- (ARCH_CAP_RDCL_NO | ARCH_CAP_IBRS_ALL | ARCH_CAP_RSBA | \
- ARCH_CAP_SKIP_VMENTRY_L1DFLUSH | ARCH_CAP_SSB_NO | ARCH_CAP_MDS_NO | \
- ARCH_CAP_PSCHANGE_MC_NO | ARCH_CAP_TSX_CTRL_MSR | ARCH_CAP_TAA_NO | \
- ARCH_CAP_SBDR_SSDP_NO | ARCH_CAP_FBSDP_NO | ARCH_CAP_PSDP_NO | \
- ARCH_CAP_FB_CLEAR | ARCH_CAP_RRSBA | ARCH_CAP_PBRSB_NO | ARCH_CAP_GDS_NO | \
- ARCH_CAP_RFDS_NO | ARCH_CAP_RFDS_CLEAR | ARCH_CAP_BHI_NO | ARCH_CAP_ITS_NO)
-
-static u64 kvm_get_arch_capabilities(void)
-{
- u64 data = kvm_host.arch_capabilities & KVM_SUPPORTED_ARCH_CAP;
-
- /*
- * If nx_huge_pages is enabled, KVM's shadow paging will ensure that
- * the nested hypervisor runs with NX huge pages. If it is not,
- * L1 is anyway vulnerable to ITLB_MULTIHIT exploits from other
- * L1 guests, so it need not worry about its own (L2) guests.
- */
- data |= ARCH_CAP_PSCHANGE_MC_NO;
-
- /*
- * If we're doing cache flushes (either "always" or "cond")
- * we will do one whenever the guest does a vmlaunch/vmresume.
- * If an outer hypervisor is doing the cache flush for us
- * (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that
- * capability to the guest too, and if EPT is disabled we're not
- * vulnerable. Overall, only VMENTER_L1D_FLUSH_NEVER will
- * require a nested hypervisor to do a flush of its own.
- */
- if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER)
- data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
-
- if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
- data |= ARCH_CAP_RDCL_NO;
- if (!boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS))
- data |= ARCH_CAP_SSB_NO;
- if (!boot_cpu_has_bug(X86_BUG_MDS))
- data |= ARCH_CAP_MDS_NO;
- if (!boot_cpu_has_bug(X86_BUG_RFDS))
- data |= ARCH_CAP_RFDS_NO;
- if (!boot_cpu_has_bug(X86_BUG_ITS))
- data |= ARCH_CAP_ITS_NO;
-
- if (!boot_cpu_has(X86_FEATURE_RTM)) {
- /*
- * If RTM=0 because the kernel has disabled TSX, the host might
- * have TAA_NO or TSX_CTRL. Clear TAA_NO (the guest sees RTM=0
- * and therefore knows that there cannot be TAA) but keep
- * TSX_CTRL: some buggy userspaces leave it set on tsx=on hosts,
- * and we want to allow migrating those guests to tsx=off hosts.
- */
- data &= ~ARCH_CAP_TAA_NO;
- } else if (!boot_cpu_has_bug(X86_BUG_TAA)) {
- data |= ARCH_CAP_TAA_NO;
- } else {
- /*
- * Nothing to do here; we emulate TSX_CTRL if present on the
- * host so the guest can choose between disabling TSX or
- * using VERW to clear CPU buffers.
- */
- }
-
- if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
- data |= ARCH_CAP_GDS_NO;
-
- return data;
-}
-
-static int kvm_get_feature_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
- bool host_initiated)
-{
- WARN_ON_ONCE(!host_initiated);
-
- switch (index) {
- case MSR_IA32_ARCH_CAPABILITIES:
- *data = kvm_get_arch_capabilities();
- break;
- case MSR_IA32_PERF_CAPABILITIES:
- *data = kvm_caps.supported_perf_cap;
- break;
- case MSR_PLATFORM_INFO:
- *data = MSR_PLATFORM_INFO_CPUID_FAULT;
- break;
- case MSR_IA32_UCODE_REV:
- rdmsrq_safe(index, data);
- break;
- default:
- return kvm_x86_call(get_feature_msr)(index, data);
- }
- return 0;
-}
-
-static int do_get_feature_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
-{
- return kvm_do_msr_access(vcpu, index, data, true, MSR_TYPE_R,
- kvm_get_feature_msr);
-}
-
-static bool __kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
-{
- if (efer & EFER_AUTOIBRS && !guest_cpu_cap_has(vcpu, X86_FEATURE_AUTOIBRS))
- return false;
-
- if (efer & EFER_FFXSR && !guest_cpu_cap_has(vcpu, X86_FEATURE_FXSR_OPT))
- return false;
-
- if (efer & EFER_SVME && !guest_cpu_cap_has(vcpu, X86_FEATURE_SVM))
- return false;
-
- if (efer & (EFER_LME | EFER_LMA) &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
- return false;
-
- if (efer & EFER_NX && !guest_cpu_cap_has(vcpu, X86_FEATURE_NX))
- return false;
-
- return true;
-
-}
-bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
-{
- if (efer & efer_reserved_bits)
- return false;
-
- return __kvm_valid_efer(vcpu, efer);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_valid_efer);
-
-static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
-{
- u64 old_efer = vcpu->arch.efer;
- u64 efer = msr_info->data;
- int r;
-
- if (efer & efer_reserved_bits)
- return 1;
-
- if (!msr_info->host_initiated) {
- if (!__kvm_valid_efer(vcpu, efer))
- return 1;
-
- if (is_paging(vcpu) &&
- (vcpu->arch.efer & EFER_LME) != (efer & EFER_LME))
- return 1;
- }
-
- efer &= ~EFER_LMA;
- efer |= vcpu->arch.efer & EFER_LMA;
-
- r = kvm_x86_call(set_efer)(vcpu, efer);
- if (r) {
- WARN_ON(r > 0);
- return r;
- }
-
- if ((efer ^ old_efer) & KVM_MMU_EFER_ROLE_BITS)
- kvm_mmu_reset_context(vcpu);
-
- if (!static_cpu_has(X86_FEATURE_XSAVES) &&
- (efer & EFER_SVME))
- kvm_hv_xsaves_xsavec_maybe_warn(vcpu);
-
- return 0;
-}
-
-void kvm_enable_efer_bits(u64 mask)
-{
- efer_reserved_bits &= ~mask;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_enable_efer_bits);
-
-bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
-{
- struct kvm_x86_msr_filter *msr_filter;
- struct msr_bitmap_range *ranges;
- struct kvm *kvm = vcpu->kvm;
- bool allowed;
- int idx;
- u32 i;
-
- /* x2APIC MSRs do not support filtering. */
- if (index >= 0x800 && index <= 0x8ff)
- return true;
-
- idx = srcu_read_lock(&kvm->srcu);
-
- msr_filter = srcu_dereference(kvm->arch.msr_filter, &kvm->srcu);
- if (!msr_filter) {
- allowed = true;
- goto out;
- }
-
- allowed = msr_filter->default_allow;
- ranges = msr_filter->ranges;
-
- for (i = 0; i < msr_filter->count; i++) {
- u32 start = ranges[i].base;
- u32 end = start + ranges[i].nmsrs;
- u32 flags = ranges[i].flags;
- unsigned long *bitmap = ranges[i].bitmap;
-
- if ((index >= start) && (index < end) && (flags & type)) {
- allowed = test_bit(index - start, bitmap);
- break;
- }
- }
-
-out:
- srcu_read_unlock(&kvm->srcu, idx);
-
- return allowed;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_msr_allowed);
-
-/*
- * Write @data into the MSR specified by @index. Select MSR specific fault
- * checks are bypassed if @host_initiated is %true.
- * Returns 0 on success, non-0 otherwise.
- * Assumes vcpu_load() was already called.
- */
-static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
- bool host_initiated)
-{
- struct msr_data msr;
-
- switch (index) {
- case MSR_FS_BASE:
- case MSR_GS_BASE:
- case MSR_KERNEL_GS_BASE:
- case MSR_CSTAR:
- case MSR_LSTAR:
- if (is_noncanonical_msr_address(data, vcpu))
- return 1;
- break;
- case MSR_IA32_SYSENTER_EIP:
- case MSR_IA32_SYSENTER_ESP:
- /*
- * IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
- * non-canonical address is written on Intel but not on
- * AMD (which ignores the top 32-bits, because it does
- * not implement 64-bit SYSENTER).
- *
- * 64-bit code should hence be able to write a non-canonical
- * value on AMD. Making the address canonical ensures that
- * vmentry does not fail on Intel after writing a non-canonical
- * value, and that something deterministic happens if the guest
- * invokes 64-bit SYSENTER.
- */
- data = __canonical_address(data, max_host_virt_addr_bits());
- break;
- case MSR_TSC_AUX:
- if (!kvm_is_supported_user_return_msr(MSR_TSC_AUX))
- return 1;
-
- if (!host_initiated &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
- return 1;
-
- /*
- * Per Intel's SDM, bits 63:32 are reserved, but AMD's APM has
- * incomplete and conflicting architectural behavior. Current
- * AMD CPUs completely ignore bits 63:32, i.e. they aren't
- * reserved and always read as zeros. Enforce Intel's reserved
- * bits check if the guest CPU is Intel compatible, otherwise
- * clear the bits. This ensures cross-vendor migration will
- * provide consistent behavior for the guest.
- */
- if (guest_cpuid_is_intel_compatible(vcpu) && (data >> 32) != 0)
- return 1;
-
- data = (u32)data;
- break;
- case MSR_IA32_U_CET:
- case MSR_IA32_S_CET:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
- return KVM_MSR_RET_UNSUPPORTED;
- if (!kvm_is_valid_u_s_cet(vcpu, data))
- return 1;
- break;
- case MSR_KVM_INTERNAL_GUEST_SSP:
- if (!host_initiated)
- return 1;
- fallthrough;
- /*
- * Note that the MSR emulation here is flawed when a vCPU
- * doesn't support the Intel 64 architecture. The expected
- * architectural behavior in this case is that the upper 32
- * bits do not exist and should always read '0'. However,
- * because the actual hardware on which the virtual CPU is
- * running does support Intel 64, XRSTORS/XSAVES in the
- * guest could observe behavior that violates the
- * architecture. Intercepting XRSTORS/XSAVES for this
- * special case isn't deemed worthwhile.
- */
- case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
- return KVM_MSR_RET_UNSUPPORTED;
- /*
- * MSR_IA32_INT_SSP_TAB is not present on processors that do
- * not support Intel 64 architecture.
- */
- if (index == MSR_IA32_INT_SSP_TAB && !guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
- return KVM_MSR_RET_UNSUPPORTED;
- if (is_noncanonical_msr_address(data, vcpu))
- return 1;
- /* All SSP MSRs except MSR_IA32_INT_SSP_TAB must be 4-byte aligned */
- if (index != MSR_IA32_INT_SSP_TAB && !IS_ALIGNED(data, 4))
- return 1;
- break;
- }
-
- msr.data = data;
- msr.index = index;
- msr.host_initiated = host_initiated;
-
- return kvm_x86_call(set_msr)(vcpu, &msr);
-}
-
-static int _kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
- bool host_initiated)
-{
- return __kvm_set_msr(vcpu, index, *data, host_initiated);
-}
-
-static int kvm_set_msr_ignored_check(struct kvm_vcpu *vcpu,
- u32 index, u64 data, bool host_initiated)
-{
- return kvm_do_msr_access(vcpu, index, &data, host_initiated, MSR_TYPE_W,
- _kvm_set_msr);
-}
-
-/*
- * Read the MSR specified by @index into @data. Select MSR specific fault
- * checks are bypassed if @host_initiated is %true.
- * Returns 0 on success, non-0 otherwise.
- * Assumes vcpu_load() was already called.
- */
-static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
- bool host_initiated)
-{
- struct msr_data msr;
- int ret;
-
- switch (index) {
- case MSR_TSC_AUX:
- if (!kvm_is_supported_user_return_msr(MSR_TSC_AUX))
- return 1;
-
- if (!host_initiated &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
- return 1;
- break;
- case MSR_IA32_U_CET:
- case MSR_IA32_S_CET:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
- return KVM_MSR_RET_UNSUPPORTED;
- break;
- case MSR_KVM_INTERNAL_GUEST_SSP:
- if (!host_initiated)
- return 1;
- fallthrough;
- case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
- return KVM_MSR_RET_UNSUPPORTED;
- break;
- }
-
- msr.index = index;
- msr.host_initiated = host_initiated;
-
- ret = kvm_x86_call(get_msr)(vcpu, &msr);
- if (!ret)
- *data = msr.data;
- return ret;
-}
-
-int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data)
-{
- return __kvm_set_msr(vcpu, index, data, true);
-}
-
-int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
-{
- return __kvm_get_msr(vcpu, index, data, true);
-}
-
-static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu,
- u32 index, u64 *data, bool host_initiated)
-{
- return kvm_do_msr_access(vcpu, index, data, host_initiated, MSR_TYPE_R,
- __kvm_get_msr);
-}
-
-int __kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
-{
- return kvm_get_msr_ignored_check(vcpu, index, data, false);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_emulate_msr_read);
-
-int __kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data)
-{
- return kvm_set_msr_ignored_check(vcpu, index, data, false);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_emulate_msr_write);
-
-int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
-{
- if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ))
- return KVM_MSR_RET_FILTERED;
-
- return __kvm_emulate_msr_read(vcpu, index, data);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_msr_read);
-
-int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data)
-{
- if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE))
- return KVM_MSR_RET_FILTERED;
-
- return __kvm_emulate_msr_write(vcpu, index, data);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_msr_write);
-
-
-static void complete_userspace_rdmsr(struct kvm_vcpu *vcpu)
-{
- if (!vcpu->run->msr.error) {
- kvm_eax_write(vcpu, vcpu->run->msr.data);
- kvm_edx_write(vcpu, vcpu->run->msr.data >> 32);
- }
-}
-
-static int complete_emulated_msr_access(struct kvm_vcpu *vcpu)
-{
- return complete_emulated_insn_gp(vcpu, vcpu->run->msr.error);
-}
-
-static int complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
-{
- complete_userspace_rdmsr(vcpu);
- return complete_emulated_msr_access(vcpu);
-}
-
-static int complete_fast_msr_access(struct kvm_vcpu *vcpu)
-{
- return kvm_x86_call(complete_emulated_msr)(vcpu, vcpu->run->msr.error);
-}
-
-static int complete_fast_rdmsr(struct kvm_vcpu *vcpu)
-{
- complete_userspace_rdmsr(vcpu);
- return complete_fast_msr_access(vcpu);
-}
-
-static int complete_fast_rdmsr_imm(struct kvm_vcpu *vcpu)
-{
- if (!vcpu->run->msr.error)
- kvm_register_write(vcpu, vcpu->arch.cui_rdmsr_imm_reg,
- vcpu->run->msr.data);
-
- return complete_fast_msr_access(vcpu);
-}
-
-static u64 kvm_msr_reason(int r)
-{
- switch (r) {
- case KVM_MSR_RET_UNSUPPORTED:
- return KVM_MSR_EXIT_REASON_UNKNOWN;
- case KVM_MSR_RET_FILTERED:
- return KVM_MSR_EXIT_REASON_FILTER;
- default:
- return KVM_MSR_EXIT_REASON_INVAL;
- }
-}
-
-static int kvm_msr_user_space(struct kvm_vcpu *vcpu, u32 index,
- u32 exit_reason, u64 data,
- int (*completion)(struct kvm_vcpu *vcpu),
- int r)
-{
- u64 msr_reason = kvm_msr_reason(r);
-
- /* Check if the user wanted to know about this MSR fault */
- if (!(vcpu->kvm->arch.user_space_msr_mask & msr_reason))
- return 0;
-
- vcpu->run->exit_reason = exit_reason;
- vcpu->run->msr.error = 0;
- memset(vcpu->run->msr.pad, 0, sizeof(vcpu->run->msr.pad));
- vcpu->run->msr.reason = msr_reason;
- vcpu->run->msr.index = index;
- vcpu->run->msr.data = data;
- vcpu->arch.complete_userspace_io = completion;
-
- return 1;
-}
-
-static int __kvm_emulate_rdmsr(struct kvm_vcpu *vcpu, u32 msr, int reg,
- int (*complete_rdmsr)(struct kvm_vcpu *))
-{
- u64 data;
- int r;
-
- r = kvm_emulate_msr_read(vcpu, msr, &data);
-
- if (!r) {
- trace_kvm_msr_read(msr, data);
-
- if (reg < 0) {
- kvm_eax_write(vcpu, data);
- kvm_edx_write(vcpu, data >> 32);
- } else {
- kvm_register_write(vcpu, reg, data);
- }
- } else {
- /* MSR read failed? See if we should ask user space */
- if (kvm_msr_user_space(vcpu, msr, KVM_EXIT_X86_RDMSR, 0,
- complete_rdmsr, r))
- return 0;
- trace_kvm_msr_read_ex(msr);
- }
-
- return kvm_x86_call(complete_emulated_msr)(vcpu, r);
-}
-
-int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
-{
- return __kvm_emulate_rdmsr(vcpu, kvm_ecx_read(vcpu), -1,
- complete_fast_rdmsr);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdmsr);
-
-int kvm_emulate_rdmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg)
-{
- vcpu->arch.cui_rdmsr_imm_reg = reg;
-
- return __kvm_emulate_rdmsr(vcpu, msr, reg, complete_fast_rdmsr_imm);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdmsr_imm);
-
-static int __kvm_emulate_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
-{
- int r;
-
- r = kvm_emulate_msr_write(vcpu, msr, data);
- if (!r) {
- trace_kvm_msr_write(msr, data);
- } else {
- /* MSR write failed? See if we should ask user space */
- if (kvm_msr_user_space(vcpu, msr, KVM_EXIT_X86_WRMSR, data,
- complete_fast_msr_access, r))
- return 0;
- /* Signal all other negative errors to userspace */
- if (r < 0)
- return r;
- trace_kvm_msr_write_ex(msr, data);
- }
-
- return kvm_x86_call(complete_emulated_msr)(vcpu, r);
-}
-
-int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
-{
- return __kvm_emulate_wrmsr(vcpu, kvm_ecx_read(vcpu),
- kvm_read_edx_eax(vcpu));
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_wrmsr);
-
-int kvm_emulate_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg)
-{
- return __kvm_emulate_wrmsr(vcpu, msr, kvm_register_read(vcpu, reg));
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_wrmsr_imm);
-
int kvm_emulate_as_nop(struct kvm_vcpu *vcpu)
{
return kvm_skip_emulated_instruction(vcpu);
@@ -1835,72 +820,6 @@ static inline bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu)
kvm_request_pending(vcpu) || xfer_to_guest_mode_work_pending();
}
-static fastpath_t __handle_fastpath_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
-{
- if (!kvm_pmu_is_fastpath_emulation_allowed(vcpu))
- return EXIT_FASTPATH_NONE;
-
- switch (msr) {
- case APIC_BASE_MSR + (APIC_ICR >> 4):
- if (!lapic_in_kernel(vcpu) || !apic_x2apic_mode(vcpu->arch.apic) ||
- kvm_x2apic_icr_write_fast(vcpu->arch.apic, data))
- return EXIT_FASTPATH_NONE;
- break;
- case MSR_IA32_TSC_DEADLINE:
- kvm_set_lapic_tscdeadline_msr(vcpu, data);
- break;
- default:
- return EXIT_FASTPATH_NONE;
- }
-
- trace_kvm_msr_write(msr, data);
-
- if (!kvm_skip_emulated_instruction(vcpu))
- return EXIT_FASTPATH_EXIT_USERSPACE;
-
- return EXIT_FASTPATH_REENTER_GUEST;
-}
-
-fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu)
-{
- return __handle_fastpath_wrmsr(vcpu, kvm_ecx_read(vcpu),
- kvm_read_edx_eax(vcpu));
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_wrmsr);
-
-fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg)
-{
- return __handle_fastpath_wrmsr(vcpu, msr, kvm_register_read(vcpu, reg));
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_wrmsr_imm);
-
-/*
- * Adapt set_msr() to msr_io()'s calling convention
- */
-static int do_get_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
-{
- return kvm_get_msr_ignored_check(vcpu, index, data, true);
-}
-
-static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
-{
- u64 val;
-
- /*
- * Reject writes to immutable feature MSRs if the vCPU model is frozen,
- * as KVM doesn't support modifying the guest vCPU model on the fly,
- * e.g. changing the VMX capabilities MSRs while L2 is active is
- * nonsensical. Allow writes of the same value, e.g. so that userspace
- * can blindly stuff all MSRs when emulating RESET.
- */
- if (!kvm_can_set_cpuid_and_feature_msrs(vcpu) &&
- kvm_is_immutable_feature_msr(index) &&
- (do_get_msr(vcpu, index, &val) || *data != val))
- return -EINVAL;
-
- return kvm_set_msr_ignored_check(vcpu, index, *data, true);
-}
-
#ifdef CONFIG_X86_64
struct pvclock_clock {
int vclock_mode;
@@ -1967,72 +886,6 @@ static s64 get_kvmclock_base_ns(void)
}
#endif
-static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int sec_hi_ofs)
-{
- int version;
- int r;
- struct pvclock_wall_clock wc;
- u32 wc_sec_hi;
- u64 wall_nsec;
-
- if (!wall_clock)
- return;
-
- r = kvm_read_guest(kvm, wall_clock, &version, sizeof(version));
- if (r)
- return;
-
- if (version & 1)
- ++version; /* first time write, random junk */
-
- ++version;
-
- if (kvm_write_guest(kvm, wall_clock, &version, sizeof(version)))
- return;
-
- wall_nsec = kvm_get_wall_clock_epoch(kvm);
-
- wc.nsec = do_div(wall_nsec, NSEC_PER_SEC);
- wc.sec = (u32)wall_nsec; /* overflow in 2106 guest time */
- wc.version = version;
-
- kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc));
-
- if (sec_hi_ofs) {
- wc_sec_hi = wall_nsec >> 32;
- kvm_write_guest(kvm, wall_clock + sec_hi_ofs,
- &wc_sec_hi, sizeof(wc_sec_hi));
- }
-
- version++;
- kvm_write_guest(kvm, wall_clock, &version, sizeof(version));
-}
-
-static void kvm_write_system_time(struct kvm_vcpu *vcpu, gpa_t system_time,
- bool old_msr, bool host_initiated)
-{
- struct kvm_arch *ka = &vcpu->kvm->arch;
-
- if (vcpu->vcpu_id == 0 && !host_initiated) {
- if (ka->boot_vcpu_runs_old_kvmclock != old_msr)
- kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
-
- ka->boot_vcpu_runs_old_kvmclock = old_msr;
- }
-
- vcpu->arch.time = system_time;
- kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
-
- /* we verify if the enable bit is set... */
- if (system_time & 1)
- kvm_gpc_activate(&vcpu->arch.pv_time, system_time & ~1ULL,
- sizeof(struct pvclock_vcpu_time_info));
- else
- kvm_gpc_deactivate(&vcpu->arch.pv_time);
-
- return;
-}
-
static uint32_t div_frac(uint32_t dividend, uint32_t divisor)
{
do_shl32_div32(dividend, divisor);
@@ -3077,151 +1930,6 @@ static void kvm_gen_kvmclock_update(struct kvm_vcpu *v)
}
}
-/* These helpers are safe iff @msr is known to be an MCx bank MSR. */
-static bool is_mci_control_msr(u32 msr)
-{
- return (msr & 3) == 0;
-}
-static bool is_mci_status_msr(u32 msr)
-{
- return (msr & 3) == 1;
-}
-
-/*
- * On AMD, HWCR[McStatusWrEn] controls whether setting MCi_STATUS results in #GP.
- */
-static bool can_set_mci_status(struct kvm_vcpu *vcpu)
-{
- /* McStatusWrEn enabled? */
- if (guest_cpuid_is_amd_compatible(vcpu))
- return !!(vcpu->arch.msr_hwcr & BIT_ULL(18));
-
- return false;
-}
-
-static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
-{
- u64 mcg_cap = vcpu->arch.mcg_cap;
- unsigned bank_num = mcg_cap & 0xff;
- u32 msr = msr_info->index;
- u64 data = msr_info->data;
- u32 offset, last_msr;
-
- switch (msr) {
- case MSR_IA32_MCG_STATUS:
- vcpu->arch.mcg_status = data;
- break;
- case MSR_IA32_MCG_CTL:
- if (!(mcg_cap & MCG_CTL_P) &&
- (data || !msr_info->host_initiated))
- return 1;
- if (data != 0 && data != ~(u64)0)
- return 1;
- vcpu->arch.mcg_ctl = data;
- break;
- case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
- last_msr = MSR_IA32_MCx_CTL2(bank_num) - 1;
- if (msr > last_msr)
- return 1;
-
- if (!(mcg_cap & MCG_CMCI_P) && (data || !msr_info->host_initiated))
- return 1;
- /* An attempt to write a 1 to a reserved bit raises #GP */
- if (data & ~(MCI_CTL2_CMCI_EN | MCI_CTL2_CMCI_THRESHOLD_MASK))
- return 1;
- offset = array_index_nospec(msr - MSR_IA32_MC0_CTL2,
- last_msr + 1 - MSR_IA32_MC0_CTL2);
- vcpu->arch.mci_ctl2_banks[offset] = data;
- break;
- case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
- last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
- if (msr > last_msr)
- return 1;
-
- /*
- * Only 0 or all 1s can be written to IA32_MCi_CTL, all other
- * values are architecturally undefined. But, some Linux
- * kernels clear bit 10 in bank 4 to workaround a BIOS/GART TLB
- * issue on AMD K8s, allow bit 10 to be clear when setting all
- * other bits in order to avoid an uncaught #GP in the guest.
- *
- * UNIXWARE clears bit 0 of MC1_CTL to ignore correctable,
- * single-bit ECC data errors.
- */
- if (is_mci_control_msr(msr) &&
- data != 0 && (data | (1 << 10) | 1) != ~(u64)0)
- return 1;
-
- /*
- * All CPUs allow writing 0 to MCi_STATUS MSRs to clear the MSR.
- * AMD-based CPUs allow non-zero values, but if and only if
- * HWCR[McStatusWrEn] is set.
- */
- if (!msr_info->host_initiated && is_mci_status_msr(msr) &&
- data != 0 && !can_set_mci_status(vcpu))
- return 1;
-
- offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
- last_msr + 1 - MSR_IA32_MC0_CTL);
- vcpu->arch.mce_banks[offset] = data;
- break;
- default:
- return 1;
- }
- return 0;
-}
-
-static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
-{
- gpa_t gpa = data & ~0x3f;
-
- /* Bits 4:5 are reserved, Should be zero */
- if (data & 0x30)
- return 1;
-
- if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_VMEXIT) &&
- (data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT))
- return 1;
-
- if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT) &&
- (data & KVM_ASYNC_PF_DELIVERY_AS_INT))
- return 1;
-
- if (!lapic_in_kernel(vcpu))
- return data ? 1 : 0;
-
- if (__kvm_pv_async_pf_enabled(data) &&
- kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa,
- sizeof(u64)))
- return 1;
-
- vcpu->arch.apf.msr_en_val = data;
-
- if (__kvm_pv_async_pf_enabled(data)) {
- kvm_async_pf_wakeup_all(vcpu);
- } else {
- kvm_clear_async_pf_completion_queue(vcpu);
- kvm_async_pf_hash_reset(vcpu);
- }
- return 0;
-}
-
-static int kvm_pv_enable_async_pf_int(struct kvm_vcpu *vcpu, u64 data)
-{
- /* Bits 8-63 are reserved */
- if (data >> 8)
- return 1;
-
- if (!lapic_in_kernel(vcpu))
- return 1;
-
- vcpu->arch.apf.msr_int_val = data;
-
- vcpu->arch.apf.vec = data & KVM_ASYNC_PF_VEC_MASK;
-
- return 0;
-}
-
static void kvmclock_reset(struct kvm_vcpu *vcpu)
{
kvm_gpc_deactivate(&vcpu->arch.pv_time);
@@ -3382,899 +2090,6 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
}
-/*
- * Returns true if the MSR in question is managed via XSTATE, i.e. is context
- * switched with the rest of guest FPU state.
- *
- * Note, S_CET is _not_ saved/restored via XSAVES/XRSTORS.
- */
-static bool is_xstate_managed_msr(struct kvm_vcpu *vcpu, u32 msr)
-{
- if (!vcpu)
- return false;
-
- switch (msr) {
- case MSR_IA32_U_CET:
- return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ||
- guest_cpu_cap_has(vcpu, X86_FEATURE_IBT);
- case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
- return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
- default:
- return false;
- }
-}
-
-/*
- * Lock (and if necessary, re-load) the guest FPU, i.e. XSTATE, and access an
- * MSR that is managed via XSTATE. Note, the caller is responsible for doing
- * the initial FPU load, this helper only ensures that guest state is resident
- * in hardware (the kernel can load its FPU state in IRQ context).
- *
- * Note, loading guest values for U_CET and PL[0-3]_SSP while executing in the
- * kernel is safe, as U_CET is specific to userspace, and PL[0-3]_SSP are only
- * consumed when transitioning to lower privilege levels, i.e. are effectively
- * only consumed by userspace as well.
- */
-static __always_inline void kvm_access_xstate_msr(struct kvm_vcpu *vcpu,
- struct msr_data *msr_info,
- int access)
-{
- BUILD_BUG_ON(access != MSR_TYPE_R && access != MSR_TYPE_W);
-
- KVM_BUG_ON(!is_xstate_managed_msr(vcpu, msr_info->index), vcpu->kvm);
- KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
-
- kvm_fpu_get();
- if (access == MSR_TYPE_R)
- rdmsrq(msr_info->index, msr_info->data);
- else
- wrmsrq(msr_info->index, msr_info->data);
- kvm_fpu_put();
-}
-
-static void kvm_set_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
-{
- kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_W);
-}
-
-static void kvm_get_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
-{
- kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_R);
-}
-
-int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
-{
- u32 msr = msr_info->index;
- u64 data = msr_info->data;
-
- /*
- * Do not allow host-initiated writes to trigger the Xen hypercall
- * page setup; it could incur locking paths which are not expected
- * if userspace sets the MSR in an unusual location.
- */
- if (kvm_xen_is_hypercall_page_msr(vcpu->kvm, msr) &&
- !msr_info->host_initiated)
- return kvm_xen_write_hypercall_page(vcpu, data);
-
- switch (msr) {
- case MSR_AMD64_NB_CFG:
- case MSR_IA32_UCODE_WRITE:
- case MSR_VM_HSAVE_PA:
- case MSR_AMD64_PATCH_LOADER:
- case MSR_AMD64_BU_CFG2:
- case MSR_AMD64_DC_CFG:
- case MSR_AMD64_TW_CFG:
- case MSR_F15H_EX_CFG:
- break;
-
- case MSR_IA32_UCODE_REV:
- if (msr_info->host_initiated)
- vcpu->arch.microcode_version = data;
- break;
- case MSR_IA32_ARCH_CAPABILITIES:
- if (!msr_info->host_initiated ||
- !guest_cpu_cap_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
- return KVM_MSR_RET_UNSUPPORTED;
- vcpu->arch.arch_capabilities = data;
- break;
- case MSR_IA32_PERF_CAPABILITIES:
- if (!msr_info->host_initiated ||
- !guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
- return KVM_MSR_RET_UNSUPPORTED;
-
- if (data & ~kvm_caps.supported_perf_cap)
- return 1;
-
- /*
- * Note, this is not just a performance optimization! KVM
- * disallows changing feature MSRs after the vCPU has run; PMU
- * refresh will bug the VM if called after the vCPU has run.
- */
- if (vcpu->arch.perf_capabilities == data)
- break;
-
- vcpu->arch.perf_capabilities = data;
- kvm_pmu_refresh(vcpu);
- kvm_make_request(KVM_REQ_RECALC_INTERCEPTS, vcpu);
- break;
- case MSR_IA32_PRED_CMD: {
- u64 reserved_bits = ~(PRED_CMD_IBPB | PRED_CMD_SBPB);
-
- if (!msr_info->host_initiated) {
- if ((!guest_has_pred_cmd_msr(vcpu)))
- return 1;
-
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL) &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_IBPB))
- reserved_bits |= PRED_CMD_IBPB;
-
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SBPB))
- reserved_bits |= PRED_CMD_SBPB;
- }
-
- if (!boot_cpu_has(X86_FEATURE_IBPB))
- reserved_bits |= PRED_CMD_IBPB;
-
- if (!boot_cpu_has(X86_FEATURE_SBPB))
- reserved_bits |= PRED_CMD_SBPB;
-
- if (data & reserved_bits)
- return 1;
-
- if (!data)
- break;
-
- wrmsrq(MSR_IA32_PRED_CMD, data);
- break;
- }
- case MSR_IA32_FLUSH_CMD:
- if (!msr_info->host_initiated &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D))
- return 1;
-
- if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D) || (data & ~L1D_FLUSH))
- return 1;
- if (!data)
- break;
-
- wrmsrq(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
- break;
- case MSR_EFER:
- return set_efer(vcpu, msr_info);
- case MSR_K7_HWCR: {
- /*
- * Allow McStatusWrEn and TscFreqSel. (Linux guests from v3.2
- * through at least v6.6 whine if TscFreqSel is clear,
- * depending on F/M/S.
- */
- u64 valid = BIT_ULL(18) | BIT_ULL(24);
-
- data &= ~(u64)0x40; /* ignore flush filter disable */
- data &= ~(u64)0x100; /* ignore ignne emulation enable */
- data &= ~(u64)0x8; /* ignore TLB cache disable */
-
- if (guest_cpu_cap_has(vcpu, X86_FEATURE_GP_ON_USER_CPUID))
- valid |= MSR_K7_HWCR_CPUID_USER_DIS;
-
- if (data & ~valid) {
- kvm_pr_unimpl_wrmsr(vcpu, msr, data);
- return 1;
- }
- vcpu->arch.msr_hwcr = data;
- break;
- }
- case MSR_FAM10H_MMIO_CONF_BASE:
- if (data != 0) {
- kvm_pr_unimpl_wrmsr(vcpu, msr, data);
- return 1;
- }
- break;
- case MSR_IA32_CR_PAT:
- if (!kvm_pat_valid(data))
- return 1;
-
- vcpu->arch.pat = data;
- break;
- case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000:
- case MSR_MTRRdefType:
- return kvm_mtrr_set_msr(vcpu, msr, data);
- case MSR_IA32_APICBASE:
- return kvm_apic_set_base(vcpu, data, msr_info->host_initiated);
- case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
- return kvm_x2apic_msr_write(vcpu, msr, data);
- case MSR_IA32_TSC_DEADLINE:
- kvm_set_lapic_tscdeadline_msr(vcpu, data);
- break;
- case MSR_IA32_TSC_ADJUST:
- if (guest_cpu_cap_has(vcpu, X86_FEATURE_TSC_ADJUST)) {
- if (!msr_info->host_initiated) {
- s64 adj = data - vcpu->arch.ia32_tsc_adjust_msr;
- adjust_tsc_offset_guest(vcpu, adj);
- /* Before back to guest, tsc_timestamp must be adjusted
- * as well, otherwise guest's percpu pvclock time could jump.
- */
- kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
- }
- vcpu->arch.ia32_tsc_adjust_msr = data;
- }
- break;
- case MSR_IA32_MISC_ENABLE: {
- u64 old_val = vcpu->arch.ia32_misc_enable_msr;
-
- if (!msr_info->host_initiated) {
- /* RO bits */
- if ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PMU_RO_MASK)
- return 1;
-
- /* R bits, i.e. writes are ignored, but don't fault. */
- data = data & ~MSR_IA32_MISC_ENABLE_EMON;
- data |= old_val & MSR_IA32_MISC_ENABLE_EMON;
- }
-
- if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
- ((old_val ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_XMM3))
- return 1;
- vcpu->arch.ia32_misc_enable_msr = data;
- vcpu->arch.cpuid_dynamic_bits_dirty = true;
- } else {
- vcpu->arch.ia32_misc_enable_msr = data;
- }
- break;
- }
- case MSR_IA32_SMBASE:
- if (!IS_ENABLED(CONFIG_KVM_SMM) || !msr_info->host_initiated)
- return 1;
- vcpu->arch.smbase = data;
- break;
- case MSR_IA32_POWER_CTL:
- vcpu->arch.msr_ia32_power_ctl = data;
- break;
- case MSR_IA32_TSC:
- if (msr_info->host_initiated) {
- kvm_synchronize_tsc(vcpu, &data);
- } else if (!vcpu->arch.guest_tsc_protected) {
- u64 adj = kvm_compute_l1_tsc_offset(vcpu, data) - vcpu->arch.l1_tsc_offset;
- adjust_tsc_offset_guest(vcpu, adj);
- vcpu->arch.ia32_tsc_adjust_msr += adj;
- }
- break;
- case MSR_IA32_XSS:
- if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
- return KVM_MSR_RET_UNSUPPORTED;
-
- if (data & ~vcpu->arch.guest_supported_xss)
- return 1;
- if (vcpu->arch.ia32_xss == data)
- break;
- vcpu->arch.ia32_xss = data;
- vcpu->arch.cpuid_dynamic_bits_dirty = true;
- break;
- case MSR_SMI_COUNT:
- if (!msr_info->host_initiated)
- return 1;
- vcpu->arch.smi_count = data;
- break;
- case MSR_KVM_WALL_CLOCK_NEW:
- if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
- return KVM_MSR_RET_UNSUPPORTED;
-
- vcpu->kvm->arch.wall_clock = data;
- kvm_write_wall_clock(vcpu->kvm, data, 0);
- break;
- case MSR_KVM_WALL_CLOCK:
- if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
- return KVM_MSR_RET_UNSUPPORTED;
-
- vcpu->kvm->arch.wall_clock = data;
- kvm_write_wall_clock(vcpu->kvm, data, 0);
- break;
- case MSR_KVM_SYSTEM_TIME_NEW:
- if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
- return KVM_MSR_RET_UNSUPPORTED;
-
- kvm_write_system_time(vcpu, data, false, msr_info->host_initiated);
- break;
- case MSR_KVM_SYSTEM_TIME:
- if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
- return KVM_MSR_RET_UNSUPPORTED;
-
- kvm_write_system_time(vcpu, data, true, msr_info->host_initiated);
- break;
- case MSR_KVM_ASYNC_PF_EN:
- if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF))
- return KVM_MSR_RET_UNSUPPORTED;
-
- if (kvm_pv_enable_async_pf(vcpu, data))
- return 1;
- break;
- case MSR_KVM_ASYNC_PF_INT:
- if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT))
- return KVM_MSR_RET_UNSUPPORTED;
-
- if (kvm_pv_enable_async_pf_int(vcpu, data))
- return 1;
- break;
- case MSR_KVM_ASYNC_PF_ACK:
- if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT))
- return KVM_MSR_RET_UNSUPPORTED;
- if (data & 0x1) {
- /*
- * Pairs with the smp_mb__after_atomic() in
- * kvm_arch_async_page_present_queued().
- */
- smp_store_mb(vcpu->arch.apf.pageready_pending, false);
-
- kvm_check_async_pf_completion(vcpu);
- }
- break;
- case MSR_KVM_STEAL_TIME:
- if (!guest_pv_has(vcpu, KVM_FEATURE_STEAL_TIME))
- return KVM_MSR_RET_UNSUPPORTED;
-
- if (unlikely(!sched_info_on()))
- return 1;
-
- if (data & KVM_STEAL_RESERVED_MASK)
- return 1;
-
- vcpu->arch.st.msr_val = data;
-
- if (!(data & KVM_MSR_ENABLED))
- break;
-
- kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
-
- break;
- case MSR_KVM_PV_EOI_EN:
- if (!guest_pv_has(vcpu, KVM_FEATURE_PV_EOI))
- return KVM_MSR_RET_UNSUPPORTED;
-
- if (kvm_lapic_set_pv_eoi(vcpu, data, sizeof(u8)))
- return 1;
- break;
-
- case MSR_KVM_POLL_CONTROL:
- if (!guest_pv_has(vcpu, KVM_FEATURE_POLL_CONTROL))
- return KVM_MSR_RET_UNSUPPORTED;
-
- /* only enable bit supported */
- if (data & (-1ULL << 1))
- return 1;
-
- vcpu->arch.msr_kvm_poll_control = data;
- break;
-
- case MSR_IA32_MCG_CTL:
- case MSR_IA32_MCG_STATUS:
- case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
- case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
- return set_msr_mce(vcpu, msr_info);
-
- case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
- case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
- case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL3:
- case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
- if (kvm_pmu_is_valid_msr(vcpu, msr))
- return kvm_pmu_set_msr(vcpu, msr_info);
-
- if (data)
- kvm_pr_unimpl_wrmsr(vcpu, msr, data);
- break;
- case MSR_K7_CLK_CTL:
- /*
- * Ignore all writes to this no longer documented MSR.
- * Writes are only relevant for old K7 processors,
- * all pre-dating SVM, but a recommended workaround from
- * AMD for these chips. It is possible to specify the
- * affected processor models on the command line, hence
- * the need to ignore the workaround.
- */
- break;
-#ifdef CONFIG_KVM_HYPERV
- case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15:
- case HV_X64_MSR_SYNDBG_CONTROL ... HV_X64_MSR_SYNDBG_PENDING_BUFFER:
- case HV_X64_MSR_SYNDBG_OPTIONS:
- case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
- case HV_X64_MSR_CRASH_CTL:
- case HV_X64_MSR_STIMER0_CONFIG ... HV_X64_MSR_STIMER3_COUNT:
- case HV_X64_MSR_REENLIGHTENMENT_CONTROL:
- case HV_X64_MSR_TSC_EMULATION_CONTROL:
- case HV_X64_MSR_TSC_EMULATION_STATUS:
- case HV_X64_MSR_TSC_INVARIANT_CONTROL:
- return kvm_hv_set_msr_common(vcpu, msr, data,
- msr_info->host_initiated);
-#endif
- case MSR_IA32_BBL_CR_CTL3:
- /* Drop writes to this legacy MSR -- see rdmsr
- * counterpart for further detail.
- */
- kvm_pr_unimpl_wrmsr(vcpu, msr, data);
- break;
- case MSR_AMD64_OSVW_ID_LENGTH:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
- return 1;
- vcpu->arch.osvw.length = data;
- break;
- case MSR_AMD64_OSVW_STATUS:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
- return 1;
- vcpu->arch.osvw.status = data;
- break;
- case MSR_PLATFORM_INFO:
- if (!msr_info->host_initiated)
- return 1;
- vcpu->arch.msr_platform_info = data;
- break;
- case MSR_MISC_FEATURES_ENABLES:
- if (data & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT ||
- (data & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT &&
- !(vcpu->arch.msr_platform_info & MSR_PLATFORM_INFO_CPUID_FAULT)))
- return 1;
- vcpu->arch.msr_misc_features_enables = data;
- break;
-#ifdef CONFIG_X86_64
- case MSR_IA32_XFD:
- if (!msr_info->host_initiated &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
- return 1;
-
- if (data & ~kvm_guest_supported_xfd(vcpu))
- return 1;
-
- fpu_update_guest_xfd(&vcpu->arch.guest_fpu, data);
- break;
- case MSR_IA32_XFD_ERR:
- if (!msr_info->host_initiated &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
- return 1;
-
- if (data & ~kvm_guest_supported_xfd(vcpu))
- return 1;
-
- vcpu->arch.guest_fpu.xfd_err = data;
- break;
-#endif
- case MSR_IA32_U_CET:
- case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
- kvm_set_xstate_msr(vcpu, msr_info);
- break;
- default:
- if (kvm_pmu_is_valid_msr(vcpu, msr))
- return kvm_pmu_set_msr(vcpu, msr_info);
-
- return KVM_MSR_RET_UNSUPPORTED;
- }
- return 0;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_msr_common);
-
-static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
-{
- u64 data;
- u64 mcg_cap = vcpu->arch.mcg_cap;
- unsigned bank_num = mcg_cap & 0xff;
- u32 offset, last_msr;
-
- switch (msr) {
- case MSR_IA32_P5_MC_ADDR:
- case MSR_IA32_P5_MC_TYPE:
- data = 0;
- break;
- case MSR_IA32_MCG_CAP:
- data = vcpu->arch.mcg_cap;
- break;
- case MSR_IA32_MCG_CTL:
- if (!(mcg_cap & MCG_CTL_P) && !host)
- return 1;
- data = vcpu->arch.mcg_ctl;
- break;
- case MSR_IA32_MCG_STATUS:
- data = vcpu->arch.mcg_status;
- break;
- case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
- last_msr = MSR_IA32_MCx_CTL2(bank_num) - 1;
- if (msr > last_msr)
- return 1;
-
- if (!(mcg_cap & MCG_CMCI_P) && !host)
- return 1;
- offset = array_index_nospec(msr - MSR_IA32_MC0_CTL2,
- last_msr + 1 - MSR_IA32_MC0_CTL2);
- data = vcpu->arch.mci_ctl2_banks[offset];
- break;
- case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
- last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
- if (msr > last_msr)
- return 1;
-
- offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
- last_msr + 1 - MSR_IA32_MC0_CTL);
- data = vcpu->arch.mce_banks[offset];
- break;
- default:
- return 1;
- }
- *pdata = data;
- return 0;
-}
-
-int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
-{
- switch (msr_info->index) {
- case MSR_IA32_PLATFORM_ID:
- case MSR_IA32_EBL_CR_POWERON:
- case MSR_IA32_LASTBRANCHFROMIP:
- case MSR_IA32_LASTBRANCHTOIP:
- case MSR_IA32_LASTINTFROMIP:
- case MSR_IA32_LASTINTTOIP:
- case MSR_AMD64_SYSCFG:
- case MSR_K8_TSEG_ADDR:
- case MSR_K8_TSEG_MASK:
- case MSR_VM_HSAVE_PA:
- case MSR_K8_INT_PENDING_MSG:
- case MSR_AMD64_NB_CFG:
- case MSR_FAM10H_MMIO_CONF_BASE:
- case MSR_AMD64_BU_CFG2:
- case MSR_IA32_PERF_CTL:
- case MSR_AMD64_DC_CFG:
- case MSR_AMD64_TW_CFG:
- case MSR_F15H_EX_CFG:
- /*
- * Intel Sandy Bridge CPUs must support the RAPL (running average power
- * limit) MSRs. Just return 0, as we do not want to expose the host
- * data here. Do not conditionalize this on CPUID, as KVM does not do
- * so for existing CPU-specific MSRs.
- */
- case MSR_RAPL_POWER_UNIT:
- case MSR_PP0_ENERGY_STATUS: /* Power plane 0 (core) */
- case MSR_PP1_ENERGY_STATUS: /* Power plane 1 (graphics uncore) */
- case MSR_PKG_ENERGY_STATUS: /* Total package */
- case MSR_DRAM_ENERGY_STATUS: /* DRAM controller */
- msr_info->data = 0;
- break;
- case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL3:
- case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
- case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
- case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
- if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
- return kvm_pmu_get_msr(vcpu, msr_info);
- msr_info->data = 0;
- break;
- case MSR_IA32_UCODE_REV:
- msr_info->data = vcpu->arch.microcode_version;
- break;
- case MSR_IA32_ARCH_CAPABILITIES:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
- return KVM_MSR_RET_UNSUPPORTED;
- msr_info->data = vcpu->arch.arch_capabilities;
- break;
- case MSR_IA32_PERF_CAPABILITIES:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
- return KVM_MSR_RET_UNSUPPORTED;
- msr_info->data = vcpu->arch.perf_capabilities;
- break;
- case MSR_IA32_POWER_CTL:
- msr_info->data = vcpu->arch.msr_ia32_power_ctl;
- break;
- case MSR_IA32_TSC: {
- /*
- * Intel SDM states that MSR_IA32_TSC read adds the TSC offset
- * even when not intercepted. AMD manual doesn't explicitly
- * state this but appears to behave the same.
- *
- * On userspace reads and writes, however, we unconditionally
- * return L1's TSC value to ensure backwards-compatible
- * behavior for migration.
- */
- u64 offset, ratio;
-
- if (msr_info->host_initiated) {
- offset = vcpu->arch.l1_tsc_offset;
- ratio = vcpu->arch.l1_tsc_scaling_ratio;
- } else {
- offset = vcpu->arch.tsc_offset;
- ratio = vcpu->arch.tsc_scaling_ratio;
- }
-
- msr_info->data = kvm_scale_tsc(rdtsc(), ratio) + offset;
- break;
- }
- case MSR_IA32_CR_PAT:
- msr_info->data = vcpu->arch.pat;
- break;
- case MSR_MTRRcap:
- case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000:
- case MSR_MTRRdefType:
- return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data);
- case 0xcd: /* fsb frequency */
- msr_info->data = 3;
- break;
- /*
- * MSR_EBC_FREQUENCY_ID
- * Conservative value valid for even the basic CPU models.
- * Models 0,1: 000 in bits 23:21 indicating a bus speed of
- * 100MHz, model 2 000 in bits 18:16 indicating 100MHz,
- * and 266MHz for model 3, or 4. Set Core Clock
- * Frequency to System Bus Frequency Ratio to 1 (bits
- * 31:24) even though these are only valid for CPU
- * models > 2, however guests may end up dividing or
- * multiplying by zero otherwise.
- */
- case MSR_EBC_FREQUENCY_ID:
- msr_info->data = 1 << 24;
- break;
- case MSR_IA32_APICBASE:
- msr_info->data = vcpu->arch.apic_base;
- break;
- case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
- return kvm_x2apic_msr_read(vcpu, msr_info->index, &msr_info->data);
- case MSR_IA32_TSC_DEADLINE:
- msr_info->data = kvm_get_lapic_tscdeadline_msr(vcpu);
- break;
- case MSR_IA32_TSC_ADJUST:
- msr_info->data = (u64)vcpu->arch.ia32_tsc_adjust_msr;
- break;
- case MSR_IA32_MISC_ENABLE:
- msr_info->data = vcpu->arch.ia32_misc_enable_msr;
- break;
- case MSR_IA32_SMBASE:
- if (!IS_ENABLED(CONFIG_KVM_SMM) || !msr_info->host_initiated)
- return 1;
- msr_info->data = vcpu->arch.smbase;
- break;
- case MSR_SMI_COUNT:
- msr_info->data = vcpu->arch.smi_count;
- break;
- case MSR_IA32_PERF_STATUS:
- /* TSC increment by tick */
- msr_info->data = 1000ULL;
- /* CPU multiplier */
- msr_info->data |= (((uint64_t)4ULL) << 40);
- break;
- case MSR_EFER:
- msr_info->data = vcpu->arch.efer;
- break;
- case MSR_KVM_WALL_CLOCK:
- if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->kvm->arch.wall_clock;
- break;
- case MSR_KVM_WALL_CLOCK_NEW:
- if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->kvm->arch.wall_clock;
- break;
- case MSR_KVM_SYSTEM_TIME:
- if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->arch.time;
- break;
- case MSR_KVM_SYSTEM_TIME_NEW:
- if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->arch.time;
- break;
- case MSR_KVM_ASYNC_PF_EN:
- if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->arch.apf.msr_en_val;
- break;
- case MSR_KVM_ASYNC_PF_INT:
- if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->arch.apf.msr_int_val;
- break;
- case MSR_KVM_ASYNC_PF_ACK:
- if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = 0;
- break;
- case MSR_KVM_STEAL_TIME:
- if (!guest_pv_has(vcpu, KVM_FEATURE_STEAL_TIME))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->arch.st.msr_val;
- break;
- case MSR_KVM_PV_EOI_EN:
- if (!guest_pv_has(vcpu, KVM_FEATURE_PV_EOI))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->arch.pv_eoi.msr_val;
- break;
- case MSR_KVM_POLL_CONTROL:
- if (!guest_pv_has(vcpu, KVM_FEATURE_POLL_CONTROL))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->arch.msr_kvm_poll_control;
- break;
- case MSR_IA32_P5_MC_ADDR:
- case MSR_IA32_P5_MC_TYPE:
- case MSR_IA32_MCG_CAP:
- case MSR_IA32_MCG_CTL:
- case MSR_IA32_MCG_STATUS:
- case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
- case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
- return get_msr_mce(vcpu, msr_info->index, &msr_info->data,
- msr_info->host_initiated);
- case MSR_IA32_XSS:
- if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
- return 1;
- msr_info->data = vcpu->arch.ia32_xss;
- break;
- case MSR_K7_CLK_CTL:
- /*
- * Provide expected ramp-up count for K7. All other
- * are set to zero, indicating minimum divisors for
- * every field.
- *
- * This prevents guest kernels on AMD host with CPU
- * type 6, model 8 and higher from exploding due to
- * the rdmsr failing.
- */
- msr_info->data = 0x20000000;
- break;
-#ifdef CONFIG_KVM_HYPERV
- case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15:
- case HV_X64_MSR_SYNDBG_CONTROL ... HV_X64_MSR_SYNDBG_PENDING_BUFFER:
- case HV_X64_MSR_SYNDBG_OPTIONS:
- case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
- case HV_X64_MSR_CRASH_CTL:
- case HV_X64_MSR_STIMER0_CONFIG ... HV_X64_MSR_STIMER3_COUNT:
- case HV_X64_MSR_REENLIGHTENMENT_CONTROL:
- case HV_X64_MSR_TSC_EMULATION_CONTROL:
- case HV_X64_MSR_TSC_EMULATION_STATUS:
- case HV_X64_MSR_TSC_INVARIANT_CONTROL:
- return kvm_hv_get_msr_common(vcpu,
- msr_info->index, &msr_info->data,
- msr_info->host_initiated);
-#endif
- case MSR_IA32_BBL_CR_CTL3:
- /* This legacy MSR exists but isn't fully documented in current
- * silicon. It is however accessed by winxp in very narrow
- * scenarios where it sets bit #19, itself documented as
- * a "reserved" bit. Best effort attempt to source coherent
- * read data here should the balance of the register be
- * interpreted by the guest:
- *
- * L2 cache control register 3: 64GB range, 256KB size,
- * enabled, latency 0x1, configured
- */
- msr_info->data = 0xbe702111;
- break;
- case MSR_AMD64_OSVW_ID_LENGTH:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
- return 1;
- msr_info->data = vcpu->arch.osvw.length;
- break;
- case MSR_AMD64_OSVW_STATUS:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
- return 1;
- msr_info->data = vcpu->arch.osvw.status;
- break;
- case MSR_PLATFORM_INFO:
- if (!msr_info->host_initiated &&
- !vcpu->kvm->arch.guest_can_read_msr_platform_info)
- return 1;
- msr_info->data = vcpu->arch.msr_platform_info;
- break;
- case MSR_MISC_FEATURES_ENABLES:
- msr_info->data = vcpu->arch.msr_misc_features_enables;
- break;
- case MSR_K7_HWCR:
- msr_info->data = vcpu->arch.msr_hwcr;
- break;
-#ifdef CONFIG_X86_64
- case MSR_IA32_XFD:
- if (!msr_info->host_initiated &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
- return 1;
-
- msr_info->data = vcpu->arch.guest_fpu.fpstate->xfd;
- break;
- case MSR_IA32_XFD_ERR:
- if (!msr_info->host_initiated &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
- return 1;
-
- msr_info->data = vcpu->arch.guest_fpu.xfd_err;
- break;
-#endif
- case MSR_IA32_U_CET:
- case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
- kvm_get_xstate_msr(vcpu, msr_info);
- break;
- default:
- if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
- return kvm_pmu_get_msr(vcpu, msr_info);
-
- return KVM_MSR_RET_UNSUPPORTED;
- }
- return 0;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_msr_common);
-
-/*
- * Read or write a bunch of msrs. All parameters are kernel addresses.
- *
- * @return number of msrs set successfully.
- */
-static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
- struct kvm_msr_entry *entries,
- int (*do_msr)(struct kvm_vcpu *vcpu,
- unsigned index, u64 *data))
-{
- bool fpu_loaded = false;
- int i;
-
- for (i = 0; i < msrs->nmsrs; ++i) {
- /*
- * If userspace is accessing one or more XSTATE-managed MSRs,
- * temporarily load the guest's FPU state so that the guest's
- * MSR value(s) is resident in hardware and thus can be accessed
- * via RDMSR/WRMSR.
- */
- if (!fpu_loaded && is_xstate_managed_msr(vcpu, entries[i].index)) {
- kvm_load_guest_fpu(vcpu);
- fpu_loaded = true;
- }
- if (do_msr(vcpu, entries[i].index, &entries[i].data))
- break;
- }
- if (fpu_loaded)
- kvm_put_guest_fpu(vcpu);
-
- return i;
-}
-
-/*
- * Read or write a bunch of msrs. Parameters are user addresses.
- *
- * @return number of msrs set successfully.
- */
-static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
- int (*do_msr)(struct kvm_vcpu *vcpu,
- unsigned index, u64 *data),
- int writeback)
-{
- struct kvm_msrs msrs;
- struct kvm_msr_entry *entries;
- unsigned size;
- int r;
-
- r = -EFAULT;
- if (copy_from_user(&msrs, user_msrs, sizeof(msrs)))
- goto out;
-
- r = -E2BIG;
- if (msrs.nmsrs >= MAX_IO_MSRS)
- goto out;
-
- size = sizeof(struct kvm_msr_entry) * msrs.nmsrs;
- entries = memdup_user(user_msrs->entries, size);
- if (IS_ERR(entries)) {
- r = PTR_ERR(entries);
- goto out;
- }
-
- r = __msr_io(vcpu, &msrs, entries, do_msr);
-
- if (writeback && copy_to_user(user_msrs->entries, entries, size))
- r = -EFAULT;
-
- kfree(entries);
-out:
- return r;
-}
-
static inline bool kvm_can_mwait_in_guest(void)
{
return boot_cpu_has(X86_FEATURE_MWAIT) &&
@@ -4586,61 +2401,6 @@ static int kvm_x86_dev_has_attr(struct kvm_device_attr *attr)
return __kvm_x86_dev_get_attr(attr, &val);
}
-static int kvm_get_msr_index_list(struct kvm_msr_list __user *user_msr_list)
-{
- struct kvm_msr_list msr_list;
- unsigned int n;
-
- if (copy_from_user(&msr_list, user_msr_list, sizeof(msr_list)))
- return -EFAULT;
-
- n = msr_list.nmsrs;
- msr_list.nmsrs = num_msrs_to_save + num_emulated_msrs;
- if (copy_to_user(user_msr_list, &msr_list, sizeof(msr_list)))
- return -EFAULT;
-
- if (n < msr_list.nmsrs)
- return -E2BIG;
-
- if (copy_to_user(user_msr_list->indices, &msrs_to_save,
- num_msrs_to_save * sizeof(u32)))
- return -EFAULT;
-
- if (copy_to_user(user_msr_list->indices + num_msrs_to_save,
- &emulated_msrs, num_emulated_msrs * sizeof(u32)))
- return -EFAULT;
-
- return 0;
-}
-
-static int kvm_get_feature_msr_index_list(struct kvm_msr_list __user *user_msr_list)
-{
- struct kvm_msr_list msr_list;
- unsigned int n;
-
- if (copy_from_user(&msr_list, user_msr_list, sizeof(msr_list)))
- return -EFAULT;
-
- n = msr_list.nmsrs;
- msr_list.nmsrs = num_msr_based_features;
- if (copy_to_user(user_msr_list, &msr_list, sizeof(msr_list)))
- return -EFAULT;
-
- if (n < msr_list.nmsrs)
- return -E2BIG;
-
- if (copy_to_user(user_msr_list->indices, &msr_based_features,
- num_msr_based_features * sizeof(u32)))
- return -EFAULT;
-
- return 0;
-}
-
-static int kvm_get_feature_msrs(struct kvm_msrs __user *user_msrs)
-{
- return msr_io(NULL, user_msrs, do_get_feature_msr, 1);
-}
-
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -5588,148 +3348,6 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
}
}
-struct kvm_x86_reg_id {
- __u32 index;
- __u8 type;
- __u8 rsvd1;
- __u8 rsvd2:4;
- __u8 size:4;
- __u8 x86;
-};
-
-static int kvm_translate_kvm_reg(struct kvm_vcpu *vcpu,
- struct kvm_x86_reg_id *reg)
-{
- switch (reg->index) {
- case KVM_REG_GUEST_SSP:
- /*
- * FIXME: If host-initiated accesses are ever exempted from
- * ignore_msrs (in kvm_do_msr_access()), drop this manual check
- * and rely on KVM's standard checks to reject accesses to regs
- * that don't exist.
- */
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
- return -EINVAL;
-
- reg->type = KVM_X86_REG_TYPE_MSR;
- reg->index = MSR_KVM_INTERNAL_GUEST_SSP;
- break;
- default:
- return -EINVAL;
- }
- return 0;
-}
-
-static int kvm_get_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
-{
- u64 val;
-
- if (do_get_msr(vcpu, msr, &val))
- return -EINVAL;
-
- if (put_user(val, user_val))
- return -EFAULT;
-
- return 0;
-}
-
-static int kvm_set_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
-{
- u64 val;
-
- if (get_user(val, user_val))
- return -EFAULT;
-
- if (do_set_msr(vcpu, msr, &val))
- return -EINVAL;
-
- return 0;
-}
-
-static int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
- void __user *argp)
-{
- struct kvm_one_reg one_reg;
- struct kvm_x86_reg_id *reg;
- u64 __user *user_val;
- bool load_fpu;
- int r;
-
- if (copy_from_user(&one_reg, argp, sizeof(one_reg)))
- return -EFAULT;
-
- if ((one_reg.id & KVM_REG_ARCH_MASK) != KVM_REG_X86)
- return -EINVAL;
-
- reg = (struct kvm_x86_reg_id *)&one_reg.id;
- if (reg->rsvd1 || reg->rsvd2)
- return -EINVAL;
-
- if (reg->type == KVM_X86_REG_TYPE_KVM) {
- r = kvm_translate_kvm_reg(vcpu, reg);
- if (r)
- return r;
- }
-
- if (reg->type != KVM_X86_REG_TYPE_MSR)
- return -EINVAL;
-
- if ((one_reg.id & KVM_REG_SIZE_MASK) != KVM_REG_SIZE_U64)
- return -EINVAL;
-
- guard(srcu)(&vcpu->kvm->srcu);
-
- load_fpu = is_xstate_managed_msr(vcpu, reg->index);
- if (load_fpu)
- kvm_load_guest_fpu(vcpu);
-
- user_val = u64_to_user_ptr(one_reg.addr);
- if (ioctl == KVM_GET_ONE_REG)
- r = kvm_get_one_msr(vcpu, reg->index, user_val);
- else
- r = kvm_set_one_msr(vcpu, reg->index, user_val);
-
- if (load_fpu)
- kvm_put_guest_fpu(vcpu);
- return r;
-}
-
-static int kvm_get_reg_list(struct kvm_vcpu *vcpu,
- struct kvm_reg_list __user *user_list)
-{
- u64 nr_regs = guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ? 1 : 0;
- u64 user_nr_regs;
-
- if (get_user(user_nr_regs, &user_list->n))
- return -EFAULT;
-
- if (put_user(nr_regs, &user_list->n))
- return -EFAULT;
-
- if (user_nr_regs < nr_regs)
- return -E2BIG;
-
- if (nr_regs &&
- put_user(KVM_X86_REG_KVM(KVM_REG_GUEST_SSP), &user_list->reg[0]))
- return -EFAULT;
-
- return 0;
-}
-
-static int kvm_get_msrs(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs)
-{
- guard(srcu)(&vcpu->kvm->srcu);
-
- return msr_io(vcpu, user_msrs, do_get_msr, 1);
-}
-
-static int kvm_set_msrs(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs)
-{
- guard(srcu)(&vcpu->kvm->srcu);
-
- return msr_io(vcpu, user_msrs, do_set_msr, 0);
-}
-
long kvm_arch_vcpu_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -6532,113 +4150,6 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
return r;
}
-static struct kvm_x86_msr_filter *kvm_alloc_msr_filter(bool default_allow)
-{
- struct kvm_x86_msr_filter *msr_filter;
-
- msr_filter = kzalloc_obj(*msr_filter, GFP_KERNEL_ACCOUNT);
- if (!msr_filter)
- return NULL;
-
- msr_filter->default_allow = default_allow;
- return msr_filter;
-}
-
-static void kvm_free_msr_filter(struct kvm_x86_msr_filter *msr_filter)
-{
- u32 i;
-
- if (!msr_filter)
- return;
-
- for (i = 0; i < msr_filter->count; i++)
- kfree(msr_filter->ranges[i].bitmap);
-
- kfree(msr_filter);
-}
-
-static int kvm_add_msr_filter(struct kvm_x86_msr_filter *msr_filter,
- struct kvm_msr_filter_range *user_range)
-{
- unsigned long *bitmap;
- size_t bitmap_size;
-
- if (!user_range->nmsrs)
- return 0;
-
- if (user_range->flags & ~KVM_MSR_FILTER_RANGE_VALID_MASK)
- return -EINVAL;
-
- if (!user_range->flags)
- return -EINVAL;
-
- bitmap_size = BITS_TO_LONGS(user_range->nmsrs) * sizeof(long);
- if (!bitmap_size || bitmap_size > KVM_MSR_FILTER_MAX_BITMAP_SIZE)
- return -EINVAL;
-
- bitmap = memdup_user((__user u8*)user_range->bitmap, bitmap_size);
- if (IS_ERR(bitmap))
- return PTR_ERR(bitmap);
-
- msr_filter->ranges[msr_filter->count] = (struct msr_bitmap_range) {
- .flags = user_range->flags,
- .base = user_range->base,
- .nmsrs = user_range->nmsrs,
- .bitmap = bitmap,
- };
-
- msr_filter->count++;
- return 0;
-}
-
-static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm,
- struct kvm_msr_filter *filter)
-{
- struct kvm_x86_msr_filter *new_filter, *old_filter;
- bool default_allow;
- bool empty = true;
- int r;
- u32 i;
-
- if (filter->flags & ~KVM_MSR_FILTER_VALID_MASK)
- return -EINVAL;
-
- for (i = 0; i < ARRAY_SIZE(filter->ranges); i++)
- empty &= !filter->ranges[i].nmsrs;
-
- default_allow = !(filter->flags & KVM_MSR_FILTER_DEFAULT_DENY);
- if (empty && !default_allow)
- return -EINVAL;
-
- new_filter = kvm_alloc_msr_filter(default_allow);
- if (!new_filter)
- return -ENOMEM;
-
- for (i = 0; i < ARRAY_SIZE(filter->ranges); i++) {
- r = kvm_add_msr_filter(new_filter, &filter->ranges[i]);
- if (r) {
- kvm_free_msr_filter(new_filter);
- return r;
- }
- }
-
- mutex_lock(&kvm->lock);
- old_filter = rcu_replace_pointer(kvm->arch.msr_filter, new_filter,
- mutex_is_locked(&kvm->lock));
- mutex_unlock(&kvm->lock);
- synchronize_srcu(&kvm->srcu);
-
- kvm_free_msr_filter(old_filter);
-
- /*
- * Recalc MSR intercepts as userspace may want to intercept accesses to
- * MSRs that KVM would otherwise pass through to the guest.
- */
- kvm_make_all_cpus_request(kvm, KVM_REQ_RECALC_INTERCEPTS);
-
- return 0;
-}
-
#ifdef CONFIG_KVM_COMPAT
/* for KVM_X86_SET_MSR_FILTER */
struct kvm_msr_filter_range_compat {
@@ -7159,157 +4670,6 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
return r;
}
-static void kvm_probe_feature_msr(u32 msr_index)
-{
- u64 data;
-
- if (kvm_get_feature_msr(NULL, msr_index, &data, true))
- return;
-
- msr_based_features[num_msr_based_features++] = msr_index;
-}
-
-static void kvm_probe_msr_to_save(u32 msr_index)
-{
- u32 dummy[2];
-
- if (rdmsr_safe(msr_index, &dummy[0], &dummy[1]))
- return;
-
- /*
- * Even MSRs that are valid in the host may not be exposed to guests in
- * some cases.
- */
- switch (msr_index) {
- case MSR_IA32_BNDCFGS:
- if (!kvm_mpx_supported())
- return;
- break;
- case MSR_TSC_AUX:
- if (!kvm_cpu_cap_has(X86_FEATURE_RDTSCP) &&
- !kvm_cpu_cap_has(X86_FEATURE_RDPID))
- return;
- break;
- case MSR_IA32_UMWAIT_CONTROL:
- if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG))
- return;
- break;
- case MSR_IA32_RTIT_CTL:
- case MSR_IA32_RTIT_STATUS:
- if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT))
- return;
- break;
- case MSR_IA32_RTIT_CR3_MATCH:
- if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT) ||
- !intel_pt_validate_hw_cap(PT_CAP_cr3_filtering))
- return;
- break;
- case MSR_IA32_RTIT_OUTPUT_BASE:
- case MSR_IA32_RTIT_OUTPUT_MASK:
- if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT) ||
- (!intel_pt_validate_hw_cap(PT_CAP_topa_output) &&
- !intel_pt_validate_hw_cap(PT_CAP_single_range_output)))
- return;
- break;
- case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
- if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT) ||
- (msr_index - MSR_IA32_RTIT_ADDR0_A >=
- intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2))
- return;
- break;
- case MSR_ARCH_PERFMON_PERFCTR0 ...
- MSR_ARCH_PERFMON_PERFCTR0 + KVM_MAX_NR_GP_COUNTERS - 1:
- if (msr_index - MSR_ARCH_PERFMON_PERFCTR0 >=
- kvm_pmu_cap.num_counters_gp)
- return;
- break;
- case MSR_ARCH_PERFMON_EVENTSEL0 ...
- MSR_ARCH_PERFMON_EVENTSEL0 + KVM_MAX_NR_GP_COUNTERS - 1:
- if (msr_index - MSR_ARCH_PERFMON_EVENTSEL0 >=
- kvm_pmu_cap.num_counters_gp)
- return;
- break;
- case MSR_ARCH_PERFMON_FIXED_CTR0 ...
- MSR_ARCH_PERFMON_FIXED_CTR0 + KVM_MAX_NR_FIXED_COUNTERS - 1:
- if (msr_index - MSR_ARCH_PERFMON_FIXED_CTR0 >=
- kvm_pmu_cap.num_counters_fixed)
- return;
- break;
- case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
- case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
- case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
- case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET:
- if (!kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2))
- return;
- break;
- case MSR_IA32_XFD:
- case MSR_IA32_XFD_ERR:
- if (!kvm_cpu_cap_has(X86_FEATURE_XFD))
- return;
- break;
- case MSR_IA32_TSX_CTRL:
- if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
- return;
- break;
- case MSR_IA32_XSS:
- if (!kvm_caps.supported_xss)
- return;
- break;
- case MSR_IA32_U_CET:
- case MSR_IA32_S_CET:
- if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
- !kvm_cpu_cap_has(X86_FEATURE_IBT))
- return;
- break;
- case MSR_IA32_INT_SSP_TAB:
- if (!kvm_cpu_cap_has(X86_FEATURE_LM))
- return;
- fallthrough;
- case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
- if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
- return;
- break;
- default:
- break;
- }
-
- msrs_to_save[num_msrs_to_save++] = msr_index;
-}
-
-static void kvm_init_msr_lists(void)
-{
- unsigned i;
-
- BUILD_BUG_ON_MSG(KVM_MAX_NR_FIXED_COUNTERS != 3,
- "Please update the fixed PMCs in msrs_to_save_pmu[]");
-
- num_msrs_to_save = 0;
- num_emulated_msrs = 0;
- num_msr_based_features = 0;
-
- for (i = 0; i < ARRAY_SIZE(msrs_to_save_base); i++)
- kvm_probe_msr_to_save(msrs_to_save_base[i]);
-
- if (enable_pmu) {
- for (i = 0; i < ARRAY_SIZE(msrs_to_save_pmu); i++)
- kvm_probe_msr_to_save(msrs_to_save_pmu[i]);
- }
-
- for (i = 0; i < ARRAY_SIZE(emulated_msrs_all); i++) {
- if (!kvm_x86_call(has_emulated_msr)(NULL,
- emulated_msrs_all[i]))
- continue;
-
- emulated_msrs[num_emulated_msrs++] = emulated_msrs_all[i];
- }
-
- for (i = KVM_FIRST_EMULATED_VMX_MSR; i <= KVM_LAST_EMULATED_VMX_MSR; i++)
- kvm_probe_feature_msr(i);
-
- for (i = 0; i < ARRAY_SIZE(msr_based_features_all_except_vmx); i++)
- kvm_probe_feature_msr(msr_based_features_all_except_vmx[i]);
-}
-
static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len,
void *__v)
{
@@ -8247,61 +5607,22 @@ static int emulator_get_msr_with_filter(struct x86_emulate_ctxt *ctxt,
u32 msr_index, u64 *pdata)
{
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
- int r;
- r = kvm_emulate_msr_read(vcpu, msr_index, pdata);
- if (r < 0)
- return X86EMUL_UNHANDLEABLE;
-
- if (r) {
- if (kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_RDMSR, 0,
- complete_emulated_rdmsr, r))
- return X86EMUL_IO_NEEDED;
-
- trace_kvm_msr_read_ex(msr_index);
- return X86EMUL_PROPAGATE_FAULT;
- }
-
- trace_kvm_msr_read(msr_index, *pdata);
- return X86EMUL_CONTINUE;
+ return kvm_emulator_get_msr_with_filter(vcpu, msr_index, pdata);
}
static int emulator_set_msr_with_filter(struct x86_emulate_ctxt *ctxt,
u32 msr_index, u64 data)
{
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
- int r;
- r = kvm_emulate_msr_write(vcpu, msr_index, data);
- if (r < 0)
- return X86EMUL_UNHANDLEABLE;
-
- if (r) {
- if (kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_WRMSR, data,
- complete_emulated_msr_access, r))
- return X86EMUL_IO_NEEDED;
-
- trace_kvm_msr_write_ex(msr_index, data);
- return X86EMUL_PROPAGATE_FAULT;
- }
-
- trace_kvm_msr_write(msr_index, data);
- return X86EMUL_CONTINUE;
+ return kvm_emulator_set_msr_with_filter(vcpu, msr_index, data);
}
static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,
u32 msr_index, u64 *pdata)
{
- /*
- * Treat emulator accesses to the current shadow stack pointer as host-
- * initiated, as they aren't true MSR accesses (SSP is a "just a reg"),
- * and this API is used only for implicit accesses, i.e. not RDMSR, and
- * so the index is fully KVM-controlled.
- */
- if (unlikely(msr_index == MSR_KVM_INTERNAL_GUEST_SSP))
- return kvm_msr_read(emul_to_vcpu(ctxt), msr_index, pdata);
-
- return __kvm_emulate_msr_read(emul_to_vcpu(ctxt), msr_index, pdata);
+ return kvm_emulator_get_msr(emul_to_vcpu(ctxt), msr_index, pdata);
}
static int emulator_check_rdpmc_early(struct x86_emulate_ctxt *ctxt, u32 pmc)
@@ -13250,32 +10571,6 @@ void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
#endif
#endif
-int kvm_spec_ctrl_test_value(u64 value)
-{
- /*
- * test that setting IA32_SPEC_CTRL to given value
- * is allowed by the host processor
- */
-
- u64 saved_value;
- unsigned long flags;
- int ret = 0;
-
- local_irq_save(flags);
-
- if (rdmsrq_safe(MSR_IA32_SPEC_CTRL, &saved_value))
- ret = 1;
- else if (wrmsrq_safe(MSR_IA32_SPEC_CTRL, value))
- ret = 1;
- else
- wrmsrq(MSR_IA32_SPEC_CTRL, saved_value);
-
- local_irq_restore(flags);
-
- return ret;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_spec_ctrl_test_value);
-
void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code)
{
struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 31e67b060148..fd3d0a196526 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -6,6 +6,7 @@
#include <asm/fpu/xstate.h>
#include <asm/mce.h>
#include <asm/pvclock.h>
+#include "msrs.h"
#include "mmu.h"
#include "regs.h"
#include "kvm_emulate.h"
@@ -45,14 +46,6 @@ do { \
failed; \
})
-/*
- * The first...last VMX feature MSRs that are emulated by KVM. This may or may
- * not cover all known VMX MSRs, as KVM doesn't emulate an MSR until there's an
- * associated feature that KVM supports for nested virtualization.
- */
-#define KVM_FIRST_EMULATED_VMX_MSR MSR_IA32_VMX_BASIC
-#define KVM_LAST_EMULATED_VMX_MSR MSR_IA32_VMX_VMFUNC
-
#define KVM_DEFAULT_PLE_GAP 128
#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
#define KVM_DEFAULT_PLE_WINDOW_GROW 2
@@ -61,16 +54,6 @@ do { \
#define KVM_SVM_DEFAULT_PLE_WINDOW_MAX USHRT_MAX
#define KVM_SVM_DEFAULT_PLE_WINDOW 3000
-/*
- * KVM's internal, non-ABI indices for synthetic MSRs. The values themselves
- * are arbitrary and have no meaning, the only requirement is that they don't
- * conflict with "real" MSRs that KVM supports. Use values at the upper end
- * of KVM's reserved paravirtual MSR range to minimize churn, i.e. these values
- * will be usable until KVM exhausts its supply of paravirtual MSR indices.
- */
-
-#define MSR_KVM_INTERNAL_GUEST_SSP 0x4b564dff
-
static inline unsigned int __grow_ple_window(unsigned int val,
unsigned int base, unsigned int modifier, unsigned int max)
{
@@ -101,9 +84,6 @@ static inline unsigned int __shrink_ple_window(unsigned int val,
return max(val, min);
}
-#define MSR_IA32_CR_PAT_DEFAULT \
- PAT_VALUE(WB, WT, UC_MINUS, UC, WB, WT, UC_MINUS, UC)
-
void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu);
int kvm_check_nested_events(struct kvm_vcpu *vcpu);
@@ -378,15 +358,12 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu,
struct kvm_queued_exception *ex);
void kvm_handle_exception_payload_quirk(struct kvm_vcpu *vcpu);
-int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data);
-int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code);
int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type,
void *insn, int insn_len);
int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
int emulation_type, void *insn, int insn_len);
-fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu);
-fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
+
fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu);
fastpath_t handle_fastpath_invd(struct kvm_vcpu *vcpu);
@@ -432,20 +409,6 @@ extern bool enable_vmware_backdoor;
extern int pi_inject_timer;
-extern bool report_ignored_msrs;
-
-static inline void kvm_pr_unimpl_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
-{
- if (report_ignored_msrs)
- vcpu_unimpl(vcpu, "Unhandled WRMSR(0x%x) = 0x%llx\n", msr, data);
-}
-
-static inline void kvm_pr_unimpl_rdmsr(struct kvm_vcpu *vcpu, u32 msr)
-{
- if (report_ignored_msrs)
- vcpu_unimpl(vcpu, "Unhandled RDMSR(0x%x)\n", msr);
-}
-
static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
{
return pvclock_scale_delta(nsec, vcpu->arch.virtual_tsc_mult,
@@ -563,33 +526,10 @@ static inline void kvm_machine_check(void)
#endif
}
-int kvm_spec_ctrl_test_value(u64 value);
int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
struct x86_exception *e);
void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid);
int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva);
-bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
-
-enum kvm_msr_access {
- MSR_TYPE_R = BIT(0),
- MSR_TYPE_W = BIT(1),
- MSR_TYPE_RW = MSR_TYPE_R | MSR_TYPE_W,
-};
-
-/*
- * Internal error codes that are used to indicate that MSR emulation encountered
- * an error that should result in #GP in the guest, unless userspace handles it.
- * Note, '1', '0', and negative numbers are off limits, as they are used by KVM
- * as part of KVM's lightly documented internal KVM_RUN return codes.
- *
- * UNSUPPORTED - The MSR isn't supported, either because it is completely
- * unknown to KVM, or because the MSR should not exist according
- * to the vCPU model.
- *
- * FILTERED - Access to the MSR is denied by a userspace MSR filter.
- */
-#define KVM_MSR_RET_UNSUPPORTED 2
-#define KVM_MSR_RET_FILTERED 3
int kvm_sev_es_mmio(struct kvm_vcpu *vcpu, bool is_write, gpa_t gpa,
unsigned int bytes, void *data);
@@ -649,27 +589,4 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
-#define CET_US_RESERVED_BITS GENMASK(9, 6)
-#define CET_US_SHSTK_MASK_BITS GENMASK(1, 0)
-#define CET_US_IBT_MASK_BITS (GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
-#define CET_US_LEGACY_BITMAP_BASE(data) ((data) >> 12)
-
-static inline bool kvm_is_valid_u_s_cet(struct kvm_vcpu *vcpu, u64 data)
-{
- if (data & CET_US_RESERVED_BITS)
- return false;
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
- (data & CET_US_SHSTK_MASK_BITS))
- return false;
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) &&
- (data & CET_US_IBT_MASK_BITS))
- return false;
- if (!IS_ALIGNED(CET_US_LEGACY_BITMAP_BASE(data), 4))
- return false;
- /* IBT can be suppressed iff the TRACKER isn't WAIT_ENDBR. */
- if ((data & CET_SUPPRESS) && (data & CET_WAIT_ENDBR))
- return false;
-
- return true;
-}
#endif
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* [PATCH v3 27/40] KVM: x86: Move register helper declarations from kvm_host.h => regs.h
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (25 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 26/40] KVM: x86: Move the bulk of MSR specific code from x86.c to msrs.{c,h} Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 0:56 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 28/40] KVM: x86: Move kvm_{g,s}et_segment() to inline helpers in regs.h Sean Christopherson
` (13 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Relocate declarations of Control/Debug Register, EFLAGS and RIP helpers
from x86's kvm_host.h to regs.h, to continue trimming down kvm_host.h.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 17 -----------------
arch/x86/kvm/regs.h | 17 +++++++++++++++++
2 files changed, 17 insertions(+), 17 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 19091d89d3cc..0f4b16b26a27 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2186,8 +2186,6 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
-
/*
* EMULTYPE_NO_DECODE - Set when re-emulating an instruction (after completing
* userspace I/O) to indicate that the emulation context
@@ -2313,24 +2311,12 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
int reason, bool has_error_code, u32 error_code);
-void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0);
-void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4);
-int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
-int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
-int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
-int kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8);
-int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val);
-unsigned long kvm_get_dr(struct kvm_vcpu *vcpu, int dr);
-unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
-void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
-unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu);
-void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu);
void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
@@ -2520,9 +2506,6 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier);
u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier);
-unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
-bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
-
void kvm_make_scan_ioapic_request(struct kvm *kvm);
void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
unsigned long *vcpu_bitmap);
diff --git a/arch/x86/kvm/regs.h b/arch/x86/kvm/regs.h
index c224874bbdde..30ef08d60a74 100644
--- a/arch/x86/kvm/regs.h
+++ b/arch/x86/kvm/regs.h
@@ -16,6 +16,18 @@
static_assert(!(KVM_POSSIBLE_CR0_GUEST_BITS & X86_CR0_PDPTR_BITS));
+void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0);
+void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4);
+int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
+int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
+int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
+int kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8);
+int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val);
+unsigned long kvm_get_dr(struct kvm_vcpu *vcpu, int dr);
+unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
+void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
+int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
+
static inline bool is_long_mode(struct kvm_vcpu *vcpu)
{
#ifdef CONFIG_X86_64
@@ -425,7 +437,12 @@ static inline unsigned long kvm_get_segment_base(struct kvm_vcpu *vcpu, int seg)
return kvm_x86_call(get_segment_base)(vcpu, seg);
}
+unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
+bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
+
+unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu);
void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
+void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
void kvm_x86_vcpu_ioctl_get_sregs2(struct kvm_vcpu *vcpu,
struct kvm_sregs2 *sregs2);
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 27/40] KVM: x86: Move register helper declarations from kvm_host.h => regs.h
2026-05-29 22:22 ` [PATCH v3 27/40] KVM: x86: Move register helper declarations from kvm_host.h => regs.h Sean Christopherson
@ 2026-05-30 0:56 ` Yosry Ahmed
2026-06-01 14:24 ` Sean Christopherson
0 siblings, 1 reply; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 0:56 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:10PM -0700, Sean Christopherson wrote:
> Relocate declarations of Control/Debug Register, EFLAGS and RIP helpers
> from x86's kvm_host.h to regs.h, to continue trimming down kvm_host.h.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/include/asm/kvm_host.h | 17 -----------------
> arch/x86/kvm/regs.h | 17 +++++++++++++++++
> 2 files changed, 17 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 19091d89d3cc..0f4b16b26a27 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2186,8 +2186,6 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
> void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
> void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
>
> -int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
Why are pdptrs helpers bundled with register helpers?
^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v3 27/40] KVM: x86: Move register helper declarations from kvm_host.h => regs.h
2026-05-30 0:56 ` Yosry Ahmed
@ 2026-06-01 14:24 ` Sean Christopherson
2026-06-01 23:36 ` Yosry Ahmed
0 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-06-01 14:24 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Sat, May 30, 2026, Yosry Ahmed wrote:
> On Fri, May 29, 2026 at 03:22:10PM -0700, Sean Christopherson wrote:
> > Relocate declarations of Control/Debug Register, EFLAGS and RIP helpers
> > from x86's kvm_host.h to regs.h, to continue trimming down kvm_host.h.
> >
> > No functional change intended.
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > arch/x86/include/asm/kvm_host.h | 17 -----------------
> > arch/x86/kvm/regs.h | 17 +++++++++++++++++
> > 2 files changed, 17 insertions(+), 17 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 19091d89d3cc..0f4b16b26a27 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -2186,8 +2186,6 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
> > void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
> > void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
> >
> > -int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
>
> Why are pdptrs helpers bundled with register helpers?
Page Directory Pointer Table Registers
^^^^^^^^^
Or as the SDM initially describes them, "5.4.1 PDPTE Registers". When using
PAE paging (not 64-bit paging), they really are CPU registers, and their lifecycle
is tied to CR{0,3,4}. I.e. I was thinking of them as an extension of Control
Registers. I can explicitly call that out in the changelog?
^ permalink raw reply [flat|nested] 87+ messages in thread* Re: [PATCH v3 27/40] KVM: x86: Move register helper declarations from kvm_host.h => regs.h
2026-06-01 14:24 ` Sean Christopherson
@ 2026-06-01 23:36 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-06-01 23:36 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Mon, Jun 01, 2026 at 07:24:30AM -0700, Sean Christopherson wrote:
> On Sat, May 30, 2026, Yosry Ahmed wrote:
> > On Fri, May 29, 2026 at 03:22:10PM -0700, Sean Christopherson wrote:
> > > Relocate declarations of Control/Debug Register, EFLAGS and RIP helpers
> > > from x86's kvm_host.h to regs.h, to continue trimming down kvm_host.h.
> > >
> > > No functional change intended.
> > >
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > > arch/x86/include/asm/kvm_host.h | 17 -----------------
> > > arch/x86/kvm/regs.h | 17 +++++++++++++++++
> > > 2 files changed, 17 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 19091d89d3cc..0f4b16b26a27 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -2186,8 +2186,6 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
> > > void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
> > > void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
> > >
> > > -int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
> >
> > Why are pdptrs helpers bundled with register helpers?
>
> Page Directory Pointer Table Registers
> ^^^^^^^^^
LOL I assumed PDPTR is just Page Directory Pointer (i.e. PTR is
pointer). Never mind then.
>
> Or as the SDM initially describes them, "5.4.1 PDPTE Registers". When using
> PAE paging (not 64-bit paging), they really are CPU registers, and their lifecycle
> is tied to CR{0,3,4}. I.e. I was thinking of them as an extension of Control
> Registers. I can explicitly call that out in the changelog?
I don't think that's needed, it was just me being uneducated.
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 28/40] KVM: x86: Move kvm_{g,s}et_segment() to inline helpers in regs.h
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (26 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 27/40] KVM: x86: Move register helper declarations from kvm_host.h => regs.h Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 0:57 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 29/40] KVM: x86: Remove defunct kvm_load_segment_descriptor() declaration Sean Christopherson
` (12 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Define kvm_{g,s}et_segment() as inline functions in regs.h, as they are
literally one-line wrappers to invoke vendor code.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 2 --
arch/x86/kvm/regs.h | 12 ++++++++++++
arch/x86/kvm/x86.c | 12 ------------
3 files changed, 12 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0f4b16b26a27..ca2e69c80a8d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2303,8 +2303,6 @@ int kvm_emulate_halt_noskip(struct kvm_vcpu *vcpu);
int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu);
int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
-void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
-void kvm_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
diff --git a/arch/x86/kvm/regs.h b/arch/x86/kvm/regs.h
index 30ef08d60a74..7a823422d78e 100644
--- a/arch/x86/kvm/regs.h
+++ b/arch/x86/kvm/regs.h
@@ -437,6 +437,18 @@ static inline unsigned long kvm_get_segment_base(struct kvm_vcpu *vcpu, int seg)
return kvm_x86_call(get_segment_base)(vcpu, seg);
}
+static inline void kvm_set_segment(struct kvm_vcpu *vcpu,
+ struct kvm_segment *var, int seg)
+{
+ kvm_x86_call(set_segment)(vcpu, var, seg);
+}
+
+static inline void kvm_get_segment(struct kvm_vcpu *vcpu,
+ struct kvm_segment *var, int seg)
+{
+ kvm_x86_call(get_segment)(vcpu, var, seg);
+}
+
unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c648fac802f6..660ba27c76ec 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4719,18 +4719,6 @@ static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v)
return handled;
}
-void kvm_set_segment(struct kvm_vcpu *vcpu,
- struct kvm_segment *var, int seg)
-{
- kvm_x86_call(set_segment)(vcpu, var, seg);
-}
-
-void kvm_get_segment(struct kvm_vcpu *vcpu,
- struct kvm_segment *var, int seg)
-{
- kvm_x86_call(get_segment)(vcpu, var, seg);
-}
-
gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
struct x86_exception *exception)
{
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 28/40] KVM: x86: Move kvm_{g,s}et_segment() to inline helpers in regs.h
2026-05-29 22:22 ` [PATCH v3 28/40] KVM: x86: Move kvm_{g,s}et_segment() to inline helpers in regs.h Sean Christopherson
@ 2026-05-30 0:57 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 0:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:11PM -0700, Sean Christopherson wrote:
> Define kvm_{g,s}et_segment() as inline functions in regs.h, as they are
> literally one-line wrappers to invoke vendor code.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 29/40] KVM: x86: Remove defunct kvm_load_segment_descriptor() declaration.
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (27 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 28/40] KVM: x86: Move kvm_{g,s}et_segment() to inline helpers in regs.h Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 0:57 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 30/40] KVM: x86: Move MSR helper declarations from kvm_host.h => msrs.h Sean Christopherson
` (11 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Remove a dead kvm_load_segment_descriptor() declaration, no functional
change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ca2e69c80a8d..a861c0d70be0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2303,7 +2303,6 @@ int kvm_emulate_halt_noskip(struct kvm_vcpu *vcpu);
int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu);
int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
-int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 29/40] KVM: x86: Remove defunct kvm_load_segment_descriptor() declaration.
2026-05-29 22:22 ` [PATCH v3 29/40] KVM: x86: Remove defunct kvm_load_segment_descriptor() declaration Sean Christopherson
@ 2026-05-30 0:57 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 0:57 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:12PM -0700, Sean Christopherson wrote:
> Remove a dead kvm_load_segment_descriptor() declaration, no functional
> change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 30/40] KVM: x86: Move MSR helper declarations from kvm_host.h => msrs.h
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (28 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 29/40] KVM: x86: Remove defunct kvm_load_segment_descriptor() declaration Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 0:59 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 31/40] KVM: x86: Move MMU helper declarations from kvm_host.h => mmu.h Sean Christopherson
` (10 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Relocate declarations of MSR helpers (and kvm_nr_uret_msrs) from x86's
x86's kvm_host.h to msrs, to continue trimming down kvm_host.h.
Deliberately leave the funky read_msr() where it is, as it will hopefully
be removed entirely as part of a broader kernel-API cleanup.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 26 -------------------------
arch/x86/kvm/msrs.h | 34 ++++++++++++++++++++++++++++++---
2 files changed, 31 insertions(+), 29 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a861c0d70be0..1143140592df 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2094,7 +2094,6 @@ struct kvm_arch_async_pf {
u64 error_code;
};
-extern u32 __read_mostly kvm_nr_uret_msrs;
extern bool __read_mostly allow_smaller_maxphyaddr;
extern bool __read_mostly enable_apicv;
extern bool __read_mostly enable_ipiv;
@@ -2278,18 +2277,6 @@ void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu);
void kvm_prepare_event_vectoring_exit(struct kvm_vcpu *vcpu, gpa_t gpa);
void kvm_prepare_unexpected_reason_exit(struct kvm_vcpu *vcpu, u64 exit_reason);
-void kvm_enable_efer_bits(u64);
-bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
-int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
-int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);
-int __kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
-int __kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);
-int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
-int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);
-int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu);
-int kvm_emulate_rdmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
-int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu);
-int kvm_emulate_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
int kvm_emulate_as_nop(struct kvm_vcpu *vcpu);
int kvm_emulate_invd(struct kvm_vcpu *vcpu);
int kvm_emulate_mwait(struct kvm_vcpu *vcpu);
@@ -2311,9 +2298,6 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
-int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
-int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
-
int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu);
void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
@@ -2488,16 +2472,6 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
unsigned long ipi_bitmap_high, u32 min,
unsigned long icr, int op_64_bit);
-int kvm_add_user_return_msr(u32 msr);
-int kvm_find_user_return_msr(u32 msr);
-int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);
-u64 kvm_get_user_return_msr(unsigned int slot);
-
-static inline bool kvm_is_supported_user_return_msr(u32 msr)
-{
- return kvm_find_user_return_msr(msr) >= 0;
-}
-
u64 kvm_scale_tsc(u64 tsc, u64 ratio);
u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier);
diff --git a/arch/x86/kvm/msrs.h b/arch/x86/kvm/msrs.h
index c34f0411ced6..5c8362a8fd97 100644
--- a/arch/x86/kvm/msrs.h
+++ b/arch/x86/kvm/msrs.h
@@ -11,6 +11,8 @@
extern bool report_ignored_msrs;
extern bool ignore_msrs;
+extern u32 __read_mostly kvm_nr_uret_msrs;
+
static inline void kvm_pr_unimpl_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
{
if (report_ignored_msrs)
@@ -56,13 +58,39 @@ int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
int kvm_get_reg_list(struct kvm_vcpu *vcpu,
struct kvm_reg_list __user *user_list);
+void kvm_enable_efer_bits(u64);
+bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
+int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
+int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);
+int __kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
+int __kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);
+int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
+int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);
+int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu);
+int kvm_emulate_rdmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
+int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu);
+int kvm_emulate_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
+
+fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu);
+fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
+
+int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
+int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
+
+int kvm_add_user_return_msr(u32 msr);
+int kvm_find_user_return_msr(u32 msr);
+int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);
+u64 kvm_get_user_return_msr(unsigned int slot);
+
+static inline bool kvm_is_supported_user_return_msr(u32 msr)
+{
+ return kvm_find_user_return_msr(msr) >= 0;
+}
+
void kvm_user_return_msr_cpu_online(void);
void drop_user_return_notifiers(void);
void kvm_destroy_user_return_msrs(void);
-fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu);
-fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
-
int kvm_emulator_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 msr_index,
u64 *pdata);
int kvm_emulator_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 msr_index,
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 30/40] KVM: x86: Move MSR helper declarations from kvm_host.h => msrs.h
2026-05-29 22:22 ` [PATCH v3 30/40] KVM: x86: Move MSR helper declarations from kvm_host.h => msrs.h Sean Christopherson
@ 2026-05-30 0:59 ` Yosry Ahmed
2026-06-01 14:50 ` Sean Christopherson
0 siblings, 1 reply; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 0:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:13PM -0700, Sean Christopherson wrote:
> Relocate declarations of MSR helpers (and kvm_nr_uret_msrs) from x86's
> x86's kvm_host.h to msrs, to continue trimming down kvm_host.h.
>
> Deliberately leave the funky read_msr() where it is, as it will hopefully
> be removed entirely as part of a broader kernel-API cleanup.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/include/asm/kvm_host.h | 26 -------------------------
> arch/x86/kvm/msrs.h | 34 ++++++++++++++++++++++++++++++---
> 2 files changed, 31 insertions(+), 29 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index a861c0d70be0..1143140592df 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2094,7 +2094,6 @@ struct kvm_arch_async_pf {
> u64 error_code;
> };
>
> -extern u32 __read_mostly kvm_nr_uret_msrs;
> extern bool __read_mostly allow_smaller_maxphyaddr;
> extern bool __read_mostly enable_apicv;
> extern bool __read_mostly enable_ipiv;
> @@ -2278,18 +2277,6 @@ void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu);
> void kvm_prepare_event_vectoring_exit(struct kvm_vcpu *vcpu, gpa_t gpa);
> void kvm_prepare_unexpected_reason_exit(struct kvm_vcpu *vcpu, u64 exit_reason);
>
> -void kvm_enable_efer_bits(u64);
> -bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
Patch 15's changelog makes me thing EFER will belong to regs.[hc] not
msrs.[hc]. Did you change your mind?
^ permalink raw reply [flat|nested] 87+ messages in thread* Re: [PATCH v3 30/40] KVM: x86: Move MSR helper declarations from kvm_host.h => msrs.h
2026-05-30 0:59 ` Yosry Ahmed
@ 2026-06-01 14:50 ` Sean Christopherson
2026-06-01 23:38 ` Yosry Ahmed
0 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-06-01 14:50 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Sat, May 30, 2026, Yosry Ahmed wrote:
> On Fri, May 29, 2026 at 03:22:13PM -0700, Sean Christopherson wrote:
> > Relocate declarations of MSR helpers (and kvm_nr_uret_msrs) from x86's
> > x86's kvm_host.h to msrs, to continue trimming down kvm_host.h.
> >
> > Deliberately leave the funky read_msr() where it is, as it will hopefully
> > be removed entirely as part of a broader kernel-API cleanup.
> >
> > No functional change intended.
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > arch/x86/include/asm/kvm_host.h | 26 -------------------------
> > arch/x86/kvm/msrs.h | 34 ++++++++++++++++++++++++++++++---
> > 2 files changed, 31 insertions(+), 29 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index a861c0d70be0..1143140592df 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -2094,7 +2094,6 @@ struct kvm_arch_async_pf {
> > u64 error_code;
> > };
> >
> > -extern u32 __read_mostly kvm_nr_uret_msrs;
> > extern bool __read_mostly allow_smaller_maxphyaddr;
> > extern bool __read_mostly enable_apicv;
> > extern bool __read_mostly enable_ipiv;
> > @@ -2278,18 +2277,6 @@ void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu);
> > void kvm_prepare_event_vectoring_exit(struct kvm_vcpu *vcpu, gpa_t gpa);
> > void kvm_prepare_unexpected_reason_exit(struct kvm_vcpu *vcpu, u64 exit_reason);
> >
> > -void kvm_enable_efer_bits(u64);
> > -bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
>
> Patch 15's changelog makes me thing EFER will belong to regs.[hc] not
> msrs.[hc]. Did you change your mind?
Not really, patch 15's changelog was always "bad", even in the previous version,
before msrs.{c,h} was added. The bulk of EFER handling was left behind in x86.c,
only the {G,S}ET_SREGS accessor/mutator logic gets moved.
I updated patch 07's changelog between v2 and v3 to better reflect reality:
Move *very* select EFER functionality as well, but leave behind the bulk of
EFER handling and all other MSR handling.
But I missed patch 15. For patch 15, how about this?
Introduce regs.c, and move the vast majority of register specific code out
of x86.c and into regs.c. Deliberately leave behind MSR code, as KVM's MSR
support is complex enough to warrant its own compilation unit, and doesn't
have much in common with the other register code.
Note, "struct kvm_sregs" has fields for EFER and MSR_IA32_APICBASE, and so
the {G,S}ET_REGS flows technically contain a tiny amount of MSR code.
MSR_IA32_APICBASE is already managed by lapic.c, and so doesn't require a
"placement decision". As for EFER, leave all other EFER handling in x86.c
(later to be moved to msrs.c), as the primary interface to EFER,
set_efer(), is very much MSR specific, even though EFER is arguably more
of a Control Register than an MSR.
No functional change intended.
^ permalink raw reply [flat|nested] 87+ messages in thread* Re: [PATCH v3 30/40] KVM: x86: Move MSR helper declarations from kvm_host.h => msrs.h
2026-06-01 14:50 ` Sean Christopherson
@ 2026-06-01 23:38 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-06-01 23:38 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Mon, Jun 01, 2026 at 07:50:34AM -0700, Sean Christopherson wrote:
> On Sat, May 30, 2026, Yosry Ahmed wrote:
> > On Fri, May 29, 2026 at 03:22:13PM -0700, Sean Christopherson wrote:
> > > Relocate declarations of MSR helpers (and kvm_nr_uret_msrs) from x86's
> > > x86's kvm_host.h to msrs, to continue trimming down kvm_host.h.
> > >
> > > Deliberately leave the funky read_msr() where it is, as it will hopefully
> > > be removed entirely as part of a broader kernel-API cleanup.
> > >
> > > No functional change intended.
> > >
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > > arch/x86/include/asm/kvm_host.h | 26 -------------------------
> > > arch/x86/kvm/msrs.h | 34 ++++++++++++++++++++++++++++++---
> > > 2 files changed, 31 insertions(+), 29 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index a861c0d70be0..1143140592df 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -2094,7 +2094,6 @@ struct kvm_arch_async_pf {
> > > u64 error_code;
> > > };
> > >
> > > -extern u32 __read_mostly kvm_nr_uret_msrs;
> > > extern bool __read_mostly allow_smaller_maxphyaddr;
> > > extern bool __read_mostly enable_apicv;
> > > extern bool __read_mostly enable_ipiv;
> > > @@ -2278,18 +2277,6 @@ void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu);
> > > void kvm_prepare_event_vectoring_exit(struct kvm_vcpu *vcpu, gpa_t gpa);
> > > void kvm_prepare_unexpected_reason_exit(struct kvm_vcpu *vcpu, u64 exit_reason);
> > >
> > > -void kvm_enable_efer_bits(u64);
> > > -bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
> >
> > Patch 15's changelog makes me thing EFER will belong to regs.[hc] not
> > msrs.[hc]. Did you change your mind?
>
> Not really, patch 15's changelog was always "bad", even in the previous version,
> before msrs.{c,h} was added. The bulk of EFER handling was left behind in x86.c,
> only the {G,S}ET_SREGS accessor/mutator logic gets moved.
>
> I updated patch 07's changelog between v2 and v3 to better reflect reality:
>
> Move *very* select EFER functionality as well, but leave behind the bulk of
> EFER handling and all other MSR handling.
>
> But I missed patch 15. For patch 15, how about this?
LGTM.
>
> Introduce regs.c, and move the vast majority of register specific code out
> of x86.c and into regs.c. Deliberately leave behind MSR code, as KVM's MSR
> support is complex enough to warrant its own compilation unit, and doesn't
> have much in common with the other register code.
>
> Note, "struct kvm_sregs" has fields for EFER and MSR_IA32_APICBASE, and so
> the {G,S}ET_REGS flows technically contain a tiny amount of MSR code.
> MSR_IA32_APICBASE is already managed by lapic.c, and so doesn't require a
> "placement decision". As for EFER, leave all other EFER handling in x86.c
> (later to be moved to msrs.c), as the primary interface to EFER,
> set_efer(), is very much MSR specific, even though EFER is arguably more
> of a Control Register than an MSR.
>
> No functional change intended.
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 31/40] KVM: x86: Move MMU helper declarations from kvm_host.h => mmu.h
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (29 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 30/40] KVM: x86: Move MSR helper declarations from kvm_host.h => msrs.h Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 0:59 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 32/40] KVM: x86: Move LLDT assembly wrappers into VMX Sean Christopherson
` (9 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Move a pile of MMU helper declarations into mmu.h, as they are very much
KVM x86 internal APIs and not intended to be exposed to arch-neutral KVM,
and certainly not to the broader kernel.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 70 --------------------------------
arch/x86/kvm/mmu.h | 71 +++++++++++++++++++++++++++++++++
2 files changed, 71 insertions(+), 70 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1143140592df..f217403e18fc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -161,12 +161,6 @@
#define KVM_HPAGE_MASK(x) (~(KVM_HPAGE_SIZE(x) - 1))
#define KVM_PAGES_PER_HPAGE(x) (KVM_HPAGE_SIZE(x) / PAGE_SIZE)
-#define KVM_MEMSLOT_PAGES_TO_MMU_PAGES_RATIO 50
-#define KVM_MIN_ALLOC_MMU_PAGES 64UL
-#define KVM_MMU_HASH_SHIFT 12
-#define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT)
-#define KVM_MIN_FREE_MMU_PAGES 5
-#define KVM_REFILL_PAGES 25
#define KVM_MAX_CPUID_ENTRIES 256
#define KVM_NR_VAR_MTRR 8
@@ -2153,38 +2147,6 @@ enum kvm_intr_type {
((vcpu) && (vcpu)->arch.handling_intr_from_guest && \
(!!in_nmi() == ((vcpu)->arch.handling_intr_from_guest == KVM_HANDLING_NMI)))
-void __init kvm_mmu_x86_module_init(void);
-int kvm_mmu_vendor_module_init(void);
-void kvm_mmu_vendor_module_exit(void);
-
-void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
-int kvm_mmu_create(struct kvm_vcpu *vcpu);
-int kvm_mmu_init_vm(struct kvm *kvm);
-void kvm_mmu_uninit_vm(struct kvm *kvm);
-
-void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
- struct kvm_memory_slot *slot);
-
-void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu);
-void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
-void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
- const struct kvm_memory_slot *memslot,
- int start_level);
-void kvm_mmu_slot_try_split_huge_pages(struct kvm *kvm,
- const struct kvm_memory_slot *memslot,
- int target_level);
-void kvm_mmu_try_split_huge_pages(struct kvm *kvm,
- const struct kvm_memory_slot *memslot,
- u64 start, u64 end,
- int target_level);
-void kvm_mmu_recover_huge_pages(struct kvm *kvm,
- const struct kvm_memory_slot *memslot);
-void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
- const struct kvm_memory_slot *memslot);
-void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
-void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
-void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-
/*
* EMULTYPE_NO_DECODE - Set when re-emulating an instruction (after completing
* userspace I/O) to indicate that the emulation context
@@ -2334,25 +2296,6 @@ static inline int __kvm_irq_line_state(unsigned long *irq_state,
void kvm_inject_nmi(struct kvm_vcpu *vcpu);
int kvm_get_nr_pending_nmis(struct kvm_vcpu *vcpu);
-bool __kvm_mmu_unprotect_gfn_and_retry(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
- bool always_retry);
-
-static inline bool kvm_mmu_unprotect_gfn_and_retry(struct kvm_vcpu *vcpu,
- gpa_t cr2_or_gpa)
-{
- return __kvm_mmu_unprotect_gfn_and_retry(vcpu, cr2_or_gpa, false);
-}
-
-void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
- ulong roots_to_free);
-void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu);
-gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
- struct x86_exception *exception);
-gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
- struct x86_exception *exception);
-gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
- struct x86_exception *exception);
-
bool kvm_apicv_activated(struct kvm *kvm);
bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu);
void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
@@ -2385,19 +2328,6 @@ static inline void kvm_dec_apicv_irq_window_req(struct kvm *kvm)
kvm_inc_or_dec_irq_window_inhibit(kvm, false);
}
-int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
- void *insn, int insn_len);
-void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg);
-void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
-void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
- u64 addr, unsigned long roots);
-void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
-void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
-
-void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
- int tdp_max_root_level, int tdp_huge_page_level);
-
-
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
#endif
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index d30676935fff..a6b871253bd7 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -15,6 +15,13 @@ extern bool tdp_mmu_enabled;
extern bool __read_mostly enable_mmio_caching;
extern bool eager_page_split;
+#define KVM_MEMSLOT_PAGES_TO_MMU_PAGES_RATIO 50
+#define KVM_MIN_ALLOC_MMU_PAGES 64UL
+#define KVM_MMU_HASH_SHIFT 12
+#define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT)
+#define KVM_MIN_FREE_MMU_PAGES 5
+#define KVM_REFILL_PAGES 25
+
#define PT_WRITABLE_SHIFT 1
#define PT_USER_SHIFT 2
@@ -96,6 +103,38 @@ static inline bool mmu_has_mbec(struct kvm_mmu *mmu)
u8 kvm_mmu_get_max_tdp_level(void);
+void __init kvm_mmu_x86_module_init(void);
+int kvm_mmu_vendor_module_init(void);
+void kvm_mmu_vendor_module_exit(void);
+
+void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
+int kvm_mmu_create(struct kvm_vcpu *vcpu);
+int kvm_mmu_init_vm(struct kvm *kvm);
+void kvm_mmu_uninit_vm(struct kvm *kvm);
+
+void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
+ struct kvm_memory_slot *slot);
+
+void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu);
+void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
+void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
+ const struct kvm_memory_slot *memslot,
+ int start_level);
+void kvm_mmu_slot_try_split_huge_pages(struct kvm *kvm,
+ const struct kvm_memory_slot *memslot,
+ int target_level);
+void kvm_mmu_try_split_huge_pages(struct kvm *kvm,
+ const struct kvm_memory_slot *memslot,
+ u64 start, u64 end,
+ int target_level);
+void kvm_mmu_recover_huge_pages(struct kvm *kvm,
+ const struct kvm_memory_slot *memslot);
+void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
+ const struct kvm_memory_slot *memslot);
+void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
+void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+
void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value);
void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
@@ -107,6 +146,19 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr4,
void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
int huge_page_level, bool accessed_dirty,
bool mbec, gpa_t new_eptp);
+
+int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
+ void *insn, int insn_len);
+void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg);
+void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
+void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+ u64 addr, unsigned long roots);
+void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
+void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
+
+void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
+ int tdp_max_root_level, int tdp_huge_page_level);
+
bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
u64 fault_address, char *insn, int insn_len);
@@ -121,6 +173,25 @@ void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu);
void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
int bytes);
+bool __kvm_mmu_unprotect_gfn_and_retry(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
+ bool always_retry);
+
+static inline bool kvm_mmu_unprotect_gfn_and_retry(struct kvm_vcpu *vcpu,
+ gpa_t cr2_or_gpa)
+{
+ return __kvm_mmu_unprotect_gfn_and_retry(vcpu, cr2_or_gpa, false);
+}
+
+void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
+ ulong roots_to_free);
+void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu);
+gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
+ struct x86_exception *exception);
+gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
+ struct x86_exception *exception);
+gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
+ struct x86_exception *exception);
+
static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
{
if (kvm_check_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu))
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 31/40] KVM: x86: Move MMU helper declarations from kvm_host.h => mmu.h
2026-05-29 22:22 ` [PATCH v3 31/40] KVM: x86: Move MMU helper declarations from kvm_host.h => mmu.h Sean Christopherson
@ 2026-05-30 0:59 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 0:59 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:14PM -0700, Sean Christopherson wrote:
> Move a pile of MMU helper declarations into mmu.h, as they are very much
> KVM x86 internal APIs and not intended to be exposed to arch-neutral KVM,
> and certainly not to the broader kernel.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 32/40] KVM: x86: Move LLDT assembly wrappers into VMX
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (30 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 31/40] KVM: x86: Move MMU helper declarations from kvm_host.h => mmu.h Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 1:02 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 33/40] KVM: x86: Move kvm_cpu_get_apicid() from kvm_host.h => avic.c Sean Christopherson
` (8 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Move kvm_{load,read}_ldt() into vmx.c, as they are exclusively used by VMX
to save/restore host state, and have no business being globally visible.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 12 ------------
arch/x86/kvm/vmx/vmx.c | 12 ++++++++++++
2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f217403e18fc..3e0884a862d7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2334,18 +2334,6 @@ static inline void kvm_dec_apicv_irq_window_req(struct kvm *kvm)
#define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
-static inline u16 kvm_read_ldt(void)
-{
- u16 ldt;
- asm("sldt %0" : "=g"(ldt));
- return ldt;
-}
-
-static inline void kvm_load_ldt(u16 sel)
-{
- asm("lldt %0" : : "rm"(sel));
-}
-
#ifdef CONFIG_X86_64
static inline unsigned long read_msr(unsigned long msr)
{
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bb19f6df921b..c8ff787d330d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1186,6 +1186,18 @@ static void vmx_remove_autostore_msr(struct vcpu_vmx *vmx, u32 msr)
vmx_remove_auto_msr(&vmx->msr_autostore, msr, VM_EXIT_MSR_STORE_COUNT);
}
+static u16 kvm_read_ldt(void)
+{
+ u16 ldt;
+ asm("sldt %0" : "=g"(ldt));
+ return ldt;
+}
+
+static void kvm_load_ldt(u16 sel)
+{
+ asm("lldt %0" : : "rm"(sel));
+}
+
#ifdef CONFIG_X86_32
/*
* On 32-bit kernels, VM exits still load the FS and GS bases from the
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 32/40] KVM: x86: Move LLDT assembly wrappers into VMX
2026-05-29 22:22 ` [PATCH v3 32/40] KVM: x86: Move LLDT assembly wrappers into VMX Sean Christopherson
@ 2026-05-30 1:02 ` Yosry Ahmed
2026-06-01 15:17 ` Sean Christopherson
0 siblings, 1 reply; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 1:02 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:15PM -0700, Sean Christopherson wrote:
> Move kvm_{load,read}_ldt() into vmx.c, as they are exclusively used by VMX
> to save/restore host state, and have no business being globally visible.
But they are generic helpers. I agree with mvoing them out of
kvm_host.h, but maybe into another header? They don't fit in regs.h or
msrs.h, so maybe x86.h? *ducks*
^ permalink raw reply [flat|nested] 87+ messages in thread* Re: [PATCH v3 32/40] KVM: x86: Move LLDT assembly wrappers into VMX
2026-05-30 1:02 ` Yosry Ahmed
@ 2026-06-01 15:17 ` Sean Christopherson
2026-06-01 23:41 ` Yosry Ahmed
0 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-06-01 15:17 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Sat, May 30, 2026, Yosry Ahmed wrote:
> On Fri, May 29, 2026 at 03:22:15PM -0700, Sean Christopherson wrote:
> > Move kvm_{load,read}_ldt() into vmx.c, as they are exclusively used by VMX
> > to save/restore host state, and have no business being globally visible.
>
> But they are generic helpers. I agree with mvoing them out of
> kvm_host.h, but maybe into another header?
No, absolutely not. The fact that KVM has to manually save, and conditionally
restore, LDT is a quirk of the VMX architecture. If it weren't for the fact that
VMX's segment_base() also needs to query the LDT, I would just open code the asm().
If it's a naming concern, I'll happily rename them to vmx_{load,store}_ldt().
Hmm, arch/x86/include/asm/desc.h provides {load,store}_ldt(), but load_ldt()
is only available for CONFIG_PARAVIRT_XXL=n builds, and both #define unnecessarily
constrain the output to memory. Those are both fixable, e.g. load_ldt() isn't
used _anywhere_ AFAICT, so burying it under CONFIG_PARAVIRT_XXL=n is just some
weird historical quirk, and I can't show_fault_oops() would care store_ldt() were
allowed to store to a register.
So after the dust settles, it probably makes sense to improve desc.h's versions
and then use those in KVM. But I don't want to do that in this series, because
it will be challenging enough to land all of this code movement without also
having to coordinate with the tip tree.
^ permalink raw reply [flat|nested] 87+ messages in thread* Re: [PATCH v3 32/40] KVM: x86: Move LLDT assembly wrappers into VMX
2026-06-01 15:17 ` Sean Christopherson
@ 2026-06-01 23:41 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-06-01 23:41 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Mon, Jun 01, 2026 at 08:17:55AM -0700, Sean Christopherson wrote:
> On Sat, May 30, 2026, Yosry Ahmed wrote:
> > On Fri, May 29, 2026 at 03:22:15PM -0700, Sean Christopherson wrote:
> > > Move kvm_{load,read}_ldt() into vmx.c, as they are exclusively used by VMX
> > > to save/restore host state, and have no business being globally visible.
> >
> > But they are generic helpers. I agree with mvoing them out of
> > kvm_host.h, but maybe into another header?
>
> No, absolutely not. The fact that KVM has to manually save, and conditionally
> restore, LDT is a quirk of the VMX architecture. If it weren't for the fact that
> VMX's segment_base() also needs to query the LDT, I would just open code the asm().
>
> If it's a naming concern, I'll happily rename them to vmx_{load,store}_ldt().
I was actually going to suggest those exact names initially, so yeah
this looks better.
>
> Hmm, arch/x86/include/asm/desc.h provides {load,store}_ldt(), but load_ldt()
> is only available for CONFIG_PARAVIRT_XXL=n builds, and both #define unnecessarily
> constrain the output to memory. Those are both fixable, e.g. load_ldt() isn't
> used _anywhere_ AFAICT, so burying it under CONFIG_PARAVIRT_XXL=n is just some
> weird historical quirk, and I can't show_fault_oops() would care store_ldt() were
> allowed to store to a register.
>
> So after the dust settles, it probably makes sense to improve desc.h's versions
> and then use those in KVM. But I don't want to do that in this series, because
> it will be challenging enough to land all of this code movement without also
> having to coordinate with the tip tree.
Even better, makes sense for a separate cleanup. Thanks.
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 33/40] KVM: x86: Move kvm_cpu_get_apicid() from kvm_host.h => avic.c
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (31 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 32/40] KVM: x86: Move LLDT assembly wrappers into VMX Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 1:03 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 34/40] KVM: x86: Move misc "VALID MASK" defines from kvm_host.h => x86.c Sean Christopherson
` (7 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Opportunistically drop the CONFIG_X86_LOCAL_APIC=n stub, as KVM hard
depends on CONFIG_X86_LOCAL_APIC=y (the stub was there purely to deal
with kvm_host.h being included by non-KVM code).
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 10 ----------
arch/x86/kvm/svm/avic.c | 5 +++++
2 files changed, 5 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3e0884a862d7..989294c7501b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2427,16 +2427,6 @@ static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
kvm_x86_call(vcpu_unblocking)(vcpu);
}
-static inline int kvm_cpu_get_apicid(int mps_cpu)
-{
-#ifdef CONFIG_X86_LOCAL_APIC
- return default_cpu_present_to_apicid(mps_cpu);
-#else
- WARN_ON_ONCE(1);
- return BAD_APICID;
-#endif
-}
-
int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
#define KVM_CLOCK_VALID_FLAGS \
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index b7083cd692ad..9264c8ef1fa1 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -146,6 +146,11 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
svm->x2avic_msrs_intercepted = intercept;
}
+static int kvm_cpu_get_apicid(int mps_cpu)
+{
+ return default_cpu_present_to_apicid(mps_cpu);
+}
+
static u32 __avic_get_max_physical_id(struct kvm *kvm, struct kvm_vcpu *vcpu)
{
u32 arch_max;
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 33/40] KVM: x86: Move kvm_cpu_get_apicid() from kvm_host.h => avic.c
2026-05-29 22:22 ` [PATCH v3 33/40] KVM: x86: Move kvm_cpu_get_apicid() from kvm_host.h => avic.c Sean Christopherson
@ 2026-05-30 1:03 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 1:03 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:16PM -0700, Sean Christopherson wrote:
> Opportunistically drop the CONFIG_X86_LOCAL_APIC=n stub, as KVM hard
> depends on CONFIG_X86_LOCAL_APIC=y (the stub was there purely to deal
> with kvm_host.h being included by non-KVM code).
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 34/40] KVM: x86: Move misc "VALID MASK" defines from kvm_host.h => x86.c
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (32 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 33/40] KVM: x86: Move kvm_cpu_get_apicid() from kvm_host.h => avic.c Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 1:05 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 35/40] KVM: x86: Move __kvm_irq_line_state() from kvm_host.h => ioapic.h Sean Christopherson
` (6 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Move a variety of "VALID MASK" defines, e.g. that capture which flags in
a given ioctl are supported by KVM, from kvm_host.h to x86.c. The set of
valid flags/bits is very much a KVM-internal detail, as the hardcoded
defines are often massaged at runtime, i.e. *directly* using the macros
outside of KVM x86 would be actively dangerous.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 34 ---------------------------------
arch/x86/kvm/x86.c | 33 ++++++++++++++++++++++++++++++++
2 files changed, 33 insertions(+), 34 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 989294c7501b..53994b8292fc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -78,12 +78,6 @@
#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
KVM_DIRTY_LOG_INITIALLY_SET)
-#define KVM_BUS_LOCK_DETECTION_VALID_MODE (KVM_BUS_LOCK_DETECTION_OFF | \
- KVM_BUS_LOCK_DETECTION_EXIT)
-
-#define KVM_X86_NOTIFY_VMEXIT_VALID_BITS (KVM_X86_NOTIFY_VMEXIT_ENABLED | \
- KVM_X86_NOTIFY_VMEXIT_USER)
-
/* x86-specific vcpu->requests bit members */
#define KVM_REQ_MIGRATE_TIMER KVM_ARCH_REQ(0)
#define KVM_REQ_REPORT_TPR_ACCESS KVM_ARCH_REQ(1)
@@ -2429,34 +2423,6 @@ static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
-#define KVM_CLOCK_VALID_FLAGS \
- (KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)
-
-#define KVM_X86_VALID_QUIRKS \
- (KVM_X86_QUIRK_LINT0_REENABLED | \
- KVM_X86_QUIRK_CD_NW_CLEARED | \
- KVM_X86_QUIRK_LAPIC_MMIO_HOLE | \
- KVM_X86_QUIRK_OUT_7E_INC_RIP | \
- KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT | \
- KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \
- KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \
- KVM_X86_QUIRK_SLOT_ZAP_ALL | \
- KVM_X86_QUIRK_STUFF_FEATURE_MSRS | \
- KVM_X86_QUIRK_IGNORE_GUEST_PAT | \
- KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM | \
- KVM_X86_QUIRK_NESTED_SVM_SHARED_PAT)
-
-#define KVM_X86_CONDITIONAL_QUIRKS \
- (KVM_X86_QUIRK_CD_NW_CLEARED | \
- KVM_X86_QUIRK_IGNORE_GUEST_PAT)
-
-/*
- * KVM previously used a u32 field in kvm_run to indicate the hypercall was
- * initiated from long mode. KVM now sets bit 0 to indicate long mode, but the
- * remaining 31 lower bits must be 0 to preserve ABI.
- */
-#define KVM_EXIT_HYPERCALL_MBZ GENMASK_ULL(31, 1)
-
static inline bool kvm_arch_has_irq_bypass(void)
{
return enable_device_posted_irqs;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 660ba27c76ec..a854b83e9881 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -105,6 +105,12 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_host);
#define emul_to_vcpu(ctxt) \
((struct kvm_vcpu *)(ctxt)->vcpu)
+/*
+ * KVM previously used a u32 field in kvm_run to indicate the hypercall was
+ * initiated from long mode. KVM now sets bit 0 to indicate long mode, but the
+ * remaining 31 lower bits must be 0 to preserve ABI.
+ */
+#define KVM_EXIT_HYPERCALL_MBZ GENMASK_ULL(31, 1)
#define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)
#define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
@@ -114,6 +120,33 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_host);
KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST | \
KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
+#define KVM_CLOCK_VALID_FLAGS \
+ (KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)
+
+#define KVM_X86_VALID_QUIRKS \
+ (KVM_X86_QUIRK_LINT0_REENABLED | \
+ KVM_X86_QUIRK_CD_NW_CLEARED | \
+ KVM_X86_QUIRK_LAPIC_MMIO_HOLE | \
+ KVM_X86_QUIRK_OUT_7E_INC_RIP | \
+ KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT | \
+ KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \
+ KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \
+ KVM_X86_QUIRK_SLOT_ZAP_ALL | \
+ KVM_X86_QUIRK_STUFF_FEATURE_MSRS | \
+ KVM_X86_QUIRK_IGNORE_GUEST_PAT | \
+ KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM | \
+ KVM_X86_QUIRK_NESTED_SVM_SHARED_PAT)
+
+#define KVM_X86_CONDITIONAL_QUIRKS \
+ (KVM_X86_QUIRK_CD_NW_CLEARED | \
+ KVM_X86_QUIRK_IGNORE_GUEST_PAT)
+
+#define KVM_BUS_LOCK_DETECTION_VALID_MODE (KVM_BUS_LOCK_DETECTION_OFF | \
+ KVM_BUS_LOCK_DETECTION_EXIT)
+
+#define KVM_X86_NOTIFY_VMEXIT_VALID_BITS (KVM_X86_NOTIFY_VMEXIT_ENABLED | \
+ KVM_X86_NOTIFY_VMEXIT_USER)
+
static void process_nmi(struct kvm_vcpu *vcpu);
static void store_regs(struct kvm_vcpu *vcpu);
static int sync_regs(struct kvm_vcpu *vcpu);
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 34/40] KVM: x86: Move misc "VALID MASK" defines from kvm_host.h => x86.c
2026-05-29 22:22 ` [PATCH v3 34/40] KVM: x86: Move misc "VALID MASK" defines from kvm_host.h => x86.c Sean Christopherson
@ 2026-05-30 1:05 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 1:05 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:17PM -0700, Sean Christopherson wrote:
> Move a variety of "VALID MASK" defines, e.g. that capture which flags in
> a given ioctl are supported by KVM, from kvm_host.h to x86.c. The set of
> valid flags/bits is very much a KVM-internal detail, as the hardcoded
> defines are often massaged at runtime, i.e. *directly* using the macros
> outside of KVM x86 would be actively dangerous.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 35/40] KVM: x86: Move __kvm_irq_line_state() from kvm_host.h => ioapic.h
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (33 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 34/40] KVM: x86: Move misc "VALID MASK" defines from kvm_host.h => x86.c Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 1:06 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 36/40] KVM: x86: Move IRQ-related helper declarations from kvm_host.h => irq.h Sean Christopherson
` (5 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Bury __kvm_irq_line_state() in CONFIG_KVM_IOAPIC=y code, as it's only used
by PIC and I/O APIC code.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 12 ------------
arch/x86/kvm/ioapic.h | 12 ++++++++++++
2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 53994b8292fc..866d33abaee0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2275,18 +2275,6 @@ static inline void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr);
-static inline int __kvm_irq_line_state(unsigned long *irq_state,
- int irq_source_id, int level)
-{
- /* Logical OR for level trig interrupt */
- if (level)
- __set_bit(irq_source_id, irq_state);
- else
- __clear_bit(irq_source_id, irq_state);
-
- return !!(*irq_state);
-}
-
void kvm_inject_nmi(struct kvm_vcpu *vcpu);
int kvm_get_nr_pending_nmis(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index 3dadae093690..81b576513116 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -113,6 +113,18 @@ void kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
void kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
ulong *ioapic_handled_vectors);
+
+static inline int __kvm_irq_line_state(unsigned long *irq_state,
+ int irq_source_id, int level)
+{
+ /* Logical OR for level trig interrupt */
+ if (level)
+ __set_bit(irq_source_id, irq_state);
+ else
+ __clear_bit(irq_source_id, irq_state);
+
+ return !!(*irq_state);
+}
#endif /* CONFIG_KVM_IOAPIC */
static inline int ioapic_in_kernel(struct kvm *kvm)
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 35/40] KVM: x86: Move __kvm_irq_line_state() from kvm_host.h => ioapic.h
2026-05-29 22:22 ` [PATCH v3 35/40] KVM: x86: Move __kvm_irq_line_state() from kvm_host.h => ioapic.h Sean Christopherson
@ 2026-05-30 1:06 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 1:06 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:18PM -0700, Sean Christopherson wrote:
> Bury __kvm_irq_line_state() in CONFIG_KVM_IOAPIC=y code, as it's only used
> by PIC and I/O APIC code.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 36/40] KVM: x86: Move IRQ-related helper declarations from kvm_host.h => irq.h
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (34 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 35/40] KVM: x86: Move __kvm_irq_line_state() from kvm_host.h => ioapic.h Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 1:10 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 37/40] KVM: x86: Move kvm_pv_send_ipi() declaration from kvm_host.h => lapic.h Sean Christopherson
` (4 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Move the function declaration for APIs to get/query pending IRQs from
kvm_host.h to irq.h, as the APIs are only used by KVM x86 code.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 5 -----
arch/x86/kvm/irq.h | 6 ++++++
arch/x86/kvm/svm/nested.c | 1 +
arch/x86/kvm/vmx/nested.c | 1 +
4 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 866d33abaee0..38de6c0dc743 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2360,12 +2360,7 @@ enum {
# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
#endif
-int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
-int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
-int kvm_cpu_has_extint(struct kvm_vcpu *v);
int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
-int kvm_cpu_get_extint(struct kvm_vcpu *v);
-int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 34f4a78a7a01..1a84ea31e7fd 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -112,6 +112,12 @@ static inline int irqchip_in_kernel(struct kvm *kvm)
return mode != KVM_IRQCHIP_NONE;
}
+int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
+int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
+int kvm_cpu_has_extint(struct kvm_vcpu *v);
+int kvm_cpu_get_extint(struct kvm_vcpu *v);
+int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
+
void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 1ab8b95975a4..ef10298cb320 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -23,6 +23,7 @@
#include "kvm_emulate.h"
#include "trace.h"
+#include "irq.h"
#include "mmu.h"
#include "x86.h"
#include "smm.h"
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b2c851cc7d5c..adbb8358ade7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -11,6 +11,7 @@
#include "x86.h"
#include "cpuid.h"
#include "hyperv.h"
+#include "irq.h"
#include "mmu.h"
#include "nested.h"
#include "pmu.h"
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 36/40] KVM: x86: Move IRQ-related helper declarations from kvm_host.h => irq.h
2026-05-29 22:22 ` [PATCH v3 36/40] KVM: x86: Move IRQ-related helper declarations from kvm_host.h => irq.h Sean Christopherson
@ 2026-05-30 1:10 ` Yosry Ahmed
2026-06-01 15:22 ` Sean Christopherson
0 siblings, 1 reply; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 1:10 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:19PM -0700, Sean Christopherson wrote:
> Move the function declaration for APIs to get/query pending IRQs from
> kvm_host.h to irq.h, as the APIs are only used by KVM x86 code.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/include/asm/kvm_host.h | 5 -----
> arch/x86/kvm/irq.h | 6 ++++++
> arch/x86/kvm/svm/nested.c | 1 +
> arch/x86/kvm/vmx/nested.c | 1 +
> 4 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 866d33abaee0..38de6c0dc743 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2360,12 +2360,7 @@ enum {
> # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
> #endif
>
> -int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
> -int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
> -int kvm_cpu_has_extint(struct kvm_vcpu *v);
> int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
kvm_arch_interrupt_allowed() is only used in x86.c. Probably it wasn't
used outside of x86 code after a1b37100d9e29c1f8dc3e2f5490a205c80180e01.
It should probably be renamed and moved, maybe separately.
Anyway:
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
^ permalink raw reply [flat|nested] 87+ messages in thread* Re: [PATCH v3 36/40] KVM: x86: Move IRQ-related helper declarations from kvm_host.h => irq.h
2026-05-30 1:10 ` Yosry Ahmed
@ 2026-06-01 15:22 ` Sean Christopherson
2026-06-01 23:44 ` Yosry Ahmed
0 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-06-01 15:22 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Sat, May 30, 2026, Yosry Ahmed wrote:
> On Fri, May 29, 2026 at 03:22:19PM -0700, Sean Christopherson wrote:
> > Move the function declaration for APIs to get/query pending IRQs from
> > kvm_host.h to irq.h, as the APIs are only used by KVM x86 code.
> >
> > No functional change intended.
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > arch/x86/include/asm/kvm_host.h | 5 -----
> > arch/x86/kvm/irq.h | 6 ++++++
> > arch/x86/kvm/svm/nested.c | 1 +
> > arch/x86/kvm/vmx/nested.c | 1 +
> > 4 files changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 866d33abaee0..38de6c0dc743 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -2360,12 +2360,7 @@ enum {
> > # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
> > #endif
> >
> > -int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
> > -int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
> > -int kvm_cpu_has_extint(struct kvm_vcpu *v);
> > int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
>
> kvm_arch_interrupt_allowed() is only used in x86.c. Probably it wasn't
> used outside of x86 code after a1b37100d9e29c1f8dc3e2f5490a205c80180e01.
Oof, and it has x86-specific semantics (KVM x86 very much relies on 0/1/-EBUSY
return values).
> It should probably be renamed and moved, maybe separately.
Yeah, I'll slot in a patch to bury it in x86.c as kvm_is_interrupt_allowed().
^ permalink raw reply [flat|nested] 87+ messages in thread* Re: [PATCH v3 36/40] KVM: x86: Move IRQ-related helper declarations from kvm_host.h => irq.h
2026-06-01 15:22 ` Sean Christopherson
@ 2026-06-01 23:44 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-06-01 23:44 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Mon, Jun 01, 2026 at 08:22:50AM -0700, Sean Christopherson wrote:
> On Sat, May 30, 2026, Yosry Ahmed wrote:
> > On Fri, May 29, 2026 at 03:22:19PM -0700, Sean Christopherson wrote:
> > > Move the function declaration for APIs to get/query pending IRQs from
> > > kvm_host.h to irq.h, as the APIs are only used by KVM x86 code.
> > >
> > > No functional change intended.
> > >
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > > arch/x86/include/asm/kvm_host.h | 5 -----
> > > arch/x86/kvm/irq.h | 6 ++++++
> > > arch/x86/kvm/svm/nested.c | 1 +
> > > arch/x86/kvm/vmx/nested.c | 1 +
> > > 4 files changed, 8 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 866d33abaee0..38de6c0dc743 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -2360,12 +2360,7 @@ enum {
> > > # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
> > > #endif
> > >
> > > -int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
> > > -int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
> > > -int kvm_cpu_has_extint(struct kvm_vcpu *v);
> > > int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
> >
> > kvm_arch_interrupt_allowed() is only used in x86.c. Probably it wasn't
> > used outside of x86 code after a1b37100d9e29c1f8dc3e2f5490a205c80180e01.
>
> Oof, and it has x86-specific semantics (KVM x86 very much relies on 0/1/-EBUSY
> return values).
>
> > It should probably be renamed and moved, maybe separately.
>
> Yeah, I'll slot in a patch to bury it in x86.c as kvm_is_interrupt_allowed().
Thank you!
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 37/40] KVM: x86: Move kvm_pv_send_ipi() declaration from kvm_host.h => lapic.h
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (35 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 36/40] KVM: x86: Move IRQ-related helper declarations from kvm_host.h => irq.h Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 1:11 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 38/40] KVM: x86/mmu: Move kvm_arch_async_page_ready() below kvm_tdp_page_fault() Sean Christopherson
` (3 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Move the declaration of kvm_pv_send_ipi() into lapic.h, as its
implementation is provided by lapic.c (sending PV IPIs relies on the
optimized APIC map provided by the in-kernel local APIC), and it's only
used by KVM x86 code.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 4 ----
arch/x86/kvm/lapic.h | 3 +++
2 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 38de6c0dc743..a3c1ff784e5c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2363,10 +2363,6 @@ enum {
int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
-int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
- unsigned long ipi_bitmap_high, u32 min,
- unsigned long icr, int op_64_bit);
-
u64 kvm_scale_tsc(u64 tsc, u64 ratio);
u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 32f09b25884a..58dbb94f980d 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -131,6 +131,9 @@ static inline int kvm_irq_delivery_to_apic(struct kvm *kvm,
}
void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high);
+int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
+ unsigned long ipi_bitmap_high, u32 min,
+ unsigned long icr, int op_64_bit);
int kvm_apic_set_base(struct kvm_vcpu *vcpu, u64 value, bool host_initiated);
int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 37/40] KVM: x86: Move kvm_pv_send_ipi() declaration from kvm_host.h => lapic.h
2026-05-29 22:22 ` [PATCH v3 37/40] KVM: x86: Move kvm_pv_send_ipi() declaration from kvm_host.h => lapic.h Sean Christopherson
@ 2026-05-30 1:11 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 1:11 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:20PM -0700, Sean Christopherson wrote:
> Move the declaration of kvm_pv_send_ipi() into lapic.h, as its
> implementation is provided by lapic.c (sending PV IPIs relies on the
> optimized APIC map provided by the in-kernel local APIC), and it's only
> used by KVM x86 code.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 38/40] KVM: x86/mmu: Move kvm_arch_async_page_ready() below kvm_tdp_page_fault()
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (36 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 37/40] KVM: x86: Move kvm_pv_send_ipi() declaration from kvm_host.h => lapic.h Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 1:12 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 39/40] KVM: x86/mmu: Move kvm_mmu_do_page_fault() from mmu_internal.h => mmu.c Sean Christopherson
` (2 subsequent siblings)
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Move the implementation of kvm_arch_async_page_ready() "down" in mmu.c so
that it lives below kvm_tdp_page_fault(). This will allow moving
kvm_mmu_do_page_fault() into mmu.c without needing a forward declaration.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 62 +++++++++++++++++++++---------------------
1 file changed, 31 insertions(+), 31 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e4d971d42f0e..f217e71c3af0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4571,37 +4571,6 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
kvm_vcpu_gfn_to_hva(vcpu, fault->gfn), &arch);
}
-void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
-{
- int r;
-
- if (WARN_ON_ONCE(work->arch.error_code & PFERR_PRIVATE_ACCESS))
- return;
-
- if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) ||
- work->wakeup_all)
- return;
-
- r = kvm_mmu_reload(vcpu);
- if (unlikely(r))
- return;
-
- if (!vcpu->arch.mmu->root_role.direct &&
- work->arch.cr3 != kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu))
- return;
-
- r = kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, work->arch.error_code,
- true, NULL, NULL);
-
- /*
- * Account fixed page faults, otherwise they'll never be counted, but
- * ignore stats for all other return times. Page-ready "faults" aren't
- * truly spurious and never trigger emulation
- */
- if (r == RET_PF_FIXED)
- vcpu->stat.pf_fixed++;
-}
-
static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault, int r)
{
@@ -5058,6 +5027,37 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
return min(range->size, end - range->gpa);
}
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
+{
+ int r;
+
+ if (WARN_ON_ONCE(work->arch.error_code & PFERR_PRIVATE_ACCESS))
+ return;
+
+ if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) ||
+ work->wakeup_all)
+ return;
+
+ r = kvm_mmu_reload(vcpu);
+ if (unlikely(r))
+ return;
+
+ if (!vcpu->arch.mmu->root_role.direct &&
+ work->arch.cr3 != kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu))
+ return;
+
+ r = kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, work->arch.error_code,
+ true, NULL, NULL);
+
+ /*
+ * Account fixed page faults, otherwise they'll never be counted, but
+ * ignore stats for all other return times. Page-ready "faults" aren't
+ * truly spurious and never trigger emulation
+ */
+ if (r == RET_PF_FIXED)
+ vcpu->stat.pf_fixed++;
+}
+
#ifdef CONFIG_KVM_GUEST_MEMFD
static void kvm_assert_gmem_invalidate_lock_held(struct kvm_memory_slot *slot)
{
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 38/40] KVM: x86/mmu: Move kvm_arch_async_page_ready() below kvm_tdp_page_fault()
2026-05-29 22:22 ` [PATCH v3 38/40] KVM: x86/mmu: Move kvm_arch_async_page_ready() below kvm_tdp_page_fault() Sean Christopherson
@ 2026-05-30 1:12 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 1:12 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:21PM -0700, Sean Christopherson wrote:
> Move the implementation of kvm_arch_async_page_ready() "down" in mmu.c so
> that it lives below kvm_tdp_page_fault(). This will allow moving
> kvm_mmu_do_page_fault() into mmu.c without needing a forward declaration.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 39/40] KVM: x86/mmu: Move kvm_mmu_do_page_fault() from mmu_internal.h => mmu.c
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (37 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 38/40] KVM: x86/mmu: Move kvm_arch_async_page_ready() below kvm_tdp_page_fault() Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 1:13 ` Yosry Ahmed
2026-05-29 22:22 ` [PATCH v3 40/40] KVM: x86: Move a pile of stuff from kvm_host.h => x86.h Sean Christopherson
2026-05-30 16:59 ` [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Paolo Bonzini
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Move kvm_mmu_do_page_fault() into mmu.c, as there are no users outside of
mmu.c, and the function typically isn't inlined by the compiler anyways.
This will allow moving the EMULTYPE_xxx definitions into x86.h without
having to include x86.h in mmu_internal.h, i.e. will help preserve the
goal of making x86.h KVM x86's "top-level" include.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 67 ++++++++++++++++++++++++++++++++-
arch/x86/kvm/mmu/mmu_internal.h | 66 --------------------------------
2 files changed, 66 insertions(+), 67 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f217e71c3af0..2796230ec398 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4927,7 +4927,7 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
}
#endif
-int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+static int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
#ifdef CONFIG_X86_64
if (tdp_mmu_enabled)
@@ -4937,6 +4937,71 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
return direct_page_fault(vcpu, fault);
}
+static int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
+ u64 err, bool prefetch, int *emulation_type,
+ u8 *level)
+{
+ struct kvm_page_fault fault = {
+ .addr = cr2_or_gpa,
+ .error_code = err,
+ .exec = err & PFERR_FETCH_MASK,
+ .write = err & PFERR_WRITE_MASK,
+ .present = err & PFERR_PRESENT_MASK,
+ .rsvd = err & PFERR_RSVD_MASK,
+ .user = err & PFERR_USER_MASK,
+ .prefetch = prefetch,
+ .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
+ .nx_huge_page_workaround_enabled =
+ is_nx_huge_page_enabled(vcpu->kvm),
+
+ .max_level = KVM_MAX_HUGEPAGE_LEVEL,
+ .req_level = PG_LEVEL_4K,
+ .goal_level = PG_LEVEL_4K,
+ .is_private = err & PFERR_PRIVATE_ACCESS,
+
+ .pfn = KVM_PFN_ERR_FAULT,
+ };
+ int r;
+
+ if (vcpu->arch.mmu->root_role.direct) {
+ /*
+ * Things like memslots don't understand the concept of a shared
+ * bit. Strip it so that the GFN can be used like normal, and the
+ * fault.addr can be used when the shared bit is needed.
+ */
+ fault.gfn = gpa_to_gfn(fault.addr) & ~kvm_gfn_direct_bits(vcpu->kvm);
+ fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
+ }
+
+ /*
+ * With retpoline being active an indirect call is rather expensive,
+ * so do a direct call in the most common case.
+ */
+ if (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) && fault.is_tdp)
+ r = kvm_tdp_page_fault(vcpu, &fault);
+ else
+ r = vcpu->arch.mmu->page_fault(vcpu, &fault);
+
+ /*
+ * Not sure what's happening, but punt to userspace and hope that
+ * they can fix it by changing memory to shared, or they can
+ * provide a better error.
+ */
+ if (r == RET_PF_EMULATE && fault.is_private) {
+ pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
+ kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
+ return -EFAULT;
+ }
+
+ if (fault.write_fault_to_shadow_pgtable && emulation_type)
+ *emulation_type |= EMULTYPE_WRITE_PF_TO_SP;
+ if (level)
+ *level = fault.goal_level;
+
+ return r;
+}
+
+
static int kvm_tdp_page_prefault(struct kvm_vcpu *vcpu, gpa_t gpa,
u64 error_code, u8 *level)
{
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 73cdcbccc89e..c29002c60126 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -290,8 +290,6 @@ struct kvm_page_fault {
bool write_fault_to_shadow_pgtable;
};
-int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
-
/*
* Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
* and of course kvm_mmu_do_page_fault().
@@ -337,70 +335,6 @@ static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
fault->is_private);
}
-static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
- u64 err, bool prefetch,
- int *emulation_type, u8 *level)
-{
- struct kvm_page_fault fault = {
- .addr = cr2_or_gpa,
- .error_code = err,
- .exec = err & PFERR_FETCH_MASK,
- .write = err & PFERR_WRITE_MASK,
- .present = err & PFERR_PRESENT_MASK,
- .rsvd = err & PFERR_RSVD_MASK,
- .user = err & PFERR_USER_MASK,
- .prefetch = prefetch,
- .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
- .nx_huge_page_workaround_enabled =
- is_nx_huge_page_enabled(vcpu->kvm),
-
- .max_level = KVM_MAX_HUGEPAGE_LEVEL,
- .req_level = PG_LEVEL_4K,
- .goal_level = PG_LEVEL_4K,
- .is_private = err & PFERR_PRIVATE_ACCESS,
-
- .pfn = KVM_PFN_ERR_FAULT,
- };
- int r;
-
- if (vcpu->arch.mmu->root_role.direct) {
- /*
- * Things like memslots don't understand the concept of a shared
- * bit. Strip it so that the GFN can be used like normal, and the
- * fault.addr can be used when the shared bit is needed.
- */
- fault.gfn = gpa_to_gfn(fault.addr) & ~kvm_gfn_direct_bits(vcpu->kvm);
- fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
- }
-
- /*
- * With retpoline being active an indirect call is rather expensive,
- * so do a direct call in the most common case.
- */
- if (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) && fault.is_tdp)
- r = kvm_tdp_page_fault(vcpu, &fault);
- else
- r = vcpu->arch.mmu->page_fault(vcpu, &fault);
-
- /*
- * Not sure what's happening, but punt to userspace and hope that
- * they can fix it by changing memory to shared, or they can
- * provide a better error.
- */
- if (r == RET_PF_EMULATE && fault.is_private) {
- pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
- kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
- return -EFAULT;
- }
-
- if (fault.write_fault_to_shadow_pgtable && emulation_type)
- *emulation_type |= EMULTYPE_WRITE_PF_TO_SP;
- if (level)
- *level = fault.goal_level;
-
- return r;
-}
-
int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
const struct kvm_memory_slot *slot, gfn_t gfn);
void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 39/40] KVM: x86/mmu: Move kvm_mmu_do_page_fault() from mmu_internal.h => mmu.c
2026-05-29 22:22 ` [PATCH v3 39/40] KVM: x86/mmu: Move kvm_mmu_do_page_fault() from mmu_internal.h => mmu.c Sean Christopherson
@ 2026-05-30 1:13 ` Yosry Ahmed
0 siblings, 0 replies; 87+ messages in thread
From: Yosry Ahmed @ 2026-05-30 1:13 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, David Woodhouse, Paul Durrant,
kvm, linux-kernel, Binbin Wu, David Woodhouse, Kai Huang
On Fri, May 29, 2026 at 03:22:22PM -0700, Sean Christopherson wrote:
> Move kvm_mmu_do_page_fault() into mmu.c, as there are no users outside of
> mmu.c, and the function typically isn't inlined by the compiler anyways.
> This will allow moving the EMULTYPE_xxx definitions into x86.h without
> having to include x86.h in mmu_internal.h, i.e. will help preserve the
> goal of making x86.h KVM x86's "top-level" include.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
[..]
> + /*
> + * Not sure what's happening, but punt to userspace and hope that
> + * they can fix it by changing memory to shared, or they can
> + * provide a better error.
> + */
LOL
> + if (r == RET_PF_EMULATE && fault.is_private) {
> + pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
> + kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
> + return -EFAULT;
> + }
> +
> + if (fault.write_fault_to_shadow_pgtable && emulation_type)
> + *emulation_type |= EMULTYPE_WRITE_PF_TO_SP;
> + if (level)
> + *level = fault.goal_level;
> +
> + return r;
> +}
^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 40/40] KVM: x86: Move a pile of stuff from kvm_host.h => x86.h
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (38 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 39/40] KVM: x86/mmu: Move kvm_mmu_do_page_fault() from mmu_internal.h => mmu.c Sean Christopherson
@ 2026-05-29 22:22 ` Sean Christopherson
2026-05-30 7:59 ` sashiko-bot
2026-05-30 16:59 ` [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Paolo Bonzini
40 siblings, 1 reply; 87+ messages in thread
From: Sean Christopherson @ 2026-05-29 22:22 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
David Woodhouse, Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
Move the majority of remaining KVM-internal declarations and defines in
kvm_host.h to x86.h, so that kvm_host.h only holds structure and function
definitions that need to be visible to arch-neutral KVM.
Land the emulator interfaces in x86.h, even though kvm_emulate.h *seems*
like a good home, as the interfaces and defines being moved are provided by
x86.c. I.e. keep kvm_emulate.h as an interface to the emulator proper.
Note, any "misses" are likely unintentional.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 197 --------------------------------
arch/x86/kvm/ioapic.c | 1 +
arch/x86/kvm/x86.h | 195 +++++++++++++++++++++++++++++++
3 files changed, 196 insertions(+), 197 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a3c1ff784e5c..4efad65e53e2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2096,9 +2096,6 @@ extern struct kvm_x86_ops kvm_x86_ops;
#define KVM_X86_OP_OPTIONAL_RET0 KVM_X86_OP
#include <asm/kvm-x86-ops.h>
-int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops);
-void kvm_x86_vendor_exit(void);
-
#define __KVM_HAVE_ARCH_VM_ALLOC
static inline struct kvm *kvm_arch_alloc_vm(void)
{
@@ -2141,175 +2138,6 @@ enum kvm_intr_type {
((vcpu) && (vcpu)->arch.handling_intr_from_guest && \
(!!in_nmi() == ((vcpu)->arch.handling_intr_from_guest == KVM_HANDLING_NMI)))
-/*
- * EMULTYPE_NO_DECODE - Set when re-emulating an instruction (after completing
- * userspace I/O) to indicate that the emulation context
- * should be reused as is, i.e. skip initialization of
- * emulation context, instruction fetch and decode.
- *
- * EMULTYPE_TRAP_UD - Set when emulating an intercepted #UD from hardware.
- * Indicates that only select instructions (tagged with
- * EmulateOnUD) should be emulated (to minimize the emulator
- * attack surface). See also EMULTYPE_TRAP_UD_FORCED.
- *
- * EMULTYPE_SKIP - Set when emulating solely to skip an instruction, i.e. to
- * decode the instruction length. For use *only* by
- * kvm_x86_ops.skip_emulated_instruction() implementations if
- * EMULTYPE_COMPLETE_USER_EXIT is not set.
- *
- * EMULTYPE_ALLOW_RETRY_PF - Set when the emulator should resume the guest to
- * retry native execution under certain conditions,
- * Can only be set in conjunction with EMULTYPE_PF.
- *
- * EMULTYPE_TRAP_UD_FORCED - Set when emulating an intercepted #UD that was
- * triggered by KVM's magic "force emulation" prefix,
- * which is opt in via module param (off by default).
- * Bypasses EmulateOnUD restriction despite emulating
- * due to an intercepted #UD (see EMULTYPE_TRAP_UD).
- * Used to test the full emulator from userspace.
- *
- * EMULTYPE_VMWARE_GP - Set when emulating an intercepted #GP for VMware
- * backdoor emulation, which is opt in via module param.
- * VMware backdoor emulation handles select instructions
- * and reinjects the #GP for all other cases.
- *
- * EMULTYPE_PF - Set when an intercepted #PF triggers the emulation, in which case
- * the CR2/GPA value pass on the stack is valid.
- *
- * EMULTYPE_COMPLETE_USER_EXIT - Set when the emulator should update interruptibility
- * state and inject single-step #DBs after skipping
- * an instruction (after completing userspace I/O).
- *
- * EMULTYPE_WRITE_PF_TO_SP - Set when emulating an intercepted page fault that
- * is attempting to write a gfn that contains one or
- * more of the PTEs used to translate the write itself,
- * and the owning page table is being shadowed by KVM.
- * If emulation of the faulting instruction fails and
- * this flag is set, KVM will exit to userspace instead
- * of retrying emulation as KVM cannot make forward
- * progress.
- *
- * If emulation fails for a write to guest page tables,
- * KVM unprotects (zaps) the shadow page for the target
- * gfn and resumes the guest to retry the non-emulatable
- * instruction (on hardware). Unprotecting the gfn
- * doesn't allow forward progress for a self-changing
- * access because doing so also zaps the translation for
- * the gfn, i.e. retrying the instruction will hit a
- * !PRESENT fault, which results in a new shadow page
- * and sends KVM back to square one.
- *
- * EMULTYPE_SKIP_SOFT_INT - Set in combination with EMULTYPE_SKIP to only skip
- * an instruction if it could generate a given software
- * interrupt, which must be encoded via
- * EMULTYPE_SET_SOFT_INT_VECTOR().
- */
-#define EMULTYPE_NO_DECODE (1 << 0)
-#define EMULTYPE_TRAP_UD (1 << 1)
-#define EMULTYPE_SKIP (1 << 2)
-#define EMULTYPE_ALLOW_RETRY_PF (1 << 3)
-#define EMULTYPE_TRAP_UD_FORCED (1 << 4)
-#define EMULTYPE_VMWARE_GP (1 << 5)
-#define EMULTYPE_PF (1 << 6)
-#define EMULTYPE_COMPLETE_USER_EXIT (1 << 7)
-#define EMULTYPE_WRITE_PF_TO_SP (1 << 8)
-#define EMULTYPE_SKIP_SOFT_INT (1 << 9)
-
-#define EMULTYPE_SET_SOFT_INT_VECTOR(v) ((u32)((v) & 0xff) << 16)
-#define EMULTYPE_GET_SOFT_INT_VECTOR(e) (((e) >> 16) & 0xff)
-
-static inline bool kvm_can_emulate_event_vectoring(int emul_type)
-{
- return !(emul_type & EMULTYPE_PF);
-}
-
-int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);
-int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu,
- void *insn, int insn_len);
-void __kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu,
- u64 *data, u8 ndata);
-void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu);
-
-void kvm_prepare_event_vectoring_exit(struct kvm_vcpu *vcpu, gpa_t gpa);
-void kvm_prepare_unexpected_reason_exit(struct kvm_vcpu *vcpu, u64 exit_reason);
-
-int kvm_emulate_as_nop(struct kvm_vcpu *vcpu);
-int kvm_emulate_invd(struct kvm_vcpu *vcpu);
-int kvm_emulate_mwait(struct kvm_vcpu *vcpu);
-int kvm_handle_invalid_op(struct kvm_vcpu *vcpu);
-int kvm_emulate_monitor(struct kvm_vcpu *vcpu);
-
-int kvm_fast_pio(struct kvm_vcpu *vcpu, int size, unsigned short port, int in);
-int kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
-int kvm_emulate_halt(struct kvm_vcpu *vcpu);
-int kvm_emulate_halt_noskip(struct kvm_vcpu *vcpu);
-int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu);
-int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
-
-void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
-
-int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
- int reason, bool has_error_code, u32 error_code);
-
-int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
-int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
-
-int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu);
-
-void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
-void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
-void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, unsigned long payload);
-void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned int nr,
- bool has_error_code, u32 error_code);
-void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault,
- bool from_hardware);
-void __kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
- struct x86_exception *fault,
- bool from_hardware);
-
-static inline void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
- struct x86_exception *fault)
-{
- __kvm_inject_emulated_page_fault(vcpu, fault, false);
-}
-
-bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr);
-
-void kvm_inject_nmi(struct kvm_vcpu *vcpu);
-int kvm_get_nr_pending_nmis(struct kvm_vcpu *vcpu);
-
-bool kvm_apicv_activated(struct kvm *kvm);
-bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu);
-void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
-void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
- enum kvm_apicv_inhibit reason, bool set);
-void kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
- enum kvm_apicv_inhibit reason, bool set);
-
-static inline void kvm_set_apicv_inhibit(struct kvm *kvm,
- enum kvm_apicv_inhibit reason)
-{
- kvm_set_or_clear_apicv_inhibit(kvm, reason, true);
-}
-
-static inline void kvm_clear_apicv_inhibit(struct kvm *kvm,
- enum kvm_apicv_inhibit reason)
-{
- kvm_set_or_clear_apicv_inhibit(kvm, reason, false);
-}
-
-void kvm_inc_or_dec_irq_window_inhibit(struct kvm *kvm, bool inc);
-
-static inline void kvm_inc_apicv_irq_window_req(struct kvm *kvm)
-{
- kvm_inc_or_dec_irq_window_inhibit(kvm, true);
-}
-
-static inline void kvm_dec_apicv_irq_window_req(struct kvm *kvm)
-{
- kvm_inc_or_dec_irq_window_inhibit(kvm, false);
-}
-
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
#endif
@@ -2326,11 +2154,6 @@ static inline unsigned long read_msr(unsigned long msr)
}
#endif
-static inline void kvm_inject_gp(struct kvm_vcpu *vcpu, u32 error_code)
-{
- kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
-}
-
#define TSS_IOPB_BASE_OFFSET 0x66
#define TSS_BASE_SIZE 0x68
#define TSS_IOPB_SIZE (65536 / 8)
@@ -2361,16 +2184,6 @@ enum {
#endif
int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
-void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
-
-u64 kvm_scale_tsc(u64 tsc, u64 ratio);
-u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
-u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier);
-u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier);
-
-void kvm_make_scan_ioapic_request(struct kvm *kvm);
-void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
- unsigned long *vcpu_bitmap);
bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
struct kvm_async_pf *work);
@@ -2382,14 +2195,6 @@ void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu);
bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
-int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
-int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
-
-void __user *__x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
- u32 size);
-bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu);
-bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu);
-
static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
{
kvm_x86_call(vcpu_blocking)(vcpu);
@@ -2400,8 +2205,6 @@ static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
kvm_x86_call(vcpu_unblocking)(vcpu);
}
-int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
-
static inline bool kvm_arch_has_irq_bypass(void)
{
return enable_device_posted_irqs;
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index eed96ff6e722..b114f646d798 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -33,6 +33,7 @@
#include "lapic.h"
#include "irq.h"
#include "trace.h"
+#include "x86.h"
static int ioapic_service(struct kvm_ioapic *vioapic, int irq,
bool line_status);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index fd3d0a196526..9db1f663e5b1 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -14,6 +14,9 @@
#define KVM_MAX_MCE_BANKS 32
+int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops);
+void kvm_x86_vendor_exit(void);
+
void kvm_spurious_fault(void);
#define SIZE_OF_MEMSLOTS_HASHTABLE \
@@ -318,6 +321,8 @@ static __always_inline void kvm_request_l1tf_flush_l1d(void)
#endif
}
+void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
+
void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
u64 get_kvmclock_ns(struct kvm *kvm);
@@ -326,6 +331,10 @@ bool kvm_get_monotonic_and_clockread(s64 *kernel_ns, u64 *tsc_timestamp);
int kvm_guest_time_update(struct kvm_vcpu *v);
void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 *user_value);
+u64 kvm_scale_tsc(u64 tsc, u64 ratio);
+u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
+u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier);
+u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier);
u64 kvm_compute_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc);
void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset);
@@ -363,10 +372,196 @@ int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type,
void *insn, int insn_len);
int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
int emulation_type, void *insn, int insn_len);
+/*
+ * EMULTYPE_NO_DECODE - Set when re-emulating an instruction (after completing
+ * userspace I/O) to indicate that the emulation context
+ * should be reused as is, i.e. skip initialization of
+ * emulation context, instruction fetch and decode.
+ *
+ * EMULTYPE_TRAP_UD - Set when emulating an intercepted #UD from hardware.
+ * Indicates that only select instructions (tagged with
+ * EmulateOnUD) should be emulated (to minimize the emulator
+ * attack surface). See also EMULTYPE_TRAP_UD_FORCED.
+ *
+ * EMULTYPE_SKIP - Set when emulating solely to skip an instruction, i.e. to
+ * decode the instruction length. For use *only* by
+ * kvm_x86_ops.skip_emulated_instruction() implementations if
+ * EMULTYPE_COMPLETE_USER_EXIT is not set.
+ *
+ * EMULTYPE_ALLOW_RETRY_PF - Set when the emulator should resume the guest to
+ * retry native execution under certain conditions,
+ * Can only be set in conjunction with EMULTYPE_PF.
+ *
+ * EMULTYPE_TRAP_UD_FORCED - Set when emulating an intercepted #UD that was
+ * triggered by KVM's magic "force emulation" prefix,
+ * which is opt in via module param (off by default).
+ * Bypasses EmulateOnUD restriction despite emulating
+ * due to an intercepted #UD (see EMULTYPE_TRAP_UD).
+ * Used to test the full emulator from userspace.
+ *
+ * EMULTYPE_VMWARE_GP - Set when emulating an intercepted #GP for VMware
+ * backdoor emulation, which is opt in via module param.
+ * VMware backdoor emulation handles select instructions
+ * and reinjects the #GP for all other cases.
+ *
+ * EMULTYPE_PF - Set when an intercepted #PF triggers the emulation, in which case
+ * the CR2/GPA value pass on the stack is valid.
+ *
+ * EMULTYPE_COMPLETE_USER_EXIT - Set when the emulator should update interruptibility
+ * state and inject single-step #DBs after skipping
+ * an instruction (after completing userspace I/O).
+ *
+ * EMULTYPE_WRITE_PF_TO_SP - Set when emulating an intercepted page fault that
+ * is attempting to write a gfn that contains one or
+ * more of the PTEs used to translate the write itself,
+ * and the owning page table is being shadowed by KVM.
+ * If emulation of the faulting instruction fails and
+ * this flag is set, KVM will exit to userspace instead
+ * of retrying emulation as KVM cannot make forward
+ * progress.
+ *
+ * If emulation fails for a write to guest page tables,
+ * KVM unprotects (zaps) the shadow page for the target
+ * gfn and resumes the guest to retry the non-emulatable
+ * instruction (on hardware). Unprotecting the gfn
+ * doesn't allow forward progress for a self-changing
+ * access because doing so also zaps the translation for
+ * the gfn, i.e. retrying the instruction will hit a
+ * !PRESENT fault, which results in a new shadow page
+ * and sends KVM back to square one.
+ *
+ * EMULTYPE_SKIP_SOFT_INT - Set in combination with EMULTYPE_SKIP to only skip
+ * an instruction if it could generate a given software
+ * interrupt, which must be encoded via
+ * EMULTYPE_SET_SOFT_INT_VECTOR().
+ */
+#define EMULTYPE_NO_DECODE (1 << 0)
+#define EMULTYPE_TRAP_UD (1 << 1)
+#define EMULTYPE_SKIP (1 << 2)
+#define EMULTYPE_ALLOW_RETRY_PF (1 << 3)
+#define EMULTYPE_TRAP_UD_FORCED (1 << 4)
+#define EMULTYPE_VMWARE_GP (1 << 5)
+#define EMULTYPE_PF (1 << 6)
+#define EMULTYPE_COMPLETE_USER_EXIT (1 << 7)
+#define EMULTYPE_WRITE_PF_TO_SP (1 << 8)
+#define EMULTYPE_SKIP_SOFT_INT (1 << 9)
+
+#define EMULTYPE_SET_SOFT_INT_VECTOR(v) ((u32)((v) & 0xff) << 16)
+#define EMULTYPE_GET_SOFT_INT_VECTOR(e) (((e) >> 16) & 0xff)
+
+static inline bool kvm_can_emulate_event_vectoring(int emul_type)
+{
+ return !(emul_type & EMULTYPE_PF);
+}
+
+int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);
+int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu,
+ void *insn, int insn_len);
+void __kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu,
+ u64 *data, u8 ndata);
+void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu);
+
+void kvm_prepare_event_vectoring_exit(struct kvm_vcpu *vcpu, gpa_t gpa);
+void kvm_prepare_unexpected_reason_exit(struct kvm_vcpu *vcpu, u64 exit_reason);
fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu);
fastpath_t handle_fastpath_invd(struct kvm_vcpu *vcpu);
+int kvm_emulate_as_nop(struct kvm_vcpu *vcpu);
+int kvm_emulate_invd(struct kvm_vcpu *vcpu);
+int kvm_emulate_mwait(struct kvm_vcpu *vcpu);
+int kvm_handle_invalid_op(struct kvm_vcpu *vcpu);
+int kvm_emulate_monitor(struct kvm_vcpu *vcpu);
+
+int kvm_fast_pio(struct kvm_vcpu *vcpu, int size, unsigned short port, int in);
+int kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
+int kvm_emulate_halt(struct kvm_vcpu *vcpu);
+int kvm_emulate_halt_noskip(struct kvm_vcpu *vcpu);
+int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu);
+int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
+
+void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
+
+int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
+ int reason, bool has_error_code, u32 error_code);
+
+int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
+int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
+int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu);
+
+int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
+int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
+
+void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
+void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
+void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, unsigned long payload);
+void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned int nr,
+ bool has_error_code, u32 error_code);
+void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault,
+ bool from_hardware);
+void __kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+ struct x86_exception *fault,
+ bool from_hardware);
+
+static inline void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+ struct x86_exception *fault)
+{
+ __kvm_inject_emulated_page_fault(vcpu, fault, false);
+}
+
+bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr);
+
+static inline void kvm_inject_gp(struct kvm_vcpu *vcpu, u32 error_code)
+{
+ kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
+}
+
+void kvm_inject_nmi(struct kvm_vcpu *vcpu);
+int kvm_get_nr_pending_nmis(struct kvm_vcpu *vcpu);
+
+void __user *__x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
+ u32 size);
+int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
+
+bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu);
+bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu);
+
+bool kvm_apicv_activated(struct kvm *kvm);
+bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu);
+void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
+void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
+ enum kvm_apicv_inhibit reason, bool set);
+void kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
+ enum kvm_apicv_inhibit reason, bool set);
+
+static inline void kvm_set_apicv_inhibit(struct kvm *kvm,
+ enum kvm_apicv_inhibit reason)
+{
+ kvm_set_or_clear_apicv_inhibit(kvm, reason, true);
+}
+
+static inline void kvm_clear_apicv_inhibit(struct kvm *kvm,
+ enum kvm_apicv_inhibit reason)
+{
+ kvm_set_or_clear_apicv_inhibit(kvm, reason, false);
+}
+
+void kvm_inc_or_dec_irq_window_inhibit(struct kvm *kvm, bool inc);
+
+static inline void kvm_inc_apicv_irq_window_req(struct kvm *kvm)
+{
+ kvm_inc_or_dec_irq_window_inhibit(kvm, true);
+}
+
+static inline void kvm_dec_apicv_irq_window_req(struct kvm *kvm)
+{
+ kvm_inc_or_dec_irq_window_inhibit(kvm, false);
+}
+
+void kvm_make_scan_ioapic_request(struct kvm *kvm);
+void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
+ unsigned long *vcpu_bitmap);
+
void kvm_setup_xss_caps(void);
/*
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 87+ messages in thread* Re: [PATCH v3 40/40] KVM: x86: Move a pile of stuff from kvm_host.h => x86.h
2026-05-29 22:22 ` [PATCH v3 40/40] KVM: x86: Move a pile of stuff from kvm_host.h => x86.h Sean Christopherson
@ 2026-05-30 7:59 ` sashiko-bot
0 siblings, 0 replies; 87+ messages in thread
From: sashiko-bot @ 2026-05-30 7:59 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Several x86 KVM-internal declarations and definitions were unintentionally left in `kvm_host.h` despite the commit's goal to move them to `x86.h`.
--
commit 6fbb37590b9a07586090f255933e9452d592518d
Author: Sean Christopherson <seanjc@google.com>
KVM: x86: Move a pile of stuff from kvm_host.h => x86.h
Move the majority of remaining KVM-internal declarations and defines in
kvm_host.h to x86.h, so that kvm_host.h only holds structure and function
definitions that need to be visible to arch-neutral KVM.
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index a3c1ff784e5c0..4efad65e53e24 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
[ ... ]
> @@ -2326,11 +2154,6 @@ static inline unsigned long read_msr(unsigned long msr)
> }
> #endif
>
> -static inline void kvm_inject_gp(struct kvm_vcpu *vcpu, u32 error_code)
> -{
> - kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
> -}
> -
> #define TSS_IOPB_BASE_OFFSET 0x66
> #define TSS_BASE_SIZE 0x68
> #define TSS_IOPB_SIZE (65536 / 8)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Severity: Medium]
The commit message notes that any "misses" are likely unintentional. Should
TSS_IOPB_BASE_OFFSET and related constants also be moved to x86.h?
There are a few other x86-specific definitions still left in this file that
might have been missed, such as struct kvm_host_values (earlier in the file).
> @@ -2361,16 +2184,6 @@ enum {
> #endif
>
> int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Severity: Medium]
Similarly, should kvm_arch_interrupt_allowed() and the preceding enum
containing TASK_SWITCH_CALL also be moved to x86.h?
> -void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
> -
[ ... ]
> @@ -2382,14 +2195,6 @@ void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu);
> bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
> extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Severity: Medium]
Are kvm_find_async_pf_gfn() and the other async page fault prototypes missed
here as well?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260529222223.870923-1-seanjc@google.com?part=40
^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning
2026-05-29 22:21 [PATCH v3 00/40] KVM: x86: GPR accessors and x86.{c,h} spring cleaning Sean Christopherson
` (39 preceding siblings ...)
2026-05-29 22:22 ` [PATCH v3 40/40] KVM: x86: Move a pile of stuff from kvm_host.h => x86.h Sean Christopherson
@ 2026-05-30 16:59 ` Paolo Bonzini
40 siblings, 0 replies; 87+ messages in thread
From: Paolo Bonzini @ 2026-05-30 16:59 UTC (permalink / raw)
To: Sean Christopherson, Vitaly Kuznetsov, David Woodhouse,
Paul Durrant
Cc: kvm, linux-kernel, Yosry Ahmed, Binbin Wu, David Woodhouse,
Kai Huang
On 5/30/26 00:21, Sean Christopherson wrote:
> The first half of this series adds proper, explicit "raw" versions of
> kvm_<reg>_{read,write}(), along with "e" versions (for hardcoded 32-bit
> accesses), and converts the existing kvm_<reg>_{read,write}() APIs into
> mode-aware variants.
>
> At the end of that journey, introduce regs.{c,h} to avoid moving _more_ code
> into x86.h, especially since the resulting code split would be super arbitrary.
What about getting everything up to patch 14 into 7.2, and then starting
7.3 development with these, the MMU split series, and possibly the
pfncache cleanups?
I'll be away starting June 28th, so we probably want to get those three
merged into kvm/next already towards the end of the merge window.
Paolo
> The second half of the series runs with the regs.{c,h} changes and performs
> spring cleaning on x86.{c,h} and asm/kvm_host.h (in case it wasn't already
> obvious, I have poor impulse control when it comes to cleaning up code).
>
> I'm most intersted in getting feedback on the file names (regs.{c,h} and
> msrs.{c,h}. I'm quite confident the actual code split is the way to go, and
> pulling stuff out of asm/kvm_host.h has been on my wish/todo list for years.
>
> I'll grab these fixes for 7.2 no matter what:
>
> KVM: x86: Trace hypercall register *after* truncating values for 32-bit
> KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of 64-bit mode
> KVM: x86/xen: Don't truncate RAX when handling hypercall from protected guest
> KVM: x86/xen: Bug the VM if 32-bit KVM observes a 64-bit mode hypercall
>
> Depending on how people feel about the names and cleanups, I'll either send
> the big cleanups as a separate pull request after the initial for-7.2 pull
> requests (if there's overwhelming consensus on the names/splits), or wait for
> 7.3 (if more discussion is needed).
>
> v3:
> - Collect more tags. [David, Binbin, Yosry]
> - Use kvm_run_sync_regs_{from,to}_user() instead of kvm_run_{g,s}et_regs().
> [Kai]
> - Fix a variety of typos. [Binbin]
> - Everything beyond patch 15...
>
> v2:
> - https://lore.kernel.org/all/20260514215355.1648463-2-seanjc@google.com
> - Collect tags. [Yosry, Kai]
> - Fix some truly egregious goofs. [Binbin]
> - Rename kvm_cache_regs.h => regs.h, add regs.c. [Yosry, because he
> complained, not because he actually suggested this :-D ]
> - Drop superfluous casting/masking of e*x() usage. [Kai]
>
> v1: https://lore.kernel.org/all/20260409235622.2052730-1-seanjc@google.com
>
> Sean Christopherson (40):
> KVM: SVM: Truncate INVLPGA address in compatibility mode
> KVM: x86/xen: Bug the VM if 32-bit KVM observes a 64-bit mode
> hypercall
> KVM: x86/xen: Don't truncate RAX when handling hypercall from
> protected guest
> KVM: VMX: Read 32-bit GPR values for ENCLS instructions outside of
> 64-bit mode
> KVM: x86: Trace hypercall register *after* truncating values for
> 32-bit
> KVM: x86: Rename kvm_cache_regs.h => regs.h
> KVM: x86: Move inlined GPR, CR, and DR helpers from x86.h to regs.h
> KVM: x86: Add mode-aware versions of kvm_<reg>_{read,write}() helpers
> KVM: x86: Drop non-raw kvm_<reg>_write() helpers
> KVM: nSVM: Use kvm_rax_read() now that it's mode-aware
> Revert "KVM: VMX: Read 32-bit GPR values for ENCLS instructions
> outside of 64-bit mode"
> KVM: x86: Harden is_64_bit_hypercall() against bugs on 32-bit kernels
> KVM: x86: Move update_cr8_intercept() to lapic.c
> KVM: x86: Move async #PF helpers to x86.h (as inlines)
> KVM: x86: Move the bulk of register specific code from x86.c to regs.c
> KVM: x86: Move local APIC specific helpers out of asm/kvm_host.h
> KVM: x86: Drop defunct vcpu_tsc_khz() declaration
> KVM: x86: Move kvm_caps and kvm_host_values to asm/kvm_host.h
> KVM: x86: Swap the include order between x86.h and mmu.h
> KVM: x86: Move tdp_enabled from kvm_host.h to mmu.h
> KVM: x86: Move eager_page_split to mmu.{c,h}
> KVM: x86/hyperv: Eliminate an unnecessary include of x86.h in hyperv.h
> KVM: x86: Move kvm_{load,put}_guest_fpu() to fpu.h
> KVM: x86: Extract get/set MSR (list) ioctl logic to helpers
> KVM: x86: Expose several TSC helpers via x86.h for use by MSR code
> KVM: x86: Move the bulk of MSR specific code from x86.c to msrs.{c,h}
> KVM: x86: Move register helper declarations from kvm_host.h => regs.h
> KVM: x86: Move kvm_{g,s}et_segment() to inline helpers in regs.h
> KVM: x86: Remove defunct kvm_load_segment_descriptor() declaration.
> KVM: x86: Move MSR helper declarations from kvm_host.h => msrs.h
> KVM: x86: Move MMU helper declarations from kvm_host.h => mmu.h
> KVM: x86: Move LLDT assembly wrappers into VMX
> KVM: x86: Move kvm_cpu_get_apicid() from kvm_host.h => avic.c
> KVM: x86: Move misc "VALID MASK" defines from kvm_host.h => x86.c
> KVM: x86: Move __kvm_irq_line_state() from kvm_host.h => ioapic.h
> KVM: x86: Move IRQ-related helper declarations from kvm_host.h =>
> irq.h
> KVM: x86: Move kvm_pv_send_ipi() declaration from kvm_host.h =>
> lapic.h
> KVM: x86/mmu: Move kvm_arch_async_page_ready() below
> kvm_tdp_page_fault()
> KVM: x86/mmu: Move kvm_mmu_do_page_fault() from mmu_internal.h =>
> mmu.c
> KVM: x86: Move a pile of stuff from kvm_host.h => x86.h
>
> arch/x86/include/asm/kvm_host.h | 452 +--
> arch/x86/kvm/Makefile | 4 +-
> arch/x86/kvm/cpuid.c | 13 +-
> arch/x86/kvm/emulate.c | 2 +-
> arch/x86/kvm/fpu.h | 26 +
> arch/x86/kvm/hyperv.c | 21 +-
> arch/x86/kvm/hyperv.h | 7 +-
> arch/x86/kvm/ioapic.c | 1 +
> arch/x86/kvm/ioapic.h | 12 +
> arch/x86/kvm/irq.c | 7 +
> arch/x86/kvm/irq.h | 6 +
> arch/x86/kvm/lapic.c | 28 +-
> arch/x86/kvm/lapic.h | 9 +
> arch/x86/kvm/mmu.h | 92 +-
> arch/x86/kvm/mmu/mmu.c | 134 +-
> arch/x86/kvm/mmu/mmu_internal.h | 66 -
> arch/x86/kvm/msrs.c | 2732 +++++++++++++++
> arch/x86/kvm/msrs.h | 156 +
> arch/x86/kvm/mtrr.c | 1 +
> arch/x86/kvm/regs.c | 875 +++++
> arch/x86/kvm/{kvm_cache_regs.h => regs.h} | 258 +-
> arch/x86/kvm/smm.c | 2 +-
> arch/x86/kvm/svm/avic.c | 5 +
> arch/x86/kvm/svm/nested.c | 9 +-
> arch/x86/kvm/svm/svm.c | 19 +-
> arch/x86/kvm/svm/svm.h | 2 +-
> arch/x86/kvm/vmx/nested.c | 9 +-
> arch/x86/kvm/vmx/nested.h | 2 +-
> arch/x86/kvm/vmx/sgx.c | 6 +-
> arch/x86/kvm/vmx/tdx.c | 18 +-
> arch/x86/kvm/vmx/vmx.c | 14 +-
> arch/x86/kvm/vmx/vmx.h | 2 +-
> arch/x86/kvm/x86.c | 3789 +--------------------
> arch/x86/kvm/x86.h | 480 ++-
> arch/x86/kvm/xen.c | 39 +-
> 35 files changed, 4722 insertions(+), 4576 deletions(-)
> create mode 100644 arch/x86/kvm/msrs.c
> create mode 100644 arch/x86/kvm/msrs.h
> create mode 100644 arch/x86/kvm/regs.c
> rename arch/x86/kvm/{kvm_cache_regs.h => regs.h} (50%)
>
>
> base-commit: d1568b1332b6b3b36b222c2868fc102727c12a34
^ permalink raw reply [flat|nested] 87+ messages in thread