* [PATCH v4 01/30] KVM: x86: Extract REGS and SREGS runtime sync code to helpers
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-15 2:16 ` Huang, Kai
2026-06-15 5:02 ` Binbin Wu
2026-06-13 0:03 ` [PATCH v4 02/30] KVM: x86: Move get_segment_base() to regs.h, as kvm_get_segment_base() Sean Christopherson
` (28 subsequent siblings)
29 siblings, 2 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Extract the REGS and SREGS portions of {store,sync}_regs() into separate
helpers in anticipation of moving the register specific code out of x86.c
and into regs.c.
No functional change intended.
Cc: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf122b8c3210..74b8f98dcb24 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12687,7 +12687,7 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
return 0;
}
-static void store_regs(struct kvm_vcpu *vcpu)
+static void kvm_run_sync_regs_to_user(struct kvm_vcpu *vcpu)
{
BUILD_BUG_ON(sizeof(struct kvm_sync_regs) > SYNC_REGS_SIZE_BYTES);
@@ -12696,13 +12696,18 @@ static void store_regs(struct kvm_vcpu *vcpu)
if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_SREGS)
__get_sregs(vcpu, &vcpu->run->s.regs.sregs);
+}
+
+static void store_regs(struct kvm_vcpu *vcpu)
+{
+ kvm_run_sync_regs_to_user(vcpu);
if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_EVENTS)
kvm_vcpu_ioctl_x86_get_vcpu_events(
vcpu, &vcpu->run->s.regs.events);
}
-static int sync_regs(struct kvm_vcpu *vcpu)
+static int kvm_run_sync_regs_from_user(struct kvm_vcpu *vcpu)
{
if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_REGS) {
__set_regs(vcpu, &vcpu->run->s.regs.regs);
@@ -12718,6 +12723,14 @@ static int sync_regs(struct kvm_vcpu *vcpu)
vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_SREGS;
}
+ return 0;
+}
+
+static int sync_regs(struct kvm_vcpu *vcpu)
+{
+ if (kvm_run_sync_regs_from_user(vcpu))
+ return -EINVAL;
+
if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_EVENTS) {
struct kvm_vcpu_events events = vcpu->run->s.regs.events;
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* Re: [PATCH v4 01/30] KVM: x86: Extract REGS and SREGS runtime sync code to helpers
2026-06-13 0:03 ` [PATCH v4 01/30] KVM: x86: Extract REGS and SREGS runtime sync code to helpers Sean Christopherson
@ 2026-06-15 2:16 ` Huang, Kai
2026-06-15 5:02 ` Binbin Wu
1 sibling, 0 replies; 64+ messages in thread
From: Huang, Kai @ 2026-06-15 2:16 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
yosry@kernel.org
On Fri, 2026-06-12 at 17:03 -0700, Sean Christopherson wrote:
> Extract the REGS and SREGS portions of {store,sync}_regs() into separate
> helpers in anticipation of moving the register specific code out of x86.c
> and into regs.c.
>
> No functional change intended.
>
> Cc: Yosry Ahmed <yosry@kernel.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH v4 01/30] KVM: x86: Extract REGS and SREGS runtime sync code to helpers
2026-06-13 0:03 ` [PATCH v4 01/30] KVM: x86: Extract REGS and SREGS runtime sync code to helpers Sean Christopherson
2026-06-15 2:16 ` Huang, Kai
@ 2026-06-15 5:02 ` Binbin Wu
1 sibling, 0 replies; 64+ messages in thread
From: Binbin Wu @ 2026-06-15 5:02 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Yosry Ahmed,
Kai Huang
On 6/13/2026 8:03 AM, Sean Christopherson wrote:
> Extract the REGS and SREGS portions of {store,sync}_regs() into separate
> helpers in anticipation of moving the register specific code out of x86.c
> and into regs.c.
>
> No functional change intended.
>
> Cc: Yosry Ahmed <yosry@kernel.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/kvm/x86.c | 17 +++++++++++++++--
> 1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cf122b8c3210..74b8f98dcb24 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12687,7 +12687,7 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
> return 0;
> }
>
> -static void store_regs(struct kvm_vcpu *vcpu)
> +static void kvm_run_sync_regs_to_user(struct kvm_vcpu *vcpu)
> {
> BUILD_BUG_ON(sizeof(struct kvm_sync_regs) > SYNC_REGS_SIZE_BYTES);
>
> @@ -12696,13 +12696,18 @@ static void store_regs(struct kvm_vcpu *vcpu)
>
> if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_SREGS)
> __get_sregs(vcpu, &vcpu->run->s.regs.sregs);
> +}
> +
> +static void store_regs(struct kvm_vcpu *vcpu)
> +{
> + kvm_run_sync_regs_to_user(vcpu);
>
> if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_EVENTS)
> kvm_vcpu_ioctl_x86_get_vcpu_events(
> vcpu, &vcpu->run->s.regs.events);
> }
>
> -static int sync_regs(struct kvm_vcpu *vcpu)
> +static int kvm_run_sync_regs_from_user(struct kvm_vcpu *vcpu)
> {
> if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_REGS) {
> __set_regs(vcpu, &vcpu->run->s.regs.regs);
> @@ -12718,6 +12723,14 @@ static int sync_regs(struct kvm_vcpu *vcpu)
> vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_SREGS;
> }
>
> + return 0;
> +}
> +
> +static int sync_regs(struct kvm_vcpu *vcpu)
> +{
> + if (kvm_run_sync_regs_from_user(vcpu))
> + return -EINVAL;
> +
> if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_EVENTS) {
> struct kvm_vcpu_events events = vcpu->run->s.regs.events;
>
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v4 02/30] KVM: x86: Move get_segment_base() to regs.h, as kvm_get_segment_base()
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
2026-06-13 0:03 ` [PATCH v4 01/30] KVM: x86: Extract REGS and SREGS runtime sync code to helpers Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-15 2:43 ` Huang, Kai
2026-06-15 5:03 ` Binbin Wu
2026-06-13 0:03 ` [PATCH v4 03/30] KVM: x86: Rename __{g,s}et_sregs2() => kvm_x86_vcpu_ioctl_{g,s}et_sregs2() Sean Christopherson
` (27 subsequent siblings)
29 siblings, 2 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Move get_segment_base() to regs.h, as kvm_get_segment_base(), so that the
bulk of the register code can be moved from x86.c to a new regs.c, without
simultaneously needing to rename "public" helpers to explicitly scope them
to KVM.
No functional change intended.
Cc: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/regs.h | 5 +++++
arch/x86/kvm/x86.c | 9 ++-------
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/regs.h b/arch/x86/kvm/regs.h
index a57ba26279ed..e5b16a0826d9 100644
--- a/arch/x86/kvm/regs.h
+++ b/arch/x86/kvm/regs.h
@@ -412,4 +412,9 @@ static inline bool is_guest_mode(struct kvm_vcpu *vcpu)
return vcpu->arch.hflags & HF_GUEST_MASK;
}
+static inline unsigned long kvm_get_segment_base(struct kvm_vcpu *vcpu, int seg)
+{
+ return kvm_x86_call(get_segment_base)(vcpu, seg);
+}
+
#endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 74b8f98dcb24..0f48cd630c32 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8492,11 +8492,6 @@ static int emulator_pio_out_emulated(struct x86_emulate_ctxt *ctxt,
return emulator_pio_out(emul_to_vcpu(ctxt), size, port, val, count);
}
-static unsigned long get_segment_base(struct kvm_vcpu *vcpu, int seg)
-{
- return kvm_x86_call(get_segment_base)(vcpu, seg);
-}
-
static void emulator_invlpg(struct x86_emulate_ctxt *ctxt, ulong address)
{
kvm_mmu_invlpg(emul_to_vcpu(ctxt), address);
@@ -8641,7 +8636,7 @@ static void emulator_set_idt(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt)
static unsigned long emulator_get_cached_segment_base(
struct x86_emulate_ctxt *ctxt, int seg)
{
- return get_segment_base(emul_to_vcpu(ctxt), seg);
+ return kvm_get_segment_base(emul_to_vcpu(ctxt), seg);
}
static bool emulator_get_segment(struct x86_emulate_ctxt *ctxt, u16 *selector,
@@ -13839,7 +13834,7 @@ unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu)
if (is_64_bit_mode(vcpu))
return kvm_rip_read(vcpu);
- return (u32)(get_segment_base(vcpu, VCPU_SREG_CS) +
+ return (u32)(kvm_get_segment_base(vcpu, VCPU_SREG_CS) +
kvm_rip_read(vcpu));
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_linear_rip);
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* Re: [PATCH v4 02/30] KVM: x86: Move get_segment_base() to regs.h, as kvm_get_segment_base()
2026-06-13 0:03 ` [PATCH v4 02/30] KVM: x86: Move get_segment_base() to regs.h, as kvm_get_segment_base() Sean Christopherson
@ 2026-06-15 2:43 ` Huang, Kai
2026-06-15 5:03 ` Binbin Wu
1 sibling, 0 replies; 64+ messages in thread
From: Huang, Kai @ 2026-06-15 2:43 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
yosry@kernel.org
On Fri, 2026-06-12 at 17:03 -0700, Sean Christopherson wrote:
> Move get_segment_base() to regs.h, as kvm_get_segment_base(), so that the
> bulk of the register code can be moved from x86.c to a new regs.c, without
> simultaneously needing to rename "public" helpers to explicitly scope them
> to KVM.
>
> No functional change intended.
>
> Cc: Yosry Ahmed <yosry@kernel.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Nit of nit: maybe explicitly say emulator_get_cached_segment_base() in x86.c
still calls kvm_get_segment_base() after code movement, so the change here is
more obvious.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v4 02/30] KVM: x86: Move get_segment_base() to regs.h, as kvm_get_segment_base()
2026-06-13 0:03 ` [PATCH v4 02/30] KVM: x86: Move get_segment_base() to regs.h, as kvm_get_segment_base() Sean Christopherson
2026-06-15 2:43 ` Huang, Kai
@ 2026-06-15 5:03 ` Binbin Wu
1 sibling, 0 replies; 64+ messages in thread
From: Binbin Wu @ 2026-06-15 5:03 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Yosry Ahmed,
Kai Huang
On 6/13/2026 8:03 AM, Sean Christopherson wrote:
> Move get_segment_base() to regs.h, as kvm_get_segment_base(), so that the
> bulk of the register code can be moved from x86.c to a new regs.c, without
> simultaneously needing to rename "public" helpers to explicitly scope them
> to KVM.
>
> No functional change intended.
>
> Cc: Yosry Ahmed <yosry@kernel.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/kvm/regs.h | 5 +++++
> arch/x86/kvm/x86.c | 9 ++-------
> 2 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/regs.h b/arch/x86/kvm/regs.h
> index a57ba26279ed..e5b16a0826d9 100644
> --- a/arch/x86/kvm/regs.h
> +++ b/arch/x86/kvm/regs.h
> @@ -412,4 +412,9 @@ static inline bool is_guest_mode(struct kvm_vcpu *vcpu)
> return vcpu->arch.hflags & HF_GUEST_MASK;
> }
>
> +static inline unsigned long kvm_get_segment_base(struct kvm_vcpu *vcpu, int seg)
> +{
> + return kvm_x86_call(get_segment_base)(vcpu, seg);
> +}
> +
> #endif
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 74b8f98dcb24..0f48cd630c32 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8492,11 +8492,6 @@ static int emulator_pio_out_emulated(struct x86_emulate_ctxt *ctxt,
> return emulator_pio_out(emul_to_vcpu(ctxt), size, port, val, count);
> }
>
> -static unsigned long get_segment_base(struct kvm_vcpu *vcpu, int seg)
> -{
> - return kvm_x86_call(get_segment_base)(vcpu, seg);
> -}
> -
> static void emulator_invlpg(struct x86_emulate_ctxt *ctxt, ulong address)
> {
> kvm_mmu_invlpg(emul_to_vcpu(ctxt), address);
> @@ -8641,7 +8636,7 @@ static void emulator_set_idt(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt)
> static unsigned long emulator_get_cached_segment_base(
> struct x86_emulate_ctxt *ctxt, int seg)
> {
> - return get_segment_base(emul_to_vcpu(ctxt), seg);
> + return kvm_get_segment_base(emul_to_vcpu(ctxt), seg);
> }
>
> static bool emulator_get_segment(struct x86_emulate_ctxt *ctxt, u16 *selector,
> @@ -13839,7 +13834,7 @@ unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu)
>
> if (is_64_bit_mode(vcpu))
> return kvm_rip_read(vcpu);
> - return (u32)(get_segment_base(vcpu, VCPU_SREG_CS) +
> + return (u32)(kvm_get_segment_base(vcpu, VCPU_SREG_CS) +
> kvm_rip_read(vcpu));
> }
> EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_linear_rip);
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v4 03/30] KVM: x86: Rename __{g,s}et_sregs2() => kvm_x86_vcpu_ioctl_{g,s}et_sregs2()
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
2026-06-13 0:03 ` [PATCH v4 01/30] KVM: x86: Extract REGS and SREGS runtime sync code to helpers Sean Christopherson
2026-06-13 0:03 ` [PATCH v4 02/30] KVM: x86: Move get_segment_base() to regs.h, as kvm_get_segment_base() Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-15 2:46 ` Huang, Kai
2026-06-15 5:13 ` Binbin Wu
2026-06-13 0:03 ` [PATCH v4 04/30] KVM: x86: Move the bulk of register specific code from x86.c to regs.c Sean Christopherson
` (26 subsequent siblings)
29 siblings, 2 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Rename the KVM_{G,S}ET_SREGS2 helpers in anticipation of moving them out of
x86.c (while leaving the ioctl dispatch behind). Having globally visible
APIs named __{g,s}et_sregs2() would be "fine", but ugly, given that
__{g,s}et_sregs() will NOT be globally visible. As a bonus, this makes it
a bit more obvious that the helpers implement newer versions of
kvm_arch_vcpu_ioctl_set_sregs().
No functional change intended.
Cc: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0f48cd630c32..0fa8a9e3b8b2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -133,8 +133,10 @@ static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
static void store_regs(struct kvm_vcpu *vcpu);
static int sync_regs(struct kvm_vcpu *vcpu);
-static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
-static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
+static int kvm_x86_vcpu_ioctl_set_sregs2(struct kvm_vcpu *vcpu,
+ struct kvm_sregs2 *sregs2);
+static void kvm_x86_vcpu_ioctl_get_sregs2(struct kvm_vcpu *vcpu,
+ struct kvm_sregs2 *sregs2);
static DEFINE_MUTEX(vendor_module_lock);
static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
@@ -6623,7 +6625,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = -ENOMEM;
if (!u.sregs2)
goto out;
- __get_sregs2(vcpu, u.sregs2);
+ kvm_x86_vcpu_ioctl_get_sregs2(vcpu, u.sregs2);
r = -EFAULT;
if (copy_to_user(argp, u.sregs2, sizeof(struct kvm_sregs2)))
goto out;
@@ -6642,7 +6644,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
u.sregs2 = NULL;
goto out;
}
- r = __set_sregs2(vcpu, u.sregs2);
+ r = kvm_x86_vcpu_ioctl_set_sregs2(vcpu, u.sregs2);
break;
}
case KVM_HAS_DEVICE_ATTR:
@@ -12210,7 +12212,8 @@ static void __get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
(unsigned long *)sregs->interrupt_bitmap);
}
-static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
+static void kvm_x86_vcpu_ioctl_get_sregs2(struct kvm_vcpu *vcpu,
+ struct kvm_sregs2 *sregs2)
{
int i;
@@ -12480,7 +12483,8 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
return 0;
}
-static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
+static int kvm_x86_vcpu_ioctl_set_sregs2(struct kvm_vcpu *vcpu,
+ struct kvm_sregs2 *sregs2)
{
int mmu_reset_needed = 0;
bool valid_pdptrs = sregs2->flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* Re: [PATCH v4 03/30] KVM: x86: Rename __{g,s}et_sregs2() => kvm_x86_vcpu_ioctl_{g,s}et_sregs2()
2026-06-13 0:03 ` [PATCH v4 03/30] KVM: x86: Rename __{g,s}et_sregs2() => kvm_x86_vcpu_ioctl_{g,s}et_sregs2() Sean Christopherson
@ 2026-06-15 2:46 ` Huang, Kai
2026-06-15 5:13 ` Binbin Wu
1 sibling, 0 replies; 64+ messages in thread
From: Huang, Kai @ 2026-06-15 2:46 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
yosry@kernel.org
On Fri, 2026-06-12 at 17:03 -0700, Sean Christopherson wrote:
> Rename the KVM_{G,S}ET_SREGS2 helpers in anticipation of moving them out of
> x86.c (while leaving the ioctl dispatch behind). Having globally visible
> APIs named __{g,s}et_sregs2() would be "fine", but ugly, given that
> __{g,s}et_sregs() will NOT be globally visible. As a bonus, this makes it
> a bit more obvious that the helpers implement newer versions of
> kvm_arch_vcpu_ioctl_set_sregs().
>
> No functional change intended.
>
> Cc: Yosry Ahmed <yosry@kernel.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH v4 03/30] KVM: x86: Rename __{g,s}et_sregs2() => kvm_x86_vcpu_ioctl_{g,s}et_sregs2()
2026-06-13 0:03 ` [PATCH v4 03/30] KVM: x86: Rename __{g,s}et_sregs2() => kvm_x86_vcpu_ioctl_{g,s}et_sregs2() Sean Christopherson
2026-06-15 2:46 ` Huang, Kai
@ 2026-06-15 5:13 ` Binbin Wu
2026-06-15 15:58 ` Sean Christopherson
1 sibling, 1 reply; 64+ messages in thread
From: Binbin Wu @ 2026-06-15 5:13 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Yosry Ahmed,
Kai Huang
On 6/13/2026 8:03 AM, Sean Christopherson wrote:
> Rename the KVM_{G,S}ET_SREGS2 helpers in anticipation of moving them out of
> x86.c (while leaving the ioctl dispatch behind). Having globally visible
> APIs named __{g,s}et_sregs2() would be "fine", but ugly, given that
> __{g,s}et_sregs() will NOT be globally visible. As a bonus, this makes it
> a bit more obvious that the helpers implement newer versions of
> kvm_arch_vcpu_ioctl_set_sregs().
>
> No functional change intended.
>
> Cc: Yosry Ahmed <yosry@kernel.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/x86.c | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0f48cd630c32..0fa8a9e3b8b2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -133,8 +133,10 @@ static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
> static void store_regs(struct kvm_vcpu *vcpu);
> static int sync_regs(struct kvm_vcpu *vcpu);
>
> -static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
> -static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
> +static int kvm_x86_vcpu_ioctl_set_sregs2(struct kvm_vcpu *vcpu,
> + struct kvm_sregs2 *sregs2);
> +static void kvm_x86_vcpu_ioctl_get_sregs2(struct kvm_vcpu *vcpu,
> + struct kvm_sregs2 *sregs2);
>
Existing code uses the pattern kvm_vcpu_ioctl_x86_xxx, is it better to
align the pattern?
> static DEFINE_MUTEX(vendor_module_lock);
> static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
> @@ -6623,7 +6625,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
> r = -ENOMEM;
> if (!u.sregs2)
> goto out;
> - __get_sregs2(vcpu, u.sregs2);
> + kvm_x86_vcpu_ioctl_get_sregs2(vcpu, u.sregs2);
> r = -EFAULT;
> if (copy_to_user(argp, u.sregs2, sizeof(struct kvm_sregs2)))
> goto out;
> @@ -6642,7 +6644,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
> u.sregs2 = NULL;
> goto out;
> }
> - r = __set_sregs2(vcpu, u.sregs2);
> + r = kvm_x86_vcpu_ioctl_set_sregs2(vcpu, u.sregs2);
> break;
> }
> case KVM_HAS_DEVICE_ATTR:
> @@ -12210,7 +12212,8 @@ static void __get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
> (unsigned long *)sregs->interrupt_bitmap);
> }
>
> -static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
> +static void kvm_x86_vcpu_ioctl_get_sregs2(struct kvm_vcpu *vcpu,
> + struct kvm_sregs2 *sregs2)
> {
> int i;
>
> @@ -12480,7 +12483,8 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
> return 0;
> }
>
> -static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
> +static int kvm_x86_vcpu_ioctl_set_sregs2(struct kvm_vcpu *vcpu,
> + struct kvm_sregs2 *sregs2)
> {
> int mmu_reset_needed = 0;
> bool valid_pdptrs = sregs2->flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH v4 03/30] KVM: x86: Rename __{g,s}et_sregs2() => kvm_x86_vcpu_ioctl_{g,s}et_sregs2()
2026-06-15 5:13 ` Binbin Wu
@ 2026-06-15 15:58 ` Sean Christopherson
0 siblings, 0 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-15 15:58 UTC (permalink / raw)
To: Binbin Wu
Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Yosry Ahmed,
Kai Huang
On Mon, Jun 15, 2026, Binbin Wu wrote:
> On 6/13/2026 8:03 AM, Sean Christopherson wrote:
> > Rename the KVM_{G,S}ET_SREGS2 helpers in anticipation of moving them out of
> > x86.c (while leaving the ioctl dispatch behind). Having globally visible
> > APIs named __{g,s}et_sregs2() would be "fine", but ugly, given that
> > __{g,s}et_sregs() will NOT be globally visible. As a bonus, this makes it
> > a bit more obvious that the helpers implement newer versions of
> > kvm_arch_vcpu_ioctl_set_sregs().
> >
> > No functional change intended.
> >
> > Cc: Yosry Ahmed <yosry@kernel.org>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > arch/x86/kvm/x86.c | 16 ++++++++++------
> > 1 file changed, 10 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 0f48cd630c32..0fa8a9e3b8b2 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -133,8 +133,10 @@ static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
> > static void store_regs(struct kvm_vcpu *vcpu);
> > static int sync_regs(struct kvm_vcpu *vcpu);
> >
> > -static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
> > -static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
> > +static int kvm_x86_vcpu_ioctl_set_sregs2(struct kvm_vcpu *vcpu,
> > + struct kvm_sregs2 *sregs2);
> > +static void kvm_x86_vcpu_ioctl_get_sregs2(struct kvm_vcpu *vcpu,
> > + struct kvm_sregs2 *sregs2);
> >
>
> Existing code uses the pattern kvm_vcpu_ioctl_x86_xxx, is it better to
> align the pattern?
Yeah, good call, I'll use kvm_vcpu_ioctl_x86_xxx. In find the effective namespace
to be a bit odd, but that's not a good reason for these to be different.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v4 04/30] KVM: x86: Move the bulk of register specific code from x86.c to regs.c
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (2 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 03/30] KVM: x86: Rename __{g,s}et_sregs2() => kvm_x86_vcpu_ioctl_{g,s}et_sregs2() Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-15 5:25 ` Binbin Wu
2026-06-13 0:03 ` [PATCH v4 05/30] KVM: x86: Move local APIC specific helpers out of asm/kvm_host.h Sean Christopherson
` (25 subsequent siblings)
29 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Introduce regs.c, and move the vast majority of register specific code out
of x86.c and into regs.c. Deliberately leave behind MSR code, as KVM's MSR
support is complex enough to warrant its own compilation unit, and doesn't
have much in common with the other register code.
Note, "struct kvm_sregs" has fields for EFER and MSR_IA32_APICBASE, and so
the {G,S}ET_REGS flows technically contain a tiny amount of MSR code.
MSR_IA32_APICBASE is already managed by lapic.c, and so doesn't require a
"placement decision". As for EFER, leave all other EFER handling in x86.c
(later to be moved to msrs.c). The primary interface to EFER, set_efer(),
is very much MSR specific, even though EFER is arguably more of a Control
Register than an MSR.
No functional change intended.
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/Makefile | 4 +-
arch/x86/kvm/regs.c | 875 +++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/regs.h | 25 ++
arch/x86/kvm/x86.c | 887 +-----------------------------------------
arch/x86/kvm/x86.h | 2 +
5 files changed, 906 insertions(+), 887 deletions(-)
create mode 100644 arch/x86/kvm/regs.c
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 77337c37324b..f39c311fd756 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -5,8 +5,8 @@ ccflags-$(CONFIG_KVM_WERROR) += -Werror
include $(srctree)/virt/kvm/Makefile.kvm
-kvm-y += x86.o emulate.o irq.o lapic.o cpuid.o pmu.o mtrr.o \
- debugfs.o mmu/mmu.o mmu/page_track.o mmu/spte.o
+kvm-y += x86.o emulate.o irq.o lapic.o cpuid.o pmu.o regs.o \
+ mtrr.o debugfs.o mmu/mmu.o mmu/page_track.o mmu/spte.o
kvm-$(CONFIG_X86_64) += mmu/tdp_iter.o mmu/tdp_mmu.o
kvm-$(CONFIG_KVM_IOAPIC) += i8259.o i8254.o ioapic.o
diff --git a/arch/x86/kvm/regs.c b/arch/x86/kvm/regs.c
new file mode 100644
index 000000000000..fb4478301076
--- /dev/null
+++ b/arch/x86/kvm/regs.c
@@ -0,0 +1,875 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/kvm_host.h>
+
+#include "lapic.h"
+#include "mmu.h"
+#include "regs.h"
+#include "x86.h"
+
+unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu)
+{
+ /* Can't read the RIP when guest state is protected, just return 0 */
+ if (vcpu->arch.guest_state_protected)
+ return 0;
+
+ if (is_64_bit_mode(vcpu))
+ return kvm_rip_read(vcpu);
+ return (u32)(kvm_get_segment_base(vcpu, VCPU_SREG_CS) +
+ kvm_rip_read(vcpu));
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_linear_rip);
+
+bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip)
+{
+ return kvm_get_linear_rip(vcpu) == linear_rip;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_is_linear_rip);
+
+unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
+{
+ unsigned long rflags;
+
+ rflags = kvm_x86_call(get_rflags)(vcpu);
+ if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
+ rflags &= ~X86_EFLAGS_TF;
+ return rflags;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_rflags);
+
+void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+ if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP &&
+ kvm_is_linear_rip(vcpu, vcpu->arch.singlestep_rip))
+ rflags |= X86_EFLAGS_TF;
+ kvm_x86_call(set_rflags)(vcpu, rflags);
+}
+
+void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+ __kvm_set_rflags(vcpu, rflags);
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_rflags);
+
+static void __get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+ if (vcpu->arch.emulate_regs_need_sync_to_vcpu) {
+ /*
+ * We are here if userspace calls get_regs() in the middle of
+ * instruction emulation. Registers state needs to be copied
+ * back from emulation context to vcpu. Userspace shouldn't do
+ * that usually, but some bad designed PV devices (vmware
+ * backdoor interface) need this to work
+ */
+ emulator_writeback_register_cache(vcpu->arch.emulate_ctxt);
+ vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
+ }
+ regs->rax = kvm_rax_read_raw(vcpu);
+ regs->rbx = kvm_rbx_read_raw(vcpu);
+ regs->rcx = kvm_rcx_read_raw(vcpu);
+ regs->rdx = kvm_rdx_read_raw(vcpu);
+ regs->rsi = kvm_rsi_read_raw(vcpu);
+ regs->rdi = kvm_rdi_read_raw(vcpu);
+ regs->rsp = kvm_rsp_read(vcpu);
+ regs->rbp = kvm_rbp_read_raw(vcpu);
+#ifdef CONFIG_X86_64
+ regs->r8 = kvm_r8_read_raw(vcpu);
+ regs->r9 = kvm_r9_read_raw(vcpu);
+ regs->r10 = kvm_r10_read_raw(vcpu);
+ regs->r11 = kvm_r11_read_raw(vcpu);
+ regs->r12 = kvm_r12_read_raw(vcpu);
+ regs->r13 = kvm_r13_read_raw(vcpu);
+ regs->r14 = kvm_r14_read_raw(vcpu);
+ regs->r15 = kvm_r15_read_raw(vcpu);
+#endif
+
+ regs->rip = kvm_rip_read(vcpu);
+ regs->rflags = kvm_get_rflags(vcpu);
+}
+
+int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+ if (vcpu->kvm->arch.has_protected_state &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
+ vcpu_load(vcpu);
+ __get_regs(vcpu, regs);
+ vcpu_put(vcpu);
+ return 0;
+}
+
+static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+ vcpu->arch.emulate_regs_need_sync_from_vcpu = true;
+ vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
+
+ kvm_rax_write_raw(vcpu, regs->rax);
+ kvm_rbx_write_raw(vcpu, regs->rbx);
+ kvm_rcx_write_raw(vcpu, regs->rcx);
+ kvm_rdx_write_raw(vcpu, regs->rdx);
+ kvm_rsi_write_raw(vcpu, regs->rsi);
+ kvm_rdi_write_raw(vcpu, regs->rdi);
+ kvm_rsp_write(vcpu, regs->rsp);
+ kvm_rbp_write_raw(vcpu, regs->rbp);
+#ifdef CONFIG_X86_64
+ kvm_r8_write_raw(vcpu, regs->r8);
+ kvm_r9_write_raw(vcpu, regs->r9);
+ kvm_r10_write_raw(vcpu, regs->r10);
+ kvm_r11_write_raw(vcpu, regs->r11);
+ kvm_r12_write_raw(vcpu, regs->r12);
+ kvm_r13_write_raw(vcpu, regs->r13);
+ kvm_r14_write_raw(vcpu, regs->r14);
+ kvm_r15_write_raw(vcpu, regs->r15);
+#endif
+
+ kvm_rip_write(vcpu, regs->rip);
+ kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED);
+
+ vcpu->arch.exception.pending = false;
+ vcpu->arch.exception_vmexit.pending = false;
+
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+}
+
+int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+ if (vcpu->kvm->arch.has_protected_state &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
+ vcpu_load(vcpu);
+ __set_regs(vcpu, regs);
+ vcpu_put(vcpu);
+ return 0;
+}
+
+static inline u64 pdptr_rsvd_bits(struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.reserved_gpa_bits | rsvd_bits(5, 8) | rsvd_bits(1, 2);
+}
+
+/*
+ * Load the pae pdptrs. Return 1 if they are all valid, 0 otherwise.
+ */
+int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
+{
+ struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
+ gfn_t pdpt_gfn = cr3 >> PAGE_SHIFT;
+ gpa_t real_gpa;
+ int i;
+ int ret;
+ u64 pdpte[ARRAY_SIZE(mmu->pdptrs)];
+
+ /*
+ * If the MMU is nested, CR3 holds an L2 GPA and needs to be translated
+ * to an L1 GPA.
+ */
+ real_gpa = kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(pdpt_gfn),
+ PFERR_USER_MASK | PFERR_WRITE_MASK |
+ PFERR_GUEST_PAGE_MASK, NULL, 0);
+ if (real_gpa == INVALID_GPA)
+ return 0;
+
+ /* Note the offset, PDPTRs are 32 byte aligned when using PAE paging. */
+ ret = kvm_vcpu_read_guest_page(vcpu, gpa_to_gfn(real_gpa), pdpte,
+ cr3 & GENMASK(11, 5), sizeof(pdpte));
+ if (ret < 0)
+ return 0;
+
+ for (i = 0; i < ARRAY_SIZE(pdpte); ++i) {
+ if ((pdpte[i] & PT_PRESENT_MASK) &&
+ (pdpte[i] & pdptr_rsvd_bits(vcpu))) {
+ return 0;
+ }
+ }
+
+ /*
+ * Marking VCPU_REG_PDPTR dirty doesn't work for !tdp_enabled.
+ * Shadow page roots need to be reconstructed instead.
+ */
+ if (!tdp_enabled && memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)))
+ kvm_mmu_free_roots(vcpu->kvm, mmu, KVM_MMU_ROOT_CURRENT);
+
+ memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs));
+ kvm_register_mark_dirty(vcpu, VCPU_REG_PDPTR);
+ kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
+ vcpu->arch.pdptrs_from_userspace = false;
+
+ return 1;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(load_pdptrs);
+
+static bool kvm_is_valid_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
+{
+#ifdef CONFIG_X86_64
+ if (cr0 & 0xffffffff00000000UL)
+ return false;
+#endif
+
+ if ((cr0 & X86_CR0_NW) && !(cr0 & X86_CR0_CD))
+ return false;
+
+ if ((cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PE))
+ return false;
+
+ return kvm_x86_call(is_valid_cr0)(vcpu, cr0);
+}
+
+void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0)
+{
+ /*
+ * CR0.WP is incorporated into the MMU role, but only for non-nested,
+ * indirect shadow MMUs. If paging is disabled, no updates are needed
+ * as there are no permission bits to emulate. If TDP is enabled, the
+ * MMU's metadata needs to be updated, e.g. so that emulating guest
+ * translations does the right thing, but there's no need to unload the
+ * root as CR0.WP doesn't affect SPTEs.
+ */
+ if ((cr0 ^ old_cr0) == X86_CR0_WP) {
+ if (!(cr0 & X86_CR0_PG))
+ return;
+
+ if (tdp_enabled) {
+ kvm_init_mmu(vcpu);
+ return;
+ }
+ }
+
+ if ((cr0 ^ old_cr0) & X86_CR0_PG) {
+ /*
+ * Clearing CR0.PG is defined to flush the TLB from the guest's
+ * perspective.
+ */
+ if (!(cr0 & X86_CR0_PG))
+ kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+ /*
+ * Check for async #PF completion events when enabling paging,
+ * as the vCPU may have previously encountered async #PFs (it's
+ * entirely legal for the guest to toggle paging on/off without
+ * waiting for the async #PF queue to drain).
+ */
+ else if (kvm_pv_async_pf_enabled(vcpu))
+ kvm_make_request(KVM_REQ_APF_READY, vcpu);
+ }
+
+ if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS)
+ kvm_mmu_reset_context(vcpu);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_post_set_cr0);
+
+int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
+{
+ unsigned long old_cr0 = kvm_read_cr0(vcpu);
+
+ if (!kvm_is_valid_cr0(vcpu, cr0))
+ return 1;
+
+ cr0 |= X86_CR0_ET;
+
+ /* Write to CR0 reserved bits are ignored, even on Intel. */
+ cr0 &= ~CR0_RESERVED_BITS;
+
+#ifdef CONFIG_X86_64
+ if ((vcpu->arch.efer & EFER_LME) && !is_paging(vcpu) &&
+ (cr0 & X86_CR0_PG)) {
+ int cs_db, cs_l;
+
+ if (!is_pae(vcpu))
+ return 1;
+ kvm_x86_call(get_cs_db_l_bits)(vcpu, &cs_db, &cs_l);
+ if (cs_l)
+ return 1;
+ }
+#endif
+ if (!(vcpu->arch.efer & EFER_LME) && (cr0 & X86_CR0_PG) &&
+ is_pae(vcpu) && ((cr0 ^ old_cr0) & X86_CR0_PDPTR_BITS) &&
+ !load_pdptrs(vcpu, kvm_read_cr3(vcpu)))
+ return 1;
+
+ if (!(cr0 & X86_CR0_PG) &&
+ (is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)))
+ return 1;
+
+ if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET))
+ return 1;
+
+ kvm_x86_call(set_cr0)(vcpu, cr0);
+
+ kvm_post_set_cr0(vcpu, old_cr0, cr0);
+
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr0);
+
+void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
+{
+ (void)kvm_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~0x0eul) | (msw & 0x0f));
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lmsw);
+
+int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
+{
+ bool skip_tlb_flush = false;
+ unsigned long pcid = 0;
+#ifdef CONFIG_X86_64
+ if (kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)) {
+ skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
+ cr3 &= ~X86_CR3_PCID_NOFLUSH;
+ pcid = cr3 & X86_CR3_PCID_MASK;
+ }
+#endif
+
+ /* PDPTRs are always reloaded for PAE paging. */
+ if (cr3 == kvm_read_cr3(vcpu) && !is_pae_paging(vcpu))
+ goto handle_tlb_flush;
+
+ /*
+ * Do not condition the GPA check on long mode, this helper is used to
+ * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
+ * the current vCPU mode is accurate.
+ */
+ if (!kvm_vcpu_is_legal_cr3(vcpu, cr3))
+ return 1;
+
+ if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
+ return 1;
+
+ if (cr3 != kvm_read_cr3(vcpu))
+ kvm_mmu_new_pgd(vcpu, cr3);
+
+ vcpu->arch.cr3 = cr3;
+ kvm_register_mark_dirty(vcpu, VCPU_REG_CR3);
+ /* Do not call post_set_cr3, we do not get here for confidential guests. */
+
+handle_tlb_flush:
+ /*
+ * A load of CR3 that flushes the TLB flushes only the current PCID,
+ * even if PCID is disabled, in which case PCID=0 is flushed. It's a
+ * moot point in the end because _disabling_ PCID will flush all PCIDs,
+ * and it's impossible to use a non-zero PCID when PCID is disabled,
+ * i.e. only PCID=0 can be relevant.
+ */
+ if (!skip_tlb_flush)
+ kvm_invalidate_pcid(vcpu, pcid);
+
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr3);
+
+static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+{
+ return __kvm_is_valid_cr4(vcpu, cr4) &&
+ kvm_x86_call(is_valid_cr4)(vcpu, cr4);
+}
+
+void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4)
+{
+ if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS)
+ kvm_mmu_reset_context(vcpu);
+
+ /*
+ * If CR4.PCIDE is changed 0 -> 1, there is no need to flush the TLB
+ * according to the SDM; however, stale prev_roots could be reused
+ * incorrectly in the future after a MOV to CR3 with NOFLUSH=1, so we
+ * free them all. This is *not* a superset of KVM_REQ_TLB_FLUSH_GUEST
+ * or KVM_REQ_TLB_FLUSH_CURRENT, because the hardware TLB is not flushed,
+ * so fall through.
+ */
+ if (!tdp_enabled &&
+ (cr4 & X86_CR4_PCIDE) && !(old_cr4 & X86_CR4_PCIDE))
+ kvm_mmu_unload(vcpu);
+
+ /*
+ * The TLB has to be flushed for all PCIDs if any of the following
+ * (architecturally required) changes happen:
+ * - CR4.PCIDE is changed from 1 to 0
+ * - CR4.PGE is toggled
+ *
+ * This is a superset of KVM_REQ_TLB_FLUSH_CURRENT.
+ */
+ if (((cr4 ^ old_cr4) & X86_CR4_PGE) ||
+ (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
+ kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+
+ /*
+ * The TLB has to be flushed for the current PCID if any of the
+ * following (architecturally required) changes happen:
+ * - CR4.SMEP is changed from 0 to 1
+ * - CR4.PAE is toggled
+ */
+ else if (((cr4 ^ old_cr4) & X86_CR4_PAE) ||
+ ((cr4 & X86_CR4_SMEP) && !(old_cr4 & X86_CR4_SMEP)))
+ kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
+
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_post_set_cr4);
+
+int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+{
+ unsigned long old_cr4 = kvm_read_cr4(vcpu);
+
+ if (!kvm_is_valid_cr4(vcpu, cr4))
+ return 1;
+
+ if (is_long_mode(vcpu)) {
+ if (!(cr4 & X86_CR4_PAE))
+ return 1;
+ if ((cr4 ^ old_cr4) & X86_CR4_LA57)
+ return 1;
+ } else if (is_paging(vcpu) && (cr4 & X86_CR4_PAE)
+ && ((cr4 ^ old_cr4) & X86_CR4_PDPTR_BITS)
+ && !load_pdptrs(vcpu, kvm_read_cr3(vcpu)))
+ return 1;
+
+ if ((cr4 & X86_CR4_PCIDE) && !(old_cr4 & X86_CR4_PCIDE)) {
+ /* PCID can not be enabled when cr3[11:0]!=000H or EFER.LMA=0 */
+ if ((kvm_read_cr3(vcpu) & X86_CR3_PCID_MASK) || !is_long_mode(vcpu))
+ return 1;
+ }
+
+ if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP))
+ return 1;
+
+ kvm_x86_call(set_cr4)(vcpu, cr4);
+
+ kvm_post_set_cr4(vcpu, old_cr4, cr4);
+
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr4);
+
+int kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8)
+{
+ if (cr8 & CR8_RESERVED_BITS)
+ return 1;
+ if (lapic_in_kernel(vcpu))
+ kvm_lapic_set_tpr(vcpu, cr8);
+ else
+ vcpu->arch.cr8 = cr8;
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr8);
+
+unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu)
+{
+ if (lapic_in_kernel(vcpu))
+ return kvm_lapic_get_cr8(vcpu);
+ else
+ return vcpu->arch.cr8;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_cr8);
+
+static void __get_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
+{
+ struct desc_ptr dt;
+
+ if (vcpu->arch.guest_state_protected)
+ goto skip_protected_regs;
+
+ kvm_handle_exception_payload_quirk(vcpu);
+
+ kvm_get_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
+ kvm_get_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
+ kvm_get_segment(vcpu, &sregs->es, VCPU_SREG_ES);
+ kvm_get_segment(vcpu, &sregs->fs, VCPU_SREG_FS);
+ kvm_get_segment(vcpu, &sregs->gs, VCPU_SREG_GS);
+ kvm_get_segment(vcpu, &sregs->ss, VCPU_SREG_SS);
+
+ kvm_get_segment(vcpu, &sregs->tr, VCPU_SREG_TR);
+ kvm_get_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR);
+
+ kvm_x86_call(get_idt)(vcpu, &dt);
+ sregs->idt.limit = dt.size;
+ sregs->idt.base = dt.address;
+ kvm_x86_call(get_gdt)(vcpu, &dt);
+ sregs->gdt.limit = dt.size;
+ sregs->gdt.base = dt.address;
+
+ sregs->cr2 = vcpu->arch.cr2;
+ sregs->cr3 = kvm_read_cr3(vcpu);
+
+skip_protected_regs:
+ sregs->cr0 = kvm_read_cr0(vcpu);
+ sregs->cr4 = kvm_read_cr4(vcpu);
+ sregs->cr8 = kvm_get_cr8(vcpu);
+ sregs->efer = vcpu->arch.efer;
+ sregs->apic_base = vcpu->arch.apic_base;
+}
+
+static void __get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
+{
+ __get_sregs_common(vcpu, sregs);
+
+ if (vcpu->arch.guest_state_protected)
+ return;
+
+ if (vcpu->arch.interrupt.injected && !vcpu->arch.interrupt.soft)
+ set_bit(vcpu->arch.interrupt.nr,
+ (unsigned long *)sregs->interrupt_bitmap);
+}
+
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+ struct kvm_sregs *sregs)
+{
+ if (vcpu->kvm->arch.has_protected_state &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
+ vcpu_load(vcpu);
+ __get_sregs(vcpu, sregs);
+ vcpu_put(vcpu);
+ return 0;
+}
+
+void kvm_x86_vcpu_ioctl_get_sregs2(struct kvm_vcpu *vcpu,
+ struct kvm_sregs2 *sregs2)
+{
+ int i;
+
+ __get_sregs_common(vcpu, (struct kvm_sregs *)sregs2);
+
+ if (vcpu->arch.guest_state_protected)
+ return;
+
+ if (is_pae_paging(vcpu)) {
+ kvm_vcpu_srcu_read_lock(vcpu);
+ for (i = 0 ; i < 4 ; i++)
+ sregs2->pdptrs[i] = kvm_pdptr_read(vcpu, i);
+ sregs2->flags |= KVM_SREGS2_FLAGS_PDPTRS_VALID;
+ kvm_vcpu_srcu_read_unlock(vcpu);
+ }
+}
+
+static bool kvm_is_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
+{
+ if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG)) {
+ /*
+ * When EFER.LME and CR0.PG are set, the processor is in
+ * 64-bit mode (though maybe in a 32-bit code segment).
+ * CR4.PAE and EFER.LMA must be set.
+ */
+ if (!(sregs->cr4 & X86_CR4_PAE) || !(sregs->efer & EFER_LMA))
+ return false;
+ if (!kvm_vcpu_is_legal_cr3(vcpu, sregs->cr3))
+ return false;
+ } else {
+ /*
+ * Not in 64-bit mode: EFER.LMA is clear and the code
+ * segment cannot be 64-bit.
+ */
+ if (sregs->efer & EFER_LMA || sregs->cs.l)
+ return false;
+ }
+
+ return kvm_is_valid_cr4(vcpu, sregs->cr4) &&
+ kvm_is_valid_cr0(vcpu, sregs->cr0);
+}
+
+static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
+ int *mmu_reset_needed, bool update_pdptrs)
+{
+ int idx;
+ struct desc_ptr dt;
+
+ if (!kvm_is_valid_sregs(vcpu, sregs))
+ return -EINVAL;
+
+ if (kvm_apic_set_base(vcpu, sregs->apic_base, true))
+ return -EINVAL;
+
+ if (vcpu->arch.guest_state_protected)
+ return 0;
+
+ dt.size = sregs->idt.limit;
+ dt.address = sregs->idt.base;
+ kvm_x86_call(set_idt)(vcpu, &dt);
+ dt.size = sregs->gdt.limit;
+ dt.address = sregs->gdt.base;
+ kvm_x86_call(set_gdt)(vcpu, &dt);
+
+ vcpu->arch.cr2 = sregs->cr2;
+ *mmu_reset_needed |= kvm_read_cr3(vcpu) != sregs->cr3;
+ vcpu->arch.cr3 = sregs->cr3;
+ kvm_register_mark_dirty(vcpu, VCPU_REG_CR3);
+ kvm_x86_call(post_set_cr3)(vcpu, sregs->cr3);
+
+ kvm_set_cr8(vcpu, sregs->cr8);
+
+ *mmu_reset_needed |= vcpu->arch.efer != sregs->efer;
+ kvm_x86_call(set_efer)(vcpu, sregs->efer);
+
+ *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
+ kvm_x86_call(set_cr0)(vcpu, sregs->cr0);
+
+ *mmu_reset_needed |= kvm_read_cr4(vcpu) != sregs->cr4;
+ kvm_x86_call(set_cr4)(vcpu, sregs->cr4);
+
+ if (update_pdptrs) {
+ idx = srcu_read_lock(&vcpu->kvm->srcu);
+ if (is_pae_paging(vcpu)) {
+ load_pdptrs(vcpu, kvm_read_cr3(vcpu));
+ *mmu_reset_needed = 1;
+ }
+ srcu_read_unlock(&vcpu->kvm->srcu, idx);
+ }
+
+ kvm_set_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
+ kvm_set_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
+ kvm_set_segment(vcpu, &sregs->es, VCPU_SREG_ES);
+ kvm_set_segment(vcpu, &sregs->fs, VCPU_SREG_FS);
+ kvm_set_segment(vcpu, &sregs->gs, VCPU_SREG_GS);
+ kvm_set_segment(vcpu, &sregs->ss, VCPU_SREG_SS);
+
+ kvm_set_segment(vcpu, &sregs->tr, VCPU_SREG_TR);
+ kvm_set_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR);
+
+ kvm_lapic_update_cr8_intercept(vcpu);
+
+ /* Older userspace won't unhalt the vcpu on reset. */
+ if (kvm_vcpu_is_bsp(vcpu) && kvm_rip_read(vcpu) == 0xfff0 &&
+ sregs->cs.selector == 0xf000 && sregs->cs.base == 0xffff0000 &&
+ !is_protmode(vcpu))
+ kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
+
+ return 0;
+}
+
+static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
+{
+ int pending_vec, max_bits;
+ int mmu_reset_needed = 0;
+ int ret = __set_sregs_common(vcpu, sregs, &mmu_reset_needed, true);
+
+ if (ret)
+ return ret;
+
+ if (mmu_reset_needed) {
+ kvm_mmu_reset_context(vcpu);
+ kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+ }
+
+ max_bits = KVM_NR_INTERRUPTS;
+ pending_vec = find_first_bit(
+ (const unsigned long *)sregs->interrupt_bitmap, max_bits);
+
+ if (pending_vec < max_bits) {
+ kvm_queue_interrupt(vcpu, pending_vec, false);
+ pr_debug("Set back pending irq %d\n", pending_vec);
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+ }
+ return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+ struct kvm_sregs *sregs)
+{
+ int ret;
+
+ if (vcpu->kvm->arch.has_protected_state &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
+ vcpu_load(vcpu);
+ ret = __set_sregs(vcpu, sregs);
+ vcpu_put(vcpu);
+ return ret;
+}
+
+int kvm_x86_vcpu_ioctl_set_sregs2(struct kvm_vcpu *vcpu,
+ struct kvm_sregs2 *sregs2)
+{
+ int mmu_reset_needed = 0;
+ bool valid_pdptrs = sregs2->flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
+ bool pae = (sregs2->cr0 & X86_CR0_PG) && (sregs2->cr4 & X86_CR4_PAE) &&
+ !(sregs2->efer & EFER_LMA);
+ int i, ret;
+
+ if (sregs2->flags & ~KVM_SREGS2_FLAGS_PDPTRS_VALID)
+ return -EINVAL;
+
+ if (valid_pdptrs && (!pae || vcpu->arch.guest_state_protected))
+ return -EINVAL;
+
+ ret = __set_sregs_common(vcpu, (struct kvm_sregs *)sregs2,
+ &mmu_reset_needed, !valid_pdptrs);
+ if (ret)
+ return ret;
+
+ if (valid_pdptrs) {
+ for (i = 0; i < 4 ; i++)
+ kvm_pdptr_write(vcpu, i, sregs2->pdptrs[i]);
+
+ kvm_register_mark_dirty(vcpu, VCPU_REG_PDPTR);
+ mmu_reset_needed = 1;
+ vcpu->arch.pdptrs_from_userspace = true;
+ }
+ if (mmu_reset_needed) {
+ kvm_mmu_reset_context(vcpu);
+ kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+ }
+ return 0;
+}
+
+void kvm_run_sync_regs_to_user(struct kvm_vcpu *vcpu)
+{
+ BUILD_BUG_ON(sizeof(struct kvm_sync_regs) > SYNC_REGS_SIZE_BYTES);
+
+ if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_REGS)
+ __get_regs(vcpu, &vcpu->run->s.regs.regs);
+
+ if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_SREGS)
+ __get_sregs(vcpu, &vcpu->run->s.regs.sregs);
+}
+
+int kvm_run_sync_regs_from_user(struct kvm_vcpu *vcpu)
+{
+ if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_REGS) {
+ __set_regs(vcpu, &vcpu->run->s.regs.regs);
+ vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_REGS;
+ }
+
+ if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_SREGS) {
+ struct kvm_sregs sregs = vcpu->run->s.regs.sregs;
+
+ if (__set_sregs(vcpu, &sregs))
+ return -EINVAL;
+
+ vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_SREGS;
+ }
+
+ return 0;
+}
+
+void kvm_update_dr0123(struct kvm_vcpu *vcpu)
+{
+ int i;
+
+ if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)) {
+ for (i = 0; i < KVM_NR_DB_REGS; i++)
+ vcpu->arch.eff_db[i] = vcpu->arch.db[i];
+ }
+}
+
+void kvm_update_dr7(struct kvm_vcpu *vcpu)
+{
+ unsigned long dr7;
+
+ if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)
+ dr7 = vcpu->arch.guest_debug_dr7;
+ else
+ dr7 = vcpu->arch.dr7;
+ kvm_x86_call(set_dr7)(vcpu, dr7);
+ vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_BP_ENABLED;
+ if (dr7 & DR7_BP_EN_MASK)
+ vcpu->arch.switch_db_regs |= KVM_DEBUGREG_BP_ENABLED;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_update_dr7);
+
+static u64 kvm_dr6_fixed(struct kvm_vcpu *vcpu)
+{
+ u64 fixed = DR6_FIXED_1;
+
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_RTM))
+ fixed |= DR6_RTM;
+
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
+ fixed |= DR6_BUS_LOCK;
+ return fixed;
+}
+
+int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
+{
+ size_t size = ARRAY_SIZE(vcpu->arch.db);
+
+ switch (dr) {
+ case 0 ... 3:
+ vcpu->arch.db[array_index_nospec(dr, size)] = val;
+ if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP))
+ vcpu->arch.eff_db[dr] = val;
+ break;
+ case 4:
+ case 6:
+ if (!kvm_dr6_valid(val))
+ return 1; /* #GP */
+ vcpu->arch.dr6 = (val & DR6_VOLATILE) | kvm_dr6_fixed(vcpu);
+ break;
+ case 5:
+ default: /* 7 */
+ if (!kvm_dr7_valid(val))
+ return 1; /* #GP */
+ vcpu->arch.dr7 = (val & DR7_VOLATILE) | DR7_FIXED_1;
+ kvm_update_dr7(vcpu);
+ break;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_dr);
+
+unsigned long kvm_get_dr(struct kvm_vcpu *vcpu, int dr)
+{
+ size_t size = ARRAY_SIZE(vcpu->arch.db);
+
+ switch (dr) {
+ case 0 ... 3:
+ return vcpu->arch.db[array_index_nospec(dr, size)];
+ case 4:
+ case 6:
+ return vcpu->arch.dr6;
+ case 5:
+ default: /* 7 */
+ return vcpu->arch.dr7;
+ }
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_dr);
+
+int kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
+ struct kvm_debugregs *dbgregs)
+{
+ unsigned int i;
+
+ if (vcpu->kvm->arch.has_protected_state &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
+ kvm_handle_exception_payload_quirk(vcpu);
+
+ memset(dbgregs, 0, sizeof(*dbgregs));
+
+ BUILD_BUG_ON(ARRAY_SIZE(vcpu->arch.db) != ARRAY_SIZE(dbgregs->db));
+ for (i = 0; i < ARRAY_SIZE(vcpu->arch.db); i++)
+ dbgregs->db[i] = vcpu->arch.db[i];
+
+ dbgregs->dr6 = vcpu->arch.dr6;
+ dbgregs->dr7 = vcpu->arch.dr7;
+ return 0;
+}
+
+int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
+ struct kvm_debugregs *dbgregs)
+{
+ unsigned int i;
+
+ if (vcpu->kvm->arch.has_protected_state &&
+ vcpu->arch.guest_state_protected)
+ return -EINVAL;
+
+ if (dbgregs->flags)
+ return -EINVAL;
+
+ if (!kvm_dr6_valid(dbgregs->dr6))
+ return -EINVAL;
+ if (!kvm_dr7_valid(dbgregs->dr7))
+ return -EINVAL;
+
+ for (i = 0; i < ARRAY_SIZE(vcpu->arch.db); i++)
+ vcpu->arch.db[i] = dbgregs->db[i];
+
+ kvm_update_dr0123(vcpu);
+ vcpu->arch.dr6 = dbgregs->dr6;
+ vcpu->arch.dr7 = dbgregs->dr7;
+ kvm_update_dr7(vcpu);
+
+ return 0;
+}
diff --git a/arch/x86/kvm/regs.h b/arch/x86/kvm/regs.h
index e5b16a0826d9..1fd12388250b 100644
--- a/arch/x86/kvm/regs.h
+++ b/arch/x86/kvm/regs.h
@@ -389,6 +389,14 @@ static inline bool kvm_dr6_valid(u64 data)
return !(data >> 32);
}
+static inline unsigned long kvm_get_effective_dr7(struct kvm_vcpu *vcpu)
+{
+ if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)
+ return vcpu->arch.guest_debug_dr7;
+
+ return vcpu->arch.dr7;
+}
+
static inline void enter_guest_mode(struct kvm_vcpu *vcpu)
{
vcpu->arch.hflags |= HF_GUEST_MASK;
@@ -417,4 +425,21 @@ static inline unsigned long kvm_get_segment_base(struct kvm_vcpu *vcpu, int seg)
return kvm_x86_call(get_segment_base)(vcpu, seg);
}
+void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
+
+void kvm_x86_vcpu_ioctl_get_sregs2(struct kvm_vcpu *vcpu,
+ struct kvm_sregs2 *sregs2);
+int kvm_x86_vcpu_ioctl_set_sregs2(struct kvm_vcpu *vcpu,
+ struct kvm_sregs2 *sregs2);
+
+void kvm_run_sync_regs_to_user(struct kvm_vcpu *vcpu);
+int kvm_run_sync_regs_from_user(struct kvm_vcpu *vcpu);
+
+void kvm_update_dr0123(struct kvm_vcpu *vcpu);
+int kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
+ struct kvm_debugregs *dbgregs);
+int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
+ struct kvm_debugregs *dbgregs);
+
+
#endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0fa8a9e3b8b2..9eb46b370fad 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -129,15 +129,9 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
static void process_nmi(struct kvm_vcpu *vcpu);
-static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
static void store_regs(struct kvm_vcpu *vcpu);
static int sync_regs(struct kvm_vcpu *vcpu);
-static int kvm_x86_vcpu_ioctl_set_sregs2(struct kvm_vcpu *vcpu,
- struct kvm_sregs2 *sregs2);
-static void kvm_x86_vcpu_ioctl_get_sregs2(struct kvm_vcpu *vcpu,
- struct kvm_sregs2 *sregs2);
-
static DEFINE_MUTEX(vendor_module_lock);
static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
@@ -1018,170 +1012,6 @@ bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr)
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_require_dr);
-static inline u64 pdptr_rsvd_bits(struct kvm_vcpu *vcpu)
-{
- return vcpu->arch.reserved_gpa_bits | rsvd_bits(5, 8) | rsvd_bits(1, 2);
-}
-
-/*
- * Load the pae pdptrs. Return 1 if they are all valid, 0 otherwise.
- */
-int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
-{
- struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
- gfn_t pdpt_gfn = cr3 >> PAGE_SHIFT;
- gpa_t real_gpa;
- int i;
- int ret;
- u64 pdpte[ARRAY_SIZE(mmu->pdptrs)];
-
- /*
- * If the MMU is nested, CR3 holds an L2 GPA and needs to be translated
- * to an L1 GPA.
- */
- real_gpa = kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(pdpt_gfn),
- PFERR_USER_MASK | PFERR_WRITE_MASK |
- PFERR_GUEST_PAGE_MASK, NULL, 0);
- if (real_gpa == INVALID_GPA)
- return 0;
-
- /* Note the offset, PDPTRs are 32 byte aligned when using PAE paging. */
- ret = kvm_vcpu_read_guest_page(vcpu, gpa_to_gfn(real_gpa), pdpte,
- cr3 & GENMASK(11, 5), sizeof(pdpte));
- if (ret < 0)
- return 0;
-
- for (i = 0; i < ARRAY_SIZE(pdpte); ++i) {
- if ((pdpte[i] & PT_PRESENT_MASK) &&
- (pdpte[i] & pdptr_rsvd_bits(vcpu))) {
- return 0;
- }
- }
-
- /*
- * Marking VCPU_REG_PDPTR dirty doesn't work for !tdp_enabled.
- * Shadow page roots need to be reconstructed instead.
- */
- if (!tdp_enabled && memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)))
- kvm_mmu_free_roots(vcpu->kvm, mmu, KVM_MMU_ROOT_CURRENT);
-
- memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs));
- kvm_register_mark_dirty(vcpu, VCPU_REG_PDPTR);
- kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
- vcpu->arch.pdptrs_from_userspace = false;
-
- return 1;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(load_pdptrs);
-
-static bool kvm_is_valid_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
-{
-#ifdef CONFIG_X86_64
- if (cr0 & 0xffffffff00000000UL)
- return false;
-#endif
-
- if ((cr0 & X86_CR0_NW) && !(cr0 & X86_CR0_CD))
- return false;
-
- if ((cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PE))
- return false;
-
- return kvm_x86_call(is_valid_cr0)(vcpu, cr0);
-}
-
-void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0)
-{
- /*
- * CR0.WP is incorporated into the MMU role, but only for non-nested,
- * indirect shadow MMUs. If paging is disabled, no updates are needed
- * as there are no permission bits to emulate. If TDP is enabled, the
- * MMU's metadata needs to be updated, e.g. so that emulating guest
- * translations does the right thing, but there's no need to unload the
- * root as CR0.WP doesn't affect SPTEs.
- */
- if ((cr0 ^ old_cr0) == X86_CR0_WP) {
- if (!(cr0 & X86_CR0_PG))
- return;
-
- if (tdp_enabled) {
- kvm_init_mmu(vcpu);
- return;
- }
- }
-
- if ((cr0 ^ old_cr0) & X86_CR0_PG) {
- /*
- * Clearing CR0.PG is defined to flush the TLB from the guest's
- * perspective.
- */
- if (!(cr0 & X86_CR0_PG))
- kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
- /*
- * Check for async #PF completion events when enabling paging,
- * as the vCPU may have previously encountered async #PFs (it's
- * entirely legal for the guest to toggle paging on/off without
- * waiting for the async #PF queue to drain).
- */
- else if (kvm_pv_async_pf_enabled(vcpu))
- kvm_make_request(KVM_REQ_APF_READY, vcpu);
- }
-
- if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS)
- kvm_mmu_reset_context(vcpu);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_post_set_cr0);
-
-int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
-{
- unsigned long old_cr0 = kvm_read_cr0(vcpu);
-
- if (!kvm_is_valid_cr0(vcpu, cr0))
- return 1;
-
- cr0 |= X86_CR0_ET;
-
- /* Write to CR0 reserved bits are ignored, even on Intel. */
- cr0 &= ~CR0_RESERVED_BITS;
-
-#ifdef CONFIG_X86_64
- if ((vcpu->arch.efer & EFER_LME) && !is_paging(vcpu) &&
- (cr0 & X86_CR0_PG)) {
- int cs_db, cs_l;
-
- if (!is_pae(vcpu))
- return 1;
- kvm_x86_call(get_cs_db_l_bits)(vcpu, &cs_db, &cs_l);
- if (cs_l)
- return 1;
- }
-#endif
- if (!(vcpu->arch.efer & EFER_LME) && (cr0 & X86_CR0_PG) &&
- is_pae(vcpu) && ((cr0 ^ old_cr0) & X86_CR0_PDPTR_BITS) &&
- !load_pdptrs(vcpu, kvm_read_cr3(vcpu)))
- return 1;
-
- if (!(cr0 & X86_CR0_PG) &&
- (is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)))
- return 1;
-
- if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET))
- return 1;
-
- kvm_x86_call(set_cr0)(vcpu, cr0);
-
- kvm_post_set_cr0(vcpu, old_cr0, cr0);
-
- return 0;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr0);
-
-void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
-{
- (void)kvm_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~0x0eul) | (msw & 0x0f));
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_lmsw);
-
static void kvm_load_xfeatures(struct kvm_vcpu *vcpu, bool load_guest)
{
if (vcpu->arch.guest_state_protected)
@@ -1291,89 +1121,7 @@ int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_xsetbv);
-static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
-{
- return __kvm_is_valid_cr4(vcpu, cr4) &&
- kvm_x86_call(is_valid_cr4)(vcpu, cr4);
-}
-
-void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4)
-{
- if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS)
- kvm_mmu_reset_context(vcpu);
-
- /*
- * If CR4.PCIDE is changed 0 -> 1, there is no need to flush the TLB
- * according to the SDM; however, stale prev_roots could be reused
- * incorrectly in the future after a MOV to CR3 with NOFLUSH=1, so we
- * free them all. This is *not* a superset of KVM_REQ_TLB_FLUSH_GUEST
- * or KVM_REQ_TLB_FLUSH_CURRENT, because the hardware TLB is not flushed,
- * so fall through.
- */
- if (!tdp_enabled &&
- (cr4 & X86_CR4_PCIDE) && !(old_cr4 & X86_CR4_PCIDE))
- kvm_mmu_unload(vcpu);
-
- /*
- * The TLB has to be flushed for all PCIDs if any of the following
- * (architecturally required) changes happen:
- * - CR4.PCIDE is changed from 1 to 0
- * - CR4.PGE is toggled
- *
- * This is a superset of KVM_REQ_TLB_FLUSH_CURRENT.
- */
- if (((cr4 ^ old_cr4) & X86_CR4_PGE) ||
- (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
- kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
-
- /*
- * The TLB has to be flushed for the current PCID if any of the
- * following (architecturally required) changes happen:
- * - CR4.SMEP is changed from 0 to 1
- * - CR4.PAE is toggled
- */
- else if (((cr4 ^ old_cr4) & X86_CR4_PAE) ||
- ((cr4 & X86_CR4_SMEP) && !(old_cr4 & X86_CR4_SMEP)))
- kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
-
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_post_set_cr4);
-
-int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
-{
- unsigned long old_cr4 = kvm_read_cr4(vcpu);
-
- if (!kvm_is_valid_cr4(vcpu, cr4))
- return 1;
-
- if (is_long_mode(vcpu)) {
- if (!(cr4 & X86_CR4_PAE))
- return 1;
- if ((cr4 ^ old_cr4) & X86_CR4_LA57)
- return 1;
- } else if (is_paging(vcpu) && (cr4 & X86_CR4_PAE)
- && ((cr4 ^ old_cr4) & X86_CR4_PDPTR_BITS)
- && !load_pdptrs(vcpu, kvm_read_cr3(vcpu)))
- return 1;
-
- if ((cr4 & X86_CR4_PCIDE) && !(old_cr4 & X86_CR4_PCIDE)) {
- /* PCID can not be enabled when cr3[11:0]!=000H or EFER.LMA=0 */
- if ((kvm_read_cr3(vcpu) & X86_CR3_PCID_MASK) || !is_long_mode(vcpu))
- return 1;
- }
-
- if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP))
- return 1;
-
- kvm_x86_call(set_cr4)(vcpu, cr4);
-
- kvm_post_set_cr4(vcpu, old_cr4, cr4);
-
- return 0;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr4);
-
-static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
+void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
{
struct kvm_mmu *mmu = vcpu->arch.mmu;
unsigned long roots_to_free = 0;
@@ -1416,167 +1164,6 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
kvm_mmu_free_roots(vcpu->kvm, mmu, roots_to_free);
}
-int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
-{
- bool skip_tlb_flush = false;
- unsigned long pcid = 0;
-#ifdef CONFIG_X86_64
- if (kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)) {
- skip_tlb_flush = cr3 & X86_CR3_PCID_NOFLUSH;
- cr3 &= ~X86_CR3_PCID_NOFLUSH;
- pcid = cr3 & X86_CR3_PCID_MASK;
- }
-#endif
-
- /* PDPTRs are always reloaded for PAE paging. */
- if (cr3 == kvm_read_cr3(vcpu) && !is_pae_paging(vcpu))
- goto handle_tlb_flush;
-
- /*
- * Do not condition the GPA check on long mode, this helper is used to
- * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
- * the current vCPU mode is accurate.
- */
- if (!kvm_vcpu_is_legal_cr3(vcpu, cr3))
- return 1;
-
- if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
- return 1;
-
- if (cr3 != kvm_read_cr3(vcpu))
- kvm_mmu_new_pgd(vcpu, cr3);
-
- vcpu->arch.cr3 = cr3;
- kvm_register_mark_dirty(vcpu, VCPU_REG_CR3);
- /* Do not call post_set_cr3, we do not get here for confidential guests. */
-
-handle_tlb_flush:
- /*
- * A load of CR3 that flushes the TLB flushes only the current PCID,
- * even if PCID is disabled, in which case PCID=0 is flushed. It's a
- * moot point in the end because _disabling_ PCID will flush all PCIDs,
- * and it's impossible to use a non-zero PCID when PCID is disabled,
- * i.e. only PCID=0 can be relevant.
- */
- if (!skip_tlb_flush)
- kvm_invalidate_pcid(vcpu, pcid);
-
- return 0;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr3);
-
-int kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8)
-{
- if (cr8 & CR8_RESERVED_BITS)
- return 1;
- if (lapic_in_kernel(vcpu))
- kvm_lapic_set_tpr(vcpu, cr8);
- else
- vcpu->arch.cr8 = cr8;
- return 0;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_cr8);
-
-unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu)
-{
- if (lapic_in_kernel(vcpu))
- return kvm_lapic_get_cr8(vcpu);
- else
- return vcpu->arch.cr8;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_cr8);
-
-static void kvm_update_dr0123(struct kvm_vcpu *vcpu)
-{
- int i;
-
- if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)) {
- for (i = 0; i < KVM_NR_DB_REGS; i++)
- vcpu->arch.eff_db[i] = vcpu->arch.db[i];
- }
-}
-
-void kvm_update_dr7(struct kvm_vcpu *vcpu)
-{
- unsigned long dr7;
-
- if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)
- dr7 = vcpu->arch.guest_debug_dr7;
- else
- dr7 = vcpu->arch.dr7;
- kvm_x86_call(set_dr7)(vcpu, dr7);
- vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_BP_ENABLED;
- if (dr7 & DR7_BP_EN_MASK)
- vcpu->arch.switch_db_regs |= KVM_DEBUGREG_BP_ENABLED;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_update_dr7);
-
-static u64 kvm_dr6_fixed(struct kvm_vcpu *vcpu)
-{
- u64 fixed = DR6_FIXED_1;
-
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_RTM))
- fixed |= DR6_RTM;
-
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
- fixed |= DR6_BUS_LOCK;
- return fixed;
-}
-
-int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
-{
- size_t size = ARRAY_SIZE(vcpu->arch.db);
-
- switch (dr) {
- case 0 ... 3:
- vcpu->arch.db[array_index_nospec(dr, size)] = val;
- if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP))
- vcpu->arch.eff_db[dr] = val;
- break;
- case 4:
- case 6:
- if (!kvm_dr6_valid(val))
- return 1; /* #GP */
- vcpu->arch.dr6 = (val & DR6_VOLATILE) | kvm_dr6_fixed(vcpu);
- break;
- case 5:
- default: /* 7 */
- if (!kvm_dr7_valid(val))
- return 1; /* #GP */
- vcpu->arch.dr7 = (val & DR7_VOLATILE) | DR7_FIXED_1;
- kvm_update_dr7(vcpu);
- break;
- }
-
- return 0;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_dr);
-
-unsigned long kvm_get_dr(struct kvm_vcpu *vcpu, int dr)
-{
- size_t size = ARRAY_SIZE(vcpu->arch.db);
-
- switch (dr) {
- case 0 ... 3:
- return vcpu->arch.db[array_index_nospec(dr, size)];
- case 4:
- case 6:
- return vcpu->arch.dr6;
- case 5:
- default: /* 7 */
- return vcpu->arch.dr7;
- }
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_dr);
-
-static unsigned long kvm_get_effective_dr7(struct kvm_vcpu *vcpu)
-{
- if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)
- return vcpu->arch.guest_debug_dr7;
-
- return vcpu->arch.dr7;
-}
-
int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu)
{
u32 pmc = kvm_ecx_read(vcpu);
@@ -5534,7 +5121,7 @@ static struct kvm_queued_exception *kvm_get_exception_to_save(struct kvm_vcpu *v
return &vcpu->arch.exception;
}
-static void kvm_handle_exception_payload_quirk(struct kvm_vcpu *vcpu)
+void kvm_handle_exception_payload_quirk(struct kvm_vcpu *vcpu)
{
struct kvm_queued_exception *ex = kvm_get_exception_to_save(vcpu);
@@ -5738,57 +5325,6 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
return 0;
}
-static int kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
- struct kvm_debugregs *dbgregs)
-{
- unsigned int i;
-
- if (vcpu->kvm->arch.has_protected_state &&
- vcpu->arch.guest_state_protected)
- return -EINVAL;
-
- kvm_handle_exception_payload_quirk(vcpu);
-
- memset(dbgregs, 0, sizeof(*dbgregs));
-
- BUILD_BUG_ON(ARRAY_SIZE(vcpu->arch.db) != ARRAY_SIZE(dbgregs->db));
- for (i = 0; i < ARRAY_SIZE(vcpu->arch.db); i++)
- dbgregs->db[i] = vcpu->arch.db[i];
-
- dbgregs->dr6 = vcpu->arch.dr6;
- dbgregs->dr7 = vcpu->arch.dr7;
- return 0;
-}
-
-static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
- struct kvm_debugregs *dbgregs)
-{
- unsigned int i;
-
- if (vcpu->kvm->arch.has_protected_state &&
- vcpu->arch.guest_state_protected)
- return -EINVAL;
-
- if (dbgregs->flags)
- return -EINVAL;
-
- if (!kvm_dr6_valid(dbgregs->dr6))
- return -EINVAL;
- if (!kvm_dr7_valid(dbgregs->dr7))
- return -EINVAL;
-
- for (i = 0; i < ARRAY_SIZE(vcpu->arch.db); i++)
- vcpu->arch.db[i] = dbgregs->db[i];
-
- kvm_update_dr0123(vcpu);
- vcpu->arch.dr6 = dbgregs->dr6;
- vcpu->arch.dr7 = dbgregs->dr7;
- kvm_update_dr7(vcpu);
-
- return 0;
-}
-
-
static int kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
u8 *state, unsigned int size)
{
@@ -12070,180 +11606,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
return r;
}
-static void __get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
-{
- if (vcpu->arch.emulate_regs_need_sync_to_vcpu) {
- /*
- * We are here if userspace calls get_regs() in the middle of
- * instruction emulation. Registers state needs to be copied
- * back from emulation context to vcpu. Userspace shouldn't do
- * that usually, but some bad designed PV devices (vmware
- * backdoor interface) need this to work
- */
- emulator_writeback_register_cache(vcpu->arch.emulate_ctxt);
- vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
- }
- regs->rax = kvm_rax_read_raw(vcpu);
- regs->rbx = kvm_rbx_read_raw(vcpu);
- regs->rcx = kvm_rcx_read_raw(vcpu);
- regs->rdx = kvm_rdx_read_raw(vcpu);
- regs->rsi = kvm_rsi_read_raw(vcpu);
- regs->rdi = kvm_rdi_read_raw(vcpu);
- regs->rsp = kvm_rsp_read(vcpu);
- regs->rbp = kvm_rbp_read_raw(vcpu);
-#ifdef CONFIG_X86_64
- regs->r8 = kvm_r8_read_raw(vcpu);
- regs->r9 = kvm_r9_read_raw(vcpu);
- regs->r10 = kvm_r10_read_raw(vcpu);
- regs->r11 = kvm_r11_read_raw(vcpu);
- regs->r12 = kvm_r12_read_raw(vcpu);
- regs->r13 = kvm_r13_read_raw(vcpu);
- regs->r14 = kvm_r14_read_raw(vcpu);
- regs->r15 = kvm_r15_read_raw(vcpu);
-#endif
-
- regs->rip = kvm_rip_read(vcpu);
- regs->rflags = kvm_get_rflags(vcpu);
-}
-
-int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
-{
- if (vcpu->kvm->arch.has_protected_state &&
- vcpu->arch.guest_state_protected)
- return -EINVAL;
-
- vcpu_load(vcpu);
- __get_regs(vcpu, regs);
- vcpu_put(vcpu);
- return 0;
-}
-
-static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
-{
- vcpu->arch.emulate_regs_need_sync_from_vcpu = true;
- vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
-
- kvm_rax_write_raw(vcpu, regs->rax);
- kvm_rbx_write_raw(vcpu, regs->rbx);
- kvm_rcx_write_raw(vcpu, regs->rcx);
- kvm_rdx_write_raw(vcpu, regs->rdx);
- kvm_rsi_write_raw(vcpu, regs->rsi);
- kvm_rdi_write_raw(vcpu, regs->rdi);
- kvm_rsp_write(vcpu, regs->rsp);
- kvm_rbp_write_raw(vcpu, regs->rbp);
-#ifdef CONFIG_X86_64
- kvm_r8_write_raw(vcpu, regs->r8);
- kvm_r9_write_raw(vcpu, regs->r9);
- kvm_r10_write_raw(vcpu, regs->r10);
- kvm_r11_write_raw(vcpu, regs->r11);
- kvm_r12_write_raw(vcpu, regs->r12);
- kvm_r13_write_raw(vcpu, regs->r13);
- kvm_r14_write_raw(vcpu, regs->r14);
- kvm_r15_write_raw(vcpu, regs->r15);
-#endif
-
- kvm_rip_write(vcpu, regs->rip);
- kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED);
-
- vcpu->arch.exception.pending = false;
- vcpu->arch.exception_vmexit.pending = false;
-
- kvm_make_request(KVM_REQ_EVENT, vcpu);
-}
-
-int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
-{
- if (vcpu->kvm->arch.has_protected_state &&
- vcpu->arch.guest_state_protected)
- return -EINVAL;
-
- vcpu_load(vcpu);
- __set_regs(vcpu, regs);
- vcpu_put(vcpu);
- return 0;
-}
-
-static void __get_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
-{
- struct desc_ptr dt;
-
- if (vcpu->arch.guest_state_protected)
- goto skip_protected_regs;
-
- kvm_handle_exception_payload_quirk(vcpu);
-
- kvm_get_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
- kvm_get_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
- kvm_get_segment(vcpu, &sregs->es, VCPU_SREG_ES);
- kvm_get_segment(vcpu, &sregs->fs, VCPU_SREG_FS);
- kvm_get_segment(vcpu, &sregs->gs, VCPU_SREG_GS);
- kvm_get_segment(vcpu, &sregs->ss, VCPU_SREG_SS);
-
- kvm_get_segment(vcpu, &sregs->tr, VCPU_SREG_TR);
- kvm_get_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR);
-
- kvm_x86_call(get_idt)(vcpu, &dt);
- sregs->idt.limit = dt.size;
- sregs->idt.base = dt.address;
- kvm_x86_call(get_gdt)(vcpu, &dt);
- sregs->gdt.limit = dt.size;
- sregs->gdt.base = dt.address;
-
- sregs->cr2 = vcpu->arch.cr2;
- sregs->cr3 = kvm_read_cr3(vcpu);
-
-skip_protected_regs:
- sregs->cr0 = kvm_read_cr0(vcpu);
- sregs->cr4 = kvm_read_cr4(vcpu);
- sregs->cr8 = kvm_get_cr8(vcpu);
- sregs->efer = vcpu->arch.efer;
- sregs->apic_base = vcpu->arch.apic_base;
-}
-
-static void __get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
-{
- __get_sregs_common(vcpu, sregs);
-
- if (vcpu->arch.guest_state_protected)
- return;
-
- if (vcpu->arch.interrupt.injected && !vcpu->arch.interrupt.soft)
- set_bit(vcpu->arch.interrupt.nr,
- (unsigned long *)sregs->interrupt_bitmap);
-}
-
-static void kvm_x86_vcpu_ioctl_get_sregs2(struct kvm_vcpu *vcpu,
- struct kvm_sregs2 *sregs2)
-{
- int i;
-
- __get_sregs_common(vcpu, (struct kvm_sregs *)sregs2);
-
- if (vcpu->arch.guest_state_protected)
- return;
-
- if (is_pae_paging(vcpu)) {
- kvm_vcpu_srcu_read_lock(vcpu);
- for (i = 0 ; i < 4 ; i++)
- sregs2->pdptrs[i] = kvm_pdptr_read(vcpu, i);
- sregs2->flags |= KVM_SREGS2_FLAGS_PDPTRS_VALID;
- kvm_vcpu_srcu_read_unlock(vcpu);
- }
-}
-
-int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
- struct kvm_sregs *sregs)
-{
- if (vcpu->kvm->arch.has_protected_state &&
- vcpu->arch.guest_state_protected)
- return -EINVAL;
-
- vcpu_load(vcpu);
- __get_sregs(vcpu, sregs);
- vcpu_put(vcpu);
- return 0;
-}
-
int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
struct kvm_mp_state *mp_state)
{
@@ -12363,176 +11725,6 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_task_switch);
-static bool kvm_is_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
-{
- if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG)) {
- /*
- * When EFER.LME and CR0.PG are set, the processor is in
- * 64-bit mode (though maybe in a 32-bit code segment).
- * CR4.PAE and EFER.LMA must be set.
- */
- if (!(sregs->cr4 & X86_CR4_PAE) || !(sregs->efer & EFER_LMA))
- return false;
- if (!kvm_vcpu_is_legal_cr3(vcpu, sregs->cr3))
- return false;
- } else {
- /*
- * Not in 64-bit mode: EFER.LMA is clear and the code
- * segment cannot be 64-bit.
- */
- if (sregs->efer & EFER_LMA || sregs->cs.l)
- return false;
- }
-
- return kvm_is_valid_cr4(vcpu, sregs->cr4) &&
- kvm_is_valid_cr0(vcpu, sregs->cr0);
-}
-
-static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
- int *mmu_reset_needed, bool update_pdptrs)
-{
- int idx;
- struct desc_ptr dt;
-
- if (!kvm_is_valid_sregs(vcpu, sregs))
- return -EINVAL;
-
- if (kvm_apic_set_base(vcpu, sregs->apic_base, true))
- return -EINVAL;
-
- if (vcpu->arch.guest_state_protected)
- return 0;
-
- dt.size = sregs->idt.limit;
- dt.address = sregs->idt.base;
- kvm_x86_call(set_idt)(vcpu, &dt);
- dt.size = sregs->gdt.limit;
- dt.address = sregs->gdt.base;
- kvm_x86_call(set_gdt)(vcpu, &dt);
-
- vcpu->arch.cr2 = sregs->cr2;
- *mmu_reset_needed |= kvm_read_cr3(vcpu) != sregs->cr3;
- vcpu->arch.cr3 = sregs->cr3;
- kvm_register_mark_dirty(vcpu, VCPU_REG_CR3);
- kvm_x86_call(post_set_cr3)(vcpu, sregs->cr3);
-
- kvm_set_cr8(vcpu, sregs->cr8);
-
- *mmu_reset_needed |= vcpu->arch.efer != sregs->efer;
- kvm_x86_call(set_efer)(vcpu, sregs->efer);
-
- *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
- kvm_x86_call(set_cr0)(vcpu, sregs->cr0);
-
- *mmu_reset_needed |= kvm_read_cr4(vcpu) != sregs->cr4;
- kvm_x86_call(set_cr4)(vcpu, sregs->cr4);
-
- if (update_pdptrs) {
- idx = srcu_read_lock(&vcpu->kvm->srcu);
- if (is_pae_paging(vcpu)) {
- load_pdptrs(vcpu, kvm_read_cr3(vcpu));
- *mmu_reset_needed = 1;
- }
- srcu_read_unlock(&vcpu->kvm->srcu, idx);
- }
-
- kvm_set_segment(vcpu, &sregs->cs, VCPU_SREG_CS);
- kvm_set_segment(vcpu, &sregs->ds, VCPU_SREG_DS);
- kvm_set_segment(vcpu, &sregs->es, VCPU_SREG_ES);
- kvm_set_segment(vcpu, &sregs->fs, VCPU_SREG_FS);
- kvm_set_segment(vcpu, &sregs->gs, VCPU_SREG_GS);
- kvm_set_segment(vcpu, &sregs->ss, VCPU_SREG_SS);
-
- kvm_set_segment(vcpu, &sregs->tr, VCPU_SREG_TR);
- kvm_set_segment(vcpu, &sregs->ldt, VCPU_SREG_LDTR);
-
- kvm_lapic_update_cr8_intercept(vcpu);
-
- /* Older userspace won't unhalt the vcpu on reset. */
- if (kvm_vcpu_is_bsp(vcpu) && kvm_rip_read(vcpu) == 0xfff0 &&
- sregs->cs.selector == 0xf000 && sregs->cs.base == 0xffff0000 &&
- !is_protmode(vcpu))
- kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE);
-
- return 0;
-}
-
-static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
-{
- int pending_vec, max_bits;
- int mmu_reset_needed = 0;
- int ret = __set_sregs_common(vcpu, sregs, &mmu_reset_needed, true);
-
- if (ret)
- return ret;
-
- if (mmu_reset_needed) {
- kvm_mmu_reset_context(vcpu);
- kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
- }
-
- max_bits = KVM_NR_INTERRUPTS;
- pending_vec = find_first_bit(
- (const unsigned long *)sregs->interrupt_bitmap, max_bits);
-
- if (pending_vec < max_bits) {
- kvm_queue_interrupt(vcpu, pending_vec, false);
- pr_debug("Set back pending irq %d\n", pending_vec);
- kvm_make_request(KVM_REQ_EVENT, vcpu);
- }
- return 0;
-}
-
-static int kvm_x86_vcpu_ioctl_set_sregs2(struct kvm_vcpu *vcpu,
- struct kvm_sregs2 *sregs2)
-{
- int mmu_reset_needed = 0;
- bool valid_pdptrs = sregs2->flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
- bool pae = (sregs2->cr0 & X86_CR0_PG) && (sregs2->cr4 & X86_CR4_PAE) &&
- !(sregs2->efer & EFER_LMA);
- int i, ret;
-
- if (sregs2->flags & ~KVM_SREGS2_FLAGS_PDPTRS_VALID)
- return -EINVAL;
-
- if (valid_pdptrs && (!pae || vcpu->arch.guest_state_protected))
- return -EINVAL;
-
- ret = __set_sregs_common(vcpu, (struct kvm_sregs *)sregs2,
- &mmu_reset_needed, !valid_pdptrs);
- if (ret)
- return ret;
-
- if (valid_pdptrs) {
- for (i = 0; i < 4 ; i++)
- kvm_pdptr_write(vcpu, i, sregs2->pdptrs[i]);
-
- kvm_register_mark_dirty(vcpu, VCPU_REG_PDPTR);
- mmu_reset_needed = 1;
- vcpu->arch.pdptrs_from_userspace = true;
- }
- if (mmu_reset_needed) {
- kvm_mmu_reset_context(vcpu);
- kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
- }
- return 0;
-}
-
-int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
- struct kvm_sregs *sregs)
-{
- int ret;
-
- if (vcpu->kvm->arch.has_protected_state &&
- vcpu->arch.guest_state_protected)
- return -EINVAL;
-
- vcpu_load(vcpu);
- ret = __set_sregs(vcpu, sregs);
- vcpu_put(vcpu);
- return ret;
-}
-
static void kvm_arch_vcpu_guestdbg_update_apicv_inhibit(struct kvm *kvm)
{
bool set = false;
@@ -12686,17 +11878,6 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
return 0;
}
-static void kvm_run_sync_regs_to_user(struct kvm_vcpu *vcpu)
-{
- BUILD_BUG_ON(sizeof(struct kvm_sync_regs) > SYNC_REGS_SIZE_BYTES);
-
- if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_REGS)
- __get_regs(vcpu, &vcpu->run->s.regs.regs);
-
- if (vcpu->run->kvm_valid_regs & KVM_SYNC_X86_SREGS)
- __get_sregs(vcpu, &vcpu->run->s.regs.sregs);
-}
-
static void store_regs(struct kvm_vcpu *vcpu)
{
kvm_run_sync_regs_to_user(vcpu);
@@ -12706,25 +11887,6 @@ static void store_regs(struct kvm_vcpu *vcpu)
vcpu, &vcpu->run->s.regs.events);
}
-static int kvm_run_sync_regs_from_user(struct kvm_vcpu *vcpu)
-{
- if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_REGS) {
- __set_regs(vcpu, &vcpu->run->s.regs.regs);
- vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_REGS;
- }
-
- if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_SREGS) {
- struct kvm_sregs sregs = vcpu->run->s.regs.sregs;
-
- if (__set_sregs(vcpu, &sregs))
- return -EINVAL;
-
- vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_SREGS;
- }
-
- return 0;
-}
-
static int sync_regs(struct kvm_vcpu *vcpu)
{
if (kvm_run_sync_regs_from_user(vcpu))
@@ -13830,51 +12992,6 @@ int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu)
return kvm_x86_call(interrupt_allowed)(vcpu, false);
}
-unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu)
-{
- /* Can't read the RIP when guest state is protected, just return 0 */
- if (vcpu->arch.guest_state_protected)
- return 0;
-
- if (is_64_bit_mode(vcpu))
- return kvm_rip_read(vcpu);
- return (u32)(kvm_get_segment_base(vcpu, VCPU_SREG_CS) +
- kvm_rip_read(vcpu));
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_linear_rip);
-
-bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip)
-{
- return kvm_get_linear_rip(vcpu) == linear_rip;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_is_linear_rip);
-
-unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
-{
- unsigned long rflags;
-
- rflags = kvm_x86_call(get_rflags)(vcpu);
- if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
- rflags &= ~X86_EFLAGS_TF;
- return rflags;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_rflags);
-
-static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
-{
- if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP &&
- kvm_is_linear_rip(vcpu, vcpu->arch.singlestep_rip))
- rflags |= X86_EFLAGS_TF;
- kvm_x86_call(set_rflags)(vcpu, rflags);
-}
-
-void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
-{
- __kvm_set_rflags(vcpu, rflags);
- kvm_make_request(KVM_REQ_EVENT, vcpu);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_rflags);
-
static inline u32 kvm_async_pf_hash_fn(gfn_t gfn)
{
BUILD_BUG_ON(!is_power_of_2(ASYNC_PF_PER_VCPU));
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index acb22167901f..80ed36d5d62a 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -403,6 +403,7 @@ int handle_ud(struct kvm_vcpu *vcpu);
void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu,
struct kvm_queued_exception *ex);
+void kvm_handle_exception_payload_quirk(struct kvm_vcpu *vcpu);
int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data);
int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
@@ -597,6 +598,7 @@ static inline void kvm_machine_check(void)
int kvm_spec_ctrl_test_value(u64 value);
int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
struct x86_exception *e);
+void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid);
int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva);
bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* Re: [PATCH v4 04/30] KVM: x86: Move the bulk of register specific code from x86.c to regs.c
2026-06-13 0:03 ` [PATCH v4 04/30] KVM: x86: Move the bulk of register specific code from x86.c to regs.c Sean Christopherson
@ 2026-06-15 5:25 ` Binbin Wu
0 siblings, 0 replies; 64+ messages in thread
From: Binbin Wu @ 2026-06-15 5:25 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Yosry Ahmed,
Kai Huang
On 6/13/2026 8:03 AM, Sean Christopherson wrote:
> Introduce regs.c, and move the vast majority of register specific code out
> of x86.c and into regs.c. Deliberately leave behind MSR code, as KVM's MSR
> support is complex enough to warrant its own compilation unit, and doesn't
> have much in common with the other register code.
>
> Note, "struct kvm_sregs" has fields for EFER and MSR_IA32_APICBASE, and so
> the {G,S}ET_REGS flows technically contain a tiny amount of MSR code.
^
Should be {G,S}ET_SREGS
> MSR_IA32_APICBASE is already managed by lapic.c, and so doesn't require a
> "placement decision". As for EFER, leave all other EFER handling in x86.c
> (later to be moved to msrs.c). The primary interface to EFER, set_efer(),
> is very much MSR specific, even though EFER is arguably more of a Control
> Register than an MSR.
>
> No functional change intended.
>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v4 05/30] KVM: x86: Move local APIC specific helpers out of asm/kvm_host.h
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (3 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 04/30] KVM: x86: Move the bulk of register specific code from x86.c to regs.c Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-15 5:47 ` Binbin Wu
2026-06-13 0:03 ` [PATCH v4 06/30] KVM: x86: Move kvm_caps and kvm_host_values to asm/kvm_host.h Sean Christopherson
` (24 subsequent siblings)
29 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Move single-use local APIC IRQ helpers out of asm/kvm_host.h so that they
are co-located with their user, and not exposed to the broader world.
No functional change intended.
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 12 ------------
arch/x86/kvm/irq.c | 7 +++++++
arch/x86/kvm/lapic.h | 5 +++++
3 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3886b536c8a5..71e575d011e7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1772,11 +1772,6 @@ struct kvm_lapic_irq {
bool msi_redir_hint;
};
-static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
-{
- return dest_mode_logical ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL;
-}
-
enum kvm_x86_run_flags {
KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0),
KVM_RUN_LOAD_GUEST_DR6 = BIT(1),
@@ -2509,13 +2504,6 @@ void __user *__x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu);
bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu);
-static inline bool kvm_irq_is_postable(struct kvm_lapic_irq *irq)
-{
- /* We can only post Fixed and LowPrio IRQs */
- return (irq->delivery_mode == APIC_DM_FIXED ||
- irq->delivery_mode == APIC_DM_LOWEST);
-}
-
static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
{
kvm_x86_call(vcpu_blocking)(vcpu);
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 8c62c6d4d5c1..727245a6ab34 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -423,6 +423,13 @@ void kvm_arch_irq_routing_update(struct kvm *kvm)
kvm_make_scan_ioapic_request(kvm);
}
+static bool kvm_irq_is_postable(struct kvm_lapic_irq *irq)
+{
+ /* We can only post Fixed and LowPrio IRQs */
+ return (irq->delivery_mode == APIC_DM_FIXED ||
+ irq->delivery_mode == APIC_DM_LOWEST);
+}
+
static int kvm_pi_update_irte(struct kvm_kernel_irqfd *irqfd,
struct kvm_kernel_irq_routing_entry *entry)
{
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 71970213dc1f..32f09b25884a 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -237,6 +237,11 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
return lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events);
}
+static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
+{
+ return dest_mode_logical ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL;
+}
+
bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
bool kvm_lapic_suppress_eoi_broadcast(struct kvm_lapic *apic);
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* Re: [PATCH v4 05/30] KVM: x86: Move local APIC specific helpers out of asm/kvm_host.h
2026-06-13 0:03 ` [PATCH v4 05/30] KVM: x86: Move local APIC specific helpers out of asm/kvm_host.h Sean Christopherson
@ 2026-06-15 5:47 ` Binbin Wu
2026-06-15 16:06 ` Sean Christopherson
0 siblings, 1 reply; 64+ messages in thread
From: Binbin Wu @ 2026-06-15 5:47 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Yosry Ahmed,
Kai Huang
On 6/13/2026 8:03 AM, Sean Christopherson wrote:
> Move single-use local APIC IRQ helpers out of asm/kvm_host.h so that they
Nit:
Not sure if it's accurate to say kvm_lapic_irq_dest_mode() is a single-use
helper.
> are co-located with their user, and not exposed to the broader world.
>
> No functional change intended.
>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Yosry Ahmed <yosry@kernel.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/include/asm/kvm_host.h | 12 ------------
> arch/x86/kvm/irq.c | 7 +++++++
> arch/x86/kvm/lapic.h | 5 +++++
> 3 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3886b536c8a5..71e575d011e7 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1772,11 +1772,6 @@ struct kvm_lapic_irq {
> bool msi_redir_hint;
> };
>
> -static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
> -{
> - return dest_mode_logical ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL;
> -}
> -
> enum kvm_x86_run_flags {
> KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0),
> KVM_RUN_LOAD_GUEST_DR6 = BIT(1),
> @@ -2509,13 +2504,6 @@ void __user *__x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
> bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu);
> bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu);
>
> -static inline bool kvm_irq_is_postable(struct kvm_lapic_irq *irq)
> -{
> - /* We can only post Fixed and LowPrio IRQs */
> - return (irq->delivery_mode == APIC_DM_FIXED ||
> - irq->delivery_mode == APIC_DM_LOWEST);
> -}
> -
> static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
> {
> kvm_x86_call(vcpu_blocking)(vcpu);
> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> index 8c62c6d4d5c1..727245a6ab34 100644
> --- a/arch/x86/kvm/irq.c
> +++ b/arch/x86/kvm/irq.c
> @@ -423,6 +423,13 @@ void kvm_arch_irq_routing_update(struct kvm *kvm)
> kvm_make_scan_ioapic_request(kvm);
> }
>
> +static bool kvm_irq_is_postable(struct kvm_lapic_irq *irq)
> +{
> + /* We can only post Fixed and LowPrio IRQs */
> + return (irq->delivery_mode == APIC_DM_FIXED ||
> + irq->delivery_mode == APIC_DM_LOWEST);
> +}
> +
> static int kvm_pi_update_irte(struct kvm_kernel_irqfd *irqfd,
> struct kvm_kernel_irq_routing_entry *entry)
> {
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 71970213dc1f..32f09b25884a 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -237,6 +237,11 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
> return lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events);
> }
>
> +static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
> +{
> + return dest_mode_logical ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL;
> +}
> +
> bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
>
> bool kvm_lapic_suppress_eoi_broadcast(struct kvm_lapic *apic);
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH v4 05/30] KVM: x86: Move local APIC specific helpers out of asm/kvm_host.h
2026-06-15 5:47 ` Binbin Wu
@ 2026-06-15 16:06 ` Sean Christopherson
0 siblings, 0 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-15 16:06 UTC (permalink / raw)
To: Binbin Wu
Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Yosry Ahmed,
Kai Huang
On Mon, Jun 15, 2026, Binbin Wu wrote:
> On 6/13/2026 8:03 AM, Sean Christopherson wrote:
> > Move single-use local APIC IRQ helpers out of asm/kvm_host.h so that they
>
> Nit:
> Not sure if it's accurate to say kvm_lapic_irq_dest_mode() is a single-use
> helper.
Yeah, I'll rephrase the changelog. No idea what happened, maybe I wrote the
changelog without looking ioapic.c code?
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v4 06/30] KVM: x86: Move kvm_caps and kvm_host_values to asm/kvm_host.h
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (4 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 05/30] KVM: x86: Move local APIC specific helpers out of asm/kvm_host.h Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-13 9:01 ` Xiaoyao Li
2026-06-13 0:03 ` [PATCH v4 07/30] KVM: x86: Swap the include order between x86.h and mmu.h Sean Christopherson
` (23 subsequent siblings)
29 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Relocate the kvm_caps and kvm_host_values struct definitions and their
associated global variable declarations to asm/kvm_host.h to allow for a
variety of cleanups in x86.h and mmu.h, and to establish a (hopefully)
maintainable rule that asm/kvm_host.h's role is to define common
structures (and declare any associated globals), and anything needed by
arch-neutral KVM.
While it would be lovely to trim kvm_host.h down to the point where it
*only* holds things needed by arch-neutral and/or non-KVM code, multiple
attempts to do just that have failed miserably. Trying to "hide" code
from arch-neutral KVM is too restrictive (and ultimately pointless), and
KVM x86 itself also needs a place to define common structures and their
globals, e.g. to avoid inconsistent header include chains and/or misplaced
helpers.
E.g. as pointed out by Kai, it's weird that x86.h, which is a kitchen sink
of sorts, includes regs.h, but not mmu.h. Literally the only reason that
x86.h doesn't include mmu.h is that mmu.h references kvm_host, which is
currently defined in x86.h. As a result of odd include ordering, the
very clearly MMU-specific helper mmu_is_nested() lives in x86.h, not mmu.h
"Fix" the kvm_host dependency so that x86.h can be the "central" include
everyone expects it to be, and set KVM x86 on the path to having somewhat
sensible "rules" for what goes where:
- asm/kvm_host.h holds "common" structure definitions and associated key
global variables, and things that are referenced by arch-neutral KVM.
- <thing>.{c,h} holds relevant declarations and definitions.
- x86.{c,h} is the kitchen sink for everything else.
Cc: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 44 ++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.h | 45 ---------------------------------
2 files changed, 44 insertions(+), 45 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 71e575d011e7..57e67bad4abe 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -315,6 +315,50 @@ enum x86_intercept_stage;
struct kvm_kernel_irqfd;
struct kvm_kernel_irq_routing_entry;
+struct kvm_caps {
+ /* control of guest tsc rate supported? */
+ bool has_tsc_control;
+ /* maximum supported tsc_khz for guests */
+ u32 max_guest_tsc_khz;
+ /* number of bits of the fractional part of the TSC scaling ratio */
+ u8 tsc_scaling_ratio_frac_bits;
+ /* maximum allowed value of TSC scaling ratio */
+ u64 max_tsc_scaling_ratio;
+ /* 1ull << kvm_caps.tsc_scaling_ratio_frac_bits */
+ u64 default_tsc_scaling_ratio;
+ /* bus lock detection supported? */
+ bool has_bus_lock_exit;
+ /* notify VM exit supported? */
+ bool has_notify_vmexit;
+ /* bit mask of VM types */
+ u32 supported_vm_types;
+
+ u64 supported_mce_cap;
+ u64 supported_xcr0;
+ u64 supported_xss;
+ u64 supported_perf_cap;
+
+ u64 supported_quirks;
+ u64 inapplicable_quirks;
+};
+extern struct kvm_caps kvm_caps;
+
+struct kvm_host_values {
+ /*
+ * The host's raw MAXPHYADDR, i.e. the number of non-reserved physical
+ * address bits irrespective of features that repurpose legal bits,
+ * e.g. MKTME.
+ */
+ u8 maxphyaddr;
+
+ u64 efer;
+ u64 xcr0;
+ u64 xss;
+ u64 s_cet;
+ u64 arch_capabilities;
+};
+extern struct kvm_host_values kvm_host;
+
/*
* kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
* also includes TDP pages) to determine whether or not a page can be used in
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 80ed36d5d62a..b7d3b54cde15 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -12,48 +12,6 @@
#define KVM_MAX_MCE_BANKS 32
-struct kvm_caps {
- /* control of guest tsc rate supported? */
- bool has_tsc_control;
- /* maximum supported tsc_khz for guests */
- u32 max_guest_tsc_khz;
- /* number of bits of the fractional part of the TSC scaling ratio */
- u8 tsc_scaling_ratio_frac_bits;
- /* maximum allowed value of TSC scaling ratio */
- u64 max_tsc_scaling_ratio;
- /* 1ull << kvm_caps.tsc_scaling_ratio_frac_bits */
- u64 default_tsc_scaling_ratio;
- /* bus lock detection supported? */
- bool has_bus_lock_exit;
- /* notify VM exit supported? */
- bool has_notify_vmexit;
- /* bit mask of VM types */
- u32 supported_vm_types;
-
- u64 supported_mce_cap;
- u64 supported_xcr0;
- u64 supported_xss;
- u64 supported_perf_cap;
-
- u64 supported_quirks;
- u64 inapplicable_quirks;
-};
-
-struct kvm_host_values {
- /*
- * The host's raw MAXPHYADDR, i.e. the number of non-reserved physical
- * address bits irrespective of features that repurpose legal bits,
- * e.g. MKTME.
- */
- u8 maxphyaddr;
-
- u64 efer;
- u64 xcr0;
- u64 xss;
- u64 s_cet;
- u64 arch_capabilities;
-};
-
void kvm_spurious_fault(void);
#define SIZE_OF_MEMSLOTS_HASHTABLE \
@@ -417,9 +375,6 @@ fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu);
fastpath_t handle_fastpath_invd(struct kvm_vcpu *vcpu);
-extern struct kvm_caps kvm_caps;
-extern struct kvm_host_values kvm_host;
-
void kvm_setup_xss_caps(void);
/*
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* Re: [PATCH v4 06/30] KVM: x86: Move kvm_caps and kvm_host_values to asm/kvm_host.h
2026-06-13 0:03 ` [PATCH v4 06/30] KVM: x86: Move kvm_caps and kvm_host_values to asm/kvm_host.h Sean Christopherson
@ 2026-06-13 9:01 ` Xiaoyao Li
2026-06-15 6:49 ` Binbin Wu
0 siblings, 1 reply; 64+ messages in thread
From: Xiaoyao Li @ 2026-06-13 9:01 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
On 6/13/2026 8:03 AM, Sean Christopherson wrote:
> Relocate the kvm_caps and kvm_host_values struct definitions and their
> associated global variable declarations to asm/kvm_host.h to allow for a
> variety of cleanups in x86.h and mmu.h, and to establish a (hopefully)
> maintainable rule that asm/kvm_host.h's role is to define common
> structures (and declare any associated globals), and anything needed by
> arch-neutral KVM.
>
> While it would be lovely to trim kvm_host.h down to the point where it
> *only* holds things needed by arch-neutral and/or non-KVM code, multiple
> attempts to do just that have failed miserably. Trying to "hide" code
> from arch-neutral KVM is too restrictive (and ultimately pointless), and
> KVM x86 itself also needs a place to define common structures and their
> globals, e.g. to avoid inconsistent header include chains and/or misplaced
> helpers.
>
> E.g. as pointed out by Kai, it's weird that x86.h, which is a kitchen sink
> of sorts, includes regs.h, but not mmu.h. Literally the only reason that
> x86.h doesn't include mmu.h is that mmu.h references kvm_host, which is
> currently defined in x86.h. As a result of odd include ordering, the
> very clearly MMU-specific helper mmu_is_nested() lives in x86.h, not mmu.h
>
> "Fix" the kvm_host dependency so that x86.h can be the "central" include
> everyone expects it to be, and set KVM x86 on the path to having somewhat
> sensible "rules" for what goes where:
>
> - asm/kvm_host.h holds "common" structure definitions and associated key
> global variables, and things that are referenced by arch-neutral KVM.
I'm confused by the term "arch-neutral" all over the changelog. I
suppose include/linux/kvm_host.h is arch-neutral while asm/kvm_host.h is
arch-specific but not KVM internal only.
> - <thing>.{c,h} holds relevant declarations and definitions.
> - x86.{c,h} is the kitchen sink for everything else.
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH v4 06/30] KVM: x86: Move kvm_caps and kvm_host_values to asm/kvm_host.h
2026-06-13 9:01 ` Xiaoyao Li
@ 2026-06-15 6:49 ` Binbin Wu
2026-06-15 16:24 ` Sean Christopherson
0 siblings, 1 reply; 64+ messages in thread
From: Binbin Wu @ 2026-06-15 6:49 UTC (permalink / raw)
To: Xiaoyao Li, Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
On 6/13/2026 5:01 PM, Xiaoyao Li wrote:
> On 6/13/2026 8:03 AM, Sean Christopherson wrote:
>> Relocate the kvm_caps and kvm_host_values struct definitions and their
>> associated global variable declarations to asm/kvm_host.h to allow for a
>> variety of cleanups in x86.h and mmu.h, and to establish a (hopefully)
>> maintainable rule that asm/kvm_host.h's role is to define common
>> structures (and declare any associated globals), and anything needed by
>> arch-neutral KVM.
>>
>> While it would be lovely to trim kvm_host.h down to the point where it
>> *only* holds things needed by arch-neutral and/or non-KVM code, multiple
>> attempts to do just that have failed miserably. Trying to "hide" code
>> from arch-neutral KVM is too restrictive (and ultimately pointless), and
>> KVM x86 itself also needs a place to define common structures and their
>> globals, e.g. to avoid inconsistent header include chains and/or misplaced
>> helpers.
>>
>> E.g. as pointed out by Kai, it's weird that x86.h, which is a kitchen sink
>> of sorts, includes regs.h, but not mmu.h. Literally the only reason that
>> x86.h doesn't include mmu.h is that mmu.h references kvm_host, which is
>> currently defined in x86.h. As a result of odd include ordering, the
>> very clearly MMU-specific helper mmu_is_nested() lives in x86.h, not mmu.h
>>
>> "Fix" the kvm_host dependency so that x86.h can be the "central" include
>> everyone expects it to be, and set KVM x86 on the path to having somewhat
>> sensible "rules" for what goes where:
>>
>> - asm/kvm_host.h holds "common" structure definitions and associated key
>> global variables, and things that are referenced by arch-neutral KVM.
>
> I'm confused by the term "arch-neutral" all over the changelog. I suppose include/linux/kvm_host.h is arch-neutral while asm/kvm_host.h is arch-specific but not KVM internal only.
IIUC, it's the situation where asm/kvm_host.h for x86 provides the
x86-specific pieces that the generic/arch-neutral KVM depends on.
E.g. struct kvm_arch is defined in asm/kvm_host.h, which is x86-specific, but
the generic KVM embeds it in struct kvm as 'struct kvm_arch arch' in
include/linux/kvm_host.h, i.e. "referenced by arch-neutral KVM".
>
>> - <thing>.{c,h} holds relevant declarations and definitions.
>> - x86.{c,h} is the kitchen sink for everything else.
>
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH v4 06/30] KVM: x86: Move kvm_caps and kvm_host_values to asm/kvm_host.h
2026-06-15 6:49 ` Binbin Wu
@ 2026-06-15 16:24 ` Sean Christopherson
0 siblings, 0 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-15 16:24 UTC (permalink / raw)
To: Binbin Wu
Cc: Xiaoyao Li, Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel,
Yosry Ahmed, Kai Huang
On Mon, Jun 15, 2026, Binbin Wu wrote:
> On 6/13/2026 5:01 PM, Xiaoyao Li wrote:
> > On 6/13/2026 8:03 AM, Sean Christopherson wrote:
> >> Relocate the kvm_caps and kvm_host_values struct definitions and their
> >> associated global variable declarations to asm/kvm_host.h to allow for a
> >> variety of cleanups in x86.h and mmu.h, and to establish a (hopefully)
> >> maintainable rule that asm/kvm_host.h's role is to define common
> >> structures (and declare any associated globals), and anything needed by
> >> arch-neutral KVM.
> >>
> >> While it would be lovely to trim kvm_host.h down to the point where it
> >> *only* holds things needed by arch-neutral and/or non-KVM code, multiple
> >> attempts to do just that have failed miserably. Trying to "hide" code
> >> from arch-neutral KVM is too restrictive (and ultimately pointless), and
> >> KVM x86 itself also needs a place to define common structures and their
> >> globals, e.g. to avoid inconsistent header include chains and/or misplaced
> >> helpers.
> >>
> >> E.g. as pointed out by Kai, it's weird that x86.h, which is a kitchen sink
> >> of sorts, includes regs.h, but not mmu.h. Literally the only reason that
> >> x86.h doesn't include mmu.h is that mmu.h references kvm_host, which is
> >> currently defined in x86.h. As a result of odd include ordering, the
> >> very clearly MMU-specific helper mmu_is_nested() lives in x86.h, not mmu.h
> >>
> >> "Fix" the kvm_host dependency so that x86.h can be the "central" include
> >> everyone expects it to be, and set KVM x86 on the path to having somewhat
> >> sensible "rules" for what goes where:
> >>
> >> - asm/kvm_host.h holds "common" structure definitions and associated key
> >> global variables, and things that are referenced by arch-neutral KVM.
> >
> > I'm confused by the term "arch-neutral" all over the changelog. I suppose
"arch-neutral" refers to virt/kvm code and other code that isn't arch specific.
I used arch-neutral here, e.g. instead of "common code", to differentiate between
arch-neutral (virt/kvm) code and common x86 (arch/x86/kvm) code.
> > include/linux/kvm_host.h is arch-neutral while asm/kvm_host.h is
> > arch-specific but not KVM internal only.
>
> IIUC, it's the situation where asm/kvm_host.h for x86 provides the
> x86-specific pieces that the generic/arch-neutral KVM depends on.
> E.g. struct kvm_arch is defined in asm/kvm_host.h, which is x86-specific, but
> the generic KVM embeds it in struct kvm as 'struct kvm_arch arch' in
> include/linux/kvm_host.h, i.e. "referenced by arch-neutral KVM".
Yep, exactly.
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v4 07/30] KVM: x86: Swap the include order between x86.h and mmu.h
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (5 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 06/30] KVM: x86: Move kvm_caps and kvm_host_values to asm/kvm_host.h Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-15 7:26 ` Binbin Wu
2026-06-13 0:03 ` [PATCH v4 08/30] KVM: x86: Move tdp_enabled from kvm_host.h to mmu.h Sean Christopherson
` (22 subsequent siblings)
29 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Invert the include ordering between x86.h and mmu.h, and move
mmu_is_nested() to mmu.h where it belongs (mmu_is_nested()'s placement in
x86.h was solely responsible for the existing ordering), so that x86.h is
not included by most KVM internal headers (but includes them).
No functional change intended.
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu.h | 6 +++++-
arch/x86/kvm/x86.h | 6 +-----
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index e1bb663ebbd5..28fca48dcf64 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -4,7 +4,6 @@
#include <linux/kvm_host.h>
#include "regs.h"
-#include "x86.h"
#include "cpuid.h"
extern bool __read_mostly enable_mmio_caching;
@@ -300,6 +299,11 @@ static inline void kvm_update_page_stats(struct kvm *kvm, int level, int count)
atomic64_add(count, &kvm->stat.pages[level - 1]);
}
+static inline bool mmu_is_nested(struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.walk_mmu == &vcpu->arch.nested_mmu;
+}
+
static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *vcpu,
struct kvm_mmu *mmu,
gpa_t gpa, u64 access,
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index b7d3b54cde15..a0e68eaf1f80 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -6,6 +6,7 @@
#include <asm/fpu/xstate.h>
#include <asm/mce.h>
#include <asm/pvclock.h>
+#include "mmu.h"
#include "regs.h"
#include "kvm_emulate.h"
#include "cpuid.h"
@@ -210,11 +211,6 @@ static inline bool x86_exception_has_error_code(unsigned int vector)
return (1U << vector) & exception_has_error_code;
}
-static inline bool mmu_is_nested(struct kvm_vcpu *vcpu)
-{
- return vcpu->arch.walk_mmu == &vcpu->arch.nested_mmu;
-}
-
static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
{
return kvm_is_cr4_bit_set(vcpu, X86_CR4_LA57) ? 57 : 48;
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* Re: [PATCH v4 07/30] KVM: x86: Swap the include order between x86.h and mmu.h
2026-06-13 0:03 ` [PATCH v4 07/30] KVM: x86: Swap the include order between x86.h and mmu.h Sean Christopherson
@ 2026-06-15 7:26 ` Binbin Wu
0 siblings, 0 replies; 64+ messages in thread
From: Binbin Wu @ 2026-06-15 7:26 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Yosry Ahmed,
Kai Huang
On 6/13/2026 8:03 AM, Sean Christopherson wrote:
> Invert the include ordering between x86.h and mmu.h, and move
> mmu_is_nested() to mmu.h where it belongs (mmu_is_nested()'s placement in
> x86.h was solely responsible for the existing ordering), so that x86.h is
> not included by most KVM internal headers (but includes them).
>
> No functional change intended.
>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v4 08/30] KVM: x86: Move tdp_enabled from kvm_host.h to mmu.h
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (6 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 07/30] KVM: x86: Swap the include order between x86.h and mmu.h Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-15 7:33 ` Binbin Wu
2026-06-13 0:03 ` [PATCH v4 09/30] KVM: x86: Move eager_page_split to mmu.{c,h} Sean Christopherson
` (21 subsequent siblings)
29 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Relocated the declaration of tdp_enabled into mmu.h, and opportunistically
hoist tdp_mmu_enabled up to the top so that the two are co-located.
No functional change intended.
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 2 --
arch/x86/kvm/mmu.h | 12 ++++++------
2 files changed, 6 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 57e67bad4abe..153ed5611c2e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2188,8 +2188,6 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
-extern bool tdp_enabled;
-
/*
* EMULTYPE_NO_DECODE - Set when re-emulating an instruction (after completing
* userspace I/O) to indicate that the emulation context
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 28fca48dcf64..0eaea2d4fac9 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -6,6 +6,12 @@
#include "regs.h"
#include "cpuid.h"
+extern bool tdp_enabled;
+#ifdef CONFIG_X86_64
+extern bool tdp_mmu_enabled;
+#else
+#define tdp_mmu_enabled false
+#endif
extern bool __read_mostly enable_mmio_caching;
#define PT_WRITABLE_SHIFT 1
@@ -260,12 +266,6 @@ static inline bool kvm_shadow_root_allocated(struct kvm *kvm)
return smp_load_acquire(&kvm->arch.shadow_root_allocated);
}
-#ifdef CONFIG_X86_64
-extern bool tdp_mmu_enabled;
-#else
-#define tdp_mmu_enabled false
-#endif
-
int kvm_tdp_mmu_map_private_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn);
static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* [PATCH v4 09/30] KVM: x86: Move eager_page_split to mmu.{c,h}
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (7 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 08/30] KVM: x86: Move tdp_enabled from kvm_host.h to mmu.h Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-15 7:49 ` Binbin Wu
2026-06-13 0:03 ` [PATCH v4 10/30] KVM: x86/hyperv: Eliminate an unnecessary include of x86.h in hyperv.h Sean Christopherson
` (20 subsequent siblings)
29 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Move KVM's eager_page_split module param to the MMU, as it is very much an
MMU knob.
No functional change intended.
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu.h | 1 +
arch/x86/kvm/mmu/mmu.c | 3 +++
arch/x86/kvm/x86.c | 3 ---
arch/x86/kvm/x86.h | 2 --
4 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 0eaea2d4fac9..d30676935fff 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -13,6 +13,7 @@ extern bool tdp_mmu_enabled;
#define tdp_mmu_enabled false
#endif
extern bool __read_mostly enable_mmio_caching;
+extern bool eager_page_split;
#define PT_WRITABLE_SHIFT 1
#define PT_USER_SHIFT 2
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9368a71336fe..313106e84a64 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -114,6 +114,9 @@ module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0444);
EXPORT_SYMBOL_FOR_KVM_INTERNAL(tdp_mmu_enabled);
#endif
+bool __read_mostly eager_page_split = true;
+module_param(eager_page_split, bool, 0644);
+
static int max_huge_page_level __read_mostly;
static int tdp_root_level __read_mostly;
static int max_tdp_level __read_mostly;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9eb46b370fad..3813078cc028 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -177,9 +177,6 @@ module_param(force_emulation_prefix, int, 0644);
int __read_mostly pi_inject_timer = -1;
module_param(pi_inject_timer, bint, 0644);
-bool __read_mostly eager_page_split = true;
-module_param(eager_page_split, bool, 0644);
-
/* Enable/disable SMT_RSB bug mitigation */
static bool __read_mostly mitigate_smt_rsb;
module_param(mitigate_smt_rsb, bool, 0444);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a0e68eaf1f80..635a21bfa681 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -415,8 +415,6 @@ extern int pi_inject_timer;
extern bool report_ignored_msrs;
-extern bool eager_page_split;
-
static inline void kvm_pr_unimpl_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
{
if (report_ignored_msrs)
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* Re: [PATCH v4 09/30] KVM: x86: Move eager_page_split to mmu.{c,h}
2026-06-13 0:03 ` [PATCH v4 09/30] KVM: x86: Move eager_page_split to mmu.{c,h} Sean Christopherson
@ 2026-06-15 7:49 ` Binbin Wu
0 siblings, 0 replies; 64+ messages in thread
From: Binbin Wu @ 2026-06-15 7:49 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Yosry Ahmed,
Kai Huang
On 6/13/2026 8:03 AM, Sean Christopherson wrote:
> Move KVM's eager_page_split module param to the MMU, as it is very much an
> MMU knob.
>
> No functional change intended.
>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Yosry Ahmed <yosry@kernel.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/kvm/mmu.h | 1 +
> arch/x86/kvm/mmu/mmu.c | 3 +++
> arch/x86/kvm/x86.c | 3 ---
> arch/x86/kvm/x86.h | 2 --
> 4 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 0eaea2d4fac9..d30676935fff 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -13,6 +13,7 @@ extern bool tdp_mmu_enabled;
> #define tdp_mmu_enabled false
> #endif
> extern bool __read_mostly enable_mmio_caching;
> +extern bool eager_page_split;
Side topic, the attributes don't take effect for declarations, but in general,
does KVM/kernel coding style have a preference to keep it for
consistency/annotation or just drop it?
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v4 10/30] KVM: x86/hyperv: Eliminate an unnecessary include of x86.h in hyperv.h
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (8 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 09/30] KVM: x86: Move eager_page_split to mmu.{c,h} Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-15 7:52 ` Binbin Wu
2026-06-13 0:03 ` [PATCH v4 11/30] KVM: x86: Move kvm_{load,put}_guest_fpu() to fpu.h Sean Christopherson
` (19 subsequent siblings)
29 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Drop an mostly unused include of x86.h from hyperv.h, and instead pull in
regs.h, which is need for at least is_guest_mode(). This eliminates the
last include of x86.h from a common x86 header, i.e. solidifies that x86.h
is the top of the pyramid.
Add a missing x86.h include in cpuid.c to avoid build breakage.
No functional change intended.
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/cpuid.c | 1 +
arch/x86/kvm/hyperv.h | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 591d2294acd7..2698fa42cd97 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -28,6 +28,7 @@
#include "trace.h"
#include "pmu.h"
#include "xen.h"
+#include "x86.h"
/*
* Unlike "struct cpuinfo_x86.x86_capability", kvm_cpu_caps doesn't need to be
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 65e89ed65349..1c8f7aaab063 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -22,7 +22,8 @@
#define __ARCH_X86_KVM_HYPERV_H__
#include <linux/kvm_host.h>
-#include "x86.h"
+
+#include "regs.h"
#ifdef CONFIG_KVM_HYPERV
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* Re: [PATCH v4 10/30] KVM: x86/hyperv: Eliminate an unnecessary include of x86.h in hyperv.h
2026-06-13 0:03 ` [PATCH v4 10/30] KVM: x86/hyperv: Eliminate an unnecessary include of x86.h in hyperv.h Sean Christopherson
@ 2026-06-15 7:52 ` Binbin Wu
0 siblings, 0 replies; 64+ messages in thread
From: Binbin Wu @ 2026-06-15 7:52 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Yosry Ahmed,
Kai Huang
On 6/13/2026 8:03 AM, Sean Christopherson wrote:
> Drop an mostly unused include of x86.h from hyperv.h, and instead pull in
^
a> regs.h, which is need for at least is_guest_mode(). This eliminates the
^
needed> last include of x86.h from a common x86 header, i.e. solidifies that x86.h
> is the top of the pyramid.
>
> Add a missing x86.h include in cpuid.c to avoid build breakage.
>
> No functional change intended.
>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/kvm/cpuid.c | 1 +
> arch/x86/kvm/hyperv.h | 3 ++-
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 591d2294acd7..2698fa42cd97 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -28,6 +28,7 @@
> #include "trace.h"
> #include "pmu.h"
> #include "xen.h"
> +#include "x86.h"
>
> /*
> * Unlike "struct cpuinfo_x86.x86_capability", kvm_cpu_caps doesn't need to be
> diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> index 65e89ed65349..1c8f7aaab063 100644
> --- a/arch/x86/kvm/hyperv.h
> +++ b/arch/x86/kvm/hyperv.h
> @@ -22,7 +22,8 @@
> #define __ARCH_X86_KVM_HYPERV_H__
>
> #include <linux/kvm_host.h>
> -#include "x86.h"
> +
> +#include "regs.h"
>
> #ifdef CONFIG_KVM_HYPERV
>
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v4 11/30] KVM: x86: Move kvm_{load,put}_guest_fpu() to fpu.h
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (9 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 10/30] KVM: x86/hyperv: Eliminate an unnecessary include of x86.h in hyperv.h Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-15 8:13 ` Binbin Wu
2026-06-13 0:03 ` [PATCH v4 12/30] KVM: x86: Extract get/set MSR (list) ioctl logic to helpers Sean Christopherson
` (18 subsequent siblings)
29 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Move the kvm_{load,put}_guest_fpu() helpers to fpu.h in anticipation of
moving the bulk of KVM's register specific code out of x86.c.
No functional change intended.
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/fpu.h | 26 ++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 24 ------------------------
2 files changed, 26 insertions(+), 24 deletions(-)
diff --git a/arch/x86/kvm/fpu.h b/arch/x86/kvm/fpu.h
index f898781b6a06..6b7b628f530d 100644
--- a/arch/x86/kvm/fpu.h
+++ b/arch/x86/kvm/fpu.h
@@ -3,8 +3,34 @@
#ifndef __KVM_FPU_H_
#define __KVM_FPU_H_
+#include <linux/kvm_host.h>
+
+#include <trace/events/kvm.h>
+
#include <asm/fpu/api.h>
+/* Swap (qemu) user FPU context for the guest FPU context. */
+static inline void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
+{
+ if (KVM_BUG_ON(vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm))
+ return;
+
+ /* Exclude PKRU, it's restored separately immediately after VM-Exit. */
+ fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, true);
+ trace_kvm_fpu(1);
+}
+
+/* When vcpu_run ends, restore user space FPU context. */
+static inline void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
+{
+ if (KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm))
+ return;
+
+ fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, false);
+ ++vcpu->stat.fpu_reload;
+ trace_kvm_fpu(0);
+}
+
typedef u32 __attribute__((vector_size(16))) sse128_t;
#define __sse128_u union { sse128_t vec; u64 as_u64[2]; u32 as_u32[4]; }
#define sse128_lo(x) ({ __sse128_u t; t.vec = x; t.as_u64[0]; })
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3813078cc028..8f9a1572a39f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -133,8 +133,6 @@ static void store_regs(struct kvm_vcpu *vcpu);
static int sync_regs(struct kvm_vcpu *vcpu);
static DEFINE_MUTEX(vendor_module_lock);
-static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
-static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
struct kvm_x86_ops kvm_x86_ops __read_mostly;
@@ -11432,28 +11430,6 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
return 0;
}
-/* Swap (qemu) user FPU context for the guest FPU context. */
-static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
-{
- if (KVM_BUG_ON(vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm))
- return;
-
- /* Exclude PKRU, it's restored separately immediately after VM-Exit. */
- fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, true);
- trace_kvm_fpu(1);
-}
-
-/* When vcpu_run ends, restore user space FPU context. */
-static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
-{
- if (KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm))
- return;
-
- fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, false);
- ++vcpu->stat.fpu_reload;
- trace_kvm_fpu(0);
-}
-
static int kvm_x86_vcpu_pre_run(struct kvm_vcpu *vcpu)
{
/*
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* Re: [PATCH v4 11/30] KVM: x86: Move kvm_{load,put}_guest_fpu() to fpu.h
2026-06-13 0:03 ` [PATCH v4 11/30] KVM: x86: Move kvm_{load,put}_guest_fpu() to fpu.h Sean Christopherson
@ 2026-06-15 8:13 ` Binbin Wu
2026-06-15 16:31 ` Sean Christopherson
0 siblings, 1 reply; 64+ messages in thread
From: Binbin Wu @ 2026-06-15 8:13 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Yosry Ahmed,
Kai Huang
On 6/13/2026 8:03 AM, Sean Christopherson wrote:
> Move the kvm_{load,put}_guest_fpu() helpers to fpu.h in anticipation of
> moving the bulk of KVM's register specific code out of x86.c.
Nit:
I think patch 4 did the moving, so it's not "in anticipation of" anymore?
>
> No functional change intended.
>
> Reviewed-by: Yosry Ahmed <yosry@kernel.org>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH v4 11/30] KVM: x86: Move kvm_{load,put}_guest_fpu() to fpu.h
2026-06-15 8:13 ` Binbin Wu
@ 2026-06-15 16:31 ` Sean Christopherson
0 siblings, 0 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-15 16:31 UTC (permalink / raw)
To: Binbin Wu
Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Yosry Ahmed,
Kai Huang
On Mon, Jun 15, 2026, Binbin Wu wrote:
> On 6/13/2026 8:03 AM, Sean Christopherson wrote:
> > Move the kvm_{load,put}_guest_fpu() helpers to fpu.h in anticipation of
> > moving the bulk of KVM's register specific code out of x86.c.
>
> Nit:
> I think patch 4 did the moving, so it's not "in anticipation of" anymore?
Oh, nice catch. It's a typo, it should be MSR, not register. I'll change to
this:
Move the kvm_{load,put}_guest_fpu() helpers to fpu.h in anticipation of
moving the bulk of KVM's MSR specific code out of x86.c (KVM needs to load
and put the FPU when accessing MSRs that are managed via XSTATE, a.k.a. the
so called "FPU").
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v4 12/30] KVM: x86: Extract get/set MSR (list) ioctl logic to helpers
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (10 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 11/30] KVM: x86: Move kvm_{load,put}_guest_fpu() to fpu.h Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-15 8:30 ` Binbin Wu
2026-06-13 0:03 ` [PATCH v4 13/30] KVM: x86: Expose several TSC helpers via x86.h for use by MSR code Sean Christopherson
` (17 subsequent siblings)
29 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Extract the code for getting/setting MSRs and MSR lists to dedicated
helpers in anticipation of moving the MSR code to a new msrs.c.
No functional change intended.
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 135 ++++++++++++++++++++++++++-------------------
1 file changed, 78 insertions(+), 57 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f9a1572a39f..ece8d53fb7fe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4602,6 +4602,61 @@ static int kvm_x86_dev_has_attr(struct kvm_device_attr *attr)
return __kvm_x86_dev_get_attr(attr, &val);
}
+static int kvm_get_msr_index_list(struct kvm_msr_list __user *user_msr_list)
+{
+ struct kvm_msr_list msr_list;
+ unsigned int n;
+
+ if (copy_from_user(&msr_list, user_msr_list, sizeof(msr_list)))
+ return -EFAULT;
+
+ n = msr_list.nmsrs;
+ msr_list.nmsrs = num_msrs_to_save + num_emulated_msrs;
+ if (copy_to_user(user_msr_list, &msr_list, sizeof(msr_list)))
+ return -EFAULT;
+
+ if (n < msr_list.nmsrs)
+ return -E2BIG;
+
+ if (copy_to_user(user_msr_list->indices, &msrs_to_save,
+ num_msrs_to_save * sizeof(u32)))
+ return -EFAULT;
+
+ if (copy_to_user(user_msr_list->indices + num_msrs_to_save,
+ &emulated_msrs, num_emulated_msrs * sizeof(u32)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int kvm_get_feature_msr_index_list(struct kvm_msr_list __user *user_msr_list)
+{
+ struct kvm_msr_list msr_list;
+ unsigned int n;
+
+ if (copy_from_user(&msr_list, user_msr_list, sizeof(msr_list)))
+ return -EFAULT;
+
+ n = msr_list.nmsrs;
+ msr_list.nmsrs = num_msr_based_features;
+ if (copy_to_user(user_msr_list, &msr_list, sizeof(msr_list)))
+ return -EFAULT;
+
+ if (n < msr_list.nmsrs)
+ return -E2BIG;
+
+ if (copy_to_user(user_msr_list->indices, &msr_based_features,
+ num_msr_based_features * sizeof(u32)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int kvm_get_feature_msrs(struct kvm_msrs __user *user_msrs)
+{
+ return msr_io(NULL, user_msrs, do_get_feature_msr, 1);
+}
+
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -4609,32 +4664,9 @@ long kvm_arch_dev_ioctl(struct file *filp,
long r;
switch (ioctl) {
- case KVM_GET_MSR_INDEX_LIST: {
- struct kvm_msr_list __user *user_msr_list = argp;
- struct kvm_msr_list msr_list;
- unsigned n;
-
- r = -EFAULT;
- if (copy_from_user(&msr_list, user_msr_list, sizeof(msr_list)))
- goto out;
- n = msr_list.nmsrs;
- msr_list.nmsrs = num_msrs_to_save + num_emulated_msrs;
- if (copy_to_user(user_msr_list, &msr_list, sizeof(msr_list)))
- goto out;
- r = -E2BIG;
- if (n < msr_list.nmsrs)
- goto out;
- r = -EFAULT;
- if (copy_to_user(user_msr_list->indices, &msrs_to_save,
- num_msrs_to_save * sizeof(u32)))
- goto out;
- if (copy_to_user(user_msr_list->indices + num_msrs_to_save,
- &emulated_msrs,
- num_emulated_msrs * sizeof(u32)))
- goto out;
- r = 0;
+ case KVM_GET_MSR_INDEX_LIST:
+ r = kvm_get_msr_index_list(argp);
break;
- }
case KVM_GET_SUPPORTED_CPUID:
case KVM_GET_EMULATED_CPUID: {
struct kvm_cpuid2 __user *cpuid_arg = argp;
@@ -4662,30 +4694,11 @@ long kvm_arch_dev_ioctl(struct file *filp,
goto out;
r = 0;
break;
- case KVM_GET_MSR_FEATURE_INDEX_LIST: {
- struct kvm_msr_list __user *user_msr_list = argp;
- struct kvm_msr_list msr_list;
- unsigned int n;
-
- r = -EFAULT;
- if (copy_from_user(&msr_list, user_msr_list, sizeof(msr_list)))
- goto out;
- n = msr_list.nmsrs;
- msr_list.nmsrs = num_msr_based_features;
- if (copy_to_user(user_msr_list, &msr_list, sizeof(msr_list)))
- goto out;
- r = -E2BIG;
- if (n < msr_list.nmsrs)
- goto out;
- r = -EFAULT;
- if (copy_to_user(user_msr_list->indices, &msr_based_features,
- num_msr_based_features * sizeof(u32)))
- goto out;
- r = 0;
+ case KVM_GET_MSR_FEATURE_INDEX_LIST:
+ r = kvm_get_feature_msr_index_list(argp);
break;
- }
case KVM_GET_MSRS:
- r = msr_io(NULL, argp, do_get_feature_msr, 1);
+ r = kvm_get_feature_msrs(argp);
break;
#ifdef CONFIG_KVM_HYPERV
case KVM_GET_SUPPORTED_HV_CPUID:
@@ -5719,6 +5732,20 @@ static int kvm_get_reg_list(struct kvm_vcpu *vcpu,
return 0;
}
+static int kvm_get_msrs(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs)
+{
+ guard(srcu)(&vcpu->kvm->srcu);
+
+ return msr_io(vcpu, user_msrs, do_get_msr, 1);
+}
+
+static int kvm_set_msrs(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs)
+{
+ guard(srcu)(&vcpu->kvm->srcu);
+
+ return msr_io(vcpu, user_msrs, do_set_msr, 0);
+}
+
long kvm_arch_vcpu_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -5823,18 +5850,12 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = 0;
break;
}
- case KVM_GET_MSRS: {
- int idx = srcu_read_lock(&vcpu->kvm->srcu);
- r = msr_io(vcpu, argp, do_get_msr, 1);
- srcu_read_unlock(&vcpu->kvm->srcu, idx);
+ case KVM_GET_MSRS:
+ r = kvm_get_msrs(vcpu, argp);
break;
- }
- case KVM_SET_MSRS: {
- int idx = srcu_read_lock(&vcpu->kvm->srcu);
- r = msr_io(vcpu, argp, do_set_msr, 0);
- srcu_read_unlock(&vcpu->kvm->srcu, idx);
+ case KVM_SET_MSRS:
+ r = kvm_set_msrs(vcpu, argp);
break;
- }
case KVM_GET_ONE_REG:
case KVM_SET_ONE_REG:
r = kvm_get_set_one_reg(vcpu, ioctl, argp);
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* [PATCH v4 13/30] KVM: x86: Expose several TSC helpers via x86.h for use by MSR code
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (11 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 12/30] KVM: x86: Extract get/set MSR (list) ioctl logic to helpers Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-13 0:16 ` sashiko-bot
2026-06-13 0:03 ` [PATCH v4 14/30] KVM: x86: Move the bulk of MSR specific code from x86.c to msrs.{c,h} Sean Christopherson
` (16 subsequent siblings)
29 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Begrudgingly move adjust_tsc_offset_{guest,host}() to x86.h as inlines,
and expose several other TSC helpers in anticipation of moving KVM's MSR
code to a dedicated msrs.c. Unfortunately for KVM, several MSRs that KVM
emulates can affect TSC state.
Opportunistically drop a superfluous local "tsc_offset" variable, whose
existence causes checkpatch to complain about lack of a blank line.
No functional change intended.
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 22 +++-------------------
arch/x86/kvm/x86.h | 19 +++++++++++++++++++
2 files changed, 22 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ece8d53fb7fe..316ec7a57f7d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2225,7 +2225,7 @@ u64 kvm_scale_tsc(u64 tsc, u64 ratio)
return _tsc;
}
-static u64 kvm_compute_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
+u64 kvm_compute_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
{
u64 tsc;
@@ -2266,7 +2266,7 @@ u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier)
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_calc_nested_tsc_multiplier);
-static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset)
+void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset)
{
if (vcpu->arch.guest_tsc_protected)
return;
@@ -2380,7 +2380,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
kvm_track_tsc_matching(vcpu, !matched);
}
-static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 *user_value)
+void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 *user_value)
{
u64 data = user_value ? *user_value : 0;
struct kvm *kvm = vcpu->kvm;
@@ -2448,22 +2448,6 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 *user_value)
raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
}
-static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
- s64 adjustment)
-{
- u64 tsc_offset = vcpu->arch.l1_tsc_offset;
- kvm_vcpu_write_tsc_offset(vcpu, tsc_offset + adjustment);
-}
-
-static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
-{
- if (vcpu->arch.l1_tsc_scaling_ratio != kvm_caps.default_tsc_scaling_ratio)
- WARN_ON(adjustment < 0);
- adjustment = kvm_scale_tsc((u64) adjustment,
- vcpu->arch.l1_tsc_scaling_ratio);
- adjust_tsc_offset_guest(vcpu, adjustment);
-}
-
#ifdef CONFIG_X86_64
static u64 read_tsc(void)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 635a21bfa681..31e67b060148 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -345,6 +345,25 @@ uint64_t kvm_get_wall_clock_epoch(struct kvm *kvm);
bool kvm_get_monotonic_and_clockread(s64 *kernel_ns, u64 *tsc_timestamp);
int kvm_guest_time_update(struct kvm_vcpu *v);
+void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 *user_value);
+u64 kvm_compute_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc);
+void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset);
+
+static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
+ s64 adjustment)
+{
+ kvm_vcpu_write_tsc_offset(vcpu, vcpu->arch.l1_tsc_offset + adjustment);
+}
+
+static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
+{
+ if (vcpu->arch.l1_tsc_scaling_ratio != kvm_caps.default_tsc_scaling_ratio)
+ WARN_ON(adjustment < 0);
+ adjustment = kvm_scale_tsc((u64) adjustment,
+ vcpu->arch.l1_tsc_scaling_ratio);
+ adjust_tsc_offset_guest(vcpu, adjustment);
+}
+
int kvm_read_guest_virt(struct kvm_vcpu *vcpu,
gva_t addr, void *val, unsigned int bytes,
struct x86_exception *exception);
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* Re: [PATCH v4 13/30] KVM: x86: Expose several TSC helpers via x86.h for use by MSR code
2026-06-13 0:03 ` [PATCH v4 13/30] KVM: x86: Expose several TSC helpers via x86.h for use by MSR code Sean Christopherson
@ 2026-06-13 0:16 ` sashiko-bot
0 siblings, 0 replies; 64+ messages in thread
From: sashiko-bot @ 2026-06-13 0:16 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues:
- [High] Lockless iteration of `vm_list` in `kvm_arch_enable_virtualization_cpu` during CPU hotplug.
- [High] Lockless cross-CPU modification of `vcpu->arch.tsc_offset_adjustment`.
--
commit ab7eedcd233e127b5aac182b220a36fa043a7767
Author: Sean Christopherson <seanjc@google.com>
Date: Sat Jun 13 00:03:12 2026 +0000
KVM: x86: Expose several TSC helpers via x86.h for use by MSR code
Begrudgingly move adjust_tsc_offset_{guest,host}() to x86.h as inlines,
and expose several other TSC helpers in anticipation of moving KVM's MSR
code to a dedicated msrs.c. Unfortunately for KVM, several MSRs that KVM
emulates can affect TSC state.
Opportunistically drop a superfluous local "tsc_offset" variable, whose
existence causes checkpatch to complain about lack of a blank line.
No functional change intended.
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ece8d53fb7fe6..316ec7a57f7dd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
[ ... ]
> @@ -2448,22 +2448,6 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 *user_value)
> raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
> }
>
> -static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
> - s64 adjustment)
> -{
> - u64 tsc_offset = vcpu->arch.l1_tsc_offset;
> - kvm_vcpu_write_tsc_offset(vcpu, tsc_offset + adjustment);
> -}
> -
> -static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
> -{
> - if (vcpu->arch.l1_tsc_scaling_ratio != kvm_caps.default_tsc_scaling_ratio)
> - WARN_ON(adjustment < 0);
> - adjustment = kvm_scale_tsc((u64) adjustment,
> - vcpu->arch.l1_tsc_scaling_ratio);
> - adjust_tsc_offset_guest(vcpu, adjustment);
> -}
> -
> #ifdef CONFIG_X86_64
>
> static u64 read_tsc(void)
[Severity: High]
This isn't a bug introduced by this patch, but while reviewing these TSC
offset adjustments I noticed a race condition in how
vcpu->arch.tsc_offset_adjustment is updated during CPU hotplug.
In kvm_arch_enable_virtualization_cpu(), there is a lockless cross-CPU
modification of vcpu->arch.tsc_offset_adjustment:
arch/x86/kvm/x86.c:kvm_arch_enable_virtualization_cpu()
if (backwards_tsc) {
u64 delta_cyc = max_tsc - local_tsc;
list_for_each_entry(kvm, &vm_list, vm_list) {
kvm->arch.backwards_tsc_observed = true;
kvm_for_each_vcpu(i, vcpu, kvm) {
vcpu->arch.tsc_offset_adjustment += delta_cyc;
Since CPU hotplug only pauses the CPU being brought online, could the
targeted VCPU be concurrently executing kvm_arch_vcpu_load() on another CPU?
If kvm_arch_vcpu_load() reads and clears this field concurrently without
vcpu->mutex, could the adjustment be lost or applied twice and lead to
incorrect guest TSC values?
[Severity: High]
This is a pre-existing issue, but I also noticed that the iteration over
vm_list in kvm_arch_enable_virtualization_cpu() is lockless:
arch/x86/kvm/x86.c:kvm_arch_enable_virtualization_cpu()
list_for_each_entry(kvm, &vm_list, vm_list) {
kvm_for_each_vcpu(i, vcpu, kvm) {
If a user concurrently creates or destroys a VM (which modifies vm_list
under kvm_lock), couldn't this lockless list_for_each_entry() dereference
poisoned pointers from list_del() during destruction, leading to a kernel
panic?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260613000329.732085-1-seanjc@google.com?part=13
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v4 14/30] KVM: x86: Move the bulk of MSR specific code from x86.c to msrs.{c,h}
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (12 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 13/30] KVM: x86: Expose several TSC helpers via x86.h for use by MSR code Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-15 9:30 ` Binbin Wu
2026-06-13 0:03 ` [PATCH v4 15/30] KVM: x86: Move register helper declarations from kvm_host.h => regs.h Sean Christopherson
` (15 subsequent siblings)
29 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Introduce msrs.{c,h}, and move the vast majority of MSR specific code out
of x86.{c,h}. Use a plural "msrs" instead of just "msr" to be consistent
with regs.{c,h}, and to make it easier to differentiate KVM's code from the
other 5+ msr.c files in the kernel.
Opportunistically drop the "x86.h" include from mtrr.c, mostly as proof
that the bulk of the MSR code is indeed being relocated to msrs.c.
No functional change intended.
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/Makefile | 2 +-
arch/x86/kvm/msrs.c | 2732 +++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/msrs.h | 128 ++
arch/x86/kvm/mtrr.c | 2 +-
arch/x86/kvm/x86.c | 2710 +---------------------------------------
arch/x86/kvm/x86.h | 87 +-
6 files changed, 2867 insertions(+), 2794 deletions(-)
create mode 100644 arch/x86/kvm/msrs.c
create mode 100644 arch/x86/kvm/msrs.h
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f39c311fd756..0474604ab8a1 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -5,7 +5,7 @@ ccflags-$(CONFIG_KVM_WERROR) += -Werror
include $(srctree)/virt/kvm/Makefile.kvm
-kvm-y += x86.o emulate.o irq.o lapic.o cpuid.o pmu.o regs.o \
+kvm-y += x86.o emulate.o irq.o lapic.o cpuid.o msrs.o pmu.o regs.o \
mtrr.o debugfs.o mmu/mmu.o mmu/page_track.o mmu/spte.o
kvm-$(CONFIG_X86_64) += mmu/tdp_iter.o mmu/tdp_mmu.o
diff --git a/arch/x86/kvm/msrs.c b/arch/x86/kvm/msrs.c
new file mode 100644
index 000000000000..07f2a22d2607
--- /dev/null
+++ b/arch/x86/kvm/msrs.c
@@ -0,0 +1,2732 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/kvm_host.h>
+#include <asm/intel_pt.h>
+#include <asm/vmx.h>
+
+#include "hyperv.h"
+#include "lapic.h"
+#include "msrs.h"
+#include "pmu.h"
+#include "trace.h"
+#include "vmx/vmx.h"
+#include "xen.h"
+#include "x86.h"
+
+bool __read_mostly ignore_msrs = 0;
+module_param(ignore_msrs, bool, 0644);
+
+bool __read_mostly report_ignored_msrs = true;
+module_param(report_ignored_msrs, bool, 0644);
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(report_ignored_msrs);
+
+/* EFER defaults:
+ * - enable syscall per default because its emulated by KVM
+ * - enable LME and LMA per default on 64 bit KVM
+ */
+#ifdef CONFIG_X86_64
+static
+u64 __read_mostly efer_reserved_bits = ~((u64)(EFER_SCE | EFER_LME | EFER_LMA));
+#else
+static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
+#endif
+
+#define MAX_IO_MSRS 256
+
+/*
+ * Restoring the host value for MSRs that are only consumed when running in
+ * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
+ * returns to userspace, i.e. the kernel can run with the guest's value.
+ */
+#define KVM_MAX_NR_USER_RETURN_MSRS 16
+
+struct kvm_user_return_msrs {
+ struct user_return_notifier urn;
+ bool registered;
+ struct kvm_user_return_msr_values {
+ u64 host;
+ u64 curr;
+ } values[KVM_MAX_NR_USER_RETURN_MSRS];
+};
+
+u32 __read_mostly kvm_nr_uret_msrs;
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_nr_uret_msrs);
+static u32 __read_mostly kvm_uret_msrs_list[KVM_MAX_NR_USER_RETURN_MSRS];
+static DEFINE_PER_CPU(struct kvm_user_return_msrs, user_return_msrs);
+
+void kvm_destroy_user_return_msrs(void)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu)
+ WARN_ON_ONCE(per_cpu(user_return_msrs, cpu).registered);
+
+ kvm_nr_uret_msrs = 0;
+}
+
+static void kvm_on_user_return(struct user_return_notifier *urn)
+{
+ unsigned slot;
+ struct kvm_user_return_msrs *msrs
+ = container_of(urn, struct kvm_user_return_msrs, urn);
+ struct kvm_user_return_msr_values *values;
+
+ msrs->registered = false;
+ user_return_notifier_unregister(urn);
+
+ for (slot = 0; slot < kvm_nr_uret_msrs; ++slot) {
+ values = &msrs->values[slot];
+ if (values->host != values->curr) {
+ wrmsrq(kvm_uret_msrs_list[slot], values->host);
+ values->curr = values->host;
+ }
+ }
+}
+
+static int kvm_probe_user_return_msr(u32 msr)
+{
+ u64 val;
+ int ret;
+
+ preempt_disable();
+ ret = rdmsrq_safe(msr, &val);
+ if (ret)
+ goto out;
+ ret = wrmsrq_safe(msr, val);
+out:
+ preempt_enable();
+ return ret;
+}
+
+int kvm_add_user_return_msr(u32 msr)
+{
+ BUG_ON(kvm_nr_uret_msrs >= KVM_MAX_NR_USER_RETURN_MSRS);
+
+ if (kvm_probe_user_return_msr(msr))
+ return -1;
+
+ kvm_uret_msrs_list[kvm_nr_uret_msrs] = msr;
+ return kvm_nr_uret_msrs++;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_add_user_return_msr);
+
+int kvm_find_user_return_msr(u32 msr)
+{
+ int i;
+
+ for (i = 0; i < kvm_nr_uret_msrs; ++i) {
+ if (kvm_uret_msrs_list[i] == msr)
+ return i;
+ }
+ return -1;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_find_user_return_msr);
+
+void kvm_user_return_msr_cpu_online(void)
+{
+ struct kvm_user_return_msrs *msrs = this_cpu_ptr(&user_return_msrs);
+ u64 value;
+ int i;
+
+ for (i = 0; i < kvm_nr_uret_msrs; ++i) {
+ rdmsrq_safe(kvm_uret_msrs_list[i], &value);
+ msrs->values[i].host = value;
+ msrs->values[i].curr = value;
+ }
+}
+
+static void kvm_user_return_register_notifier(struct kvm_user_return_msrs *msrs)
+{
+ if (!msrs->registered) {
+ msrs->urn.on_user_return = kvm_on_user_return;
+ user_return_notifier_register(&msrs->urn);
+ msrs->registered = true;
+ }
+}
+
+int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
+{
+ struct kvm_user_return_msrs *msrs = this_cpu_ptr(&user_return_msrs);
+ int err;
+
+ value = (value & mask) | (msrs->values[slot].host & ~mask);
+ if (value == msrs->values[slot].curr)
+ return 0;
+ err = wrmsrq_safe(kvm_uret_msrs_list[slot], value);
+ if (err)
+ return 1;
+
+ msrs->values[slot].curr = value;
+ kvm_user_return_register_notifier(msrs);
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_user_return_msr);
+
+u64 kvm_get_user_return_msr(unsigned int slot)
+{
+ return this_cpu_ptr(&user_return_msrs)->values[slot].curr;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_user_return_msr);
+
+void drop_user_return_notifiers(void)
+{
+ struct kvm_user_return_msrs *msrs = this_cpu_ptr(&user_return_msrs);
+
+ if (msrs->registered)
+ kvm_on_user_return(&msrs->urn);
+}
+
+/*
+ * The three MSR lists(msrs_to_save, emulated_msrs, msr_based_features) track
+ * the set of MSRs that KVM exposes to userspace through KVM_GET_MSRS,
+ * KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. msrs_to_save holds MSRs that
+ * require host support, i.e. should be probed via RDMSR. emulated_msrs holds
+ * MSRs that KVM emulates without strictly requiring host support.
+ * msr_based_features holds MSRs that enumerate features, i.e. are effectively
+ * CPUID leafs. Note, msr_based_features isn't mutually exclusive with
+ * msrs_to_save and emulated_msrs.
+ */
+
+static const u32 msrs_to_save_base[] = {
+ MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
+ MSR_STAR,
+#ifdef CONFIG_X86_64
+ MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
+#endif
+ MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
+ MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+ MSR_IA32_SPEC_CTRL, MSR_IA32_TSX_CTRL,
+ MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
+ MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
+ MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
+ MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
+ MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
+ MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
+ MSR_IA32_UMWAIT_CONTROL,
+
+ MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS,
+
+ MSR_IA32_U_CET, MSR_IA32_S_CET,
+ MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
+ MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB,
+ MSR_IA32_DEBUGCTLMSR,
+ MSR_IA32_LASTBRANCHFROMIP, MSR_IA32_LASTBRANCHTOIP,
+ MSR_IA32_LASTINTFROMIP, MSR_IA32_LASTINTTOIP,
+};
+
+static const u32 msrs_to_save_pmu[] = {
+ MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
+ MSR_ARCH_PERFMON_FIXED_CTR0 + 2,
+ MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
+ MSR_CORE_PERF_GLOBAL_CTRL,
+ MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
+
+ /* This part of MSRs should match KVM_MAX_NR_INTEL_GP_COUNTERS. */
+ MSR_ARCH_PERFMON_PERFCTR0, MSR_ARCH_PERFMON_PERFCTR1,
+ MSR_ARCH_PERFMON_PERFCTR0 + 2, MSR_ARCH_PERFMON_PERFCTR0 + 3,
+ MSR_ARCH_PERFMON_PERFCTR0 + 4, MSR_ARCH_PERFMON_PERFCTR0 + 5,
+ MSR_ARCH_PERFMON_PERFCTR0 + 6, MSR_ARCH_PERFMON_PERFCTR0 + 7,
+ MSR_ARCH_PERFMON_EVENTSEL0, MSR_ARCH_PERFMON_EVENTSEL1,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 2, MSR_ARCH_PERFMON_EVENTSEL0 + 3,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 4, MSR_ARCH_PERFMON_EVENTSEL0 + 5,
+ MSR_ARCH_PERFMON_EVENTSEL0 + 6, MSR_ARCH_PERFMON_EVENTSEL0 + 7,
+
+ MSR_K7_EVNTSEL0, MSR_K7_EVNTSEL1, MSR_K7_EVNTSEL2, MSR_K7_EVNTSEL3,
+ MSR_K7_PERFCTR0, MSR_K7_PERFCTR1, MSR_K7_PERFCTR2, MSR_K7_PERFCTR3,
+
+ /* This part of MSRs should match KVM_MAX_NR_AMD_GP_COUNTERS. */
+ MSR_F15H_PERF_CTL0, MSR_F15H_PERF_CTL1, MSR_F15H_PERF_CTL2,
+ MSR_F15H_PERF_CTL3, MSR_F15H_PERF_CTL4, MSR_F15H_PERF_CTL5,
+ MSR_F15H_PERF_CTR0, MSR_F15H_PERF_CTR1, MSR_F15H_PERF_CTR2,
+ MSR_F15H_PERF_CTR3, MSR_F15H_PERF_CTR4, MSR_F15H_PERF_CTR5,
+
+ MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
+ MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
+ MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
+ MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET,
+};
+
+static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_base) +
+ ARRAY_SIZE(msrs_to_save_pmu)];
+static unsigned num_msrs_to_save;
+
+static const u32 emulated_msrs_all[] = {
+ MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
+ MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
+
+#ifdef CONFIG_KVM_HYPERV
+ HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
+ HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_REFERENCE_TSC,
+ HV_X64_MSR_TSC_FREQUENCY, HV_X64_MSR_APIC_FREQUENCY,
+ HV_X64_MSR_CRASH_P0, HV_X64_MSR_CRASH_P1, HV_X64_MSR_CRASH_P2,
+ HV_X64_MSR_CRASH_P3, HV_X64_MSR_CRASH_P4, HV_X64_MSR_CRASH_CTL,
+ HV_X64_MSR_RESET,
+ HV_X64_MSR_VP_INDEX,
+ HV_X64_MSR_VP_RUNTIME,
+ HV_X64_MSR_SCONTROL,
+ HV_X64_MSR_STIMER0_CONFIG,
+ HV_X64_MSR_VP_ASSIST_PAGE,
+ HV_X64_MSR_REENLIGHTENMENT_CONTROL, HV_X64_MSR_TSC_EMULATION_CONTROL,
+ HV_X64_MSR_TSC_EMULATION_STATUS, HV_X64_MSR_TSC_INVARIANT_CONTROL,
+ HV_X64_MSR_SYNDBG_OPTIONS,
+ HV_X64_MSR_SYNDBG_CONTROL, HV_X64_MSR_SYNDBG_STATUS,
+ HV_X64_MSR_SYNDBG_SEND_BUFFER, HV_X64_MSR_SYNDBG_RECV_BUFFER,
+ HV_X64_MSR_SYNDBG_PENDING_BUFFER,
+#endif
+
+ MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
+ MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF_INT, MSR_KVM_ASYNC_PF_ACK,
+
+ MSR_IA32_TSC_ADJUST,
+ MSR_IA32_TSC_DEADLINE,
+ MSR_IA32_ARCH_CAPABILITIES,
+ MSR_IA32_PERF_CAPABILITIES,
+ MSR_IA32_MISC_ENABLE,
+ MSR_IA32_MCG_STATUS,
+ MSR_IA32_MCG_CTL,
+ MSR_IA32_MCG_EXT_CTL,
+ MSR_IA32_SMBASE,
+ MSR_SMI_COUNT,
+ MSR_PLATFORM_INFO,
+ MSR_MISC_FEATURES_ENABLES,
+ MSR_AMD64_VIRT_SPEC_CTRL,
+ MSR_AMD64_TSC_RATIO,
+ MSR_IA32_POWER_CTL,
+ MSR_IA32_UCODE_REV,
+
+ /*
+ * KVM always supports the "true" VMX control MSRs, even if the host
+ * does not. The VMX MSRs as a whole are considered "emulated" as KVM
+ * doesn't strictly require them to exist in the host (ignoring that
+ * KVM would refuse to load in the first place if the core set of MSRs
+ * aren't supported).
+ */
+ MSR_IA32_VMX_BASIC,
+ MSR_IA32_VMX_TRUE_PINBASED_CTLS,
+ MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
+ MSR_IA32_VMX_TRUE_EXIT_CTLS,
+ MSR_IA32_VMX_TRUE_ENTRY_CTLS,
+ MSR_IA32_VMX_MISC,
+ MSR_IA32_VMX_CR0_FIXED0,
+ MSR_IA32_VMX_CR4_FIXED0,
+ MSR_IA32_VMX_VMCS_ENUM,
+ MSR_IA32_VMX_PROCBASED_CTLS2,
+ MSR_IA32_VMX_EPT_VPID_CAP,
+ MSR_IA32_VMX_VMFUNC,
+
+ MSR_K7_HWCR,
+ MSR_KVM_POLL_CONTROL,
+};
+
+static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
+static unsigned num_emulated_msrs;
+
+/*
+ * List of MSRs that control the existence of MSR-based features, i.e. MSRs
+ * that are effectively CPUID leafs. VMX MSRs are also included in the set of
+ * feature MSRs, but are handled separately to allow expedited lookups.
+ */
+static const u32 msr_based_features_all_except_vmx[] = {
+ MSR_AMD64_DE_CFG,
+ MSR_IA32_UCODE_REV,
+ MSR_IA32_ARCH_CAPABILITIES,
+ MSR_IA32_PERF_CAPABILITIES,
+ MSR_PLATFORM_INFO,
+};
+
+static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all_except_vmx) +
+ (KVM_LAST_EMULATED_VMX_MSR - KVM_FIRST_EMULATED_VMX_MSR + 1)];
+static unsigned int num_msr_based_features;
+
+int kvm_get_msr_index_list(struct kvm_msr_list __user *user_msr_list)
+{
+ struct kvm_msr_list msr_list;
+ unsigned int n;
+
+ if (copy_from_user(&msr_list, user_msr_list, sizeof(msr_list)))
+ return -EFAULT;
+
+ n = msr_list.nmsrs;
+ msr_list.nmsrs = num_msrs_to_save + num_emulated_msrs;
+ if (copy_to_user(user_msr_list, &msr_list, sizeof(msr_list)))
+ return -EFAULT;
+
+ if (n < msr_list.nmsrs)
+ return -E2BIG;
+
+ if (copy_to_user(user_msr_list->indices, &msrs_to_save,
+ num_msrs_to_save * sizeof(u32)))
+ return -EFAULT;
+
+ if (copy_to_user(user_msr_list->indices + num_msrs_to_save,
+ &emulated_msrs, num_emulated_msrs * sizeof(u32)))
+ return -EFAULT;
+
+ return 0;
+}
+
+int kvm_get_feature_msr_index_list(struct kvm_msr_list __user *user_msr_list)
+{
+ struct kvm_msr_list msr_list;
+ unsigned int n;
+
+ if (copy_from_user(&msr_list, user_msr_list, sizeof(msr_list)))
+ return -EFAULT;
+
+ n = msr_list.nmsrs;
+ msr_list.nmsrs = num_msr_based_features;
+ if (copy_to_user(user_msr_list, &msr_list, sizeof(msr_list)))
+ return -EFAULT;
+
+ if (n < msr_list.nmsrs)
+ return -E2BIG;
+
+ if (copy_to_user(user_msr_list->indices, &msr_based_features,
+ num_msr_based_features * sizeof(u32)))
+ return -EFAULT;
+
+ return 0;
+}
+
+/*
+ * All feature MSRs except uCode revID, which tracks the currently loaded uCode
+ * patch, are immutable once the vCPU model is defined.
+ */
+static bool kvm_is_immutable_feature_msr(u32 msr)
+{
+ int i;
+
+ if (msr >= KVM_FIRST_EMULATED_VMX_MSR && msr <= KVM_LAST_EMULATED_VMX_MSR)
+ return true;
+
+ for (i = 0; i < ARRAY_SIZE(msr_based_features_all_except_vmx); i++) {
+ if (msr == msr_based_features_all_except_vmx[i])
+ return msr != MSR_IA32_UCODE_REV;
+ }
+
+ return false;
+}
+
+static bool kvm_is_advertised_msr(u32 msr_index)
+{
+ unsigned int i;
+
+ for (i = 0; i < num_msrs_to_save; i++) {
+ if (msrs_to_save[i] == msr_index)
+ return true;
+ }
+
+ for (i = 0; i < num_emulated_msrs; i++) {
+ if (emulated_msrs[i] == msr_index)
+ return true;
+ }
+
+ return false;
+}
+
+
+/*
+ * Some IA32_ARCH_CAPABILITIES bits have dependencies on MSRs that KVM
+ * does not yet virtualize. These include:
+ * 10 - MISC_PACKAGE_CTRLS
+ * 11 - ENERGY_FILTERING_CTL
+ * 12 - DOITM
+ * 18 - FB_CLEAR_CTRL
+ * 21 - XAPIC_DISABLE_STATUS
+ * 23 - OVERCLOCKING_STATUS
+ */
+
+#define KVM_SUPPORTED_ARCH_CAP \
+ (ARCH_CAP_RDCL_NO | ARCH_CAP_IBRS_ALL | ARCH_CAP_RSBA | \
+ ARCH_CAP_SKIP_VMENTRY_L1DFLUSH | ARCH_CAP_SSB_NO | ARCH_CAP_MDS_NO | \
+ ARCH_CAP_PSCHANGE_MC_NO | ARCH_CAP_TSX_CTRL_MSR | ARCH_CAP_TAA_NO | \
+ ARCH_CAP_SBDR_SSDP_NO | ARCH_CAP_FBSDP_NO | ARCH_CAP_PSDP_NO | \
+ ARCH_CAP_FB_CLEAR | ARCH_CAP_RRSBA | ARCH_CAP_PBRSB_NO | ARCH_CAP_GDS_NO | \
+ ARCH_CAP_RFDS_NO | ARCH_CAP_RFDS_CLEAR | ARCH_CAP_BHI_NO | ARCH_CAP_ITS_NO)
+
+u64 kvm_get_arch_capabilities(void)
+{
+ u64 data = kvm_host.arch_capabilities & KVM_SUPPORTED_ARCH_CAP;
+
+ /*
+ * If nx_huge_pages is enabled, KVM's shadow paging will ensure that
+ * the nested hypervisor runs with NX huge pages. If it is not,
+ * L1 is anyway vulnerable to ITLB_MULTIHIT exploits from other
+ * L1 guests, so it need not worry about its own (L2) guests.
+ */
+ data |= ARCH_CAP_PSCHANGE_MC_NO;
+
+ /*
+ * If we're doing cache flushes (either "always" or "cond")
+ * we will do one whenever the guest does a vmlaunch/vmresume.
+ * If an outer hypervisor is doing the cache flush for us
+ * (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that
+ * capability to the guest too, and if EPT is disabled we're not
+ * vulnerable. Overall, only VMENTER_L1D_FLUSH_NEVER will
+ * require a nested hypervisor to do a flush of its own.
+ */
+ if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER)
+ data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
+
+ if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
+ data |= ARCH_CAP_RDCL_NO;
+ if (!boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS))
+ data |= ARCH_CAP_SSB_NO;
+ if (!boot_cpu_has_bug(X86_BUG_MDS))
+ data |= ARCH_CAP_MDS_NO;
+ if (!boot_cpu_has_bug(X86_BUG_RFDS))
+ data |= ARCH_CAP_RFDS_NO;
+ if (!boot_cpu_has_bug(X86_BUG_ITS))
+ data |= ARCH_CAP_ITS_NO;
+
+ if (!boot_cpu_has(X86_FEATURE_RTM)) {
+ /*
+ * If RTM=0 because the kernel has disabled TSX, the host might
+ * have TAA_NO or TSX_CTRL. Clear TAA_NO (the guest sees RTM=0
+ * and therefore knows that there cannot be TAA) but keep
+ * TSX_CTRL: some buggy userspaces leave it set on tsx=on hosts,
+ * and we want to allow migrating those guests to tsx=off hosts.
+ */
+ data &= ~ARCH_CAP_TAA_NO;
+ } else if (!boot_cpu_has_bug(X86_BUG_TAA)) {
+ data |= ARCH_CAP_TAA_NO;
+ } else {
+ /*
+ * Nothing to do here; we emulate TSX_CTRL if present on the
+ * host so the guest can choose between disabling TSX or
+ * using VERW to clear CPU buffers.
+ */
+ }
+
+ if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
+ data |= ARCH_CAP_GDS_NO;
+
+ return data;
+}
+
+static int kvm_get_feature_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
+ bool host_initiated)
+{
+ WARN_ON_ONCE(!host_initiated);
+
+ switch (index) {
+ case MSR_IA32_ARCH_CAPABILITIES:
+ *data = kvm_get_arch_capabilities();
+ break;
+ case MSR_IA32_PERF_CAPABILITIES:
+ *data = kvm_caps.supported_perf_cap;
+ break;
+ case MSR_PLATFORM_INFO:
+ *data = MSR_PLATFORM_INFO_CPUID_FAULT;
+ break;
+ case MSR_IA32_UCODE_REV:
+ rdmsrq_safe(index, data);
+ break;
+ default:
+ return kvm_x86_call(get_feature_msr)(index, data);
+ }
+ return 0;
+}
+
+typedef int (*msr_access_t)(struct kvm_vcpu *vcpu, u32 index, u64 *data,
+ bool host_initiated);
+
+static __always_inline int kvm_do_msr_access(struct kvm_vcpu *vcpu, u32 msr,
+ u64 *data, bool host_initiated,
+ enum kvm_msr_access rw,
+ msr_access_t msr_access_fn)
+{
+ const char *op = rw == MSR_TYPE_W ? "wrmsr" : "rdmsr";
+ int ret;
+
+ BUILD_BUG_ON(rw != MSR_TYPE_R && rw != MSR_TYPE_W);
+
+ /*
+ * Zero the data on read failures to avoid leaking stack data to the
+ * guest and/or userspace, e.g. if the failure is ignored below.
+ */
+ ret = msr_access_fn(vcpu, msr, data, host_initiated);
+ if (ret && rw == MSR_TYPE_R)
+ *data = 0;
+
+ if (ret != KVM_MSR_RET_UNSUPPORTED)
+ return ret;
+
+ /*
+ * Userspace is allowed to read MSRs, and write '0' to MSRs, that KVM
+ * advertises to userspace, even if an MSR isn't fully supported.
+ * Simply check that @data is '0', which covers both the write '0' case
+ * and all reads (in which case @data is zeroed on failure; see above).
+ */
+ if (host_initiated && !*data && kvm_is_advertised_msr(msr))
+ return 0;
+
+ if (!ignore_msrs) {
+ kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n",
+ op, msr, *data);
+ return ret;
+ }
+
+ if (report_ignored_msrs)
+ kvm_pr_unimpl("ignored %s: 0x%x data 0x%llx\n", op, msr, *data);
+
+ return 0;
+}
+
+static int do_get_feature_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
+{
+ return kvm_do_msr_access(vcpu, index, data, true, MSR_TYPE_R,
+ kvm_get_feature_msr);
+}
+
+static bool __kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
+{
+ if (efer & EFER_AUTOIBRS && !guest_cpu_cap_has(vcpu, X86_FEATURE_AUTOIBRS))
+ return false;
+
+ if (efer & EFER_FFXSR && !guest_cpu_cap_has(vcpu, X86_FEATURE_FXSR_OPT))
+ return false;
+
+ if (efer & EFER_SVME && !guest_cpu_cap_has(vcpu, X86_FEATURE_SVM))
+ return false;
+
+ if (efer & (EFER_LME | EFER_LMA) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
+ return false;
+
+ if (efer & EFER_NX && !guest_cpu_cap_has(vcpu, X86_FEATURE_NX))
+ return false;
+
+ return true;
+
+}
+bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
+{
+ if (efer & efer_reserved_bits)
+ return false;
+
+ return __kvm_valid_efer(vcpu, efer);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_valid_efer);
+
+static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+ u64 old_efer = vcpu->arch.efer;
+ u64 efer = msr_info->data;
+ int r;
+
+ if (efer & efer_reserved_bits)
+ return 1;
+
+ if (!msr_info->host_initiated) {
+ if (!__kvm_valid_efer(vcpu, efer))
+ return 1;
+
+ if (is_paging(vcpu) &&
+ (vcpu->arch.efer & EFER_LME) != (efer & EFER_LME))
+ return 1;
+ }
+
+ efer &= ~EFER_LMA;
+ efer |= vcpu->arch.efer & EFER_LMA;
+
+ r = kvm_x86_call(set_efer)(vcpu, efer);
+ if (r) {
+ WARN_ON(r > 0);
+ return r;
+ }
+
+ if ((efer ^ old_efer) & KVM_MMU_EFER_ROLE_BITS)
+ kvm_mmu_reset_context(vcpu);
+
+ if (!static_cpu_has(X86_FEATURE_XSAVES) &&
+ (efer & EFER_SVME))
+ kvm_hv_xsaves_xsavec_maybe_warn(vcpu);
+
+ return 0;
+}
+
+void kvm_enable_efer_bits(u64 mask)
+{
+ efer_reserved_bits &= ~mask;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_enable_efer_bits);
+
+bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
+{
+ struct kvm_x86_msr_filter *msr_filter;
+ struct msr_bitmap_range *ranges;
+ struct kvm *kvm = vcpu->kvm;
+ bool allowed;
+ int idx;
+ u32 i;
+
+ /* x2APIC MSRs do not support filtering. */
+ if (index >= 0x800 && index <= 0x8ff)
+ return true;
+
+ idx = srcu_read_lock(&kvm->srcu);
+
+ msr_filter = srcu_dereference(kvm->arch.msr_filter, &kvm->srcu);
+ if (!msr_filter) {
+ allowed = true;
+ goto out;
+ }
+
+ allowed = msr_filter->default_allow;
+ ranges = msr_filter->ranges;
+
+ for (i = 0; i < msr_filter->count; i++) {
+ u32 start = ranges[i].base;
+ u32 end = start + ranges[i].nmsrs;
+ u32 flags = ranges[i].flags;
+ unsigned long *bitmap = ranges[i].bitmap;
+
+ if ((index >= start) && (index < end) && (flags & type)) {
+ allowed = test_bit(index - start, bitmap);
+ break;
+ }
+ }
+
+out:
+ srcu_read_unlock(&kvm->srcu, idx);
+
+ return allowed;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_msr_allowed);
+
+/*
+ * Write @data into the MSR specified by @index. Select MSR specific fault
+ * checks are bypassed if @host_initiated is %true.
+ * Returns 0 on success, non-0 otherwise.
+ * Assumes vcpu_load() was already called.
+ */
+static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
+ bool host_initiated)
+{
+ struct msr_data msr;
+
+ switch (index) {
+ case MSR_FS_BASE:
+ case MSR_GS_BASE:
+ case MSR_KERNEL_GS_BASE:
+ case MSR_CSTAR:
+ case MSR_LSTAR:
+ if (is_noncanonical_msr_address(data, vcpu))
+ return 1;
+ break;
+ case MSR_IA32_SYSENTER_EIP:
+ case MSR_IA32_SYSENTER_ESP:
+ /*
+ * IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
+ * non-canonical address is written on Intel but not on
+ * AMD (which ignores the top 32-bits, because it does
+ * not implement 64-bit SYSENTER).
+ *
+ * 64-bit code should hence be able to write a non-canonical
+ * value on AMD. Making the address canonical ensures that
+ * vmentry does not fail on Intel after writing a non-canonical
+ * value, and that something deterministic happens if the guest
+ * invokes 64-bit SYSENTER.
+ */
+ data = __canonical_address(data, max_host_virt_addr_bits());
+ break;
+ case MSR_TSC_AUX:
+ if (!kvm_is_supported_user_return_msr(MSR_TSC_AUX))
+ return 1;
+
+ if (!host_initiated &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
+ return 1;
+
+ /*
+ * Per Intel's SDM, bits 63:32 are reserved, but AMD's APM has
+ * incomplete and conflicting architectural behavior. Current
+ * AMD CPUs completely ignore bits 63:32, i.e. they aren't
+ * reserved and always read as zeros. Enforce Intel's reserved
+ * bits check if the guest CPU is Intel compatible, otherwise
+ * clear the bits. This ensures cross-vendor migration will
+ * provide consistent behavior for the guest.
+ */
+ if (guest_cpuid_is_intel_compatible(vcpu) && (data >> 32) != 0)
+ return 1;
+
+ data = (u32)data;
+ break;
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
+ return KVM_MSR_RET_UNSUPPORTED;
+ if (!kvm_is_valid_u_s_cet(vcpu, data))
+ return 1;
+ break;
+ case MSR_KVM_INTERNAL_GUEST_SSP:
+ if (!host_initiated)
+ return 1;
+ fallthrough;
+ /*
+ * Note that the MSR emulation here is flawed when a vCPU
+ * doesn't support the Intel 64 architecture. The expected
+ * architectural behavior in this case is that the upper 32
+ * bits do not exist and should always read '0'. However,
+ * because the actual hardware on which the virtual CPU is
+ * running does support Intel 64, XRSTORS/XSAVES in the
+ * guest could observe behavior that violates the
+ * architecture. Intercepting XRSTORS/XSAVES for this
+ * special case isn't deemed worthwhile.
+ */
+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+ return KVM_MSR_RET_UNSUPPORTED;
+ /*
+ * MSR_IA32_INT_SSP_TAB is not present on processors that do
+ * not support Intel 64 architecture.
+ */
+ if (index == MSR_IA32_INT_SSP_TAB && !guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
+ return KVM_MSR_RET_UNSUPPORTED;
+ if (is_noncanonical_msr_address(data, vcpu))
+ return 1;
+ /* All SSP MSRs except MSR_IA32_INT_SSP_TAB must be 4-byte aligned */
+ if (index != MSR_IA32_INT_SSP_TAB && !IS_ALIGNED(data, 4))
+ return 1;
+ break;
+ }
+
+ msr.data = data;
+ msr.index = index;
+ msr.host_initiated = host_initiated;
+
+ return kvm_x86_call(set_msr)(vcpu, &msr);
+}
+
+static int _kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
+ bool host_initiated)
+{
+ return __kvm_set_msr(vcpu, index, *data, host_initiated);
+}
+
+static int kvm_set_msr_ignored_check(struct kvm_vcpu *vcpu,
+ u32 index, u64 data, bool host_initiated)
+{
+ return kvm_do_msr_access(vcpu, index, &data, host_initiated, MSR_TYPE_W,
+ _kvm_set_msr);
+}
+
+/*
+ * Read the MSR specified by @index into @data. Select MSR specific fault
+ * checks are bypassed if @host_initiated is %true.
+ * Returns 0 on success, non-0 otherwise.
+ * Assumes vcpu_load() was already called.
+ */
+static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
+ bool host_initiated)
+{
+ struct msr_data msr;
+ int ret;
+
+ switch (index) {
+ case MSR_TSC_AUX:
+ if (!kvm_is_supported_user_return_msr(MSR_TSC_AUX))
+ return 1;
+
+ if (!host_initiated &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
+ return 1;
+ break;
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
+ return KVM_MSR_RET_UNSUPPORTED;
+ break;
+ case MSR_KVM_INTERNAL_GUEST_SSP:
+ if (!host_initiated)
+ return 1;
+ fallthrough;
+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+ return KVM_MSR_RET_UNSUPPORTED;
+ break;
+ }
+
+ msr.index = index;
+ msr.host_initiated = host_initiated;
+
+ ret = kvm_x86_call(get_msr)(vcpu, &msr);
+ if (!ret)
+ *data = msr.data;
+ return ret;
+}
+
+static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu,
+ u32 index, u64 *data, bool host_initiated)
+{
+ return kvm_do_msr_access(vcpu, index, data, host_initiated, MSR_TYPE_R,
+ __kvm_get_msr);
+}
+
+int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data)
+{
+ return __kvm_set_msr(vcpu, index, data, true);
+}
+
+int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
+{
+ return __kvm_get_msr(vcpu, index, data, true);
+}
+
+int __kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
+{
+ return kvm_get_msr_ignored_check(vcpu, index, data, false);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_emulate_msr_read);
+
+int __kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data)
+{
+ return kvm_set_msr_ignored_check(vcpu, index, data, false);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_emulate_msr_write);
+
+int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
+{
+ if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ))
+ return KVM_MSR_RET_FILTERED;
+
+ return __kvm_emulate_msr_read(vcpu, index, data);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_msr_read);
+
+int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data)
+{
+ if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE))
+ return KVM_MSR_RET_FILTERED;
+
+ return __kvm_emulate_msr_write(vcpu, index, data);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_msr_write);
+
+static fastpath_t __handle_fastpath_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
+{
+ if (!kvm_pmu_is_fastpath_emulation_allowed(vcpu))
+ return EXIT_FASTPATH_NONE;
+
+ switch (msr) {
+ case APIC_BASE_MSR + (APIC_ICR >> 4):
+ if (!lapic_in_kernel(vcpu) || !apic_x2apic_mode(vcpu->arch.apic) ||
+ kvm_x2apic_icr_write_fast(vcpu->arch.apic, data))
+ return EXIT_FASTPATH_NONE;
+ break;
+ case MSR_IA32_TSC_DEADLINE:
+ kvm_set_lapic_tscdeadline_msr(vcpu, data);
+ break;
+ default:
+ return EXIT_FASTPATH_NONE;
+ }
+
+ trace_kvm_msr_write(msr, data);
+
+ if (!kvm_skip_emulated_instruction(vcpu))
+ return EXIT_FASTPATH_EXIT_USERSPACE;
+
+ return EXIT_FASTPATH_REENTER_GUEST;
+}
+
+fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu)
+{
+ return __handle_fastpath_wrmsr(vcpu, kvm_ecx_read(vcpu),
+ kvm_read_edx_eax(vcpu));
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_wrmsr);
+
+fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg)
+{
+ return __handle_fastpath_wrmsr(vcpu, msr, kvm_register_read(vcpu, reg));
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_wrmsr_imm);
+
+static void complete_userspace_rdmsr(struct kvm_vcpu *vcpu)
+{
+ if (!vcpu->run->msr.error) {
+ kvm_eax_write(vcpu, vcpu->run->msr.data);
+ kvm_edx_write(vcpu, vcpu->run->msr.data >> 32);
+ }
+}
+
+static int complete_emulated_insn_gp(struct kvm_vcpu *vcpu, int err)
+{
+ if (err) {
+ kvm_inject_gp(vcpu, 0);
+ return 1;
+ }
+
+ return kvm_emulate_instruction(vcpu, EMULTYPE_NO_DECODE | EMULTYPE_SKIP |
+ EMULTYPE_COMPLETE_USER_EXIT);
+}
+
+static int complete_emulated_msr_access(struct kvm_vcpu *vcpu)
+{
+ return complete_emulated_insn_gp(vcpu, vcpu->run->msr.error);
+}
+
+static int complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
+{
+ complete_userspace_rdmsr(vcpu);
+ return complete_emulated_msr_access(vcpu);
+}
+
+static int complete_fast_msr_access(struct kvm_vcpu *vcpu)
+{
+ return kvm_x86_call(complete_emulated_msr)(vcpu, vcpu->run->msr.error);
+}
+
+static int complete_fast_rdmsr(struct kvm_vcpu *vcpu)
+{
+ complete_userspace_rdmsr(vcpu);
+ return complete_fast_msr_access(vcpu);
+}
+
+static int complete_fast_rdmsr_imm(struct kvm_vcpu *vcpu)
+{
+ if (!vcpu->run->msr.error)
+ kvm_register_write(vcpu, vcpu->arch.cui_rdmsr_imm_reg,
+ vcpu->run->msr.data);
+
+ return complete_fast_msr_access(vcpu);
+}
+
+static u64 kvm_msr_reason(int r)
+{
+ switch (r) {
+ case KVM_MSR_RET_UNSUPPORTED:
+ return KVM_MSR_EXIT_REASON_UNKNOWN;
+ case KVM_MSR_RET_FILTERED:
+ return KVM_MSR_EXIT_REASON_FILTER;
+ default:
+ return KVM_MSR_EXIT_REASON_INVAL;
+ }
+}
+
+static int kvm_msr_user_space(struct kvm_vcpu *vcpu, u32 index,
+ u32 exit_reason, u64 data,
+ int (*completion)(struct kvm_vcpu *vcpu),
+ int r)
+{
+ u64 msr_reason = kvm_msr_reason(r);
+
+ /* Check if the user wanted to know about this MSR fault */
+ if (!(vcpu->kvm->arch.user_space_msr_mask & msr_reason))
+ return 0;
+
+ vcpu->run->exit_reason = exit_reason;
+ vcpu->run->msr.error = 0;
+ memset(vcpu->run->msr.pad, 0, sizeof(vcpu->run->msr.pad));
+ vcpu->run->msr.reason = msr_reason;
+ vcpu->run->msr.index = index;
+ vcpu->run->msr.data = data;
+ vcpu->arch.complete_userspace_io = completion;
+
+ return 1;
+}
+
+static int __kvm_emulate_rdmsr(struct kvm_vcpu *vcpu, u32 msr, int reg,
+ int (*complete_rdmsr)(struct kvm_vcpu *))
+{
+ u64 data;
+ int r;
+
+ r = kvm_emulate_msr_read(vcpu, msr, &data);
+
+ if (!r) {
+ trace_kvm_msr_read(msr, data);
+
+ if (reg < 0) {
+ kvm_eax_write(vcpu, data);
+ kvm_edx_write(vcpu, data >> 32);
+ } else {
+ kvm_register_write(vcpu, reg, data);
+ }
+ } else {
+ /* MSR read failed? See if we should ask user space */
+ if (kvm_msr_user_space(vcpu, msr, KVM_EXIT_X86_RDMSR, 0,
+ complete_rdmsr, r))
+ return 0;
+ trace_kvm_msr_read_ex(msr);
+ }
+
+ return kvm_x86_call(complete_emulated_msr)(vcpu, r);
+}
+
+int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
+{
+ return __kvm_emulate_rdmsr(vcpu, kvm_ecx_read(vcpu), -1,
+ complete_fast_rdmsr);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdmsr);
+
+int kvm_emulate_rdmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg)
+{
+ vcpu->arch.cui_rdmsr_imm_reg = reg;
+
+ return __kvm_emulate_rdmsr(vcpu, msr, reg, complete_fast_rdmsr_imm);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdmsr_imm);
+
+static int __kvm_emulate_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
+{
+ int r;
+
+ r = kvm_emulate_msr_write(vcpu, msr, data);
+ if (!r) {
+ trace_kvm_msr_write(msr, data);
+ } else {
+ /* MSR write failed? See if we should ask user space */
+ if (kvm_msr_user_space(vcpu, msr, KVM_EXIT_X86_WRMSR, data,
+ complete_fast_msr_access, r))
+ return 0;
+ /* Signal all other negative errors to userspace */
+ if (r < 0)
+ return r;
+ trace_kvm_msr_write_ex(msr, data);
+ }
+
+ return kvm_x86_call(complete_emulated_msr)(vcpu, r);
+}
+
+int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
+{
+ return __kvm_emulate_wrmsr(vcpu, kvm_ecx_read(vcpu),
+ kvm_read_edx_eax(vcpu));
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_wrmsr);
+
+int kvm_emulate_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg)
+{
+ return __kvm_emulate_wrmsr(vcpu, msr, kvm_register_read(vcpu, reg));
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_wrmsr_imm);
+
+int kvm_emulator_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 msr_index,
+ u64 *pdata)
+{
+ int r;
+
+ r = kvm_emulate_msr_read(vcpu, msr_index, pdata);
+ if (r < 0)
+ return X86EMUL_UNHANDLEABLE;
+
+ if (r) {
+ if (kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_RDMSR, 0,
+ complete_emulated_rdmsr, r))
+ return X86EMUL_IO_NEEDED;
+
+ trace_kvm_msr_read_ex(msr_index);
+ return X86EMUL_PROPAGATE_FAULT;
+ }
+
+ trace_kvm_msr_read(msr_index, *pdata);
+ return X86EMUL_CONTINUE;
+}
+
+int kvm_emulator_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 msr_index,
+ u64 data)
+{
+ int r;
+
+ r = kvm_emulate_msr_write(vcpu, msr_index, data);
+ if (r < 0)
+ return X86EMUL_UNHANDLEABLE;
+
+ if (r) {
+ if (kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_WRMSR, data,
+ complete_emulated_msr_access, r))
+ return X86EMUL_IO_NEEDED;
+
+ trace_kvm_msr_write_ex(msr_index, data);
+ return X86EMUL_PROPAGATE_FAULT;
+ }
+
+ trace_kvm_msr_write(msr_index, data);
+ return X86EMUL_CONTINUE;
+}
+
+int kvm_emulator_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
+{
+ /*
+ * Treat emulator accesses to the current shadow stack pointer as host-
+ * initiated, as they aren't true MSR accesses (SSP is a "just a reg"),
+ * and this API is used only for implicit accesses, i.e. not RDMSR, and
+ * so the index is fully KVM-controlled.
+ */
+ if (unlikely(msr_index == MSR_KVM_INTERNAL_GUEST_SSP))
+ return kvm_msr_read(vcpu, msr_index, pdata);
+
+ return __kvm_emulate_msr_read(vcpu, msr_index, pdata);
+}
+
+/*
+ * Returns true if the MSR in question is managed via XSTATE, i.e. is context
+ * switched with the rest of guest FPU state.
+ *
+ * Note, S_CET is _not_ saved/restored via XSAVES/XRSTORS.
+ */
+static bool is_xstate_managed_msr(struct kvm_vcpu *vcpu, u32 msr)
+{
+ if (!vcpu)
+ return false;
+
+ switch (msr) {
+ case MSR_IA32_U_CET:
+ return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ||
+ guest_cpu_cap_has(vcpu, X86_FEATURE_IBT);
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
+ default:
+ return false;
+ }
+}
+
+/*
+ * Lock (and if necessary, re-load) the guest FPU, i.e. XSTATE, and access an
+ * MSR that is managed via XSTATE. Note, the caller is responsible for doing
+ * the initial FPU load, this helper only ensures that guest state is resident
+ * in hardware (the kernel can load its FPU state in IRQ context).
+ *
+ * Note, loading guest values for U_CET and PL[0-3]_SSP while executing in the
+ * kernel is safe, as U_CET is specific to userspace, and PL[0-3]_SSP are only
+ * consumed when transitioning to lower privilege levels, i.e. are effectively
+ * only consumed by userspace as well.
+ */
+static __always_inline void kvm_access_xstate_msr(struct kvm_vcpu *vcpu,
+ struct msr_data *msr_info,
+ int access)
+{
+ BUILD_BUG_ON(access != MSR_TYPE_R && access != MSR_TYPE_W);
+
+ KVM_BUG_ON(!is_xstate_managed_msr(vcpu, msr_info->index), vcpu->kvm);
+ KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
+
+ kvm_fpu_get();
+ if (access == MSR_TYPE_R)
+ rdmsrq(msr_info->index, msr_info->data);
+ else
+ wrmsrq(msr_info->index, msr_info->data);
+ kvm_fpu_put();
+}
+
+static void kvm_set_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+ kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_W);
+}
+
+static void kvm_get_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+ kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_R);
+}
+
+static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int sec_hi_ofs)
+{
+ int version;
+ int r;
+ struct pvclock_wall_clock wc;
+ u32 wc_sec_hi;
+ u64 wall_nsec;
+
+ if (!wall_clock)
+ return;
+
+ r = kvm_read_guest(kvm, wall_clock, &version, sizeof(version));
+ if (r)
+ return;
+
+ if (version & 1)
+ ++version; /* first time write, random junk */
+
+ ++version;
+
+ if (kvm_write_guest(kvm, wall_clock, &version, sizeof(version)))
+ return;
+
+ wall_nsec = kvm_get_wall_clock_epoch(kvm);
+
+ wc.nsec = do_div(wall_nsec, NSEC_PER_SEC);
+ wc.sec = (u32)wall_nsec; /* overflow in 2106 guest time */
+ wc.version = version;
+
+ kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc));
+
+ if (sec_hi_ofs) {
+ wc_sec_hi = wall_nsec >> 32;
+ kvm_write_guest(kvm, wall_clock + sec_hi_ofs,
+ &wc_sec_hi, sizeof(wc_sec_hi));
+ }
+
+ version++;
+ kvm_write_guest(kvm, wall_clock, &version, sizeof(version));
+}
+
+static void kvm_write_system_time(struct kvm_vcpu *vcpu, gpa_t system_time,
+ bool old_msr, bool host_initiated)
+{
+ struct kvm_arch *ka = &vcpu->kvm->arch;
+
+ if (vcpu->vcpu_id == 0 && !host_initiated) {
+ if (ka->boot_vcpu_runs_old_kvmclock != old_msr)
+ kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
+
+ ka->boot_vcpu_runs_old_kvmclock = old_msr;
+ }
+
+ vcpu->arch.time = system_time;
+ kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
+
+ /* we verify if the enable bit is set... */
+ if (system_time & 1)
+ kvm_gpc_activate(&vcpu->arch.pv_time, system_time & ~1ULL,
+ sizeof(struct pvclock_vcpu_time_info));
+ else
+ kvm_gpc_deactivate(&vcpu->arch.pv_time);
+
+ return;
+}
+
+/* These helpers are safe iff @msr is known to be an MCx bank MSR. */
+static bool is_mci_control_msr(u32 msr)
+{
+ return (msr & 3) == 0;
+}
+static bool is_mci_status_msr(u32 msr)
+{
+ return (msr & 3) == 1;
+}
+
+/*
+ * On AMD, HWCR[McStatusWrEn] controls whether setting MCi_STATUS results in #GP.
+ */
+static bool can_set_mci_status(struct kvm_vcpu *vcpu)
+{
+ /* McStatusWrEn enabled? */
+ if (guest_cpuid_is_amd_compatible(vcpu))
+ return !!(vcpu->arch.msr_hwcr & BIT_ULL(18));
+
+ return false;
+}
+
+static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+ u64 mcg_cap = vcpu->arch.mcg_cap;
+ unsigned bank_num = mcg_cap & 0xff;
+ u32 msr = msr_info->index;
+ u64 data = msr_info->data;
+ u32 offset, last_msr;
+
+ switch (msr) {
+ case MSR_IA32_MCG_STATUS:
+ vcpu->arch.mcg_status = data;
+ break;
+ case MSR_IA32_MCG_CTL:
+ if (!(mcg_cap & MCG_CTL_P) &&
+ (data || !msr_info->host_initiated))
+ return 1;
+ if (data != 0 && data != ~(u64)0)
+ return 1;
+ vcpu->arch.mcg_ctl = data;
+ break;
+ case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
+ last_msr = MSR_IA32_MCx_CTL2(bank_num) - 1;
+ if (msr > last_msr)
+ return 1;
+
+ if (!(mcg_cap & MCG_CMCI_P) && (data || !msr_info->host_initiated))
+ return 1;
+ /* An attempt to write a 1 to a reserved bit raises #GP */
+ if (data & ~(MCI_CTL2_CMCI_EN | MCI_CTL2_CMCI_THRESHOLD_MASK))
+ return 1;
+ offset = array_index_nospec(msr - MSR_IA32_MC0_CTL2,
+ last_msr + 1 - MSR_IA32_MC0_CTL2);
+ vcpu->arch.mci_ctl2_banks[offset] = data;
+ break;
+ case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+ last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
+ if (msr > last_msr)
+ return 1;
+
+ /*
+ * Only 0 or all 1s can be written to IA32_MCi_CTL, all other
+ * values are architecturally undefined. But, some Linux
+ * kernels clear bit 10 in bank 4 to workaround a BIOS/GART TLB
+ * issue on AMD K8s, allow bit 10 to be clear when setting all
+ * other bits in order to avoid an uncaught #GP in the guest.
+ *
+ * UNIXWARE clears bit 0 of MC1_CTL to ignore correctable,
+ * single-bit ECC data errors.
+ */
+ if (is_mci_control_msr(msr) &&
+ data != 0 && (data | (1 << 10) | 1) != ~(u64)0)
+ return 1;
+
+ /*
+ * All CPUs allow writing 0 to MCi_STATUS MSRs to clear the MSR.
+ * AMD-based CPUs allow non-zero values, but if and only if
+ * HWCR[McStatusWrEn] is set.
+ */
+ if (!msr_info->host_initiated && is_mci_status_msr(msr) &&
+ data != 0 && !can_set_mci_status(vcpu))
+ return 1;
+
+ offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
+ last_msr + 1 - MSR_IA32_MC0_CTL);
+ vcpu->arch.mce_banks[offset] = data;
+ break;
+ default:
+ return 1;
+ }
+ return 0;
+}
+
+static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
+{
+ gpa_t gpa = data & ~0x3f;
+
+ /* Bits 4:5 are reserved, Should be zero */
+ if (data & 0x30)
+ return 1;
+
+ if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_VMEXIT) &&
+ (data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT))
+ return 1;
+
+ if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT) &&
+ (data & KVM_ASYNC_PF_DELIVERY_AS_INT))
+ return 1;
+
+ if (!lapic_in_kernel(vcpu))
+ return data ? 1 : 0;
+
+ if (__kvm_pv_async_pf_enabled(data) &&
+ kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa,
+ sizeof(u64)))
+ return 1;
+
+ vcpu->arch.apf.msr_en_val = data;
+
+ if (__kvm_pv_async_pf_enabled(data)) {
+ kvm_async_pf_wakeup_all(vcpu);
+ } else {
+ kvm_clear_async_pf_completion_queue(vcpu);
+ kvm_async_pf_hash_reset(vcpu);
+ }
+ return 0;
+}
+
+static int kvm_pv_enable_async_pf_int(struct kvm_vcpu *vcpu, u64 data)
+{
+ /* Bits 8-63 are reserved */
+ if (data >> 8)
+ return 1;
+
+ if (!lapic_in_kernel(vcpu))
+ return 1;
+
+ vcpu->arch.apf.msr_int_val = data;
+
+ vcpu->arch.apf.vec = data & KVM_ASYNC_PF_VEC_MASK;
+
+ return 0;
+}
+
+#ifdef CONFIG_X86_64
+static inline u64 kvm_guest_supported_xfd(struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.guest_supported_xcr0 & XFEATURE_MASK_USER_DYNAMIC;
+}
+#endif
+
+int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+ u32 msr = msr_info->index;
+ u64 data = msr_info->data;
+
+ /*
+ * Do not allow host-initiated writes to trigger the Xen hypercall
+ * page setup; it could incur locking paths which are not expected
+ * if userspace sets the MSR in an unusual location.
+ */
+ if (kvm_xen_is_hypercall_page_msr(vcpu->kvm, msr) &&
+ !msr_info->host_initiated)
+ return kvm_xen_write_hypercall_page(vcpu, data);
+
+ switch (msr) {
+ case MSR_AMD64_NB_CFG:
+ case MSR_IA32_UCODE_WRITE:
+ case MSR_VM_HSAVE_PA:
+ case MSR_AMD64_PATCH_LOADER:
+ case MSR_AMD64_BU_CFG2:
+ case MSR_AMD64_DC_CFG:
+ case MSR_AMD64_TW_CFG:
+ case MSR_F15H_EX_CFG:
+ break;
+
+ case MSR_IA32_UCODE_REV:
+ if (msr_info->host_initiated)
+ vcpu->arch.microcode_version = data;
+ break;
+ case MSR_IA32_ARCH_CAPABILITIES:
+ if (!msr_info->host_initiated ||
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+ return KVM_MSR_RET_UNSUPPORTED;
+ vcpu->arch.arch_capabilities = data;
+ break;
+ case MSR_IA32_PERF_CAPABILITIES:
+ if (!msr_info->host_initiated ||
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ if (data & ~kvm_caps.supported_perf_cap)
+ return 1;
+
+ /*
+ * Note, this is not just a performance optimization! KVM
+ * disallows changing feature MSRs after the vCPU has run; PMU
+ * refresh will bug the VM if called after the vCPU has run.
+ */
+ if (vcpu->arch.perf_capabilities == data)
+ break;
+
+ vcpu->arch.perf_capabilities = data;
+ kvm_pmu_refresh(vcpu);
+ kvm_make_request(KVM_REQ_RECALC_INTERCEPTS, vcpu);
+ break;
+ case MSR_IA32_PRED_CMD: {
+ u64 reserved_bits = ~(PRED_CMD_IBPB | PRED_CMD_SBPB);
+
+ if (!msr_info->host_initiated) {
+ if ((!guest_has_pred_cmd_msr(vcpu)))
+ return 1;
+
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_IBPB))
+ reserved_bits |= PRED_CMD_IBPB;
+
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SBPB))
+ reserved_bits |= PRED_CMD_SBPB;
+ }
+
+ if (!boot_cpu_has(X86_FEATURE_IBPB))
+ reserved_bits |= PRED_CMD_IBPB;
+
+ if (!boot_cpu_has(X86_FEATURE_SBPB))
+ reserved_bits |= PRED_CMD_SBPB;
+
+ if (data & reserved_bits)
+ return 1;
+
+ if (!data)
+ break;
+
+ wrmsrq(MSR_IA32_PRED_CMD, data);
+ break;
+ }
+ case MSR_IA32_FLUSH_CMD:
+ if (!msr_info->host_initiated &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D))
+ return 1;
+
+ if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D) || (data & ~L1D_FLUSH))
+ return 1;
+ if (!data)
+ break;
+
+ wrmsrq(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
+ break;
+ case MSR_EFER:
+ return set_efer(vcpu, msr_info);
+ case MSR_K7_HWCR: {
+ /*
+ * Allow McStatusWrEn and TscFreqSel. (Linux guests from v3.2
+ * through at least v6.6 whine if TscFreqSel is clear,
+ * depending on F/M/S.
+ */
+ u64 valid = BIT_ULL(18) | BIT_ULL(24);
+
+ data &= ~(u64)0x40; /* ignore flush filter disable */
+ data &= ~(u64)0x100; /* ignore ignne emulation enable */
+ data &= ~(u64)0x8; /* ignore TLB cache disable */
+
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_GP_ON_USER_CPUID))
+ valid |= MSR_K7_HWCR_CPUID_USER_DIS;
+
+ if (data & ~valid) {
+ kvm_pr_unimpl_wrmsr(vcpu, msr, data);
+ return 1;
+ }
+ vcpu->arch.msr_hwcr = data;
+ break;
+ }
+ case MSR_FAM10H_MMIO_CONF_BASE:
+ if (data != 0) {
+ kvm_pr_unimpl_wrmsr(vcpu, msr, data);
+ return 1;
+ }
+ break;
+ case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
+ vcpu->arch.pat = data;
+ break;
+ case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000:
+ case MSR_MTRRdefType:
+ return kvm_mtrr_set_msr(vcpu, msr, data);
+ case MSR_IA32_APICBASE:
+ return kvm_apic_set_base(vcpu, data, msr_info->host_initiated);
+ case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
+ return kvm_x2apic_msr_write(vcpu, msr, data);
+ case MSR_IA32_TSC_DEADLINE:
+ kvm_set_lapic_tscdeadline_msr(vcpu, data);
+ break;
+ case MSR_IA32_TSC_ADJUST:
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_TSC_ADJUST)) {
+ if (!msr_info->host_initiated) {
+ s64 adj = data - vcpu->arch.ia32_tsc_adjust_msr;
+ adjust_tsc_offset_guest(vcpu, adj);
+ /* Before back to guest, tsc_timestamp must be adjusted
+ * as well, otherwise guest's percpu pvclock time could jump.
+ */
+ kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+ }
+ vcpu->arch.ia32_tsc_adjust_msr = data;
+ }
+ break;
+ case MSR_IA32_MISC_ENABLE: {
+ u64 old_val = vcpu->arch.ia32_misc_enable_msr;
+
+ if (!msr_info->host_initiated) {
+ /* RO bits */
+ if ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PMU_RO_MASK)
+ return 1;
+
+ /* R bits, i.e. writes are ignored, but don't fault. */
+ data = data & ~MSR_IA32_MISC_ENABLE_EMON;
+ data |= old_val & MSR_IA32_MISC_ENABLE_EMON;
+ }
+
+ if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
+ ((old_val ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_XMM3))
+ return 1;
+ vcpu->arch.ia32_misc_enable_msr = data;
+ vcpu->arch.cpuid_dynamic_bits_dirty = true;
+ } else {
+ vcpu->arch.ia32_misc_enable_msr = data;
+ }
+ break;
+ }
+ case MSR_IA32_SMBASE:
+ if (!IS_ENABLED(CONFIG_KVM_SMM) || !msr_info->host_initiated)
+ return 1;
+ vcpu->arch.smbase = data;
+ break;
+ case MSR_IA32_POWER_CTL:
+ vcpu->arch.msr_ia32_power_ctl = data;
+ break;
+ case MSR_IA32_TSC:
+ if (msr_info->host_initiated) {
+ kvm_synchronize_tsc(vcpu, &data);
+ } else if (!vcpu->arch.guest_tsc_protected) {
+ u64 adj = kvm_compute_l1_tsc_offset(vcpu, data) - vcpu->arch.l1_tsc_offset;
+ adjust_tsc_offset_guest(vcpu, adj);
+ vcpu->arch.ia32_tsc_adjust_msr += adj;
+ }
+ break;
+ case MSR_IA32_XSS:
+ if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ if (data & ~vcpu->arch.guest_supported_xss)
+ return 1;
+ if (vcpu->arch.ia32_xss == data)
+ break;
+ vcpu->arch.ia32_xss = data;
+ vcpu->arch.cpuid_dynamic_bits_dirty = true;
+ break;
+ case MSR_SMI_COUNT:
+ if (!msr_info->host_initiated)
+ return 1;
+ vcpu->arch.smi_count = data;
+ break;
+ case MSR_KVM_WALL_CLOCK_NEW:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ vcpu->kvm->arch.wall_clock = data;
+ kvm_write_wall_clock(vcpu->kvm, data, 0);
+ break;
+ case MSR_KVM_WALL_CLOCK:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ vcpu->kvm->arch.wall_clock = data;
+ kvm_write_wall_clock(vcpu->kvm, data, 0);
+ break;
+ case MSR_KVM_SYSTEM_TIME_NEW:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ kvm_write_system_time(vcpu, data, false, msr_info->host_initiated);
+ break;
+ case MSR_KVM_SYSTEM_TIME:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ kvm_write_system_time(vcpu, data, true, msr_info->host_initiated);
+ break;
+ case MSR_KVM_ASYNC_PF_EN:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ if (kvm_pv_enable_async_pf(vcpu, data))
+ return 1;
+ break;
+ case MSR_KVM_ASYNC_PF_INT:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ if (kvm_pv_enable_async_pf_int(vcpu, data))
+ return 1;
+ break;
+ case MSR_KVM_ASYNC_PF_ACK:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT))
+ return KVM_MSR_RET_UNSUPPORTED;
+ if (data & 0x1) {
+ /*
+ * Pairs with the smp_mb__after_atomic() in
+ * kvm_arch_async_page_present_queued().
+ */
+ smp_store_mb(vcpu->arch.apf.pageready_pending, false);
+
+ kvm_check_async_pf_completion(vcpu);
+ }
+ break;
+ case MSR_KVM_STEAL_TIME:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_STEAL_TIME))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ if (unlikely(!sched_info_on()))
+ return 1;
+
+ if (data & KVM_STEAL_RESERVED_MASK)
+ return 1;
+
+ vcpu->arch.st.msr_val = data;
+
+ if (!(data & KVM_MSR_ENABLED))
+ break;
+
+ kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
+
+ break;
+ case MSR_KVM_PV_EOI_EN:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_PV_EOI))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ if (kvm_lapic_set_pv_eoi(vcpu, data, sizeof(u8)))
+ return 1;
+ break;
+
+ case MSR_KVM_POLL_CONTROL:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_POLL_CONTROL))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ /* only enable bit supported */
+ if (data & (-1ULL << 1))
+ return 1;
+
+ vcpu->arch.msr_kvm_poll_control = data;
+ break;
+
+ case MSR_IA32_MCG_CTL:
+ case MSR_IA32_MCG_STATUS:
+ case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+ case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
+ return set_msr_mce(vcpu, msr_info);
+
+ case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
+ case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
+ case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL3:
+ case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
+ if (kvm_pmu_is_valid_msr(vcpu, msr))
+ return kvm_pmu_set_msr(vcpu, msr_info);
+
+ if (data)
+ kvm_pr_unimpl_wrmsr(vcpu, msr, data);
+ break;
+ case MSR_K7_CLK_CTL:
+ /*
+ * Ignore all writes to this no longer documented MSR.
+ * Writes are only relevant for old K7 processors,
+ * all pre-dating SVM, but a recommended workaround from
+ * AMD for these chips. It is possible to specify the
+ * affected processor models on the command line, hence
+ * the need to ignore the workaround.
+ */
+ break;
+#ifdef CONFIG_KVM_HYPERV
+ case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15:
+ case HV_X64_MSR_SYNDBG_CONTROL ... HV_X64_MSR_SYNDBG_PENDING_BUFFER:
+ case HV_X64_MSR_SYNDBG_OPTIONS:
+ case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
+ case HV_X64_MSR_CRASH_CTL:
+ case HV_X64_MSR_STIMER0_CONFIG ... HV_X64_MSR_STIMER3_COUNT:
+ case HV_X64_MSR_REENLIGHTENMENT_CONTROL:
+ case HV_X64_MSR_TSC_EMULATION_CONTROL:
+ case HV_X64_MSR_TSC_EMULATION_STATUS:
+ case HV_X64_MSR_TSC_INVARIANT_CONTROL:
+ return kvm_hv_set_msr_common(vcpu, msr, data,
+ msr_info->host_initiated);
+#endif
+ case MSR_IA32_BBL_CR_CTL3:
+ /* Drop writes to this legacy MSR -- see rdmsr
+ * counterpart for further detail.
+ */
+ kvm_pr_unimpl_wrmsr(vcpu, msr, data);
+ break;
+ case MSR_AMD64_OSVW_ID_LENGTH:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
+ return 1;
+ vcpu->arch.osvw.length = data;
+ break;
+ case MSR_AMD64_OSVW_STATUS:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
+ return 1;
+ vcpu->arch.osvw.status = data;
+ break;
+ case MSR_PLATFORM_INFO:
+ if (!msr_info->host_initiated)
+ return 1;
+ vcpu->arch.msr_platform_info = data;
+ break;
+ case MSR_MISC_FEATURES_ENABLES:
+ if (data & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT ||
+ (data & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT &&
+ !(vcpu->arch.msr_platform_info & MSR_PLATFORM_INFO_CPUID_FAULT)))
+ return 1;
+ vcpu->arch.msr_misc_features_enables = data;
+ break;
+#ifdef CONFIG_X86_64
+ case MSR_IA32_XFD:
+ if (!msr_info->host_initiated &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
+ return 1;
+
+ if (data & ~kvm_guest_supported_xfd(vcpu))
+ return 1;
+
+ fpu_update_guest_xfd(&vcpu->arch.guest_fpu, data);
+ break;
+ case MSR_IA32_XFD_ERR:
+ if (!msr_info->host_initiated &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
+ return 1;
+
+ if (data & ~kvm_guest_supported_xfd(vcpu))
+ return 1;
+
+ vcpu->arch.guest_fpu.xfd_err = data;
+ break;
+#endif
+ case MSR_IA32_U_CET:
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ kvm_set_xstate_msr(vcpu, msr_info);
+ break;
+ default:
+ if (kvm_pmu_is_valid_msr(vcpu, msr))
+ return kvm_pmu_set_msr(vcpu, msr_info);
+
+ return KVM_MSR_RET_UNSUPPORTED;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_msr_common);
+
+static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
+{
+ u64 data;
+ u64 mcg_cap = vcpu->arch.mcg_cap;
+ unsigned bank_num = mcg_cap & 0xff;
+ u32 offset, last_msr;
+
+ switch (msr) {
+ case MSR_IA32_P5_MC_ADDR:
+ case MSR_IA32_P5_MC_TYPE:
+ data = 0;
+ break;
+ case MSR_IA32_MCG_CAP:
+ data = vcpu->arch.mcg_cap;
+ break;
+ case MSR_IA32_MCG_CTL:
+ if (!(mcg_cap & MCG_CTL_P) && !host)
+ return 1;
+ data = vcpu->arch.mcg_ctl;
+ break;
+ case MSR_IA32_MCG_STATUS:
+ data = vcpu->arch.mcg_status;
+ break;
+ case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
+ last_msr = MSR_IA32_MCx_CTL2(bank_num) - 1;
+ if (msr > last_msr)
+ return 1;
+
+ if (!(mcg_cap & MCG_CMCI_P) && !host)
+ return 1;
+ offset = array_index_nospec(msr - MSR_IA32_MC0_CTL2,
+ last_msr + 1 - MSR_IA32_MC0_CTL2);
+ data = vcpu->arch.mci_ctl2_banks[offset];
+ break;
+ case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+ last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
+ if (msr > last_msr)
+ return 1;
+
+ offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
+ last_msr + 1 - MSR_IA32_MC0_CTL);
+ data = vcpu->arch.mce_banks[offset];
+ break;
+ default:
+ return 1;
+ }
+ *pdata = data;
+ return 0;
+}
+
+int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+ switch (msr_info->index) {
+ case MSR_IA32_PLATFORM_ID:
+ case MSR_IA32_EBL_CR_POWERON:
+ case MSR_IA32_LASTBRANCHFROMIP:
+ case MSR_IA32_LASTBRANCHTOIP:
+ case MSR_IA32_LASTINTFROMIP:
+ case MSR_IA32_LASTINTTOIP:
+ case MSR_AMD64_SYSCFG:
+ case MSR_K8_TSEG_ADDR:
+ case MSR_K8_TSEG_MASK:
+ case MSR_VM_HSAVE_PA:
+ case MSR_K8_INT_PENDING_MSG:
+ case MSR_AMD64_NB_CFG:
+ case MSR_FAM10H_MMIO_CONF_BASE:
+ case MSR_AMD64_BU_CFG2:
+ case MSR_IA32_PERF_CTL:
+ case MSR_AMD64_DC_CFG:
+ case MSR_AMD64_TW_CFG:
+ case MSR_F15H_EX_CFG:
+ /*
+ * Intel Sandy Bridge CPUs must support the RAPL (running average power
+ * limit) MSRs. Just return 0, as we do not want to expose the host
+ * data here. Do not conditionalize this on CPUID, as KVM does not do
+ * so for existing CPU-specific MSRs.
+ */
+ case MSR_RAPL_POWER_UNIT:
+ case MSR_PP0_ENERGY_STATUS: /* Power plane 0 (core) */
+ case MSR_PP1_ENERGY_STATUS: /* Power plane 1 (graphics uncore) */
+ case MSR_PKG_ENERGY_STATUS: /* Total package */
+ case MSR_DRAM_ENERGY_STATUS: /* DRAM controller */
+ msr_info->data = 0;
+ break;
+ case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL3:
+ case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
+ case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
+ case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
+ if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
+ return kvm_pmu_get_msr(vcpu, msr_info);
+ msr_info->data = 0;
+ break;
+ case MSR_IA32_UCODE_REV:
+ msr_info->data = vcpu->arch.microcode_version;
+ break;
+ case MSR_IA32_ARCH_CAPABILITIES:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+ return KVM_MSR_RET_UNSUPPORTED;
+ msr_info->data = vcpu->arch.arch_capabilities;
+ break;
+ case MSR_IA32_PERF_CAPABILITIES:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
+ return KVM_MSR_RET_UNSUPPORTED;
+ msr_info->data = vcpu->arch.perf_capabilities;
+ break;
+ case MSR_IA32_POWER_CTL:
+ msr_info->data = vcpu->arch.msr_ia32_power_ctl;
+ break;
+ case MSR_IA32_TSC: {
+ /*
+ * Intel SDM states that MSR_IA32_TSC read adds the TSC offset
+ * even when not intercepted. AMD manual doesn't explicitly
+ * state this but appears to behave the same.
+ *
+ * On userspace reads and writes, however, we unconditionally
+ * return L1's TSC value to ensure backwards-compatible
+ * behavior for migration.
+ */
+ u64 offset, ratio;
+
+ if (msr_info->host_initiated) {
+ offset = vcpu->arch.l1_tsc_offset;
+ ratio = vcpu->arch.l1_tsc_scaling_ratio;
+ } else {
+ offset = vcpu->arch.tsc_offset;
+ ratio = vcpu->arch.tsc_scaling_ratio;
+ }
+
+ msr_info->data = kvm_scale_tsc(rdtsc(), ratio) + offset;
+ break;
+ }
+ case MSR_IA32_CR_PAT:
+ msr_info->data = vcpu->arch.pat;
+ break;
+ case MSR_MTRRcap:
+ case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000:
+ case MSR_MTRRdefType:
+ return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data);
+ case 0xcd: /* fsb frequency */
+ msr_info->data = 3;
+ break;
+ /*
+ * MSR_EBC_FREQUENCY_ID
+ * Conservative value valid for even the basic CPU models.
+ * Models 0,1: 000 in bits 23:21 indicating a bus speed of
+ * 100MHz, model 2 000 in bits 18:16 indicating 100MHz,
+ * and 266MHz for model 3, or 4. Set Core Clock
+ * Frequency to System Bus Frequency Ratio to 1 (bits
+ * 31:24) even though these are only valid for CPU
+ * models > 2, however guests may end up dividing or
+ * multiplying by zero otherwise.
+ */
+ case MSR_EBC_FREQUENCY_ID:
+ msr_info->data = 1 << 24;
+ break;
+ case MSR_IA32_APICBASE:
+ msr_info->data = vcpu->arch.apic_base;
+ break;
+ case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
+ return kvm_x2apic_msr_read(vcpu, msr_info->index, &msr_info->data);
+ case MSR_IA32_TSC_DEADLINE:
+ msr_info->data = kvm_get_lapic_tscdeadline_msr(vcpu);
+ break;
+ case MSR_IA32_TSC_ADJUST:
+ msr_info->data = (u64)vcpu->arch.ia32_tsc_adjust_msr;
+ break;
+ case MSR_IA32_MISC_ENABLE:
+ msr_info->data = vcpu->arch.ia32_misc_enable_msr;
+ break;
+ case MSR_IA32_SMBASE:
+ if (!IS_ENABLED(CONFIG_KVM_SMM) || !msr_info->host_initiated)
+ return 1;
+ msr_info->data = vcpu->arch.smbase;
+ break;
+ case MSR_SMI_COUNT:
+ msr_info->data = vcpu->arch.smi_count;
+ break;
+ case MSR_IA32_PERF_STATUS:
+ /* TSC increment by tick */
+ msr_info->data = 1000ULL;
+ /* CPU multiplier */
+ msr_info->data |= (((uint64_t)4ULL) << 40);
+ break;
+ case MSR_EFER:
+ msr_info->data = vcpu->arch.efer;
+ break;
+ case MSR_KVM_WALL_CLOCK:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->kvm->arch.wall_clock;
+ break;
+ case MSR_KVM_WALL_CLOCK_NEW:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->kvm->arch.wall_clock;
+ break;
+ case MSR_KVM_SYSTEM_TIME:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->arch.time;
+ break;
+ case MSR_KVM_SYSTEM_TIME_NEW:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->arch.time;
+ break;
+ case MSR_KVM_ASYNC_PF_EN:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->arch.apf.msr_en_val;
+ break;
+ case MSR_KVM_ASYNC_PF_INT:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->arch.apf.msr_int_val;
+ break;
+ case MSR_KVM_ASYNC_PF_ACK:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = 0;
+ break;
+ case MSR_KVM_STEAL_TIME:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_STEAL_TIME))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->arch.st.msr_val;
+ break;
+ case MSR_KVM_PV_EOI_EN:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_PV_EOI))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->arch.pv_eoi.msr_val;
+ break;
+ case MSR_KVM_POLL_CONTROL:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_POLL_CONTROL))
+ return KVM_MSR_RET_UNSUPPORTED;
+
+ msr_info->data = vcpu->arch.msr_kvm_poll_control;
+ break;
+ case MSR_IA32_P5_MC_ADDR:
+ case MSR_IA32_P5_MC_TYPE:
+ case MSR_IA32_MCG_CAP:
+ case MSR_IA32_MCG_CTL:
+ case MSR_IA32_MCG_STATUS:
+ case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
+ case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
+ return get_msr_mce(vcpu, msr_info->index, &msr_info->data,
+ msr_info->host_initiated);
+ case MSR_IA32_XSS:
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
+ return 1;
+ msr_info->data = vcpu->arch.ia32_xss;
+ break;
+ case MSR_K7_CLK_CTL:
+ /*
+ * Provide expected ramp-up count for K7. All other
+ * are set to zero, indicating minimum divisors for
+ * every field.
+ *
+ * This prevents guest kernels on AMD host with CPU
+ * type 6, model 8 and higher from exploding due to
+ * the rdmsr failing.
+ */
+ msr_info->data = 0x20000000;
+ break;
+#ifdef CONFIG_KVM_HYPERV
+ case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15:
+ case HV_X64_MSR_SYNDBG_CONTROL ... HV_X64_MSR_SYNDBG_PENDING_BUFFER:
+ case HV_X64_MSR_SYNDBG_OPTIONS:
+ case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
+ case HV_X64_MSR_CRASH_CTL:
+ case HV_X64_MSR_STIMER0_CONFIG ... HV_X64_MSR_STIMER3_COUNT:
+ case HV_X64_MSR_REENLIGHTENMENT_CONTROL:
+ case HV_X64_MSR_TSC_EMULATION_CONTROL:
+ case HV_X64_MSR_TSC_EMULATION_STATUS:
+ case HV_X64_MSR_TSC_INVARIANT_CONTROL:
+ return kvm_hv_get_msr_common(vcpu,
+ msr_info->index, &msr_info->data,
+ msr_info->host_initiated);
+#endif
+ case MSR_IA32_BBL_CR_CTL3:
+ /* This legacy MSR exists but isn't fully documented in current
+ * silicon. It is however accessed by winxp in very narrow
+ * scenarios where it sets bit #19, itself documented as
+ * a "reserved" bit. Best effort attempt to source coherent
+ * read data here should the balance of the register be
+ * interpreted by the guest:
+ *
+ * L2 cache control register 3: 64GB range, 256KB size,
+ * enabled, latency 0x1, configured
+ */
+ msr_info->data = 0xbe702111;
+ break;
+ case MSR_AMD64_OSVW_ID_LENGTH:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
+ return 1;
+ msr_info->data = vcpu->arch.osvw.length;
+ break;
+ case MSR_AMD64_OSVW_STATUS:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
+ return 1;
+ msr_info->data = vcpu->arch.osvw.status;
+ break;
+ case MSR_PLATFORM_INFO:
+ if (!msr_info->host_initiated &&
+ !vcpu->kvm->arch.guest_can_read_msr_platform_info)
+ return 1;
+ msr_info->data = vcpu->arch.msr_platform_info;
+ break;
+ case MSR_MISC_FEATURES_ENABLES:
+ msr_info->data = vcpu->arch.msr_misc_features_enables;
+ break;
+ case MSR_K7_HWCR:
+ msr_info->data = vcpu->arch.msr_hwcr;
+ break;
+#ifdef CONFIG_X86_64
+ case MSR_IA32_XFD:
+ if (!msr_info->host_initiated &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
+ return 1;
+
+ msr_info->data = vcpu->arch.guest_fpu.fpstate->xfd;
+ break;
+ case MSR_IA32_XFD_ERR:
+ if (!msr_info->host_initiated &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
+ return 1;
+
+ msr_info->data = vcpu->arch.guest_fpu.xfd_err;
+ break;
+#endif
+ case MSR_IA32_U_CET:
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ kvm_get_xstate_msr(vcpu, msr_info);
+ break;
+ default:
+ if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
+ return kvm_pmu_get_msr(vcpu, msr_info);
+
+ return KVM_MSR_RET_UNSUPPORTED;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_msr_common);
+
+static int do_get_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
+{
+ return kvm_get_msr_ignored_check(vcpu, index, data, true);
+}
+
+static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
+{
+ u64 val;
+
+ /*
+ * Reject writes to immutable feature MSRs if the vCPU model is frozen,
+ * as KVM doesn't support modifying the guest vCPU model on the fly,
+ * e.g. changing the VMX capabilities MSRs while L2 is active is
+ * nonsensical. Allow writes of the same value, e.g. so that userspace
+ * can blindly stuff all MSRs when emulating RESET.
+ */
+ if (!kvm_can_set_cpuid_and_feature_msrs(vcpu) &&
+ kvm_is_immutable_feature_msr(index) &&
+ (do_get_msr(vcpu, index, &val) || *data != val))
+ return -EINVAL;
+
+ return kvm_set_msr_ignored_check(vcpu, index, *data, true);
+}
+
+/*
+ * Read or write a bunch of msrs. All parameters are kernel addresses.
+ *
+ * @return number of msrs set successfully.
+ */
+static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
+ struct kvm_msr_entry *entries,
+ int (*do_msr)(struct kvm_vcpu *vcpu,
+ unsigned index, u64 *data))
+{
+ bool fpu_loaded = false;
+ int i;
+
+ for (i = 0; i < msrs->nmsrs; ++i) {
+ /*
+ * If userspace is accessing one or more XSTATE-managed MSRs,
+ * temporarily load the guest's FPU state so that the guest's
+ * MSR value(s) is resident in hardware and thus can be accessed
+ * via RDMSR/WRMSR.
+ */
+ if (!fpu_loaded && is_xstate_managed_msr(vcpu, entries[i].index)) {
+ kvm_load_guest_fpu(vcpu);
+ fpu_loaded = true;
+ }
+ if (do_msr(vcpu, entries[i].index, &entries[i].data))
+ break;
+ }
+ if (fpu_loaded)
+ kvm_put_guest_fpu(vcpu);
+
+ return i;
+}
+
+/*
+ * Read or write a bunch of msrs. Parameters are user addresses.
+ *
+ * @return number of msrs set successfully.
+ */
+static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
+ int (*do_msr)(struct kvm_vcpu *vcpu,
+ unsigned index, u64 *data),
+ int writeback)
+{
+ struct kvm_msrs msrs;
+ struct kvm_msr_entry *entries;
+ unsigned size;
+ int r;
+
+ r = -EFAULT;
+ if (copy_from_user(&msrs, user_msrs, sizeof(msrs)))
+ goto out;
+
+ r = -E2BIG;
+ if (msrs.nmsrs >= MAX_IO_MSRS)
+ goto out;
+
+ size = sizeof(struct kvm_msr_entry) * msrs.nmsrs;
+ entries = memdup_user(user_msrs->entries, size);
+ if (IS_ERR(entries)) {
+ r = PTR_ERR(entries);
+ goto out;
+ }
+
+ r = __msr_io(vcpu, &msrs, entries, do_msr);
+
+ if (writeback && copy_to_user(user_msrs->entries, entries, size))
+ r = -EFAULT;
+
+ kfree(entries);
+out:
+ return r;
+}
+
+int kvm_get_feature_msrs(struct kvm_msrs __user *user_msrs)
+{
+ return msr_io(NULL, user_msrs, do_get_feature_msr, 1);
+}
+
+int kvm_get_msrs(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs)
+{
+ guard(srcu)(&vcpu->kvm->srcu);
+
+ return msr_io(vcpu, user_msrs, do_get_msr, 1);
+}
+
+int kvm_set_msrs(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs)
+{
+ guard(srcu)(&vcpu->kvm->srcu);
+
+ return msr_io(vcpu, user_msrs, do_set_msr, 0);
+}
+
+static int kvm_get_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
+{
+ u64 val;
+
+ if (do_get_msr(vcpu, msr, &val))
+ return -EINVAL;
+
+ if (put_user(val, user_val))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int kvm_set_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
+{
+ u64 val;
+
+ if (get_user(val, user_val))
+ return -EFAULT;
+
+ if (do_set_msr(vcpu, msr, &val))
+ return -EINVAL;
+
+ return 0;
+}
+
+struct kvm_x86_reg_id {
+ __u32 index;
+ __u8 type;
+ __u8 rsvd1;
+ __u8 rsvd2:4;
+ __u8 size:4;
+ __u8 x86;
+};
+
+static int kvm_translate_kvm_reg(struct kvm_vcpu *vcpu,
+ struct kvm_x86_reg_id *reg)
+{
+ switch (reg->index) {
+ case KVM_REG_GUEST_SSP:
+ /*
+ * FIXME: If host-initiated accesses are ever exempted from
+ * ignore_msrs (in kvm_do_msr_access()), drop this manual check
+ * and rely on KVM's standard checks to reject accesses to regs
+ * that don't exist.
+ */
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+ return -EINVAL;
+
+ reg->type = KVM_X86_REG_TYPE_MSR;
+ reg->index = MSR_KVM_INTERNAL_GUEST_SSP;
+ break;
+ default:
+ return -EINVAL;
+ }
+ return 0;
+}
+
+int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
+ void __user *argp)
+{
+ struct kvm_one_reg one_reg;
+ struct kvm_x86_reg_id *reg;
+ u64 __user *user_val;
+ bool load_fpu;
+ int r;
+
+ if (copy_from_user(&one_reg, argp, sizeof(one_reg)))
+ return -EFAULT;
+
+ if ((one_reg.id & KVM_REG_ARCH_MASK) != KVM_REG_X86)
+ return -EINVAL;
+
+ reg = (struct kvm_x86_reg_id *)&one_reg.id;
+ if (reg->rsvd1 || reg->rsvd2)
+ return -EINVAL;
+
+ if (reg->type == KVM_X86_REG_TYPE_KVM) {
+ r = kvm_translate_kvm_reg(vcpu, reg);
+ if (r)
+ return r;
+ }
+
+ if (reg->type != KVM_X86_REG_TYPE_MSR)
+ return -EINVAL;
+
+ if ((one_reg.id & KVM_REG_SIZE_MASK) != KVM_REG_SIZE_U64)
+ return -EINVAL;
+
+ guard(srcu)(&vcpu->kvm->srcu);
+
+ load_fpu = is_xstate_managed_msr(vcpu, reg->index);
+ if (load_fpu)
+ kvm_load_guest_fpu(vcpu);
+
+ user_val = u64_to_user_ptr(one_reg.addr);
+ if (ioctl == KVM_GET_ONE_REG)
+ r = kvm_get_one_msr(vcpu, reg->index, user_val);
+ else
+ r = kvm_set_one_msr(vcpu, reg->index, user_val);
+
+ if (load_fpu)
+ kvm_put_guest_fpu(vcpu);
+ return r;
+}
+
+int kvm_get_reg_list(struct kvm_vcpu *vcpu,
+ struct kvm_reg_list __user *user_list)
+{
+ u64 nr_regs = guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ? 1 : 0;
+ u64 user_nr_regs;
+
+ if (get_user(user_nr_regs, &user_list->n))
+ return -EFAULT;
+
+ if (put_user(nr_regs, &user_list->n))
+ return -EFAULT;
+
+ if (user_nr_regs < nr_regs)
+ return -E2BIG;
+
+ if (nr_regs &&
+ put_user(KVM_X86_REG_KVM(KVM_REG_GUEST_SSP), &user_list->reg[0]))
+ return -EFAULT;
+
+ return 0;
+}
+
+static struct kvm_x86_msr_filter *kvm_alloc_msr_filter(bool default_allow)
+{
+ struct kvm_x86_msr_filter *msr_filter;
+
+ msr_filter = kzalloc_obj(*msr_filter, GFP_KERNEL_ACCOUNT);
+ if (!msr_filter)
+ return NULL;
+
+ msr_filter->default_allow = default_allow;
+ return msr_filter;
+}
+
+void kvm_free_msr_filter(struct kvm_x86_msr_filter *msr_filter)
+{
+ u32 i;
+
+ if (!msr_filter)
+ return;
+
+ for (i = 0; i < msr_filter->count; i++)
+ kfree(msr_filter->ranges[i].bitmap);
+
+ kfree(msr_filter);
+}
+
+static int kvm_add_msr_filter(struct kvm_x86_msr_filter *msr_filter,
+ struct kvm_msr_filter_range *user_range)
+{
+ unsigned long *bitmap;
+ size_t bitmap_size;
+
+ if (!user_range->nmsrs)
+ return 0;
+
+ if (user_range->flags & ~KVM_MSR_FILTER_RANGE_VALID_MASK)
+ return -EINVAL;
+
+ if (!user_range->flags)
+ return -EINVAL;
+
+ bitmap_size = BITS_TO_LONGS(user_range->nmsrs) * sizeof(long);
+ if (!bitmap_size || bitmap_size > KVM_MSR_FILTER_MAX_BITMAP_SIZE)
+ return -EINVAL;
+
+ bitmap = memdup_user((__user u8*)user_range->bitmap, bitmap_size);
+ if (IS_ERR(bitmap))
+ return PTR_ERR(bitmap);
+
+ msr_filter->ranges[msr_filter->count] = (struct msr_bitmap_range) {
+ .flags = user_range->flags,
+ .base = user_range->base,
+ .nmsrs = user_range->nmsrs,
+ .bitmap = bitmap,
+ };
+
+ msr_filter->count++;
+ return 0;
+}
+
+int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, struct kvm_msr_filter *filter)
+{
+ struct kvm_x86_msr_filter *new_filter, *old_filter;
+ bool default_allow;
+ bool empty = true;
+ int r;
+ u32 i;
+
+ if (filter->flags & ~KVM_MSR_FILTER_VALID_MASK)
+ return -EINVAL;
+
+ for (i = 0; i < ARRAY_SIZE(filter->ranges); i++)
+ empty &= !filter->ranges[i].nmsrs;
+
+ default_allow = !(filter->flags & KVM_MSR_FILTER_DEFAULT_DENY);
+ if (empty && !default_allow)
+ return -EINVAL;
+
+ new_filter = kvm_alloc_msr_filter(default_allow);
+ if (!new_filter)
+ return -ENOMEM;
+
+ for (i = 0; i < ARRAY_SIZE(filter->ranges); i++) {
+ r = kvm_add_msr_filter(new_filter, &filter->ranges[i]);
+ if (r) {
+ kvm_free_msr_filter(new_filter);
+ return r;
+ }
+ }
+
+ mutex_lock(&kvm->lock);
+ old_filter = rcu_replace_pointer(kvm->arch.msr_filter, new_filter,
+ mutex_is_locked(&kvm->lock));
+ mutex_unlock(&kvm->lock);
+ synchronize_srcu(&kvm->srcu);
+
+ kvm_free_msr_filter(old_filter);
+
+ /*
+ * Recalc MSR intercepts as userspace may want to intercept accesses to
+ * MSRs that KVM would otherwise pass through to the guest.
+ */
+ kvm_make_all_cpus_request(kvm, KVM_REQ_RECALC_INTERCEPTS);
+
+ return 0;
+}
+
+
+static void kvm_probe_feature_msr(u32 msr_index)
+{
+ u64 data;
+
+ if (kvm_get_feature_msr(NULL, msr_index, &data, true))
+ return;
+
+ msr_based_features[num_msr_based_features++] = msr_index;
+}
+
+static void kvm_probe_msr_to_save(u32 msr_index)
+{
+ u32 dummy[2];
+
+ if (rdmsr_safe(msr_index, &dummy[0], &dummy[1]))
+ return;
+
+ /*
+ * Even MSRs that are valid in the host may not be exposed to guests in
+ * some cases.
+ */
+ switch (msr_index) {
+ case MSR_IA32_BNDCFGS:
+ if (!kvm_mpx_supported())
+ return;
+ break;
+ case MSR_TSC_AUX:
+ if (!kvm_cpu_cap_has(X86_FEATURE_RDTSCP) &&
+ !kvm_cpu_cap_has(X86_FEATURE_RDPID))
+ return;
+ break;
+ case MSR_IA32_UMWAIT_CONTROL:
+ if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG))
+ return;
+ break;
+ case MSR_IA32_RTIT_CTL:
+ case MSR_IA32_RTIT_STATUS:
+ if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT))
+ return;
+ break;
+ case MSR_IA32_RTIT_CR3_MATCH:
+ if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT) ||
+ !intel_pt_validate_hw_cap(PT_CAP_cr3_filtering))
+ return;
+ break;
+ case MSR_IA32_RTIT_OUTPUT_BASE:
+ case MSR_IA32_RTIT_OUTPUT_MASK:
+ if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT) ||
+ (!intel_pt_validate_hw_cap(PT_CAP_topa_output) &&
+ !intel_pt_validate_hw_cap(PT_CAP_single_range_output)))
+ return;
+ break;
+ case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
+ if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT) ||
+ (msr_index - MSR_IA32_RTIT_ADDR0_A >=
+ intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2))
+ return;
+ break;
+ case MSR_ARCH_PERFMON_PERFCTR0 ...
+ MSR_ARCH_PERFMON_PERFCTR0 + KVM_MAX_NR_GP_COUNTERS - 1:
+ if (msr_index - MSR_ARCH_PERFMON_PERFCTR0 >=
+ kvm_pmu_cap.num_counters_gp)
+ return;
+ break;
+ case MSR_ARCH_PERFMON_EVENTSEL0 ...
+ MSR_ARCH_PERFMON_EVENTSEL0 + KVM_MAX_NR_GP_COUNTERS - 1:
+ if (msr_index - MSR_ARCH_PERFMON_EVENTSEL0 >=
+ kvm_pmu_cap.num_counters_gp)
+ return;
+ break;
+ case MSR_ARCH_PERFMON_FIXED_CTR0 ...
+ MSR_ARCH_PERFMON_FIXED_CTR0 + KVM_MAX_NR_FIXED_COUNTERS - 1:
+ if (msr_index - MSR_ARCH_PERFMON_FIXED_CTR0 >=
+ kvm_pmu_cap.num_counters_fixed)
+ return;
+ break;
+ case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET:
+ if (!kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2))
+ return;
+ break;
+ case MSR_IA32_XFD:
+ case MSR_IA32_XFD_ERR:
+ if (!kvm_cpu_cap_has(X86_FEATURE_XFD))
+ return;
+ break;
+ case MSR_IA32_TSX_CTRL:
+ if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
+ return;
+ break;
+ case MSR_IA32_XSS:
+ if (!kvm_caps.supported_xss)
+ return;
+ break;
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+ !kvm_cpu_cap_has(X86_FEATURE_IBT))
+ return;
+ break;
+ case MSR_IA32_INT_SSP_TAB:
+ if (!kvm_cpu_cap_has(X86_FEATURE_LM))
+ return;
+ fallthrough;
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
+ return;
+ break;
+ default:
+ break;
+ }
+
+ msrs_to_save[num_msrs_to_save++] = msr_index;
+}
+
+void kvm_init_msr_lists(void)
+{
+ unsigned i;
+
+ BUILD_BUG_ON_MSG(KVM_MAX_NR_FIXED_COUNTERS != 3,
+ "Please update the fixed PMCs in msrs_to_save_pmu[]");
+
+ num_msrs_to_save = 0;
+ num_emulated_msrs = 0;
+ num_msr_based_features = 0;
+
+ for (i = 0; i < ARRAY_SIZE(msrs_to_save_base); i++)
+ kvm_probe_msr_to_save(msrs_to_save_base[i]);
+
+ if (enable_pmu) {
+ for (i = 0; i < ARRAY_SIZE(msrs_to_save_pmu); i++)
+ kvm_probe_msr_to_save(msrs_to_save_pmu[i]);
+ }
+
+ for (i = 0; i < ARRAY_SIZE(emulated_msrs_all); i++) {
+ if (!kvm_x86_call(has_emulated_msr)(NULL,
+ emulated_msrs_all[i]))
+ continue;
+
+ emulated_msrs[num_emulated_msrs++] = emulated_msrs_all[i];
+ }
+
+ for (i = KVM_FIRST_EMULATED_VMX_MSR; i <= KVM_LAST_EMULATED_VMX_MSR; i++)
+ kvm_probe_feature_msr(i);
+
+ for (i = 0; i < ARRAY_SIZE(msr_based_features_all_except_vmx); i++)
+ kvm_probe_feature_msr(msr_based_features_all_except_vmx[i]);
+}
+
+int kvm_spec_ctrl_test_value(u64 value)
+{
+ /*
+ * test that setting IA32_SPEC_CTRL to given value
+ * is allowed by the host processor
+ */
+
+ u64 saved_value;
+ unsigned long flags;
+ int ret = 0;
+
+ local_irq_save(flags);
+
+ if (rdmsrq_safe(MSR_IA32_SPEC_CTRL, &saved_value))
+ ret = 1;
+ else if (wrmsrq_safe(MSR_IA32_SPEC_CTRL, value))
+ ret = 1;
+ else
+ wrmsrq(MSR_IA32_SPEC_CTRL, saved_value);
+
+ local_irq_restore(flags);
+
+ return ret;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_spec_ctrl_test_value);
diff --git a/arch/x86/kvm/msrs.h b/arch/x86/kvm/msrs.h
new file mode 100644
index 000000000000..25b75fe39b80
--- /dev/null
+++ b/arch/x86/kvm/msrs.h
@@ -0,0 +1,128 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef ARCH_X86_KVM_MSR_H
+#define ARCH_X86_KVM_MSR_H
+
+#include <linux/kvm_host.h>
+#include <linux/user-return-notifier.h>
+
+#include "cpuid.h"
+#include "regs.h"
+
+extern bool report_ignored_msrs;
+extern bool ignore_msrs;
+
+static inline void kvm_pr_unimpl_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
+{
+ if (report_ignored_msrs)
+ vcpu_unimpl(vcpu, "Unhandled WRMSR(0x%x) = 0x%llx\n", msr, data);
+}
+
+static inline void kvm_pr_unimpl_rdmsr(struct kvm_vcpu *vcpu, u32 msr)
+{
+ if (report_ignored_msrs)
+ vcpu_unimpl(vcpu, "Unhandled RDMSR(0x%x)\n", msr);
+}
+
+/*
+ * The first...last VMX feature MSRs that are emulated by KVM. This may or may
+ * not cover all known VMX MSRs, as KVM doesn't emulate an MSR until there's an
+ * associated feature that KVM supports for nested virtualization.
+ */
+#define KVM_FIRST_EMULATED_VMX_MSR MSR_IA32_VMX_BASIC
+#define KVM_LAST_EMULATED_VMX_MSR MSR_IA32_VMX_VMFUNC
+
+/*
+ * KVM's internal, non-ABI indices for synthetic MSRs. The values themselves
+ * are arbitrary and have no meaning, the only requirement is that they don't
+ * conflict with "real" MSRs that KVM supports. Use values at the upper end
+ * of KVM's reserved paravirtual MSR range to minimize churn, i.e. these values
+ * will be usable until KVM exhausts its supply of paravirtual MSR indices.
+ */
+#define MSR_KVM_INTERNAL_GUEST_SSP 0x4b564dff
+
+#define MSR_IA32_CR_PAT_DEFAULT \
+ PAT_VALUE(WB, WT, UC_MINUS, UC, WB, WT, UC_MINUS, UC)
+
+void kvm_init_msr_lists(void);
+int kvm_get_msr_index_list(struct kvm_msr_list __user *user_msr_list);
+int kvm_get_feature_msr_index_list(struct kvm_msr_list __user *user_msr_list);
+int kvm_get_feature_msrs(struct kvm_msrs __user *user_msrs);
+
+int kvm_get_msrs(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs);
+int kvm_set_msrs(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs);
+
+int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
+ void __user *argp);
+int kvm_get_reg_list(struct kvm_vcpu *vcpu,
+ struct kvm_reg_list __user *user_list);
+
+void kvm_user_return_msr_cpu_online(void);
+void drop_user_return_notifiers(void);
+void kvm_destroy_user_return_msrs(void);
+
+fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu);
+fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
+
+int kvm_emulator_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 msr_index,
+ u64 *pdata);
+int kvm_emulator_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 msr_index,
+ u64 data);
+int kvm_emulator_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
+
+bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
+
+enum kvm_msr_access {
+ MSR_TYPE_R = BIT(0),
+ MSR_TYPE_W = BIT(1),
+ MSR_TYPE_RW = MSR_TYPE_R | MSR_TYPE_W,
+};
+
+/*
+ * Internal error codes that are used to indicate that MSR emulation encountered
+ * an error that should result in #GP in the guest, unless userspace handles it.
+ * Note, '1', '0', and negative numbers are off limits, as they are used by KVM
+ * as part of KVM's lightly documented internal KVM_RUN return codes.
+ *
+ * UNSUPPORTED - The MSR isn't supported, either because it is completely
+ * unknown to KVM, or because the MSR should not exist according
+ * to the vCPU model.
+ *
+ * FILTERED - Access to the MSR is denied by a userspace MSR filter.
+ */
+#define KVM_MSR_RET_UNSUPPORTED 2
+#define KVM_MSR_RET_FILTERED 3
+
+int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, struct kvm_msr_filter *filter);
+void kvm_free_msr_filter(struct kvm_x86_msr_filter *msr_filter);
+
+int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data);
+int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
+
+u64 kvm_get_arch_capabilities(void);
+int kvm_spec_ctrl_test_value(u64 value);
+
+#define CET_US_RESERVED_BITS GENMASK(9, 6)
+#define CET_US_SHSTK_MASK_BITS GENMASK(1, 0)
+#define CET_US_IBT_MASK_BITS (GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
+#define CET_US_LEGACY_BITMAP_BASE(data) ((data) >> 12)
+
+static inline bool kvm_is_valid_u_s_cet(struct kvm_vcpu *vcpu, u64 data)
+{
+ if (data & CET_US_RESERVED_BITS)
+ return false;
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+ (data & CET_US_SHSTK_MASK_BITS))
+ return false;
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) &&
+ (data & CET_US_IBT_MASK_BITS))
+ return false;
+ if (!IS_ALIGNED(CET_US_LEGACY_BITMAP_BASE(data), 4))
+ return false;
+ /* IBT can be suppressed iff the TRACKER isn't WAIT_ENDBR. */
+ if ((data & CET_SUPPRESS) && (data & CET_WAIT_ENDBR))
+ return false;
+
+ return true;
+}
+
+#endif
diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index 6f74e2b27c1e..c4ec024943bb 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -19,7 +19,7 @@
#include <asm/mtrr.h>
#include "cpuid.h"
-#include "x86.h"
+#include "msrs.h"
static u64 *find_mtrr(struct kvm_vcpu *vcpu, unsigned int msr)
{
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 316ec7a57f7d..caf01afe13f0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -80,7 +80,6 @@
#include <asm/mshyperv.h>
#include <asm/hypervisor.h>
#include <asm/tlbflush.h>
-#include <asm/intel_pt.h>
#include <asm/emulate_prefix.h>
#include <asm/sgx.h>
#include <asm/virt.h>
@@ -90,8 +89,6 @@
#define CREATE_TRACE_POINTS
#include "trace.h"
-#define MAX_IO_MSRS 256
-
/*
* Note, kvm_caps fields should *never* have default values, all fields must be
* recomputed from scratch during vendor module load, e.g. to account for a
@@ -108,17 +105,6 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_host);
#define emul_to_vcpu(ctxt) \
((struct kvm_vcpu *)(ctxt)->vcpu)
-/* EFER defaults:
- * - enable syscall per default because its emulated by KVM
- * - enable LME and LMA per default on 64 bit KVM
- */
-#ifdef CONFIG_X86_64
-static
-u64 __read_mostly efer_reserved_bits = ~((u64)(EFER_SCE | EFER_LME | EFER_LMA));
-#else
-static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
-#endif
-
#define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)
#define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
@@ -146,13 +132,6 @@ EXPORT_STATIC_CALL_GPL(kvm_x86_get_cs_db_l_bits);
EXPORT_STATIC_CALL_GPL(kvm_x86_cache_reg);
EXPORT_STATIC_CALL_GPL(kvm_x86_get_cpl);
-static bool __read_mostly ignore_msrs = 0;
-module_param(ignore_msrs, bool, 0644);
-
-bool __read_mostly report_ignored_msrs = true;
-module_param(report_ignored_msrs, bool, 0644);
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(report_ignored_msrs);
-
unsigned int min_timer_period_us = 200;
module_param(min_timer_period_us, uint, 0644);
@@ -179,27 +158,6 @@ module_param(pi_inject_timer, bint, 0644);
static bool __read_mostly mitigate_smt_rsb;
module_param(mitigate_smt_rsb, bool, 0444);
-/*
- * Restoring the host value for MSRs that are only consumed when running in
- * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
- * returns to userspace, i.e. the kernel can run with the guest's value.
- */
-#define KVM_MAX_NR_USER_RETURN_MSRS 16
-
-struct kvm_user_return_msrs {
- struct user_return_notifier urn;
- bool registered;
- struct kvm_user_return_msr_values {
- u64 host;
- u64 curr;
- } values[KVM_MAX_NR_USER_RETURN_MSRS];
-};
-
-u32 __read_mostly kvm_nr_uret_msrs;
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_nr_uret_msrs);
-static u32 __read_mostly kvm_uret_msrs_list[KVM_MAX_NR_USER_RETURN_MSRS];
-static DEFINE_PER_CPU(struct kvm_user_return_msrs, user_return_msrs);
-
#define KVM_SUPPORTED_XCR0 (XFEATURE_MASK_FP | XFEATURE_MASK_SSE \
| XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
@@ -301,249 +259,6 @@ const struct kvm_stats_header kvm_vcpu_stats_header = {
static struct kmem_cache *x86_emulator_cache;
-/*
- * The three MSR lists(msrs_to_save, emulated_msrs, msr_based_features) track
- * the set of MSRs that KVM exposes to userspace through KVM_GET_MSRS,
- * KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. msrs_to_save holds MSRs that
- * require host support, i.e. should be probed via RDMSR. emulated_msrs holds
- * MSRs that KVM emulates without strictly requiring host support.
- * msr_based_features holds MSRs that enumerate features, i.e. are effectively
- * CPUID leafs. Note, msr_based_features isn't mutually exclusive with
- * msrs_to_save and emulated_msrs.
- */
-
-static const u32 msrs_to_save_base[] = {
- MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
- MSR_STAR,
-#ifdef CONFIG_X86_64
- MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
-#endif
- MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
- MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
- MSR_IA32_SPEC_CTRL, MSR_IA32_TSX_CTRL,
- MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH,
- MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK,
- MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B,
- MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
- MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
- MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
- MSR_IA32_UMWAIT_CONTROL,
-
- MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS,
-
- MSR_IA32_U_CET, MSR_IA32_S_CET,
- MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
- MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB,
- MSR_IA32_DEBUGCTLMSR,
- MSR_IA32_LASTBRANCHFROMIP, MSR_IA32_LASTBRANCHTOIP,
- MSR_IA32_LASTINTFROMIP, MSR_IA32_LASTINTTOIP,
-};
-
-static const u32 msrs_to_save_pmu[] = {
- MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
- MSR_ARCH_PERFMON_FIXED_CTR0 + 2,
- MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
- MSR_CORE_PERF_GLOBAL_CTRL,
- MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
-
- /* This part of MSRs should match KVM_MAX_NR_INTEL_GP_COUNTERS. */
- MSR_ARCH_PERFMON_PERFCTR0, MSR_ARCH_PERFMON_PERFCTR1,
- MSR_ARCH_PERFMON_PERFCTR0 + 2, MSR_ARCH_PERFMON_PERFCTR0 + 3,
- MSR_ARCH_PERFMON_PERFCTR0 + 4, MSR_ARCH_PERFMON_PERFCTR0 + 5,
- MSR_ARCH_PERFMON_PERFCTR0 + 6, MSR_ARCH_PERFMON_PERFCTR0 + 7,
- MSR_ARCH_PERFMON_EVENTSEL0, MSR_ARCH_PERFMON_EVENTSEL1,
- MSR_ARCH_PERFMON_EVENTSEL0 + 2, MSR_ARCH_PERFMON_EVENTSEL0 + 3,
- MSR_ARCH_PERFMON_EVENTSEL0 + 4, MSR_ARCH_PERFMON_EVENTSEL0 + 5,
- MSR_ARCH_PERFMON_EVENTSEL0 + 6, MSR_ARCH_PERFMON_EVENTSEL0 + 7,
-
- MSR_K7_EVNTSEL0, MSR_K7_EVNTSEL1, MSR_K7_EVNTSEL2, MSR_K7_EVNTSEL3,
- MSR_K7_PERFCTR0, MSR_K7_PERFCTR1, MSR_K7_PERFCTR2, MSR_K7_PERFCTR3,
-
- /* This part of MSRs should match KVM_MAX_NR_AMD_GP_COUNTERS. */
- MSR_F15H_PERF_CTL0, MSR_F15H_PERF_CTL1, MSR_F15H_PERF_CTL2,
- MSR_F15H_PERF_CTL3, MSR_F15H_PERF_CTL4, MSR_F15H_PERF_CTL5,
- MSR_F15H_PERF_CTR0, MSR_F15H_PERF_CTR1, MSR_F15H_PERF_CTR2,
- MSR_F15H_PERF_CTR3, MSR_F15H_PERF_CTR4, MSR_F15H_PERF_CTR5,
-
- MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
- MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
- MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
- MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET,
-};
-
-static u32 msrs_to_save[ARRAY_SIZE(msrs_to_save_base) +
- ARRAY_SIZE(msrs_to_save_pmu)];
-static unsigned num_msrs_to_save;
-
-static const u32 emulated_msrs_all[] = {
- MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
- MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
-
-#ifdef CONFIG_KVM_HYPERV
- HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
- HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_REFERENCE_TSC,
- HV_X64_MSR_TSC_FREQUENCY, HV_X64_MSR_APIC_FREQUENCY,
- HV_X64_MSR_CRASH_P0, HV_X64_MSR_CRASH_P1, HV_X64_MSR_CRASH_P2,
- HV_X64_MSR_CRASH_P3, HV_X64_MSR_CRASH_P4, HV_X64_MSR_CRASH_CTL,
- HV_X64_MSR_RESET,
- HV_X64_MSR_VP_INDEX,
- HV_X64_MSR_VP_RUNTIME,
- HV_X64_MSR_SCONTROL,
- HV_X64_MSR_STIMER0_CONFIG,
- HV_X64_MSR_VP_ASSIST_PAGE,
- HV_X64_MSR_REENLIGHTENMENT_CONTROL, HV_X64_MSR_TSC_EMULATION_CONTROL,
- HV_X64_MSR_TSC_EMULATION_STATUS, HV_X64_MSR_TSC_INVARIANT_CONTROL,
- HV_X64_MSR_SYNDBG_OPTIONS,
- HV_X64_MSR_SYNDBG_CONTROL, HV_X64_MSR_SYNDBG_STATUS,
- HV_X64_MSR_SYNDBG_SEND_BUFFER, HV_X64_MSR_SYNDBG_RECV_BUFFER,
- HV_X64_MSR_SYNDBG_PENDING_BUFFER,
-#endif
-
- MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
- MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF_INT, MSR_KVM_ASYNC_PF_ACK,
-
- MSR_IA32_TSC_ADJUST,
- MSR_IA32_TSC_DEADLINE,
- MSR_IA32_ARCH_CAPABILITIES,
- MSR_IA32_PERF_CAPABILITIES,
- MSR_IA32_MISC_ENABLE,
- MSR_IA32_MCG_STATUS,
- MSR_IA32_MCG_CTL,
- MSR_IA32_MCG_EXT_CTL,
- MSR_IA32_SMBASE,
- MSR_SMI_COUNT,
- MSR_PLATFORM_INFO,
- MSR_MISC_FEATURES_ENABLES,
- MSR_AMD64_VIRT_SPEC_CTRL,
- MSR_AMD64_TSC_RATIO,
- MSR_IA32_POWER_CTL,
- MSR_IA32_UCODE_REV,
-
- /*
- * KVM always supports the "true" VMX control MSRs, even if the host
- * does not. The VMX MSRs as a whole are considered "emulated" as KVM
- * doesn't strictly require them to exist in the host (ignoring that
- * KVM would refuse to load in the first place if the core set of MSRs
- * aren't supported).
- */
- MSR_IA32_VMX_BASIC,
- MSR_IA32_VMX_TRUE_PINBASED_CTLS,
- MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
- MSR_IA32_VMX_TRUE_EXIT_CTLS,
- MSR_IA32_VMX_TRUE_ENTRY_CTLS,
- MSR_IA32_VMX_MISC,
- MSR_IA32_VMX_CR0_FIXED0,
- MSR_IA32_VMX_CR4_FIXED0,
- MSR_IA32_VMX_VMCS_ENUM,
- MSR_IA32_VMX_PROCBASED_CTLS2,
- MSR_IA32_VMX_EPT_VPID_CAP,
- MSR_IA32_VMX_VMFUNC,
-
- MSR_K7_HWCR,
- MSR_KVM_POLL_CONTROL,
-};
-
-static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
-static unsigned num_emulated_msrs;
-
-/*
- * List of MSRs that control the existence of MSR-based features, i.e. MSRs
- * that are effectively CPUID leafs. VMX MSRs are also included in the set of
- * feature MSRs, but are handled separately to allow expedited lookups.
- */
-static const u32 msr_based_features_all_except_vmx[] = {
- MSR_AMD64_DE_CFG,
- MSR_IA32_UCODE_REV,
- MSR_IA32_ARCH_CAPABILITIES,
- MSR_IA32_PERF_CAPABILITIES,
- MSR_PLATFORM_INFO,
-};
-
-static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all_except_vmx) +
- (KVM_LAST_EMULATED_VMX_MSR - KVM_FIRST_EMULATED_VMX_MSR + 1)];
-static unsigned int num_msr_based_features;
-
-/*
- * All feature MSRs except uCode revID, which tracks the currently loaded uCode
- * patch, are immutable once the vCPU model is defined.
- */
-static bool kvm_is_immutable_feature_msr(u32 msr)
-{
- int i;
-
- if (msr >= KVM_FIRST_EMULATED_VMX_MSR && msr <= KVM_LAST_EMULATED_VMX_MSR)
- return true;
-
- for (i = 0; i < ARRAY_SIZE(msr_based_features_all_except_vmx); i++) {
- if (msr == msr_based_features_all_except_vmx[i])
- return msr != MSR_IA32_UCODE_REV;
- }
-
- return false;
-}
-
-static bool kvm_is_advertised_msr(u32 msr_index)
-{
- unsigned int i;
-
- for (i = 0; i < num_msrs_to_save; i++) {
- if (msrs_to_save[i] == msr_index)
- return true;
- }
-
- for (i = 0; i < num_emulated_msrs; i++) {
- if (emulated_msrs[i] == msr_index)
- return true;
- }
-
- return false;
-}
-
-typedef int (*msr_access_t)(struct kvm_vcpu *vcpu, u32 index, u64 *data,
- bool host_initiated);
-
-static __always_inline int kvm_do_msr_access(struct kvm_vcpu *vcpu, u32 msr,
- u64 *data, bool host_initiated,
- enum kvm_msr_access rw,
- msr_access_t msr_access_fn)
-{
- const char *op = rw == MSR_TYPE_W ? "wrmsr" : "rdmsr";
- int ret;
-
- BUILD_BUG_ON(rw != MSR_TYPE_R && rw != MSR_TYPE_W);
-
- /*
- * Zero the data on read failures to avoid leaking stack data to the
- * guest and/or userspace, e.g. if the failure is ignored below.
- */
- ret = msr_access_fn(vcpu, msr, data, host_initiated);
- if (ret && rw == MSR_TYPE_R)
- *data = 0;
-
- if (ret != KVM_MSR_RET_UNSUPPORTED)
- return ret;
-
- /*
- * Userspace is allowed to read MSRs, and write '0' to MSRs, that KVM
- * advertises to userspace, even if an MSR isn't fully supported.
- * Simply check that @data is '0', which covers both the write '0' case
- * and all reads (in which case @data is zeroed on failure; see above).
- */
- if (host_initiated && !*data && kvm_is_advertised_msr(msr))
- return 0;
-
- if (!ignore_msrs) {
- kvm_debug_ratelimited("unhandled %s: 0x%x data 0x%llx\n",
- op, msr, *data);
- return ret;
- }
-
- if (report_ignored_msrs)
- kvm_pr_unimpl("ignored %s: 0x%x data 0x%llx\n", op, msr, *data);
-
- return 0;
-}
-
static struct kmem_cache *kvm_alloc_emulator_cache(void)
{
unsigned int useroffset = offsetof(struct x86_emulate_ctxt, src);
@@ -557,128 +272,6 @@ static struct kmem_cache *kvm_alloc_emulator_cache(void)
static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
-static void kvm_destroy_user_return_msrs(void)
-{
- int cpu;
-
- for_each_possible_cpu(cpu)
- WARN_ON_ONCE(per_cpu(user_return_msrs, cpu).registered);
-
- kvm_nr_uret_msrs = 0;
-}
-
-static void kvm_on_user_return(struct user_return_notifier *urn)
-{
- unsigned slot;
- struct kvm_user_return_msrs *msrs
- = container_of(urn, struct kvm_user_return_msrs, urn);
- struct kvm_user_return_msr_values *values;
-
- msrs->registered = false;
- user_return_notifier_unregister(urn);
-
- for (slot = 0; slot < kvm_nr_uret_msrs; ++slot) {
- values = &msrs->values[slot];
- if (values->host != values->curr) {
- wrmsrq(kvm_uret_msrs_list[slot], values->host);
- values->curr = values->host;
- }
- }
-}
-
-static int kvm_probe_user_return_msr(u32 msr)
-{
- u64 val;
- int ret;
-
- preempt_disable();
- ret = rdmsrq_safe(msr, &val);
- if (ret)
- goto out;
- ret = wrmsrq_safe(msr, val);
-out:
- preempt_enable();
- return ret;
-}
-
-int kvm_add_user_return_msr(u32 msr)
-{
- BUG_ON(kvm_nr_uret_msrs >= KVM_MAX_NR_USER_RETURN_MSRS);
-
- if (kvm_probe_user_return_msr(msr))
- return -1;
-
- kvm_uret_msrs_list[kvm_nr_uret_msrs] = msr;
- return kvm_nr_uret_msrs++;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_add_user_return_msr);
-
-int kvm_find_user_return_msr(u32 msr)
-{
- int i;
-
- for (i = 0; i < kvm_nr_uret_msrs; ++i) {
- if (kvm_uret_msrs_list[i] == msr)
- return i;
- }
- return -1;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_find_user_return_msr);
-
-static void kvm_user_return_msr_cpu_online(void)
-{
- struct kvm_user_return_msrs *msrs = this_cpu_ptr(&user_return_msrs);
- u64 value;
- int i;
-
- for (i = 0; i < kvm_nr_uret_msrs; ++i) {
- rdmsrq_safe(kvm_uret_msrs_list[i], &value);
- msrs->values[i].host = value;
- msrs->values[i].curr = value;
- }
-}
-
-static void kvm_user_return_register_notifier(struct kvm_user_return_msrs *msrs)
-{
- if (!msrs->registered) {
- msrs->urn.on_user_return = kvm_on_user_return;
- user_return_notifier_register(&msrs->urn);
- msrs->registered = true;
- }
-}
-
-int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
-{
- struct kvm_user_return_msrs *msrs = this_cpu_ptr(&user_return_msrs);
- int err;
-
- value = (value & mask) | (msrs->values[slot].host & ~mask);
- if (value == msrs->values[slot].curr)
- return 0;
- err = wrmsrq_safe(kvm_uret_msrs_list[slot], value);
- if (err)
- return 1;
-
- msrs->values[slot].curr = value;
- kvm_user_return_register_notifier(msrs);
- return 0;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_user_return_msr);
-
-u64 kvm_get_user_return_msr(unsigned int slot)
-{
- return this_cpu_ptr(&user_return_msrs)->values[slot].curr;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_user_return_msr);
-
-static void drop_user_return_notifiers(void)
-{
- struct kvm_user_return_msrs *msrs = this_cpu_ptr(&user_return_msrs);
-
- if (msrs->registered)
- kvm_on_user_return(&msrs->urn);
-}
-
/*
* Handle a fault on a hardware virtualization (VMX or SVM) instruction.
*
@@ -933,17 +526,6 @@ int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err)
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_complete_insn_gp);
-static int complete_emulated_insn_gp(struct kvm_vcpu *vcpu, int err)
-{
- if (err) {
- kvm_inject_gp(vcpu, 0);
- return 1;
- }
-
- return kvm_emulate_instruction(vcpu, EMULTYPE_NO_DECODE | EMULTYPE_SKIP |
- EMULTYPE_COMPLETE_USER_EXIT);
-}
-
void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault,
bool from_hardware)
{
@@ -1050,13 +632,6 @@ static void kvm_load_host_pkru(struct kvm_vcpu *vcpu)
}
}
-#ifdef CONFIG_X86_64
-static inline u64 kvm_guest_supported_xfd(struct kvm_vcpu *vcpu)
-{
- return vcpu->arch.guest_supported_xcr0 & XFEATURE_MASK_USER_DYNAMIC;
-}
-#endif
-
int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
{
u64 xcr0 = xcr;
@@ -1175,595 +750,6 @@ int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdpmc);
-/*
- * Some IA32_ARCH_CAPABILITIES bits have dependencies on MSRs that KVM
- * does not yet virtualize. These include:
- * 10 - MISC_PACKAGE_CTRLS
- * 11 - ENERGY_FILTERING_CTL
- * 12 - DOITM
- * 18 - FB_CLEAR_CTRL
- * 21 - XAPIC_DISABLE_STATUS
- * 23 - OVERCLOCKING_STATUS
- */
-
-#define KVM_SUPPORTED_ARCH_CAP \
- (ARCH_CAP_RDCL_NO | ARCH_CAP_IBRS_ALL | ARCH_CAP_RSBA | \
- ARCH_CAP_SKIP_VMENTRY_L1DFLUSH | ARCH_CAP_SSB_NO | ARCH_CAP_MDS_NO | \
- ARCH_CAP_PSCHANGE_MC_NO | ARCH_CAP_TSX_CTRL_MSR | ARCH_CAP_TAA_NO | \
- ARCH_CAP_SBDR_SSDP_NO | ARCH_CAP_FBSDP_NO | ARCH_CAP_PSDP_NO | \
- ARCH_CAP_FB_CLEAR | ARCH_CAP_RRSBA | ARCH_CAP_PBRSB_NO | ARCH_CAP_GDS_NO | \
- ARCH_CAP_RFDS_NO | ARCH_CAP_RFDS_CLEAR | ARCH_CAP_BHI_NO | ARCH_CAP_ITS_NO)
-
-static u64 kvm_get_arch_capabilities(void)
-{
- u64 data = kvm_host.arch_capabilities & KVM_SUPPORTED_ARCH_CAP;
-
- /*
- * If nx_huge_pages is enabled, KVM's shadow paging will ensure that
- * the nested hypervisor runs with NX huge pages. If it is not,
- * L1 is anyway vulnerable to ITLB_MULTIHIT exploits from other
- * L1 guests, so it need not worry about its own (L2) guests.
- */
- data |= ARCH_CAP_PSCHANGE_MC_NO;
-
- /*
- * If we're doing cache flushes (either "always" or "cond")
- * we will do one whenever the guest does a vmlaunch/vmresume.
- * If an outer hypervisor is doing the cache flush for us
- * (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that
- * capability to the guest too, and if EPT is disabled we're not
- * vulnerable. Overall, only VMENTER_L1D_FLUSH_NEVER will
- * require a nested hypervisor to do a flush of its own.
- */
- if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER)
- data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
-
- if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
- data |= ARCH_CAP_RDCL_NO;
- if (!boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS))
- data |= ARCH_CAP_SSB_NO;
- if (!boot_cpu_has_bug(X86_BUG_MDS))
- data |= ARCH_CAP_MDS_NO;
- if (!boot_cpu_has_bug(X86_BUG_RFDS))
- data |= ARCH_CAP_RFDS_NO;
- if (!boot_cpu_has_bug(X86_BUG_ITS))
- data |= ARCH_CAP_ITS_NO;
-
- if (!boot_cpu_has(X86_FEATURE_RTM)) {
- /*
- * If RTM=0 because the kernel has disabled TSX, the host might
- * have TAA_NO or TSX_CTRL. Clear TAA_NO (the guest sees RTM=0
- * and therefore knows that there cannot be TAA) but keep
- * TSX_CTRL: some buggy userspaces leave it set on tsx=on hosts,
- * and we want to allow migrating those guests to tsx=off hosts.
- */
- data &= ~ARCH_CAP_TAA_NO;
- } else if (!boot_cpu_has_bug(X86_BUG_TAA)) {
- data |= ARCH_CAP_TAA_NO;
- } else {
- /*
- * Nothing to do here; we emulate TSX_CTRL if present on the
- * host so the guest can choose between disabling TSX or
- * using VERW to clear CPU buffers.
- */
- }
-
- if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
- data |= ARCH_CAP_GDS_NO;
-
- return data;
-}
-
-static int kvm_get_feature_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
- bool host_initiated)
-{
- WARN_ON_ONCE(!host_initiated);
-
- switch (index) {
- case MSR_IA32_ARCH_CAPABILITIES:
- *data = kvm_get_arch_capabilities();
- break;
- case MSR_IA32_PERF_CAPABILITIES:
- *data = kvm_caps.supported_perf_cap;
- break;
- case MSR_PLATFORM_INFO:
- *data = MSR_PLATFORM_INFO_CPUID_FAULT;
- break;
- case MSR_IA32_UCODE_REV:
- rdmsrq_safe(index, data);
- break;
- default:
- return kvm_x86_call(get_feature_msr)(index, data);
- }
- return 0;
-}
-
-static int do_get_feature_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
-{
- return kvm_do_msr_access(vcpu, index, data, true, MSR_TYPE_R,
- kvm_get_feature_msr);
-}
-
-static bool __kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
-{
- if (efer & EFER_AUTOIBRS && !guest_cpu_cap_has(vcpu, X86_FEATURE_AUTOIBRS))
- return false;
-
- if (efer & EFER_FFXSR && !guest_cpu_cap_has(vcpu, X86_FEATURE_FXSR_OPT))
- return false;
-
- if (efer & EFER_SVME && !guest_cpu_cap_has(vcpu, X86_FEATURE_SVM))
- return false;
-
- if (efer & (EFER_LME | EFER_LMA) &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
- return false;
-
- if (efer & EFER_NX && !guest_cpu_cap_has(vcpu, X86_FEATURE_NX))
- return false;
-
- return true;
-
-}
-bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
-{
- if (efer & efer_reserved_bits)
- return false;
-
- return __kvm_valid_efer(vcpu, efer);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_valid_efer);
-
-static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
-{
- u64 old_efer = vcpu->arch.efer;
- u64 efer = msr_info->data;
- int r;
-
- if (efer & efer_reserved_bits)
- return 1;
-
- if (!msr_info->host_initiated) {
- if (!__kvm_valid_efer(vcpu, efer))
- return 1;
-
- if (is_paging(vcpu) &&
- (vcpu->arch.efer & EFER_LME) != (efer & EFER_LME))
- return 1;
- }
-
- efer &= ~EFER_LMA;
- efer |= vcpu->arch.efer & EFER_LMA;
-
- r = kvm_x86_call(set_efer)(vcpu, efer);
- if (r) {
- WARN_ON(r > 0);
- return r;
- }
-
- if ((efer ^ old_efer) & KVM_MMU_EFER_ROLE_BITS)
- kvm_mmu_reset_context(vcpu);
-
- if (!static_cpu_has(X86_FEATURE_XSAVES) &&
- (efer & EFER_SVME))
- kvm_hv_xsaves_xsavec_maybe_warn(vcpu);
-
- return 0;
-}
-
-void kvm_enable_efer_bits(u64 mask)
-{
- efer_reserved_bits &= ~mask;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_enable_efer_bits);
-
-bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
-{
- struct kvm_x86_msr_filter *msr_filter;
- struct msr_bitmap_range *ranges;
- struct kvm *kvm = vcpu->kvm;
- bool allowed;
- int idx;
- u32 i;
-
- /* x2APIC MSRs do not support filtering. */
- if (index >= 0x800 && index <= 0x8ff)
- return true;
-
- idx = srcu_read_lock(&kvm->srcu);
-
- msr_filter = srcu_dereference(kvm->arch.msr_filter, &kvm->srcu);
- if (!msr_filter) {
- allowed = true;
- goto out;
- }
-
- allowed = msr_filter->default_allow;
- ranges = msr_filter->ranges;
-
- for (i = 0; i < msr_filter->count; i++) {
- u32 start = ranges[i].base;
- u32 end = start + ranges[i].nmsrs;
- u32 flags = ranges[i].flags;
- unsigned long *bitmap = ranges[i].bitmap;
-
- if ((index >= start) && (index < end) && (flags & type)) {
- allowed = test_bit(index - start, bitmap);
- break;
- }
- }
-
-out:
- srcu_read_unlock(&kvm->srcu, idx);
-
- return allowed;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_msr_allowed);
-
-/*
- * Write @data into the MSR specified by @index. Select MSR specific fault
- * checks are bypassed if @host_initiated is %true.
- * Returns 0 on success, non-0 otherwise.
- * Assumes vcpu_load() was already called.
- */
-static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
- bool host_initiated)
-{
- struct msr_data msr;
-
- switch (index) {
- case MSR_FS_BASE:
- case MSR_GS_BASE:
- case MSR_KERNEL_GS_BASE:
- case MSR_CSTAR:
- case MSR_LSTAR:
- if (is_noncanonical_msr_address(data, vcpu))
- return 1;
- break;
- case MSR_IA32_SYSENTER_EIP:
- case MSR_IA32_SYSENTER_ESP:
- /*
- * IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
- * non-canonical address is written on Intel but not on
- * AMD (which ignores the top 32-bits, because it does
- * not implement 64-bit SYSENTER).
- *
- * 64-bit code should hence be able to write a non-canonical
- * value on AMD. Making the address canonical ensures that
- * vmentry does not fail on Intel after writing a non-canonical
- * value, and that something deterministic happens if the guest
- * invokes 64-bit SYSENTER.
- */
- data = __canonical_address(data, max_host_virt_addr_bits());
- break;
- case MSR_TSC_AUX:
- if (!kvm_is_supported_user_return_msr(MSR_TSC_AUX))
- return 1;
-
- if (!host_initiated &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
- return 1;
-
- /*
- * Per Intel's SDM, bits 63:32 are reserved, but AMD's APM has
- * incomplete and conflicting architectural behavior. Current
- * AMD CPUs completely ignore bits 63:32, i.e. they aren't
- * reserved and always read as zeros. Enforce Intel's reserved
- * bits check if the guest CPU is Intel compatible, otherwise
- * clear the bits. This ensures cross-vendor migration will
- * provide consistent behavior for the guest.
- */
- if (guest_cpuid_is_intel_compatible(vcpu) && (data >> 32) != 0)
- return 1;
-
- data = (u32)data;
- break;
- case MSR_IA32_U_CET:
- case MSR_IA32_S_CET:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
- return KVM_MSR_RET_UNSUPPORTED;
- if (!kvm_is_valid_u_s_cet(vcpu, data))
- return 1;
- break;
- case MSR_KVM_INTERNAL_GUEST_SSP:
- if (!host_initiated)
- return 1;
- fallthrough;
- /*
- * Note that the MSR emulation here is flawed when a vCPU
- * doesn't support the Intel 64 architecture. The expected
- * architectural behavior in this case is that the upper 32
- * bits do not exist and should always read '0'. However,
- * because the actual hardware on which the virtual CPU is
- * running does support Intel 64, XRSTORS/XSAVES in the
- * guest could observe behavior that violates the
- * architecture. Intercepting XRSTORS/XSAVES for this
- * special case isn't deemed worthwhile.
- */
- case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
- return KVM_MSR_RET_UNSUPPORTED;
- /*
- * MSR_IA32_INT_SSP_TAB is not present on processors that do
- * not support Intel 64 architecture.
- */
- if (index == MSR_IA32_INT_SSP_TAB && !guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
- return KVM_MSR_RET_UNSUPPORTED;
- if (is_noncanonical_msr_address(data, vcpu))
- return 1;
- /* All SSP MSRs except MSR_IA32_INT_SSP_TAB must be 4-byte aligned */
- if (index != MSR_IA32_INT_SSP_TAB && !IS_ALIGNED(data, 4))
- return 1;
- break;
- }
-
- msr.data = data;
- msr.index = index;
- msr.host_initiated = host_initiated;
-
- return kvm_x86_call(set_msr)(vcpu, &msr);
-}
-
-static int _kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
- bool host_initiated)
-{
- return __kvm_set_msr(vcpu, index, *data, host_initiated);
-}
-
-static int kvm_set_msr_ignored_check(struct kvm_vcpu *vcpu,
- u32 index, u64 data, bool host_initiated)
-{
- return kvm_do_msr_access(vcpu, index, &data, host_initiated, MSR_TYPE_W,
- _kvm_set_msr);
-}
-
-/*
- * Read the MSR specified by @index into @data. Select MSR specific fault
- * checks are bypassed if @host_initiated is %true.
- * Returns 0 on success, non-0 otherwise.
- * Assumes vcpu_load() was already called.
- */
-static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
- bool host_initiated)
-{
- struct msr_data msr;
- int ret;
-
- switch (index) {
- case MSR_TSC_AUX:
- if (!kvm_is_supported_user_return_msr(MSR_TSC_AUX))
- return 1;
-
- if (!host_initiated &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
- return 1;
- break;
- case MSR_IA32_U_CET:
- case MSR_IA32_S_CET:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
- return KVM_MSR_RET_UNSUPPORTED;
- break;
- case MSR_KVM_INTERNAL_GUEST_SSP:
- if (!host_initiated)
- return 1;
- fallthrough;
- case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
- return KVM_MSR_RET_UNSUPPORTED;
- break;
- }
-
- msr.index = index;
- msr.host_initiated = host_initiated;
-
- ret = kvm_x86_call(get_msr)(vcpu, &msr);
- if (!ret)
- *data = msr.data;
- return ret;
-}
-
-int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data)
-{
- return __kvm_set_msr(vcpu, index, data, true);
-}
-
-int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
-{
- return __kvm_get_msr(vcpu, index, data, true);
-}
-
-static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu,
- u32 index, u64 *data, bool host_initiated)
-{
- return kvm_do_msr_access(vcpu, index, data, host_initiated, MSR_TYPE_R,
- __kvm_get_msr);
-}
-
-int __kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
-{
- return kvm_get_msr_ignored_check(vcpu, index, data, false);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_emulate_msr_read);
-
-int __kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data)
-{
- return kvm_set_msr_ignored_check(vcpu, index, data, false);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_emulate_msr_write);
-
-int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data)
-{
- if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ))
- return KVM_MSR_RET_FILTERED;
-
- return __kvm_emulate_msr_read(vcpu, index, data);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_msr_read);
-
-int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data)
-{
- if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE))
- return KVM_MSR_RET_FILTERED;
-
- return __kvm_emulate_msr_write(vcpu, index, data);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_msr_write);
-
-
-static void complete_userspace_rdmsr(struct kvm_vcpu *vcpu)
-{
- if (!vcpu->run->msr.error) {
- kvm_eax_write(vcpu, vcpu->run->msr.data);
- kvm_edx_write(vcpu, vcpu->run->msr.data >> 32);
- }
-}
-
-static int complete_emulated_msr_access(struct kvm_vcpu *vcpu)
-{
- return complete_emulated_insn_gp(vcpu, vcpu->run->msr.error);
-}
-
-static int complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
-{
- complete_userspace_rdmsr(vcpu);
- return complete_emulated_msr_access(vcpu);
-}
-
-static int complete_fast_msr_access(struct kvm_vcpu *vcpu)
-{
- return kvm_x86_call(complete_emulated_msr)(vcpu, vcpu->run->msr.error);
-}
-
-static int complete_fast_rdmsr(struct kvm_vcpu *vcpu)
-{
- complete_userspace_rdmsr(vcpu);
- return complete_fast_msr_access(vcpu);
-}
-
-static int complete_fast_rdmsr_imm(struct kvm_vcpu *vcpu)
-{
- if (!vcpu->run->msr.error)
- kvm_register_write(vcpu, vcpu->arch.cui_rdmsr_imm_reg,
- vcpu->run->msr.data);
-
- return complete_fast_msr_access(vcpu);
-}
-
-static u64 kvm_msr_reason(int r)
-{
- switch (r) {
- case KVM_MSR_RET_UNSUPPORTED:
- return KVM_MSR_EXIT_REASON_UNKNOWN;
- case KVM_MSR_RET_FILTERED:
- return KVM_MSR_EXIT_REASON_FILTER;
- default:
- return KVM_MSR_EXIT_REASON_INVAL;
- }
-}
-
-static int kvm_msr_user_space(struct kvm_vcpu *vcpu, u32 index,
- u32 exit_reason, u64 data,
- int (*completion)(struct kvm_vcpu *vcpu),
- int r)
-{
- u64 msr_reason = kvm_msr_reason(r);
-
- /* Check if the user wanted to know about this MSR fault */
- if (!(vcpu->kvm->arch.user_space_msr_mask & msr_reason))
- return 0;
-
- vcpu->run->exit_reason = exit_reason;
- vcpu->run->msr.error = 0;
- memset(vcpu->run->msr.pad, 0, sizeof(vcpu->run->msr.pad));
- vcpu->run->msr.reason = msr_reason;
- vcpu->run->msr.index = index;
- vcpu->run->msr.data = data;
- vcpu->arch.complete_userspace_io = completion;
-
- return 1;
-}
-
-static int __kvm_emulate_rdmsr(struct kvm_vcpu *vcpu, u32 msr, int reg,
- int (*complete_rdmsr)(struct kvm_vcpu *))
-{
- u64 data;
- int r;
-
- r = kvm_emulate_msr_read(vcpu, msr, &data);
-
- if (!r) {
- trace_kvm_msr_read(msr, data);
-
- if (reg < 0) {
- kvm_eax_write(vcpu, data);
- kvm_edx_write(vcpu, data >> 32);
- } else {
- kvm_register_write(vcpu, reg, data);
- }
- } else {
- /* MSR read failed? See if we should ask user space */
- if (kvm_msr_user_space(vcpu, msr, KVM_EXIT_X86_RDMSR, 0,
- complete_rdmsr, r))
- return 0;
- trace_kvm_msr_read_ex(msr);
- }
-
- return kvm_x86_call(complete_emulated_msr)(vcpu, r);
-}
-
-int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
-{
- return __kvm_emulate_rdmsr(vcpu, kvm_ecx_read(vcpu), -1,
- complete_fast_rdmsr);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdmsr);
-
-int kvm_emulate_rdmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg)
-{
- vcpu->arch.cui_rdmsr_imm_reg = reg;
-
- return __kvm_emulate_rdmsr(vcpu, msr, reg, complete_fast_rdmsr_imm);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_rdmsr_imm);
-
-static int __kvm_emulate_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
-{
- int r;
-
- r = kvm_emulate_msr_write(vcpu, msr, data);
- if (!r) {
- trace_kvm_msr_write(msr, data);
- } else {
- /* MSR write failed? See if we should ask user space */
- if (kvm_msr_user_space(vcpu, msr, KVM_EXIT_X86_WRMSR, data,
- complete_fast_msr_access, r))
- return 0;
- /* Signal all other negative errors to userspace */
- if (r < 0)
- return r;
- trace_kvm_msr_write_ex(msr, data);
- }
-
- return kvm_x86_call(complete_emulated_msr)(vcpu, r);
-}
-
-int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
-{
- return __kvm_emulate_wrmsr(vcpu, kvm_ecx_read(vcpu),
- kvm_read_edx_eax(vcpu));
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_wrmsr);
-
-int kvm_emulate_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg)
-{
- return __kvm_emulate_wrmsr(vcpu, msr, kvm_register_read(vcpu, reg));
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_wrmsr_imm);
-
int kvm_emulate_as_nop(struct kvm_vcpu *vcpu)
{
return kvm_skip_emulated_instruction(vcpu);
@@ -1835,72 +821,6 @@ static inline bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu)
kvm_request_pending(vcpu) || xfer_to_guest_mode_work_pending();
}
-static fastpath_t __handle_fastpath_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
-{
- if (!kvm_pmu_is_fastpath_emulation_allowed(vcpu))
- return EXIT_FASTPATH_NONE;
-
- switch (msr) {
- case APIC_BASE_MSR + (APIC_ICR >> 4):
- if (!lapic_in_kernel(vcpu) || !apic_x2apic_mode(vcpu->arch.apic) ||
- kvm_x2apic_icr_write_fast(vcpu->arch.apic, data))
- return EXIT_FASTPATH_NONE;
- break;
- case MSR_IA32_TSC_DEADLINE:
- kvm_set_lapic_tscdeadline_msr(vcpu, data);
- break;
- default:
- return EXIT_FASTPATH_NONE;
- }
-
- trace_kvm_msr_write(msr, data);
-
- if (!kvm_skip_emulated_instruction(vcpu))
- return EXIT_FASTPATH_EXIT_USERSPACE;
-
- return EXIT_FASTPATH_REENTER_GUEST;
-}
-
-fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu)
-{
- return __handle_fastpath_wrmsr(vcpu, kvm_ecx_read(vcpu),
- kvm_read_edx_eax(vcpu));
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_wrmsr);
-
-fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg)
-{
- return __handle_fastpath_wrmsr(vcpu, msr, kvm_register_read(vcpu, reg));
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(handle_fastpath_wrmsr_imm);
-
-/*
- * Adapt set_msr() to msr_io()'s calling convention
- */
-static int do_get_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
-{
- return kvm_get_msr_ignored_check(vcpu, index, data, true);
-}
-
-static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
-{
- u64 val;
-
- /*
- * Reject writes to immutable feature MSRs if the vCPU model is frozen,
- * as KVM doesn't support modifying the guest vCPU model on the fly,
- * e.g. changing the VMX capabilities MSRs while L2 is active is
- * nonsensical. Allow writes of the same value, e.g. so that userspace
- * can blindly stuff all MSRs when emulating RESET.
- */
- if (!kvm_can_set_cpuid_and_feature_msrs(vcpu) &&
- kvm_is_immutable_feature_msr(index) &&
- (do_get_msr(vcpu, index, &val) || *data != val))
- return -EINVAL;
-
- return kvm_set_msr_ignored_check(vcpu, index, *data, true);
-}
-
#ifdef CONFIG_X86_64
struct pvclock_clock {
int vclock_mode;
@@ -1967,72 +887,6 @@ static s64 get_kvmclock_base_ns(void)
}
#endif
-static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int sec_hi_ofs)
-{
- int version;
- int r;
- struct pvclock_wall_clock wc;
- u32 wc_sec_hi;
- u64 wall_nsec;
-
- if (!wall_clock)
- return;
-
- r = kvm_read_guest(kvm, wall_clock, &version, sizeof(version));
- if (r)
- return;
-
- if (version & 1)
- ++version; /* first time write, random junk */
-
- ++version;
-
- if (kvm_write_guest(kvm, wall_clock, &version, sizeof(version)))
- return;
-
- wall_nsec = kvm_get_wall_clock_epoch(kvm);
-
- wc.nsec = do_div(wall_nsec, NSEC_PER_SEC);
- wc.sec = (u32)wall_nsec; /* overflow in 2106 guest time */
- wc.version = version;
-
- kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc));
-
- if (sec_hi_ofs) {
- wc_sec_hi = wall_nsec >> 32;
- kvm_write_guest(kvm, wall_clock + sec_hi_ofs,
- &wc_sec_hi, sizeof(wc_sec_hi));
- }
-
- version++;
- kvm_write_guest(kvm, wall_clock, &version, sizeof(version));
-}
-
-static void kvm_write_system_time(struct kvm_vcpu *vcpu, gpa_t system_time,
- bool old_msr, bool host_initiated)
-{
- struct kvm_arch *ka = &vcpu->kvm->arch;
-
- if (vcpu->vcpu_id == 0 && !host_initiated) {
- if (ka->boot_vcpu_runs_old_kvmclock != old_msr)
- kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
-
- ka->boot_vcpu_runs_old_kvmclock = old_msr;
- }
-
- vcpu->arch.time = system_time;
- kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
-
- /* we verify if the enable bit is set... */
- if (system_time & 1)
- kvm_gpc_activate(&vcpu->arch.pv_time, system_time & ~1ULL,
- sizeof(struct pvclock_vcpu_time_info));
- else
- kvm_gpc_deactivate(&vcpu->arch.pv_time);
-
- return;
-}
-
static uint32_t div_frac(uint32_t dividend, uint32_t divisor)
{
do_shl32_div32(dividend, divisor);
@@ -3077,151 +1931,6 @@ static void kvm_gen_kvmclock_update(struct kvm_vcpu *v)
}
}
-/* These helpers are safe iff @msr is known to be an MCx bank MSR. */
-static bool is_mci_control_msr(u32 msr)
-{
- return (msr & 3) == 0;
-}
-static bool is_mci_status_msr(u32 msr)
-{
- return (msr & 3) == 1;
-}
-
-/*
- * On AMD, HWCR[McStatusWrEn] controls whether setting MCi_STATUS results in #GP.
- */
-static bool can_set_mci_status(struct kvm_vcpu *vcpu)
-{
- /* McStatusWrEn enabled? */
- if (guest_cpuid_is_amd_compatible(vcpu))
- return !!(vcpu->arch.msr_hwcr & BIT_ULL(18));
-
- return false;
-}
-
-static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
-{
- u64 mcg_cap = vcpu->arch.mcg_cap;
- unsigned bank_num = mcg_cap & 0xff;
- u32 msr = msr_info->index;
- u64 data = msr_info->data;
- u32 offset, last_msr;
-
- switch (msr) {
- case MSR_IA32_MCG_STATUS:
- vcpu->arch.mcg_status = data;
- break;
- case MSR_IA32_MCG_CTL:
- if (!(mcg_cap & MCG_CTL_P) &&
- (data || !msr_info->host_initiated))
- return 1;
- if (data != 0 && data != ~(u64)0)
- return 1;
- vcpu->arch.mcg_ctl = data;
- break;
- case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
- last_msr = MSR_IA32_MCx_CTL2(bank_num) - 1;
- if (msr > last_msr)
- return 1;
-
- if (!(mcg_cap & MCG_CMCI_P) && (data || !msr_info->host_initiated))
- return 1;
- /* An attempt to write a 1 to a reserved bit raises #GP */
- if (data & ~(MCI_CTL2_CMCI_EN | MCI_CTL2_CMCI_THRESHOLD_MASK))
- return 1;
- offset = array_index_nospec(msr - MSR_IA32_MC0_CTL2,
- last_msr + 1 - MSR_IA32_MC0_CTL2);
- vcpu->arch.mci_ctl2_banks[offset] = data;
- break;
- case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
- last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
- if (msr > last_msr)
- return 1;
-
- /*
- * Only 0 or all 1s can be written to IA32_MCi_CTL, all other
- * values are architecturally undefined. But, some Linux
- * kernels clear bit 10 in bank 4 to workaround a BIOS/GART TLB
- * issue on AMD K8s, allow bit 10 to be clear when setting all
- * other bits in order to avoid an uncaught #GP in the guest.
- *
- * UNIXWARE clears bit 0 of MC1_CTL to ignore correctable,
- * single-bit ECC data errors.
- */
- if (is_mci_control_msr(msr) &&
- data != 0 && (data | (1 << 10) | 1) != ~(u64)0)
- return 1;
-
- /*
- * All CPUs allow writing 0 to MCi_STATUS MSRs to clear the MSR.
- * AMD-based CPUs allow non-zero values, but if and only if
- * HWCR[McStatusWrEn] is set.
- */
- if (!msr_info->host_initiated && is_mci_status_msr(msr) &&
- data != 0 && !can_set_mci_status(vcpu))
- return 1;
-
- offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
- last_msr + 1 - MSR_IA32_MC0_CTL);
- vcpu->arch.mce_banks[offset] = data;
- break;
- default:
- return 1;
- }
- return 0;
-}
-
-static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
-{
- gpa_t gpa = data & ~0x3f;
-
- /* Bits 4:5 are reserved, Should be zero */
- if (data & 0x30)
- return 1;
-
- if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_VMEXIT) &&
- (data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT))
- return 1;
-
- if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT) &&
- (data & KVM_ASYNC_PF_DELIVERY_AS_INT))
- return 1;
-
- if (!lapic_in_kernel(vcpu))
- return data ? 1 : 0;
-
- if (__kvm_pv_async_pf_enabled(data) &&
- kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa,
- sizeof(u64)))
- return 1;
-
- vcpu->arch.apf.msr_en_val = data;
-
- if (__kvm_pv_async_pf_enabled(data)) {
- kvm_async_pf_wakeup_all(vcpu);
- } else {
- kvm_clear_async_pf_completion_queue(vcpu);
- kvm_async_pf_hash_reset(vcpu);
- }
- return 0;
-}
-
-static int kvm_pv_enable_async_pf_int(struct kvm_vcpu *vcpu, u64 data)
-{
- /* Bits 8-63 are reserved */
- if (data >> 8)
- return 1;
-
- if (!lapic_in_kernel(vcpu))
- return 1;
-
- vcpu->arch.apf.msr_int_val = data;
-
- vcpu->arch.apf.vec = data & KVM_ASYNC_PF_VEC_MASK;
-
- return 0;
-}
-
static void kvmclock_reset(struct kvm_vcpu *vcpu)
{
kvm_gpc_deactivate(&vcpu->arch.pv_time);
@@ -3382,899 +2091,6 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
}
-/*
- * Returns true if the MSR in question is managed via XSTATE, i.e. is context
- * switched with the rest of guest FPU state.
- *
- * Note, S_CET is _not_ saved/restored via XSAVES/XRSTORS.
- */
-static bool is_xstate_managed_msr(struct kvm_vcpu *vcpu, u32 msr)
-{
- if (!vcpu)
- return false;
-
- switch (msr) {
- case MSR_IA32_U_CET:
- return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ||
- guest_cpu_cap_has(vcpu, X86_FEATURE_IBT);
- case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
- return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
- default:
- return false;
- }
-}
-
-/*
- * Lock (and if necessary, re-load) the guest FPU, i.e. XSTATE, and access an
- * MSR that is managed via XSTATE. Note, the caller is responsible for doing
- * the initial FPU load, this helper only ensures that guest state is resident
- * in hardware (the kernel can load its FPU state in IRQ context).
- *
- * Note, loading guest values for U_CET and PL[0-3]_SSP while executing in the
- * kernel is safe, as U_CET is specific to userspace, and PL[0-3]_SSP are only
- * consumed when transitioning to lower privilege levels, i.e. are effectively
- * only consumed by userspace as well.
- */
-static __always_inline void kvm_access_xstate_msr(struct kvm_vcpu *vcpu,
- struct msr_data *msr_info,
- int access)
-{
- BUILD_BUG_ON(access != MSR_TYPE_R && access != MSR_TYPE_W);
-
- KVM_BUG_ON(!is_xstate_managed_msr(vcpu, msr_info->index), vcpu->kvm);
- KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
-
- kvm_fpu_get();
- if (access == MSR_TYPE_R)
- rdmsrq(msr_info->index, msr_info->data);
- else
- wrmsrq(msr_info->index, msr_info->data);
- kvm_fpu_put();
-}
-
-static void kvm_set_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
-{
- kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_W);
-}
-
-static void kvm_get_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
-{
- kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_R);
-}
-
-int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
-{
- u32 msr = msr_info->index;
- u64 data = msr_info->data;
-
- /*
- * Do not allow host-initiated writes to trigger the Xen hypercall
- * page setup; it could incur locking paths which are not expected
- * if userspace sets the MSR in an unusual location.
- */
- if (kvm_xen_is_hypercall_page_msr(vcpu->kvm, msr) &&
- !msr_info->host_initiated)
- return kvm_xen_write_hypercall_page(vcpu, data);
-
- switch (msr) {
- case MSR_AMD64_NB_CFG:
- case MSR_IA32_UCODE_WRITE:
- case MSR_VM_HSAVE_PA:
- case MSR_AMD64_PATCH_LOADER:
- case MSR_AMD64_BU_CFG2:
- case MSR_AMD64_DC_CFG:
- case MSR_AMD64_TW_CFG:
- case MSR_F15H_EX_CFG:
- break;
-
- case MSR_IA32_UCODE_REV:
- if (msr_info->host_initiated)
- vcpu->arch.microcode_version = data;
- break;
- case MSR_IA32_ARCH_CAPABILITIES:
- if (!msr_info->host_initiated ||
- !guest_cpu_cap_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
- return KVM_MSR_RET_UNSUPPORTED;
- vcpu->arch.arch_capabilities = data;
- break;
- case MSR_IA32_PERF_CAPABILITIES:
- if (!msr_info->host_initiated ||
- !guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
- return KVM_MSR_RET_UNSUPPORTED;
-
- if (data & ~kvm_caps.supported_perf_cap)
- return 1;
-
- /*
- * Note, this is not just a performance optimization! KVM
- * disallows changing feature MSRs after the vCPU has run; PMU
- * refresh will bug the VM if called after the vCPU has run.
- */
- if (vcpu->arch.perf_capabilities == data)
- break;
-
- vcpu->arch.perf_capabilities = data;
- kvm_pmu_refresh(vcpu);
- kvm_make_request(KVM_REQ_RECALC_INTERCEPTS, vcpu);
- break;
- case MSR_IA32_PRED_CMD: {
- u64 reserved_bits = ~(PRED_CMD_IBPB | PRED_CMD_SBPB);
-
- if (!msr_info->host_initiated) {
- if ((!guest_has_pred_cmd_msr(vcpu)))
- return 1;
-
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL) &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_IBPB))
- reserved_bits |= PRED_CMD_IBPB;
-
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SBPB))
- reserved_bits |= PRED_CMD_SBPB;
- }
-
- if (!boot_cpu_has(X86_FEATURE_IBPB))
- reserved_bits |= PRED_CMD_IBPB;
-
- if (!boot_cpu_has(X86_FEATURE_SBPB))
- reserved_bits |= PRED_CMD_SBPB;
-
- if (data & reserved_bits)
- return 1;
-
- if (!data)
- break;
-
- wrmsrq(MSR_IA32_PRED_CMD, data);
- break;
- }
- case MSR_IA32_FLUSH_CMD:
- if (!msr_info->host_initiated &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D))
- return 1;
-
- if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D) || (data & ~L1D_FLUSH))
- return 1;
- if (!data)
- break;
-
- wrmsrq(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
- break;
- case MSR_EFER:
- return set_efer(vcpu, msr_info);
- case MSR_K7_HWCR: {
- /*
- * Allow McStatusWrEn and TscFreqSel. (Linux guests from v3.2
- * through at least v6.6 whine if TscFreqSel is clear,
- * depending on F/M/S.
- */
- u64 valid = BIT_ULL(18) | BIT_ULL(24);
-
- data &= ~(u64)0x40; /* ignore flush filter disable */
- data &= ~(u64)0x100; /* ignore ignne emulation enable */
- data &= ~(u64)0x8; /* ignore TLB cache disable */
-
- if (guest_cpu_cap_has(vcpu, X86_FEATURE_GP_ON_USER_CPUID))
- valid |= MSR_K7_HWCR_CPUID_USER_DIS;
-
- if (data & ~valid) {
- kvm_pr_unimpl_wrmsr(vcpu, msr, data);
- return 1;
- }
- vcpu->arch.msr_hwcr = data;
- break;
- }
- case MSR_FAM10H_MMIO_CONF_BASE:
- if (data != 0) {
- kvm_pr_unimpl_wrmsr(vcpu, msr, data);
- return 1;
- }
- break;
- case MSR_IA32_CR_PAT:
- if (!kvm_pat_valid(data))
- return 1;
-
- vcpu->arch.pat = data;
- break;
- case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000:
- case MSR_MTRRdefType:
- return kvm_mtrr_set_msr(vcpu, msr, data);
- case MSR_IA32_APICBASE:
- return kvm_apic_set_base(vcpu, data, msr_info->host_initiated);
- case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
- return kvm_x2apic_msr_write(vcpu, msr, data);
- case MSR_IA32_TSC_DEADLINE:
- kvm_set_lapic_tscdeadline_msr(vcpu, data);
- break;
- case MSR_IA32_TSC_ADJUST:
- if (guest_cpu_cap_has(vcpu, X86_FEATURE_TSC_ADJUST)) {
- if (!msr_info->host_initiated) {
- s64 adj = data - vcpu->arch.ia32_tsc_adjust_msr;
- adjust_tsc_offset_guest(vcpu, adj);
- /* Before back to guest, tsc_timestamp must be adjusted
- * as well, otherwise guest's percpu pvclock time could jump.
- */
- kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
- }
- vcpu->arch.ia32_tsc_adjust_msr = data;
- }
- break;
- case MSR_IA32_MISC_ENABLE: {
- u64 old_val = vcpu->arch.ia32_misc_enable_msr;
-
- if (!msr_info->host_initiated) {
- /* RO bits */
- if ((old_val ^ data) & MSR_IA32_MISC_ENABLE_PMU_RO_MASK)
- return 1;
-
- /* R bits, i.e. writes are ignored, but don't fault. */
- data = data & ~MSR_IA32_MISC_ENABLE_EMON;
- data |= old_val & MSR_IA32_MISC_ENABLE_EMON;
- }
-
- if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
- ((old_val ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_XMM3))
- return 1;
- vcpu->arch.ia32_misc_enable_msr = data;
- vcpu->arch.cpuid_dynamic_bits_dirty = true;
- } else {
- vcpu->arch.ia32_misc_enable_msr = data;
- }
- break;
- }
- case MSR_IA32_SMBASE:
- if (!IS_ENABLED(CONFIG_KVM_SMM) || !msr_info->host_initiated)
- return 1;
- vcpu->arch.smbase = data;
- break;
- case MSR_IA32_POWER_CTL:
- vcpu->arch.msr_ia32_power_ctl = data;
- break;
- case MSR_IA32_TSC:
- if (msr_info->host_initiated) {
- kvm_synchronize_tsc(vcpu, &data);
- } else if (!vcpu->arch.guest_tsc_protected) {
- u64 adj = kvm_compute_l1_tsc_offset(vcpu, data) - vcpu->arch.l1_tsc_offset;
- adjust_tsc_offset_guest(vcpu, adj);
- vcpu->arch.ia32_tsc_adjust_msr += adj;
- }
- break;
- case MSR_IA32_XSS:
- if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
- return KVM_MSR_RET_UNSUPPORTED;
-
- if (data & ~vcpu->arch.guest_supported_xss)
- return 1;
- if (vcpu->arch.ia32_xss == data)
- break;
- vcpu->arch.ia32_xss = data;
- vcpu->arch.cpuid_dynamic_bits_dirty = true;
- break;
- case MSR_SMI_COUNT:
- if (!msr_info->host_initiated)
- return 1;
- vcpu->arch.smi_count = data;
- break;
- case MSR_KVM_WALL_CLOCK_NEW:
- if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
- return KVM_MSR_RET_UNSUPPORTED;
-
- vcpu->kvm->arch.wall_clock = data;
- kvm_write_wall_clock(vcpu->kvm, data, 0);
- break;
- case MSR_KVM_WALL_CLOCK:
- if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
- return KVM_MSR_RET_UNSUPPORTED;
-
- vcpu->kvm->arch.wall_clock = data;
- kvm_write_wall_clock(vcpu->kvm, data, 0);
- break;
- case MSR_KVM_SYSTEM_TIME_NEW:
- if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
- return KVM_MSR_RET_UNSUPPORTED;
-
- kvm_write_system_time(vcpu, data, false, msr_info->host_initiated);
- break;
- case MSR_KVM_SYSTEM_TIME:
- if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
- return KVM_MSR_RET_UNSUPPORTED;
-
- kvm_write_system_time(vcpu, data, true, msr_info->host_initiated);
- break;
- case MSR_KVM_ASYNC_PF_EN:
- if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF))
- return KVM_MSR_RET_UNSUPPORTED;
-
- if (kvm_pv_enable_async_pf(vcpu, data))
- return 1;
- break;
- case MSR_KVM_ASYNC_PF_INT:
- if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT))
- return KVM_MSR_RET_UNSUPPORTED;
-
- if (kvm_pv_enable_async_pf_int(vcpu, data))
- return 1;
- break;
- case MSR_KVM_ASYNC_PF_ACK:
- if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT))
- return KVM_MSR_RET_UNSUPPORTED;
- if (data & 0x1) {
- /*
- * Pairs with the smp_mb__after_atomic() in
- * kvm_arch_async_page_present_queued().
- */
- smp_store_mb(vcpu->arch.apf.pageready_pending, false);
-
- kvm_check_async_pf_completion(vcpu);
- }
- break;
- case MSR_KVM_STEAL_TIME:
- if (!guest_pv_has(vcpu, KVM_FEATURE_STEAL_TIME))
- return KVM_MSR_RET_UNSUPPORTED;
-
- if (unlikely(!sched_info_on()))
- return 1;
-
- if (data & KVM_STEAL_RESERVED_MASK)
- return 1;
-
- vcpu->arch.st.msr_val = data;
-
- if (!(data & KVM_MSR_ENABLED))
- break;
-
- kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
-
- break;
- case MSR_KVM_PV_EOI_EN:
- if (!guest_pv_has(vcpu, KVM_FEATURE_PV_EOI))
- return KVM_MSR_RET_UNSUPPORTED;
-
- if (kvm_lapic_set_pv_eoi(vcpu, data, sizeof(u8)))
- return 1;
- break;
-
- case MSR_KVM_POLL_CONTROL:
- if (!guest_pv_has(vcpu, KVM_FEATURE_POLL_CONTROL))
- return KVM_MSR_RET_UNSUPPORTED;
-
- /* only enable bit supported */
- if (data & (-1ULL << 1))
- return 1;
-
- vcpu->arch.msr_kvm_poll_control = data;
- break;
-
- case MSR_IA32_MCG_CTL:
- case MSR_IA32_MCG_STATUS:
- case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
- case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
- return set_msr_mce(vcpu, msr_info);
-
- case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
- case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
- case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL3:
- case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
- if (kvm_pmu_is_valid_msr(vcpu, msr))
- return kvm_pmu_set_msr(vcpu, msr_info);
-
- if (data)
- kvm_pr_unimpl_wrmsr(vcpu, msr, data);
- break;
- case MSR_K7_CLK_CTL:
- /*
- * Ignore all writes to this no longer documented MSR.
- * Writes are only relevant for old K7 processors,
- * all pre-dating SVM, but a recommended workaround from
- * AMD for these chips. It is possible to specify the
- * affected processor models on the command line, hence
- * the need to ignore the workaround.
- */
- break;
-#ifdef CONFIG_KVM_HYPERV
- case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15:
- case HV_X64_MSR_SYNDBG_CONTROL ... HV_X64_MSR_SYNDBG_PENDING_BUFFER:
- case HV_X64_MSR_SYNDBG_OPTIONS:
- case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
- case HV_X64_MSR_CRASH_CTL:
- case HV_X64_MSR_STIMER0_CONFIG ... HV_X64_MSR_STIMER3_COUNT:
- case HV_X64_MSR_REENLIGHTENMENT_CONTROL:
- case HV_X64_MSR_TSC_EMULATION_CONTROL:
- case HV_X64_MSR_TSC_EMULATION_STATUS:
- case HV_X64_MSR_TSC_INVARIANT_CONTROL:
- return kvm_hv_set_msr_common(vcpu, msr, data,
- msr_info->host_initiated);
-#endif
- case MSR_IA32_BBL_CR_CTL3:
- /* Drop writes to this legacy MSR -- see rdmsr
- * counterpart for further detail.
- */
- kvm_pr_unimpl_wrmsr(vcpu, msr, data);
- break;
- case MSR_AMD64_OSVW_ID_LENGTH:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
- return 1;
- vcpu->arch.osvw.length = data;
- break;
- case MSR_AMD64_OSVW_STATUS:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
- return 1;
- vcpu->arch.osvw.status = data;
- break;
- case MSR_PLATFORM_INFO:
- if (!msr_info->host_initiated)
- return 1;
- vcpu->arch.msr_platform_info = data;
- break;
- case MSR_MISC_FEATURES_ENABLES:
- if (data & ~MSR_MISC_FEATURES_ENABLES_CPUID_FAULT ||
- (data & MSR_MISC_FEATURES_ENABLES_CPUID_FAULT &&
- !(vcpu->arch.msr_platform_info & MSR_PLATFORM_INFO_CPUID_FAULT)))
- return 1;
- vcpu->arch.msr_misc_features_enables = data;
- break;
-#ifdef CONFIG_X86_64
- case MSR_IA32_XFD:
- if (!msr_info->host_initiated &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
- return 1;
-
- if (data & ~kvm_guest_supported_xfd(vcpu))
- return 1;
-
- fpu_update_guest_xfd(&vcpu->arch.guest_fpu, data);
- break;
- case MSR_IA32_XFD_ERR:
- if (!msr_info->host_initiated &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
- return 1;
-
- if (data & ~kvm_guest_supported_xfd(vcpu))
- return 1;
-
- vcpu->arch.guest_fpu.xfd_err = data;
- break;
-#endif
- case MSR_IA32_U_CET:
- case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
- kvm_set_xstate_msr(vcpu, msr_info);
- break;
- default:
- if (kvm_pmu_is_valid_msr(vcpu, msr))
- return kvm_pmu_set_msr(vcpu, msr_info);
-
- return KVM_MSR_RET_UNSUPPORTED;
- }
- return 0;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_msr_common);
-
-static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
-{
- u64 data;
- u64 mcg_cap = vcpu->arch.mcg_cap;
- unsigned bank_num = mcg_cap & 0xff;
- u32 offset, last_msr;
-
- switch (msr) {
- case MSR_IA32_P5_MC_ADDR:
- case MSR_IA32_P5_MC_TYPE:
- data = 0;
- break;
- case MSR_IA32_MCG_CAP:
- data = vcpu->arch.mcg_cap;
- break;
- case MSR_IA32_MCG_CTL:
- if (!(mcg_cap & MCG_CTL_P) && !host)
- return 1;
- data = vcpu->arch.mcg_ctl;
- break;
- case MSR_IA32_MCG_STATUS:
- data = vcpu->arch.mcg_status;
- break;
- case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
- last_msr = MSR_IA32_MCx_CTL2(bank_num) - 1;
- if (msr > last_msr)
- return 1;
-
- if (!(mcg_cap & MCG_CMCI_P) && !host)
- return 1;
- offset = array_index_nospec(msr - MSR_IA32_MC0_CTL2,
- last_msr + 1 - MSR_IA32_MC0_CTL2);
- data = vcpu->arch.mci_ctl2_banks[offset];
- break;
- case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
- last_msr = MSR_IA32_MCx_CTL(bank_num) - 1;
- if (msr > last_msr)
- return 1;
-
- offset = array_index_nospec(msr - MSR_IA32_MC0_CTL,
- last_msr + 1 - MSR_IA32_MC0_CTL);
- data = vcpu->arch.mce_banks[offset];
- break;
- default:
- return 1;
- }
- *pdata = data;
- return 0;
-}
-
-int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
-{
- switch (msr_info->index) {
- case MSR_IA32_PLATFORM_ID:
- case MSR_IA32_EBL_CR_POWERON:
- case MSR_IA32_LASTBRANCHFROMIP:
- case MSR_IA32_LASTBRANCHTOIP:
- case MSR_IA32_LASTINTFROMIP:
- case MSR_IA32_LASTINTTOIP:
- case MSR_AMD64_SYSCFG:
- case MSR_K8_TSEG_ADDR:
- case MSR_K8_TSEG_MASK:
- case MSR_VM_HSAVE_PA:
- case MSR_K8_INT_PENDING_MSG:
- case MSR_AMD64_NB_CFG:
- case MSR_FAM10H_MMIO_CONF_BASE:
- case MSR_AMD64_BU_CFG2:
- case MSR_IA32_PERF_CTL:
- case MSR_AMD64_DC_CFG:
- case MSR_AMD64_TW_CFG:
- case MSR_F15H_EX_CFG:
- /*
- * Intel Sandy Bridge CPUs must support the RAPL (running average power
- * limit) MSRs. Just return 0, as we do not want to expose the host
- * data here. Do not conditionalize this on CPUID, as KVM does not do
- * so for existing CPU-specific MSRs.
- */
- case MSR_RAPL_POWER_UNIT:
- case MSR_PP0_ENERGY_STATUS: /* Power plane 0 (core) */
- case MSR_PP1_ENERGY_STATUS: /* Power plane 1 (graphics uncore) */
- case MSR_PKG_ENERGY_STATUS: /* Total package */
- case MSR_DRAM_ENERGY_STATUS: /* DRAM controller */
- msr_info->data = 0;
- break;
- case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL3:
- case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
- case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
- case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
- if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
- return kvm_pmu_get_msr(vcpu, msr_info);
- msr_info->data = 0;
- break;
- case MSR_IA32_UCODE_REV:
- msr_info->data = vcpu->arch.microcode_version;
- break;
- case MSR_IA32_ARCH_CAPABILITIES:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
- return KVM_MSR_RET_UNSUPPORTED;
- msr_info->data = vcpu->arch.arch_capabilities;
- break;
- case MSR_IA32_PERF_CAPABILITIES:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
- return KVM_MSR_RET_UNSUPPORTED;
- msr_info->data = vcpu->arch.perf_capabilities;
- break;
- case MSR_IA32_POWER_CTL:
- msr_info->data = vcpu->arch.msr_ia32_power_ctl;
- break;
- case MSR_IA32_TSC: {
- /*
- * Intel SDM states that MSR_IA32_TSC read adds the TSC offset
- * even when not intercepted. AMD manual doesn't explicitly
- * state this but appears to behave the same.
- *
- * On userspace reads and writes, however, we unconditionally
- * return L1's TSC value to ensure backwards-compatible
- * behavior for migration.
- */
- u64 offset, ratio;
-
- if (msr_info->host_initiated) {
- offset = vcpu->arch.l1_tsc_offset;
- ratio = vcpu->arch.l1_tsc_scaling_ratio;
- } else {
- offset = vcpu->arch.tsc_offset;
- ratio = vcpu->arch.tsc_scaling_ratio;
- }
-
- msr_info->data = kvm_scale_tsc(rdtsc(), ratio) + offset;
- break;
- }
- case MSR_IA32_CR_PAT:
- msr_info->data = vcpu->arch.pat;
- break;
- case MSR_MTRRcap:
- case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000:
- case MSR_MTRRdefType:
- return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data);
- case 0xcd: /* fsb frequency */
- msr_info->data = 3;
- break;
- /*
- * MSR_EBC_FREQUENCY_ID
- * Conservative value valid for even the basic CPU models.
- * Models 0,1: 000 in bits 23:21 indicating a bus speed of
- * 100MHz, model 2 000 in bits 18:16 indicating 100MHz,
- * and 266MHz for model 3, or 4. Set Core Clock
- * Frequency to System Bus Frequency Ratio to 1 (bits
- * 31:24) even though these are only valid for CPU
- * models > 2, however guests may end up dividing or
- * multiplying by zero otherwise.
- */
- case MSR_EBC_FREQUENCY_ID:
- msr_info->data = 1 << 24;
- break;
- case MSR_IA32_APICBASE:
- msr_info->data = vcpu->arch.apic_base;
- break;
- case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
- return kvm_x2apic_msr_read(vcpu, msr_info->index, &msr_info->data);
- case MSR_IA32_TSC_DEADLINE:
- msr_info->data = kvm_get_lapic_tscdeadline_msr(vcpu);
- break;
- case MSR_IA32_TSC_ADJUST:
- msr_info->data = (u64)vcpu->arch.ia32_tsc_adjust_msr;
- break;
- case MSR_IA32_MISC_ENABLE:
- msr_info->data = vcpu->arch.ia32_misc_enable_msr;
- break;
- case MSR_IA32_SMBASE:
- if (!IS_ENABLED(CONFIG_KVM_SMM) || !msr_info->host_initiated)
- return 1;
- msr_info->data = vcpu->arch.smbase;
- break;
- case MSR_SMI_COUNT:
- msr_info->data = vcpu->arch.smi_count;
- break;
- case MSR_IA32_PERF_STATUS:
- /* TSC increment by tick */
- msr_info->data = 1000ULL;
- /* CPU multiplier */
- msr_info->data |= (((uint64_t)4ULL) << 40);
- break;
- case MSR_EFER:
- msr_info->data = vcpu->arch.efer;
- break;
- case MSR_KVM_WALL_CLOCK:
- if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->kvm->arch.wall_clock;
- break;
- case MSR_KVM_WALL_CLOCK_NEW:
- if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->kvm->arch.wall_clock;
- break;
- case MSR_KVM_SYSTEM_TIME:
- if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->arch.time;
- break;
- case MSR_KVM_SYSTEM_TIME_NEW:
- if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->arch.time;
- break;
- case MSR_KVM_ASYNC_PF_EN:
- if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->arch.apf.msr_en_val;
- break;
- case MSR_KVM_ASYNC_PF_INT:
- if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->arch.apf.msr_int_val;
- break;
- case MSR_KVM_ASYNC_PF_ACK:
- if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF_INT))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = 0;
- break;
- case MSR_KVM_STEAL_TIME:
- if (!guest_pv_has(vcpu, KVM_FEATURE_STEAL_TIME))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->arch.st.msr_val;
- break;
- case MSR_KVM_PV_EOI_EN:
- if (!guest_pv_has(vcpu, KVM_FEATURE_PV_EOI))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->arch.pv_eoi.msr_val;
- break;
- case MSR_KVM_POLL_CONTROL:
- if (!guest_pv_has(vcpu, KVM_FEATURE_POLL_CONTROL))
- return KVM_MSR_RET_UNSUPPORTED;
-
- msr_info->data = vcpu->arch.msr_kvm_poll_control;
- break;
- case MSR_IA32_P5_MC_ADDR:
- case MSR_IA32_P5_MC_TYPE:
- case MSR_IA32_MCG_CAP:
- case MSR_IA32_MCG_CTL:
- case MSR_IA32_MCG_STATUS:
- case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
- case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
- return get_msr_mce(vcpu, msr_info->index, &msr_info->data,
- msr_info->host_initiated);
- case MSR_IA32_XSS:
- if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
- return 1;
- msr_info->data = vcpu->arch.ia32_xss;
- break;
- case MSR_K7_CLK_CTL:
- /*
- * Provide expected ramp-up count for K7. All other
- * are set to zero, indicating minimum divisors for
- * every field.
- *
- * This prevents guest kernels on AMD host with CPU
- * type 6, model 8 and higher from exploding due to
- * the rdmsr failing.
- */
- msr_info->data = 0x20000000;
- break;
-#ifdef CONFIG_KVM_HYPERV
- case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15:
- case HV_X64_MSR_SYNDBG_CONTROL ... HV_X64_MSR_SYNDBG_PENDING_BUFFER:
- case HV_X64_MSR_SYNDBG_OPTIONS:
- case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4:
- case HV_X64_MSR_CRASH_CTL:
- case HV_X64_MSR_STIMER0_CONFIG ... HV_X64_MSR_STIMER3_COUNT:
- case HV_X64_MSR_REENLIGHTENMENT_CONTROL:
- case HV_X64_MSR_TSC_EMULATION_CONTROL:
- case HV_X64_MSR_TSC_EMULATION_STATUS:
- case HV_X64_MSR_TSC_INVARIANT_CONTROL:
- return kvm_hv_get_msr_common(vcpu,
- msr_info->index, &msr_info->data,
- msr_info->host_initiated);
-#endif
- case MSR_IA32_BBL_CR_CTL3:
- /* This legacy MSR exists but isn't fully documented in current
- * silicon. It is however accessed by winxp in very narrow
- * scenarios where it sets bit #19, itself documented as
- * a "reserved" bit. Best effort attempt to source coherent
- * read data here should the balance of the register be
- * interpreted by the guest:
- *
- * L2 cache control register 3: 64GB range, 256KB size,
- * enabled, latency 0x1, configured
- */
- msr_info->data = 0xbe702111;
- break;
- case MSR_AMD64_OSVW_ID_LENGTH:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
- return 1;
- msr_info->data = vcpu->arch.osvw.length;
- break;
- case MSR_AMD64_OSVW_STATUS:
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
- return 1;
- msr_info->data = vcpu->arch.osvw.status;
- break;
- case MSR_PLATFORM_INFO:
- if (!msr_info->host_initiated &&
- !vcpu->kvm->arch.guest_can_read_msr_platform_info)
- return 1;
- msr_info->data = vcpu->arch.msr_platform_info;
- break;
- case MSR_MISC_FEATURES_ENABLES:
- msr_info->data = vcpu->arch.msr_misc_features_enables;
- break;
- case MSR_K7_HWCR:
- msr_info->data = vcpu->arch.msr_hwcr;
- break;
-#ifdef CONFIG_X86_64
- case MSR_IA32_XFD:
- if (!msr_info->host_initiated &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
- return 1;
-
- msr_info->data = vcpu->arch.guest_fpu.fpstate->xfd;
- break;
- case MSR_IA32_XFD_ERR:
- if (!msr_info->host_initiated &&
- !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
- return 1;
-
- msr_info->data = vcpu->arch.guest_fpu.xfd_err;
- break;
-#endif
- case MSR_IA32_U_CET:
- case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
- kvm_get_xstate_msr(vcpu, msr_info);
- break;
- default:
- if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
- return kvm_pmu_get_msr(vcpu, msr_info);
-
- return KVM_MSR_RET_UNSUPPORTED;
- }
- return 0;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_get_msr_common);
-
-/*
- * Read or write a bunch of msrs. All parameters are kernel addresses.
- *
- * @return number of msrs set successfully.
- */
-static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
- struct kvm_msr_entry *entries,
- int (*do_msr)(struct kvm_vcpu *vcpu,
- unsigned index, u64 *data))
-{
- bool fpu_loaded = false;
- int i;
-
- for (i = 0; i < msrs->nmsrs; ++i) {
- /*
- * If userspace is accessing one or more XSTATE-managed MSRs,
- * temporarily load the guest's FPU state so that the guest's
- * MSR value(s) is resident in hardware and thus can be accessed
- * via RDMSR/WRMSR.
- */
- if (!fpu_loaded && is_xstate_managed_msr(vcpu, entries[i].index)) {
- kvm_load_guest_fpu(vcpu);
- fpu_loaded = true;
- }
- if (do_msr(vcpu, entries[i].index, &entries[i].data))
- break;
- }
- if (fpu_loaded)
- kvm_put_guest_fpu(vcpu);
-
- return i;
-}
-
-/*
- * Read or write a bunch of msrs. Parameters are user addresses.
- *
- * @return number of msrs set successfully.
- */
-static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
- int (*do_msr)(struct kvm_vcpu *vcpu,
- unsigned index, u64 *data),
- int writeback)
-{
- struct kvm_msrs msrs;
- struct kvm_msr_entry *entries;
- unsigned size;
- int r;
-
- r = -EFAULT;
- if (copy_from_user(&msrs, user_msrs, sizeof(msrs)))
- goto out;
-
- r = -E2BIG;
- if (msrs.nmsrs >= MAX_IO_MSRS)
- goto out;
-
- size = sizeof(struct kvm_msr_entry) * msrs.nmsrs;
- entries = memdup_user(user_msrs->entries, size);
- if (IS_ERR(entries)) {
- r = PTR_ERR(entries);
- goto out;
- }
-
- r = __msr_io(vcpu, &msrs, entries, do_msr);
-
- if (writeback && copy_to_user(user_msrs->entries, entries, size))
- r = -EFAULT;
-
- kfree(entries);
-out:
- return r;
-}
-
static inline bool kvm_can_mwait_in_guest(void)
{
return boot_cpu_has(X86_FEATURE_MWAIT) &&
@@ -4586,61 +2402,6 @@ static int kvm_x86_dev_has_attr(struct kvm_device_attr *attr)
return __kvm_x86_dev_get_attr(attr, &val);
}
-static int kvm_get_msr_index_list(struct kvm_msr_list __user *user_msr_list)
-{
- struct kvm_msr_list msr_list;
- unsigned int n;
-
- if (copy_from_user(&msr_list, user_msr_list, sizeof(msr_list)))
- return -EFAULT;
-
- n = msr_list.nmsrs;
- msr_list.nmsrs = num_msrs_to_save + num_emulated_msrs;
- if (copy_to_user(user_msr_list, &msr_list, sizeof(msr_list)))
- return -EFAULT;
-
- if (n < msr_list.nmsrs)
- return -E2BIG;
-
- if (copy_to_user(user_msr_list->indices, &msrs_to_save,
- num_msrs_to_save * sizeof(u32)))
- return -EFAULT;
-
- if (copy_to_user(user_msr_list->indices + num_msrs_to_save,
- &emulated_msrs, num_emulated_msrs * sizeof(u32)))
- return -EFAULT;
-
- return 0;
-}
-
-static int kvm_get_feature_msr_index_list(struct kvm_msr_list __user *user_msr_list)
-{
- struct kvm_msr_list msr_list;
- unsigned int n;
-
- if (copy_from_user(&msr_list, user_msr_list, sizeof(msr_list)))
- return -EFAULT;
-
- n = msr_list.nmsrs;
- msr_list.nmsrs = num_msr_based_features;
- if (copy_to_user(user_msr_list, &msr_list, sizeof(msr_list)))
- return -EFAULT;
-
- if (n < msr_list.nmsrs)
- return -E2BIG;
-
- if (copy_to_user(user_msr_list->indices, &msr_based_features,
- num_msr_based_features * sizeof(u32)))
- return -EFAULT;
-
- return 0;
-}
-
-static int kvm_get_feature_msrs(struct kvm_msrs __user *user_msrs)
-{
- return msr_io(NULL, user_msrs, do_get_feature_msr, 1);
-}
-
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -5588,148 +3349,6 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
}
}
-struct kvm_x86_reg_id {
- __u32 index;
- __u8 type;
- __u8 rsvd1;
- __u8 rsvd2:4;
- __u8 size:4;
- __u8 x86;
-};
-
-static int kvm_translate_kvm_reg(struct kvm_vcpu *vcpu,
- struct kvm_x86_reg_id *reg)
-{
- switch (reg->index) {
- case KVM_REG_GUEST_SSP:
- /*
- * FIXME: If host-initiated accesses are ever exempted from
- * ignore_msrs (in kvm_do_msr_access()), drop this manual check
- * and rely on KVM's standard checks to reject accesses to regs
- * that don't exist.
- */
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
- return -EINVAL;
-
- reg->type = KVM_X86_REG_TYPE_MSR;
- reg->index = MSR_KVM_INTERNAL_GUEST_SSP;
- break;
- default:
- return -EINVAL;
- }
- return 0;
-}
-
-static int kvm_get_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
-{
- u64 val;
-
- if (do_get_msr(vcpu, msr, &val))
- return -EINVAL;
-
- if (put_user(val, user_val))
- return -EFAULT;
-
- return 0;
-}
-
-static int kvm_set_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
-{
- u64 val;
-
- if (get_user(val, user_val))
- return -EFAULT;
-
- if (do_set_msr(vcpu, msr, &val))
- return -EINVAL;
-
- return 0;
-}
-
-static int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
- void __user *argp)
-{
- struct kvm_one_reg one_reg;
- struct kvm_x86_reg_id *reg;
- u64 __user *user_val;
- bool load_fpu;
- int r;
-
- if (copy_from_user(&one_reg, argp, sizeof(one_reg)))
- return -EFAULT;
-
- if ((one_reg.id & KVM_REG_ARCH_MASK) != KVM_REG_X86)
- return -EINVAL;
-
- reg = (struct kvm_x86_reg_id *)&one_reg.id;
- if (reg->rsvd1 || reg->rsvd2)
- return -EINVAL;
-
- if (reg->type == KVM_X86_REG_TYPE_KVM) {
- r = kvm_translate_kvm_reg(vcpu, reg);
- if (r)
- return r;
- }
-
- if (reg->type != KVM_X86_REG_TYPE_MSR)
- return -EINVAL;
-
- if ((one_reg.id & KVM_REG_SIZE_MASK) != KVM_REG_SIZE_U64)
- return -EINVAL;
-
- guard(srcu)(&vcpu->kvm->srcu);
-
- load_fpu = is_xstate_managed_msr(vcpu, reg->index);
- if (load_fpu)
- kvm_load_guest_fpu(vcpu);
-
- user_val = u64_to_user_ptr(one_reg.addr);
- if (ioctl == KVM_GET_ONE_REG)
- r = kvm_get_one_msr(vcpu, reg->index, user_val);
- else
- r = kvm_set_one_msr(vcpu, reg->index, user_val);
-
- if (load_fpu)
- kvm_put_guest_fpu(vcpu);
- return r;
-}
-
-static int kvm_get_reg_list(struct kvm_vcpu *vcpu,
- struct kvm_reg_list __user *user_list)
-{
- u64 nr_regs = guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ? 1 : 0;
- u64 user_nr_regs;
-
- if (get_user(user_nr_regs, &user_list->n))
- return -EFAULT;
-
- if (put_user(nr_regs, &user_list->n))
- return -EFAULT;
-
- if (user_nr_regs < nr_regs)
- return -E2BIG;
-
- if (nr_regs &&
- put_user(KVM_X86_REG_KVM(KVM_REG_GUEST_SSP), &user_list->reg[0]))
- return -EFAULT;
-
- return 0;
-}
-
-static int kvm_get_msrs(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs)
-{
- guard(srcu)(&vcpu->kvm->srcu);
-
- return msr_io(vcpu, user_msrs, do_get_msr, 1);
-}
-
-static int kvm_set_msrs(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs)
-{
- guard(srcu)(&vcpu->kvm->srcu);
-
- return msr_io(vcpu, user_msrs, do_set_msr, 0);
-}
-
long kvm_arch_vcpu_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -6532,113 +4151,6 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
return r;
}
-static struct kvm_x86_msr_filter *kvm_alloc_msr_filter(bool default_allow)
-{
- struct kvm_x86_msr_filter *msr_filter;
-
- msr_filter = kzalloc_obj(*msr_filter, GFP_KERNEL_ACCOUNT);
- if (!msr_filter)
- return NULL;
-
- msr_filter->default_allow = default_allow;
- return msr_filter;
-}
-
-static void kvm_free_msr_filter(struct kvm_x86_msr_filter *msr_filter)
-{
- u32 i;
-
- if (!msr_filter)
- return;
-
- for (i = 0; i < msr_filter->count; i++)
- kfree(msr_filter->ranges[i].bitmap);
-
- kfree(msr_filter);
-}
-
-static int kvm_add_msr_filter(struct kvm_x86_msr_filter *msr_filter,
- struct kvm_msr_filter_range *user_range)
-{
- unsigned long *bitmap;
- size_t bitmap_size;
-
- if (!user_range->nmsrs)
- return 0;
-
- if (user_range->flags & ~KVM_MSR_FILTER_RANGE_VALID_MASK)
- return -EINVAL;
-
- if (!user_range->flags)
- return -EINVAL;
-
- bitmap_size = BITS_TO_LONGS(user_range->nmsrs) * sizeof(long);
- if (!bitmap_size || bitmap_size > KVM_MSR_FILTER_MAX_BITMAP_SIZE)
- return -EINVAL;
-
- bitmap = memdup_user((__user u8*)user_range->bitmap, bitmap_size);
- if (IS_ERR(bitmap))
- return PTR_ERR(bitmap);
-
- msr_filter->ranges[msr_filter->count] = (struct msr_bitmap_range) {
- .flags = user_range->flags,
- .base = user_range->base,
- .nmsrs = user_range->nmsrs,
- .bitmap = bitmap,
- };
-
- msr_filter->count++;
- return 0;
-}
-
-static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm,
- struct kvm_msr_filter *filter)
-{
- struct kvm_x86_msr_filter *new_filter, *old_filter;
- bool default_allow;
- bool empty = true;
- int r;
- u32 i;
-
- if (filter->flags & ~KVM_MSR_FILTER_VALID_MASK)
- return -EINVAL;
-
- for (i = 0; i < ARRAY_SIZE(filter->ranges); i++)
- empty &= !filter->ranges[i].nmsrs;
-
- default_allow = !(filter->flags & KVM_MSR_FILTER_DEFAULT_DENY);
- if (empty && !default_allow)
- return -EINVAL;
-
- new_filter = kvm_alloc_msr_filter(default_allow);
- if (!new_filter)
- return -ENOMEM;
-
- for (i = 0; i < ARRAY_SIZE(filter->ranges); i++) {
- r = kvm_add_msr_filter(new_filter, &filter->ranges[i]);
- if (r) {
- kvm_free_msr_filter(new_filter);
- return r;
- }
- }
-
- mutex_lock(&kvm->lock);
- old_filter = rcu_replace_pointer(kvm->arch.msr_filter, new_filter,
- mutex_is_locked(&kvm->lock));
- mutex_unlock(&kvm->lock);
- synchronize_srcu(&kvm->srcu);
-
- kvm_free_msr_filter(old_filter);
-
- /*
- * Recalc MSR intercepts as userspace may want to intercept accesses to
- * MSRs that KVM would otherwise pass through to the guest.
- */
- kvm_make_all_cpus_request(kvm, KVM_REQ_RECALC_INTERCEPTS);
-
- return 0;
-}
-
#ifdef CONFIG_KVM_COMPAT
/* for KVM_X86_SET_MSR_FILTER */
struct kvm_msr_filter_range_compat {
@@ -7159,157 +4671,6 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
return r;
}
-static void kvm_probe_feature_msr(u32 msr_index)
-{
- u64 data;
-
- if (kvm_get_feature_msr(NULL, msr_index, &data, true))
- return;
-
- msr_based_features[num_msr_based_features++] = msr_index;
-}
-
-static void kvm_probe_msr_to_save(u32 msr_index)
-{
- u32 dummy[2];
-
- if (rdmsr_safe(msr_index, &dummy[0], &dummy[1]))
- return;
-
- /*
- * Even MSRs that are valid in the host may not be exposed to guests in
- * some cases.
- */
- switch (msr_index) {
- case MSR_IA32_BNDCFGS:
- if (!kvm_mpx_supported())
- return;
- break;
- case MSR_TSC_AUX:
- if (!kvm_cpu_cap_has(X86_FEATURE_RDTSCP) &&
- !kvm_cpu_cap_has(X86_FEATURE_RDPID))
- return;
- break;
- case MSR_IA32_UMWAIT_CONTROL:
- if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG))
- return;
- break;
- case MSR_IA32_RTIT_CTL:
- case MSR_IA32_RTIT_STATUS:
- if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT))
- return;
- break;
- case MSR_IA32_RTIT_CR3_MATCH:
- if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT) ||
- !intel_pt_validate_hw_cap(PT_CAP_cr3_filtering))
- return;
- break;
- case MSR_IA32_RTIT_OUTPUT_BASE:
- case MSR_IA32_RTIT_OUTPUT_MASK:
- if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT) ||
- (!intel_pt_validate_hw_cap(PT_CAP_topa_output) &&
- !intel_pt_validate_hw_cap(PT_CAP_single_range_output)))
- return;
- break;
- case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
- if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT) ||
- (msr_index - MSR_IA32_RTIT_ADDR0_A >=
- intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2))
- return;
- break;
- case MSR_ARCH_PERFMON_PERFCTR0 ...
- MSR_ARCH_PERFMON_PERFCTR0 + KVM_MAX_NR_GP_COUNTERS - 1:
- if (msr_index - MSR_ARCH_PERFMON_PERFCTR0 >=
- kvm_pmu_cap.num_counters_gp)
- return;
- break;
- case MSR_ARCH_PERFMON_EVENTSEL0 ...
- MSR_ARCH_PERFMON_EVENTSEL0 + KVM_MAX_NR_GP_COUNTERS - 1:
- if (msr_index - MSR_ARCH_PERFMON_EVENTSEL0 >=
- kvm_pmu_cap.num_counters_gp)
- return;
- break;
- case MSR_ARCH_PERFMON_FIXED_CTR0 ...
- MSR_ARCH_PERFMON_FIXED_CTR0 + KVM_MAX_NR_FIXED_COUNTERS - 1:
- if (msr_index - MSR_ARCH_PERFMON_FIXED_CTR0 >=
- kvm_pmu_cap.num_counters_fixed)
- return;
- break;
- case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
- case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
- case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
- case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET:
- if (!kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2))
- return;
- break;
- case MSR_IA32_XFD:
- case MSR_IA32_XFD_ERR:
- if (!kvm_cpu_cap_has(X86_FEATURE_XFD))
- return;
- break;
- case MSR_IA32_TSX_CTRL:
- if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
- return;
- break;
- case MSR_IA32_XSS:
- if (!kvm_caps.supported_xss)
- return;
- break;
- case MSR_IA32_U_CET:
- case MSR_IA32_S_CET:
- if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
- !kvm_cpu_cap_has(X86_FEATURE_IBT))
- return;
- break;
- case MSR_IA32_INT_SSP_TAB:
- if (!kvm_cpu_cap_has(X86_FEATURE_LM))
- return;
- fallthrough;
- case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
- if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
- return;
- break;
- default:
- break;
- }
-
- msrs_to_save[num_msrs_to_save++] = msr_index;
-}
-
-static void kvm_init_msr_lists(void)
-{
- unsigned i;
-
- BUILD_BUG_ON_MSG(KVM_MAX_NR_FIXED_COUNTERS != 3,
- "Please update the fixed PMCs in msrs_to_save_pmu[]");
-
- num_msrs_to_save = 0;
- num_emulated_msrs = 0;
- num_msr_based_features = 0;
-
- for (i = 0; i < ARRAY_SIZE(msrs_to_save_base); i++)
- kvm_probe_msr_to_save(msrs_to_save_base[i]);
-
- if (enable_pmu) {
- for (i = 0; i < ARRAY_SIZE(msrs_to_save_pmu); i++)
- kvm_probe_msr_to_save(msrs_to_save_pmu[i]);
- }
-
- for (i = 0; i < ARRAY_SIZE(emulated_msrs_all); i++) {
- if (!kvm_x86_call(has_emulated_msr)(NULL,
- emulated_msrs_all[i]))
- continue;
-
- emulated_msrs[num_emulated_msrs++] = emulated_msrs_all[i];
- }
-
- for (i = KVM_FIRST_EMULATED_VMX_MSR; i <= KVM_LAST_EMULATED_VMX_MSR; i++)
- kvm_probe_feature_msr(i);
-
- for (i = 0; i < ARRAY_SIZE(msr_based_features_all_except_vmx); i++)
- kvm_probe_feature_msr(msr_based_features_all_except_vmx[i]);
-}
-
static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len,
void *__v)
{
@@ -8247,61 +5608,22 @@ static int emulator_get_msr_with_filter(struct x86_emulate_ctxt *ctxt,
u32 msr_index, u64 *pdata)
{
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
- int r;
- r = kvm_emulate_msr_read(vcpu, msr_index, pdata);
- if (r < 0)
- return X86EMUL_UNHANDLEABLE;
-
- if (r) {
- if (kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_RDMSR, 0,
- complete_emulated_rdmsr, r))
- return X86EMUL_IO_NEEDED;
-
- trace_kvm_msr_read_ex(msr_index);
- return X86EMUL_PROPAGATE_FAULT;
- }
-
- trace_kvm_msr_read(msr_index, *pdata);
- return X86EMUL_CONTINUE;
+ return kvm_emulator_get_msr_with_filter(vcpu, msr_index, pdata);
}
static int emulator_set_msr_with_filter(struct x86_emulate_ctxt *ctxt,
u32 msr_index, u64 data)
{
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
- int r;
- r = kvm_emulate_msr_write(vcpu, msr_index, data);
- if (r < 0)
- return X86EMUL_UNHANDLEABLE;
-
- if (r) {
- if (kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_WRMSR, data,
- complete_emulated_msr_access, r))
- return X86EMUL_IO_NEEDED;
-
- trace_kvm_msr_write_ex(msr_index, data);
- return X86EMUL_PROPAGATE_FAULT;
- }
-
- trace_kvm_msr_write(msr_index, data);
- return X86EMUL_CONTINUE;
+ return kvm_emulator_set_msr_with_filter(vcpu, msr_index, data);
}
static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,
u32 msr_index, u64 *pdata)
{
- /*
- * Treat emulator accesses to the current shadow stack pointer as host-
- * initiated, as they aren't true MSR accesses (SSP is a "just a reg"),
- * and this API is used only for implicit accesses, i.e. not RDMSR, and
- * so the index is fully KVM-controlled.
- */
- if (unlikely(msr_index == MSR_KVM_INTERNAL_GUEST_SSP))
- return kvm_msr_read(emul_to_vcpu(ctxt), msr_index, pdata);
-
- return __kvm_emulate_msr_read(emul_to_vcpu(ctxt), msr_index, pdata);
+ return kvm_emulator_get_msr(emul_to_vcpu(ctxt), msr_index, pdata);
}
static int emulator_check_rdpmc_early(struct x86_emulate_ctxt *ctxt, u32 pmc)
@@ -13248,32 +10570,6 @@ void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
#endif
#endif
-int kvm_spec_ctrl_test_value(u64 value)
-{
- /*
- * test that setting IA32_SPEC_CTRL to given value
- * is allowed by the host processor
- */
-
- u64 saved_value;
- unsigned long flags;
- int ret = 0;
-
- local_irq_save(flags);
-
- if (rdmsrq_safe(MSR_IA32_SPEC_CTRL, &saved_value))
- ret = 1;
- else if (wrmsrq_safe(MSR_IA32_SPEC_CTRL, value))
- ret = 1;
- else
- wrmsrq(MSR_IA32_SPEC_CTRL, saved_value);
-
- local_irq_restore(flags);
-
- return ret;
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_spec_ctrl_test_value);
-
void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code)
{
struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 31e67b060148..fd3d0a196526 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -6,6 +6,7 @@
#include <asm/fpu/xstate.h>
#include <asm/mce.h>
#include <asm/pvclock.h>
+#include "msrs.h"
#include "mmu.h"
#include "regs.h"
#include "kvm_emulate.h"
@@ -45,14 +46,6 @@ do { \
failed; \
})
-/*
- * The first...last VMX feature MSRs that are emulated by KVM. This may or may
- * not cover all known VMX MSRs, as KVM doesn't emulate an MSR until there's an
- * associated feature that KVM supports for nested virtualization.
- */
-#define KVM_FIRST_EMULATED_VMX_MSR MSR_IA32_VMX_BASIC
-#define KVM_LAST_EMULATED_VMX_MSR MSR_IA32_VMX_VMFUNC
-
#define KVM_DEFAULT_PLE_GAP 128
#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
#define KVM_DEFAULT_PLE_WINDOW_GROW 2
@@ -61,16 +54,6 @@ do { \
#define KVM_SVM_DEFAULT_PLE_WINDOW_MAX USHRT_MAX
#define KVM_SVM_DEFAULT_PLE_WINDOW 3000
-/*
- * KVM's internal, non-ABI indices for synthetic MSRs. The values themselves
- * are arbitrary and have no meaning, the only requirement is that they don't
- * conflict with "real" MSRs that KVM supports. Use values at the upper end
- * of KVM's reserved paravirtual MSR range to minimize churn, i.e. these values
- * will be usable until KVM exhausts its supply of paravirtual MSR indices.
- */
-
-#define MSR_KVM_INTERNAL_GUEST_SSP 0x4b564dff
-
static inline unsigned int __grow_ple_window(unsigned int val,
unsigned int base, unsigned int modifier, unsigned int max)
{
@@ -101,9 +84,6 @@ static inline unsigned int __shrink_ple_window(unsigned int val,
return max(val, min);
}
-#define MSR_IA32_CR_PAT_DEFAULT \
- PAT_VALUE(WB, WT, UC_MINUS, UC, WB, WT, UC_MINUS, UC)
-
void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu);
int kvm_check_nested_events(struct kvm_vcpu *vcpu);
@@ -378,15 +358,12 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu,
struct kvm_queued_exception *ex);
void kvm_handle_exception_payload_quirk(struct kvm_vcpu *vcpu);
-int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data);
-int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code);
int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type,
void *insn, int insn_len);
int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
int emulation_type, void *insn, int insn_len);
-fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu);
-fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
+
fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu);
fastpath_t handle_fastpath_invd(struct kvm_vcpu *vcpu);
@@ -432,20 +409,6 @@ extern bool enable_vmware_backdoor;
extern int pi_inject_timer;
-extern bool report_ignored_msrs;
-
-static inline void kvm_pr_unimpl_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
-{
- if (report_ignored_msrs)
- vcpu_unimpl(vcpu, "Unhandled WRMSR(0x%x) = 0x%llx\n", msr, data);
-}
-
-static inline void kvm_pr_unimpl_rdmsr(struct kvm_vcpu *vcpu, u32 msr)
-{
- if (report_ignored_msrs)
- vcpu_unimpl(vcpu, "Unhandled RDMSR(0x%x)\n", msr);
-}
-
static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
{
return pvclock_scale_delta(nsec, vcpu->arch.virtual_tsc_mult,
@@ -563,33 +526,10 @@ static inline void kvm_machine_check(void)
#endif
}
-int kvm_spec_ctrl_test_value(u64 value);
int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
struct x86_exception *e);
void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid);
int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva);
-bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
-
-enum kvm_msr_access {
- MSR_TYPE_R = BIT(0),
- MSR_TYPE_W = BIT(1),
- MSR_TYPE_RW = MSR_TYPE_R | MSR_TYPE_W,
-};
-
-/*
- * Internal error codes that are used to indicate that MSR emulation encountered
- * an error that should result in #GP in the guest, unless userspace handles it.
- * Note, '1', '0', and negative numbers are off limits, as they are used by KVM
- * as part of KVM's lightly documented internal KVM_RUN return codes.
- *
- * UNSUPPORTED - The MSR isn't supported, either because it is completely
- * unknown to KVM, or because the MSR should not exist according
- * to the vCPU model.
- *
- * FILTERED - Access to the MSR is denied by a userspace MSR filter.
- */
-#define KVM_MSR_RET_UNSUPPORTED 2
-#define KVM_MSR_RET_FILTERED 3
int kvm_sev_es_mmio(struct kvm_vcpu *vcpu, bool is_write, gpa_t gpa,
unsigned int bytes, void *data);
@@ -649,27 +589,4 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
-#define CET_US_RESERVED_BITS GENMASK(9, 6)
-#define CET_US_SHSTK_MASK_BITS GENMASK(1, 0)
-#define CET_US_IBT_MASK_BITS (GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
-#define CET_US_LEGACY_BITMAP_BASE(data) ((data) >> 12)
-
-static inline bool kvm_is_valid_u_s_cet(struct kvm_vcpu *vcpu, u64 data)
-{
- if (data & CET_US_RESERVED_BITS)
- return false;
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
- (data & CET_US_SHSTK_MASK_BITS))
- return false;
- if (!guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) &&
- (data & CET_US_IBT_MASK_BITS))
- return false;
- if (!IS_ALIGNED(CET_US_LEGACY_BITMAP_BASE(data), 4))
- return false;
- /* IBT can be suppressed iff the TRACKER isn't WAIT_ENDBR. */
- if ((data & CET_SUPPRESS) && (data & CET_WAIT_ENDBR))
- return false;
-
- return true;
-}
#endif
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* Re: [PATCH v4 14/30] KVM: x86: Move the bulk of MSR specific code from x86.c to msrs.{c,h}
2026-06-13 0:03 ` [PATCH v4 14/30] KVM: x86: Move the bulk of MSR specific code from x86.c to msrs.{c,h} Sean Christopherson
@ 2026-06-15 9:30 ` Binbin Wu
0 siblings, 0 replies; 64+ messages in thread
From: Binbin Wu @ 2026-06-15 9:30 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Yosry Ahmed,
Kai Huang
On 6/13/2026 8:03 AM, Sean Christopherson wrote:
> Introduce msrs.{c,h}, and move the vast majority of MSR specific code out
> of x86.{c,h}. Use a plural "msrs" instead of just "msr" to be consistent
> with regs.{c,h}, and to make it easier to differentiate KVM's code from the
> other 5+ msr.c files in the kernel.
>
> Opportunistically drop the "x86.h" include from mtrr.c, mostly as proof
> that the bulk of the MSR code is indeed being relocated to msrs.c.
>
> No functional change intended.
>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
One nit below.
[...]> diff --git a/arch/x86/kvm/msrs.h b/arch/x86/kvm/msrs.h
> new file mode 100644
> index 000000000000..25b75fe39b80
> --- /dev/null
> +++ b/arch/x86/kvm/msrs.h
> @@ -0,0 +1,128 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef ARCH_X86_KVM_MSR_H
> +#define ARCH_X86_KVM_MSR_H
To align with the file name,
ARCH_X86_KVM_MSR_H -> ARCH_X86_KVM_MSRS_H ?
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v4 15/30] KVM: x86: Move register helper declarations from kvm_host.h => regs.h
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (13 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 14/30] KVM: x86: Move the bulk of MSR specific code from x86.c to msrs.{c,h} Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-13 0:03 ` [PATCH v4 16/30] KVM: x86: Move kvm_{g,s}et_segment() to inline helpers in regs.h Sean Christopherson
` (14 subsequent siblings)
29 siblings, 0 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Relocate declarations of Control/Debug Register, EFLAGS and RIP helpers
from x86's kvm_host.h to regs.h, to continue trimming down kvm_host.h.
No functional change intended.
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 19 -------------------
arch/x86/kvm/regs.h | 18 ++++++++++++++++++
2 files changed, 18 insertions(+), 19 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 153ed5611c2e..cc96a9acaf88 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2186,8 +2186,6 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
-
/*
* EMULTYPE_NO_DECODE - Set when re-emulating an instruction (after completing
* userspace I/O) to indicate that the emulation context
@@ -2312,24 +2310,12 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
int reason, bool has_error_code, u32 error_code);
-void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0);
-void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4);
-int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
-int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
-int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
-int kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8);
-int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val);
-unsigned long kvm_get_dr(struct kvm_vcpu *vcpu, int dr);
-unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
-void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
-unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu);
-void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu);
void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
@@ -2366,8 +2352,6 @@ static inline int __kvm_irq_line_state(unsigned long *irq_state,
void kvm_inject_nmi(struct kvm_vcpu *vcpu);
int kvm_get_nr_pending_nmis(struct kvm_vcpu *vcpu);
-void kvm_update_dr7(struct kvm_vcpu *vcpu);
-
bool __kvm_mmu_unprotect_gfn_and_retry(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
bool always_retry);
@@ -2521,9 +2505,6 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier);
u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier);
-unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
-bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
-
void kvm_make_scan_ioapic_request(struct kvm *kvm);
void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
unsigned long *vcpu_bitmap);
diff --git a/arch/x86/kvm/regs.h b/arch/x86/kvm/regs.h
index 1fd12388250b..30ef08d60a74 100644
--- a/arch/x86/kvm/regs.h
+++ b/arch/x86/kvm/regs.h
@@ -16,6 +16,18 @@
static_assert(!(KVM_POSSIBLE_CR0_GUEST_BITS & X86_CR0_PDPTR_BITS));
+void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0);
+void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4);
+int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
+int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
+int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
+int kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8);
+int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val);
+unsigned long kvm_get_dr(struct kvm_vcpu *vcpu, int dr);
+unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
+void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
+int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
+
static inline bool is_long_mode(struct kvm_vcpu *vcpu)
{
#ifdef CONFIG_X86_64
@@ -425,7 +437,12 @@ static inline unsigned long kvm_get_segment_base(struct kvm_vcpu *vcpu, int seg)
return kvm_x86_call(get_segment_base)(vcpu, seg);
}
+unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
+bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
+
+unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu);
void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
+void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
void kvm_x86_vcpu_ioctl_get_sregs2(struct kvm_vcpu *vcpu,
struct kvm_sregs2 *sregs2);
@@ -436,6 +453,7 @@ void kvm_run_sync_regs_to_user(struct kvm_vcpu *vcpu);
int kvm_run_sync_regs_from_user(struct kvm_vcpu *vcpu);
void kvm_update_dr0123(struct kvm_vcpu *vcpu);
+void kvm_update_dr7(struct kvm_vcpu *vcpu);
int kvm_vcpu_ioctl_x86_get_debugregs(struct kvm_vcpu *vcpu,
struct kvm_debugregs *dbgregs);
int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* [PATCH v4 16/30] KVM: x86: Move kvm_{g,s}et_segment() to inline helpers in regs.h
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (14 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 15/30] KVM: x86: Move register helper declarations from kvm_host.h => regs.h Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-13 0:03 ` [PATCH v4 17/30] KVM: x86: Move MSR helper declarations from kvm_host.h => msrs.h Sean Christopherson
` (13 subsequent siblings)
29 siblings, 0 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Define kvm_{g,s}et_segment() as inline functions in regs.h, as they are
literally one-line wrappers to invoke vendor code.
No functional change intended.
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 2 --
arch/x86/kvm/regs.h | 12 ++++++++++++
arch/x86/kvm/x86.c | 12 ------------
3 files changed, 12 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cc96a9acaf88..3c3e160b0991 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2303,8 +2303,6 @@ int kvm_emulate_halt_noskip(struct kvm_vcpu *vcpu);
int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu);
int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
-void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
-void kvm_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
diff --git a/arch/x86/kvm/regs.h b/arch/x86/kvm/regs.h
index 30ef08d60a74..7a823422d78e 100644
--- a/arch/x86/kvm/regs.h
+++ b/arch/x86/kvm/regs.h
@@ -437,6 +437,18 @@ static inline unsigned long kvm_get_segment_base(struct kvm_vcpu *vcpu, int seg)
return kvm_x86_call(get_segment_base)(vcpu, seg);
}
+static inline void kvm_set_segment(struct kvm_vcpu *vcpu,
+ struct kvm_segment *var, int seg)
+{
+ kvm_x86_call(set_segment)(vcpu, var, seg);
+}
+
+static inline void kvm_get_segment(struct kvm_vcpu *vcpu,
+ struct kvm_segment *var, int seg)
+{
+ kvm_x86_call(get_segment)(vcpu, var, seg);
+}
+
unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index caf01afe13f0..c6cc0961a4e7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4720,18 +4720,6 @@ static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v)
return handled;
}
-void kvm_set_segment(struct kvm_vcpu *vcpu,
- struct kvm_segment *var, int seg)
-{
- kvm_x86_call(set_segment)(vcpu, var, seg);
-}
-
-void kvm_get_segment(struct kvm_vcpu *vcpu,
- struct kvm_segment *var, int seg)
-{
- kvm_x86_call(get_segment)(vcpu, var, seg);
-}
-
gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
struct x86_exception *exception)
{
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* [PATCH v4 17/30] KVM: x86: Move MSR helper declarations from kvm_host.h => msrs.h
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (15 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 16/30] KVM: x86: Move kvm_{g,s}et_segment() to inline helpers in regs.h Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-13 0:03 ` [PATCH v4 18/30] KVM: x86: Move "struct kvm_x86_msr_filter" definition to msrs.c Sean Christopherson
` (12 subsequent siblings)
29 siblings, 0 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Relocate declarations of MSR helpers (and kvm_nr_uret_msrs) from x86's
kvm_host.h to msrs, to continue trimming down kvm_host.h.
Deliberately leave the funky read_msr() where it is, as it will hopefully
be removed entirely as part of a broader kernel-API cleanup.
No functional change intended.
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 26 -------------------------
arch/x86/kvm/msrs.h | 34 ++++++++++++++++++++++++++++++---
2 files changed, 31 insertions(+), 29 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3c3e160b0991..09e437c947a9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2094,7 +2094,6 @@ struct kvm_arch_async_pf {
u64 error_code;
};
-extern u32 __read_mostly kvm_nr_uret_msrs;
extern bool __read_mostly allow_smaller_maxphyaddr;
extern bool __read_mostly enable_apicv;
extern bool __read_mostly enable_ipiv;
@@ -2278,18 +2277,6 @@ void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu);
void kvm_prepare_event_vectoring_exit(struct kvm_vcpu *vcpu, gpa_t gpa);
void kvm_prepare_unexpected_reason_exit(struct kvm_vcpu *vcpu, u64 exit_reason);
-void kvm_enable_efer_bits(u64);
-bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
-int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
-int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);
-int __kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
-int __kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);
-int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
-int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);
-int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu);
-int kvm_emulate_rdmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
-int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu);
-int kvm_emulate_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
int kvm_emulate_as_nop(struct kvm_vcpu *vcpu);
int kvm_emulate_invd(struct kvm_vcpu *vcpu);
int kvm_emulate_mwait(struct kvm_vcpu *vcpu);
@@ -2311,9 +2298,6 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
-int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
-int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
-
int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu);
void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
@@ -2488,16 +2472,6 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
unsigned long ipi_bitmap_high, u32 min,
unsigned long icr, int op_64_bit);
-int kvm_add_user_return_msr(u32 msr);
-int kvm_find_user_return_msr(u32 msr);
-int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);
-u64 kvm_get_user_return_msr(unsigned int slot);
-
-static inline bool kvm_is_supported_user_return_msr(u32 msr)
-{
- return kvm_find_user_return_msr(msr) >= 0;
-}
-
u64 kvm_scale_tsc(u64 tsc, u64 ratio);
u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier);
diff --git a/arch/x86/kvm/msrs.h b/arch/x86/kvm/msrs.h
index 25b75fe39b80..b698983e37fb 100644
--- a/arch/x86/kvm/msrs.h
+++ b/arch/x86/kvm/msrs.h
@@ -11,6 +11,8 @@
extern bool report_ignored_msrs;
extern bool ignore_msrs;
+extern u32 __read_mostly kvm_nr_uret_msrs;
+
static inline void kvm_pr_unimpl_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
{
if (report_ignored_msrs)
@@ -56,13 +58,39 @@ int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
int kvm_get_reg_list(struct kvm_vcpu *vcpu,
struct kvm_reg_list __user *user_list);
+void kvm_enable_efer_bits(u64);
+bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
+int kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
+int kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);
+int __kvm_emulate_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
+int __kvm_emulate_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);
+int kvm_msr_read(struct kvm_vcpu *vcpu, u32 index, u64 *data);
+int kvm_msr_write(struct kvm_vcpu *vcpu, u32 index, u64 data);
+int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu);
+int kvm_emulate_rdmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
+int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu);
+int kvm_emulate_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
+
+fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu);
+fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
+
+int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
+int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
+
+int kvm_add_user_return_msr(u32 msr);
+int kvm_find_user_return_msr(u32 msr);
+int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);
+u64 kvm_get_user_return_msr(unsigned int slot);
+
+static inline bool kvm_is_supported_user_return_msr(u32 msr)
+{
+ return kvm_find_user_return_msr(msr) >= 0;
+}
+
void kvm_user_return_msr_cpu_online(void);
void drop_user_return_notifiers(void);
void kvm_destroy_user_return_msrs(void);
-fastpath_t handle_fastpath_wrmsr(struct kvm_vcpu *vcpu);
-fastpath_t handle_fastpath_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
-
int kvm_emulator_get_msr_with_filter(struct kvm_vcpu *vcpu, u32 msr_index,
u64 *pdata);
int kvm_emulator_set_msr_with_filter(struct kvm_vcpu *vcpu, u32 msr_index,
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* [PATCH v4 18/30] KVM: x86: Move "struct kvm_x86_msr_filter" definition to msrs.c
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (16 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 17/30] KVM: x86: Move MSR helper declarations from kvm_host.h => msrs.h Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-15 2:47 ` Huang, Kai
2026-06-13 0:03 ` [PATCH v4 19/30] KVM: x86/pmu: Move "struct kvm_x86_pmu_event_filter" definition to pmu.c Sean Christopherson
` (11 subsequent siblings)
29 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Move the definition of "struct kvm_x86_msr_filter" and its associate,
"struct msr_bitmap_range", to msrs.c, as the details of the filters are
very much implementation details that can and should be buried in msrs.c.
While the _existence_ of filters is public knowledge, almost by definition,
the contents don't need to be exposed outside of the MSR code as the filter
data is provided by userspace, i.e. it pretty much has to be dynamically
allocated, and thus never should be fully embedded in a globally visible
structure.
Note, this creates a discrepancy with the PMU event filter structure; that
will be remedied shortly.
No functional change intended.
Suggested-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 15 ++-------------
arch/x86/kvm/msrs.c | 13 +++++++++++++
2 files changed, 15 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09e437c947a9..4ff6304c02d0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -315,6 +315,8 @@ enum x86_intercept_stage;
struct kvm_kernel_irqfd;
struct kvm_kernel_irq_routing_entry;
+struct kvm_x86_msr_filter;
+
struct kvm_caps {
/* control of guest tsc rate supported? */
bool has_tsc_control;
@@ -1291,13 +1293,6 @@ struct kvm_hv {
};
#endif
-struct msr_bitmap_range {
- u32 flags;
- u32 nmsrs;
- u32 base;
- unsigned long *bitmap;
-};
-
#ifdef CONFIG_KVM_XEN
/* Xen emulation context */
struct kvm_xen {
@@ -1328,12 +1323,6 @@ enum kvm_suppress_eoi_broadcast_mode {
KVM_SUPPRESS_EOI_BROADCAST_DISABLED /* Disable Suppress EOI broadcast */
};
-struct kvm_x86_msr_filter {
- u8 count;
- bool default_allow:1;
- struct msr_bitmap_range ranges[16];
-};
-
struct kvm_x86_pmu_event_filter {
__u32 action;
__u32 nevents;
diff --git a/arch/x86/kvm/msrs.c b/arch/x86/kvm/msrs.c
index 07f2a22d2607..c230b18d87e3 100644
--- a/arch/x86/kvm/msrs.c
+++ b/arch/x86/kvm/msrs.c
@@ -32,6 +32,19 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
#define MAX_IO_MSRS 256
+struct msr_bitmap_range {
+ u32 flags;
+ u32 nmsrs;
+ u32 base;
+ unsigned long *bitmap;
+};
+
+struct kvm_x86_msr_filter {
+ u8 count;
+ bool default_allow:1;
+ struct msr_bitmap_range ranges[16];
+};
+
/*
* Restoring the host value for MSRs that are only consumed when running in
* usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* Re: [PATCH v4 18/30] KVM: x86: Move "struct kvm_x86_msr_filter" definition to msrs.c
2026-06-13 0:03 ` [PATCH v4 18/30] KVM: x86: Move "struct kvm_x86_msr_filter" definition to msrs.c Sean Christopherson
@ 2026-06-15 2:47 ` Huang, Kai
0 siblings, 0 replies; 64+ messages in thread
From: Huang, Kai @ 2026-06-15 2:47 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
yosry@kernel.org
On Fri, 2026-06-12 at 17:03 -0700, Sean Christopherson wrote:
> Move the definition of "struct kvm_x86_msr_filter" and its associate,
> "struct msr_bitmap_range", to msrs.c, as the details of the filters are
> very much implementation details that can and should be buried in msrs.c.
> While the _existence_ of filters is public knowledge, almost by definition,
> the contents don't need to be exposed outside of the MSR code as the filter
> data is provided by userspace, i.e. it pretty much has to be dynamically
> allocated, and thus never should be fully embedded in a globally visible
> structure.
>
> Note, this creates a discrepancy with the PMU event filter structure; that
> will be remedied shortly.
>
> No functional change intended.
>
> Suggested-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v4 19/30] KVM: x86/pmu: Move "struct kvm_x86_pmu_event_filter" definition to pmu.c
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (17 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 18/30] KVM: x86: Move "struct kvm_x86_msr_filter" definition to msrs.c Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-15 2:48 ` Huang, Kai
2026-06-13 0:03 ` [PATCH v4 20/30] KVM: x86: Move MMU helper declarations from kvm_host.h => mmu.h Sean Christopherson
` (10 subsequent siblings)
29 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Move the definition of "struct kvm_x86_pmu_event_filter" to pmu.c, as the
as the details of the filters are very much implementation details that can
and should be buried in pmu.c. While the _existence_ of filters is public
knowledge, almost by definition, the contents don't need to be exposed
outside of the PMU code as the filter data is provided by userspace, i.e.
it pretty much has to be dynamically allocated, and thus never should be
fully embedded in a globally visible structure.
No functional change intended
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 13 +------------
arch/x86/kvm/pmu.c | 12 ++++++++++++
2 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4ff6304c02d0..211958914cc3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -316,6 +316,7 @@ struct kvm_kernel_irqfd;
struct kvm_kernel_irq_routing_entry;
struct kvm_x86_msr_filter;
+struct kvm_x86_pmu_event_filter;
struct kvm_caps {
/* control of guest tsc rate supported? */
@@ -1323,18 +1324,6 @@ enum kvm_suppress_eoi_broadcast_mode {
KVM_SUPPRESS_EOI_BROADCAST_DISABLED /* Disable Suppress EOI broadcast */
};
-struct kvm_x86_pmu_event_filter {
- __u32 action;
- __u32 nevents;
- __u32 fixed_counter_bitmap;
- __u32 flags;
- __u32 nr_includes;
- __u32 nr_excludes;
- __u64 *includes;
- __u64 *excludes;
- __u64 events[] __counted_by(nevents);
-};
-
enum kvm_apicv_inhibit {
/********************************************************************/
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index b92dd2e58335..62d0ed99ebe9 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -43,6 +43,18 @@ module_param(enable_pmu, bool, 0444);
bool __read_mostly enable_mediated_pmu;
EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_mediated_pmu);
+struct kvm_x86_pmu_event_filter {
+ __u32 action;
+ __u32 nevents;
+ __u32 fixed_counter_bitmap;
+ __u32 flags;
+ __u32 nr_includes;
+ __u32 nr_excludes;
+ __u64 *includes;
+ __u64 *excludes;
+ __u64 events[] __counted_by(nevents);
+};
+
struct kvm_pmu_emulated_event_selectors {
u64 INSTRUCTIONS_RETIRED;
u64 BRANCH_INSTRUCTIONS_RETIRED;
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* Re: [PATCH v4 19/30] KVM: x86/pmu: Move "struct kvm_x86_pmu_event_filter" definition to pmu.c
2026-06-13 0:03 ` [PATCH v4 19/30] KVM: x86/pmu: Move "struct kvm_x86_pmu_event_filter" definition to pmu.c Sean Christopherson
@ 2026-06-15 2:48 ` Huang, Kai
0 siblings, 0 replies; 64+ messages in thread
From: Huang, Kai @ 2026-06-15 2:48 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
yosry@kernel.org
On Fri, 2026-06-12 at 17:03 -0700, Sean Christopherson wrote:
> Move the definition of "struct kvm_x86_pmu_event_filter" to pmu.c, as the
> as the details of the filters are very much implementation details that can
> and should be buried in pmu.c. While the _existence_ of filters is public
> knowledge, almost by definition, the contents don't need to be exposed
> outside of the PMU code as the filter data is provided by userspace, i.e.
> it pretty much has to be dynamically allocated, and thus never should be
> fully embedded in a globally visible structure.
>
> No functional change intended
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v4 20/30] KVM: x86: Move MMU helper declarations from kvm_host.h => mmu.h
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (18 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 19/30] KVM: x86/pmu: Move "struct kvm_x86_pmu_event_filter" definition to pmu.c Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-13 0:03 ` [PATCH v4 21/30] KVM: x86: Move LLDT assembly wrappers into VMX Sean Christopherson
` (9 subsequent siblings)
29 siblings, 0 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Move a pile of MMU helper declarations and macros into mmu.h, as they are
very much KVM x86 internal APIs and details, and not intended to be exposed
to arch-neutral KVM, and certainly not to the broader kernel.
No functional change intended.
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 70 --------------------------------
arch/x86/kvm/mmu.h | 71 +++++++++++++++++++++++++++++++++
2 files changed, 71 insertions(+), 70 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 211958914cc3..1dd231ecf4b2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -161,12 +161,6 @@
#define KVM_HPAGE_MASK(x) (~(KVM_HPAGE_SIZE(x) - 1))
#define KVM_PAGES_PER_HPAGE(x) (KVM_HPAGE_SIZE(x) / PAGE_SIZE)
-#define KVM_MEMSLOT_PAGES_TO_MMU_PAGES_RATIO 50
-#define KVM_MIN_ALLOC_MMU_PAGES 64UL
-#define KVM_MMU_HASH_SHIFT 12
-#define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT)
-#define KVM_MIN_FREE_MMU_PAGES 5
-#define KVM_REFILL_PAGES 25
#define KVM_MAX_CPUID_ENTRIES 256
#define KVM_NR_VAR_MTRR 8
@@ -2131,38 +2125,6 @@ enum kvm_intr_type {
((vcpu) && (vcpu)->arch.handling_intr_from_guest && \
(!!in_nmi() == ((vcpu)->arch.handling_intr_from_guest == KVM_HANDLING_NMI)))
-void __init kvm_mmu_x86_module_init(void);
-int kvm_mmu_vendor_module_init(void);
-void kvm_mmu_vendor_module_exit(void);
-
-void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
-int kvm_mmu_create(struct kvm_vcpu *vcpu);
-int kvm_mmu_init_vm(struct kvm *kvm);
-void kvm_mmu_uninit_vm(struct kvm *kvm);
-
-void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
- struct kvm_memory_slot *slot);
-
-void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu);
-void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
-void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
- const struct kvm_memory_slot *memslot,
- int start_level);
-void kvm_mmu_slot_try_split_huge_pages(struct kvm *kvm,
- const struct kvm_memory_slot *memslot,
- int target_level);
-void kvm_mmu_try_split_huge_pages(struct kvm *kvm,
- const struct kvm_memory_slot *memslot,
- u64 start, u64 end,
- int target_level);
-void kvm_mmu_recover_huge_pages(struct kvm *kvm,
- const struct kvm_memory_slot *memslot);
-void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
- const struct kvm_memory_slot *memslot);
-void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
-void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
-void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-
/*
* EMULTYPE_NO_DECODE - Set when re-emulating an instruction (after completing
* userspace I/O) to indicate that the emulation context
@@ -2312,25 +2274,6 @@ static inline int __kvm_irq_line_state(unsigned long *irq_state,
void kvm_inject_nmi(struct kvm_vcpu *vcpu);
int kvm_get_nr_pending_nmis(struct kvm_vcpu *vcpu);
-bool __kvm_mmu_unprotect_gfn_and_retry(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
- bool always_retry);
-
-static inline bool kvm_mmu_unprotect_gfn_and_retry(struct kvm_vcpu *vcpu,
- gpa_t cr2_or_gpa)
-{
- return __kvm_mmu_unprotect_gfn_and_retry(vcpu, cr2_or_gpa, false);
-}
-
-void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
- ulong roots_to_free);
-void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu);
-gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
- struct x86_exception *exception);
-gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
- struct x86_exception *exception);
-gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
- struct x86_exception *exception);
-
bool kvm_apicv_activated(struct kvm *kvm);
bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu);
void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
@@ -2363,19 +2306,6 @@ static inline void kvm_dec_apicv_irq_window_req(struct kvm *kvm)
kvm_inc_or_dec_irq_window_inhibit(kvm, false);
}
-int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
- void *insn, int insn_len);
-void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg);
-void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
-void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
- u64 addr, unsigned long roots);
-void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
-void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
-
-void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
- int tdp_max_root_level, int tdp_huge_page_level);
-
-
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
#endif
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index d30676935fff..a6b871253bd7 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -15,6 +15,13 @@ extern bool tdp_mmu_enabled;
extern bool __read_mostly enable_mmio_caching;
extern bool eager_page_split;
+#define KVM_MEMSLOT_PAGES_TO_MMU_PAGES_RATIO 50
+#define KVM_MIN_ALLOC_MMU_PAGES 64UL
+#define KVM_MMU_HASH_SHIFT 12
+#define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT)
+#define KVM_MIN_FREE_MMU_PAGES 5
+#define KVM_REFILL_PAGES 25
+
#define PT_WRITABLE_SHIFT 1
#define PT_USER_SHIFT 2
@@ -96,6 +103,38 @@ static inline bool mmu_has_mbec(struct kvm_mmu *mmu)
u8 kvm_mmu_get_max_tdp_level(void);
+void __init kvm_mmu_x86_module_init(void);
+int kvm_mmu_vendor_module_init(void);
+void kvm_mmu_vendor_module_exit(void);
+
+void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
+int kvm_mmu_create(struct kvm_vcpu *vcpu);
+int kvm_mmu_init_vm(struct kvm *kvm);
+void kvm_mmu_uninit_vm(struct kvm *kvm);
+
+void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
+ struct kvm_memory_slot *slot);
+
+void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu);
+void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
+void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
+ const struct kvm_memory_slot *memslot,
+ int start_level);
+void kvm_mmu_slot_try_split_huge_pages(struct kvm *kvm,
+ const struct kvm_memory_slot *memslot,
+ int target_level);
+void kvm_mmu_try_split_huge_pages(struct kvm *kvm,
+ const struct kvm_memory_slot *memslot,
+ u64 start, u64 end,
+ int target_level);
+void kvm_mmu_recover_huge_pages(struct kvm *kvm,
+ const struct kvm_memory_slot *memslot);
+void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
+ const struct kvm_memory_slot *memslot);
+void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
+void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+
void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value);
void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
@@ -107,6 +146,19 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr4,
void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
int huge_page_level, bool accessed_dirty,
bool mbec, gpa_t new_eptp);
+
+int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
+ void *insn, int insn_len);
+void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg);
+void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
+void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+ u64 addr, unsigned long roots);
+void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
+void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);
+
+void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
+ int tdp_max_root_level, int tdp_huge_page_level);
+
bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
u64 fault_address, char *insn, int insn_len);
@@ -121,6 +173,25 @@ void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu);
void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
int bytes);
+bool __kvm_mmu_unprotect_gfn_and_retry(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
+ bool always_retry);
+
+static inline bool kvm_mmu_unprotect_gfn_and_retry(struct kvm_vcpu *vcpu,
+ gpa_t cr2_or_gpa)
+{
+ return __kvm_mmu_unprotect_gfn_and_retry(vcpu, cr2_or_gpa, false);
+}
+
+void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
+ ulong roots_to_free);
+void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu);
+gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
+ struct x86_exception *exception);
+gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
+ struct x86_exception *exception);
+gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
+ struct x86_exception *exception);
+
static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
{
if (kvm_check_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu))
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* [PATCH v4 21/30] KVM: x86: Move LLDT assembly wrappers into VMX
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (19 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 20/30] KVM: x86: Move MMU helper declarations from kvm_host.h => mmu.h Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-13 0:03 ` [PATCH v4 22/30] KVM: x86: Move misc "VALID MASK" defines from kvm_host.h => x86.c Sean Christopherson
` (8 subsequent siblings)
29 siblings, 0 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Move kvm_{load,read}_ldt() into vmx.c, as vmx_{load,store}_ldt(), as they
are exclusively used by VMX to save/restore host state, and have no
business being globally visible.
Ideally, KVM-specific helpers wouldn't exist at all, as they are nothing
more than assembly wrappers for SLDT and LLDT, i.e. should be provided by
the kernel, not by KVM. Punt that cleanup to the future, as
arch/x86/include/asm/desc.h _does_ provide helpers, but load_ldt() is only
available for CONFIG_PARAVIRT_XXL=n builds, and both {load,store}_ldt()
unnecessarily constrain the operands to memory.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 12 ------------
arch/x86/kvm/vmx/vmx.c | 18 +++++++++++++++---
2 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1dd231ecf4b2..d1fdb2c3fdfa 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2312,18 +2312,6 @@ static inline void kvm_dec_apicv_irq_window_req(struct kvm *kvm)
#define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state)
-static inline u16 kvm_read_ldt(void)
-{
- u16 ldt;
- asm("sldt %0" : "=g"(ldt));
- return ldt;
-}
-
-static inline void kvm_load_ldt(u16 sel)
-{
- asm("lldt %0" : : "rm"(sel));
-}
-
#ifdef CONFIG_X86_64
static inline unsigned long read_msr(unsigned long msr)
{
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c548f22375ad..2c120ff47790 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1186,6 +1186,18 @@ static void vmx_remove_autostore_msr(struct vcpu_vmx *vmx, u32 msr)
vmx_remove_auto_msr(&vmx->msr_autostore, msr, VM_EXIT_MSR_STORE_COUNT);
}
+static u16 vmx_store_ldt(void)
+{
+ u16 ldt;
+ asm("sldt %0" : "=g"(ldt));
+ return ldt;
+}
+
+static void vmx_load_ldt(u16 sel)
+{
+ asm("lldt %0" : : "rm"(sel));
+}
+
#ifdef CONFIG_X86_32
/*
* On 32-bit kernels, VM exits still load the FS and GS bases from the
@@ -1203,7 +1215,7 @@ static unsigned long segment_base(u16 selector)
table = get_current_gdt_ro();
if ((selector & SEGMENT_TI_MASK) == SEGMENT_LDT) {
- u16 ldt_selector = kvm_read_ldt();
+ u16 ldt_selector = vmx_store_ldt();
if (!(ldt_selector & ~SEGMENT_RPL_MASK))
return 0;
@@ -1358,7 +1370,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
* Set host fs and gs selectors. Unfortunately, 22.2.3 does not
* allow segment selectors with cpl > 0 or ti == 1.
*/
- host_state->ldt_sel = kvm_read_ldt();
+ host_state->ldt_sel = vmx_store_ldt();
#ifdef CONFIG_X86_64
savesegment(ds, host_state->ds_sel);
@@ -1405,7 +1417,7 @@ static void vmx_prepare_switch_to_host(struct vcpu_vmx *vmx)
rdmsrq(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base);
#endif
if (host_state->ldt_sel || (host_state->gs_sel & 7)) {
- kvm_load_ldt(host_state->ldt_sel);
+ vmx_load_ldt(host_state->ldt_sel);
#ifdef CONFIG_X86_64
load_gs_index(host_state->gs_sel);
#else
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* [PATCH v4 22/30] KVM: x86: Move misc "VALID MASK" defines from kvm_host.h => x86.c
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (20 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 21/30] KVM: x86: Move LLDT assembly wrappers into VMX Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-13 0:03 ` [PATCH v4 23/30] KVM: x86: Move __kvm_irq_line_state() from kvm_host.h => ioapic.h Sean Christopherson
` (7 subsequent siblings)
29 siblings, 0 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Move a variety of "VALID MASK" defines, e.g. that capture which flags in
a given ioctl are supported by KVM, from kvm_host.h to x86.c. The set of
valid flags/bits is very much a KVM-internal detail, as the values from the
hardcoded #defines are often captured and massaged by KVM's setup code, i.e.
*directly* using the macros outside of KVM x86 would be actively dangerous.
No functional change intended.
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 34 ---------------------------------
arch/x86/kvm/x86.c | 33 ++++++++++++++++++++++++++++++++
2 files changed, 33 insertions(+), 34 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d1fdb2c3fdfa..72f8f8933ab7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -78,12 +78,6 @@
#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
KVM_DIRTY_LOG_INITIALLY_SET)
-#define KVM_BUS_LOCK_DETECTION_VALID_MODE (KVM_BUS_LOCK_DETECTION_OFF | \
- KVM_BUS_LOCK_DETECTION_EXIT)
-
-#define KVM_X86_NOTIFY_VMEXIT_VALID_BITS (KVM_X86_NOTIFY_VMEXIT_ENABLED | \
- KVM_X86_NOTIFY_VMEXIT_USER)
-
/* x86-specific vcpu->requests bit members */
#define KVM_REQ_MIGRATE_TIMER KVM_ARCH_REQ(0)
#define KVM_REQ_REPORT_TPR_ACCESS KVM_ARCH_REQ(1)
@@ -2417,34 +2411,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
-#define KVM_CLOCK_VALID_FLAGS \
- (KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)
-
-#define KVM_X86_VALID_QUIRKS \
- (KVM_X86_QUIRK_LINT0_REENABLED | \
- KVM_X86_QUIRK_CD_NW_CLEARED | \
- KVM_X86_QUIRK_LAPIC_MMIO_HOLE | \
- KVM_X86_QUIRK_OUT_7E_INC_RIP | \
- KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT | \
- KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \
- KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \
- KVM_X86_QUIRK_SLOT_ZAP_ALL | \
- KVM_X86_QUIRK_STUFF_FEATURE_MSRS | \
- KVM_X86_QUIRK_IGNORE_GUEST_PAT | \
- KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM | \
- KVM_X86_QUIRK_NESTED_SVM_SHARED_PAT)
-
-#define KVM_X86_CONDITIONAL_QUIRKS \
- (KVM_X86_QUIRK_CD_NW_CLEARED | \
- KVM_X86_QUIRK_IGNORE_GUEST_PAT)
-
-/*
- * KVM previously used a u32 field in kvm_run to indicate the hypercall was
- * initiated from long mode. KVM now sets bit 0 to indicate long mode, but the
- * remaining 31 lower bits must be 0 to preserve ABI.
- */
-#define KVM_EXIT_HYPERCALL_MBZ GENMASK_ULL(31, 1)
-
static inline bool kvm_arch_has_irq_bypass(void)
{
return enable_device_posted_irqs;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c6cc0961a4e7..e0456133133b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -105,6 +105,12 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_host);
#define emul_to_vcpu(ctxt) \
((struct kvm_vcpu *)(ctxt)->vcpu)
+/*
+ * KVM previously used a u32 field in kvm_run to indicate the hypercall was
+ * initiated from long mode. KVM now sets bit 0 to indicate long mode, but the
+ * remaining 31 lower bits must be 0 to preserve ABI.
+ */
+#define KVM_EXIT_HYPERCALL_MBZ GENMASK_ULL(31, 1)
#define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)
#define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
@@ -114,6 +120,33 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_host);
KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST | \
KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
+#define KVM_CLOCK_VALID_FLAGS \
+ (KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)
+
+#define KVM_X86_VALID_QUIRKS \
+ (KVM_X86_QUIRK_LINT0_REENABLED | \
+ KVM_X86_QUIRK_CD_NW_CLEARED | \
+ KVM_X86_QUIRK_LAPIC_MMIO_HOLE | \
+ KVM_X86_QUIRK_OUT_7E_INC_RIP | \
+ KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT | \
+ KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \
+ KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \
+ KVM_X86_QUIRK_SLOT_ZAP_ALL | \
+ KVM_X86_QUIRK_STUFF_FEATURE_MSRS | \
+ KVM_X86_QUIRK_IGNORE_GUEST_PAT | \
+ KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM | \
+ KVM_X86_QUIRK_NESTED_SVM_SHARED_PAT)
+
+#define KVM_X86_CONDITIONAL_QUIRKS \
+ (KVM_X86_QUIRK_CD_NW_CLEARED | \
+ KVM_X86_QUIRK_IGNORE_GUEST_PAT)
+
+#define KVM_BUS_LOCK_DETECTION_VALID_MODE (KVM_BUS_LOCK_DETECTION_OFF | \
+ KVM_BUS_LOCK_DETECTION_EXIT)
+
+#define KVM_X86_NOTIFY_VMEXIT_VALID_BITS (KVM_X86_NOTIFY_VMEXIT_ENABLED | \
+ KVM_X86_NOTIFY_VMEXIT_USER)
+
static void process_nmi(struct kvm_vcpu *vcpu);
static void store_regs(struct kvm_vcpu *vcpu);
static int sync_regs(struct kvm_vcpu *vcpu);
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* [PATCH v4 23/30] KVM: x86: Move __kvm_irq_line_state() from kvm_host.h => ioapic.h
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (21 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 22/30] KVM: x86: Move misc "VALID MASK" defines from kvm_host.h => x86.c Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-13 0:03 ` [PATCH v4 24/30] KVM: x86: Move IRQ-related helper declarations from kvm_host.h => irq.h Sean Christopherson
` (6 subsequent siblings)
29 siblings, 0 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Bury __kvm_irq_line_state() in CONFIG_KVM_IOAPIC=y code, as it's only used
by PIC and I/O APIC code.
No functional change intended.
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 12 ------------
arch/x86/kvm/ioapic.h | 12 ++++++++++++
2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 72f8f8933ab7..2b96ab0f46f9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2253,18 +2253,6 @@ static inline void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr);
-static inline int __kvm_irq_line_state(unsigned long *irq_state,
- int irq_source_id, int level)
-{
- /* Logical OR for level trig interrupt */
- if (level)
- __set_bit(irq_source_id, irq_state);
- else
- __clear_bit(irq_source_id, irq_state);
-
- return !!(*irq_state);
-}
-
void kvm_inject_nmi(struct kvm_vcpu *vcpu);
int kvm_get_nr_pending_nmis(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index 3dadae093690..81b576513116 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -113,6 +113,18 @@ void kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
void kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
ulong *ioapic_handled_vectors);
+
+static inline int __kvm_irq_line_state(unsigned long *irq_state,
+ int irq_source_id, int level)
+{
+ /* Logical OR for level trig interrupt */
+ if (level)
+ __set_bit(irq_source_id, irq_state);
+ else
+ __clear_bit(irq_source_id, irq_state);
+
+ return !!(*irq_state);
+}
#endif /* CONFIG_KVM_IOAPIC */
static inline int ioapic_in_kernel(struct kvm *kvm)
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* [PATCH v4 24/30] KVM: x86: Move IRQ-related helper declarations from kvm_host.h => irq.h
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (22 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 23/30] KVM: x86: Move __kvm_irq_line_state() from kvm_host.h => ioapic.h Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-15 11:55 ` Huang, Kai
2026-06-13 0:03 ` [PATCH v4 25/30] KVM: x86: Move kvm_pv_send_ipi() declaration from kvm_host.h => lapic.h Sean Christopherson
` (5 subsequent siblings)
29 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Move the function declaration for APIs to get/query pending IRQs from
kvm_host.h to irq.h, as the APIs are only used by KVM x86 code.
No functional change intended.
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 5 -----
arch/x86/kvm/irq.h | 6 ++++++
arch/x86/kvm/svm/nested.c | 1 +
arch/x86/kvm/vmx/nested.c | 1 +
4 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2b96ab0f46f9..811687b7a76a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2338,12 +2338,7 @@ enum {
# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
#endif
-int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
-int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
-int kvm_cpu_has_extint(struct kvm_vcpu *v);
int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
-int kvm_cpu_get_extint(struct kvm_vcpu *v);
-int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 34f4a78a7a01..1a84ea31e7fd 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -112,6 +112,12 @@ static inline int irqchip_in_kernel(struct kvm *kvm)
return mode != KVM_IRQCHIP_NONE;
}
+int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
+int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
+int kvm_cpu_has_extint(struct kvm_vcpu *v);
+int kvm_cpu_get_extint(struct kvm_vcpu *v);
+int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
+
void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 1ab8b95975a4..ef10298cb320 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -23,6 +23,7 @@
#include "kvm_emulate.h"
#include "trace.h"
+#include "irq.h"
#include "mmu.h"
#include "x86.h"
#include "smm.h"
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b2c851cc7d5c..adbb8358ade7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -11,6 +11,7 @@
#include "x86.h"
#include "cpuid.h"
#include "hyperv.h"
+#include "irq.h"
#include "mmu.h"
#include "nested.h"
#include "pmu.h"
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* Re: [PATCH v4 24/30] KVM: x86: Move IRQ-related helper declarations from kvm_host.h => irq.h
2026-06-13 0:03 ` [PATCH v4 24/30] KVM: x86: Move IRQ-related helper declarations from kvm_host.h => irq.h Sean Christopherson
@ 2026-06-15 11:55 ` Huang, Kai
0 siblings, 0 replies; 64+ messages in thread
From: Huang, Kai @ 2026-06-15 11:55 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
yosry@kernel.org
On Fri, 2026-06-12 at 17:03 -0700, Sean Christopherson wrote:
> Move the function declaration for APIs to get/query pending IRQs from
> kvm_host.h to irq.h, as the APIs are only used by KVM x86 code.
>
> No functional change intended.
>
> Reviewed-by: Yosry Ahmed <yosry@kernel.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
>
Reviewed-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v4 25/30] KVM: x86: Move kvm_pv_send_ipi() declaration from kvm_host.h => lapic.h
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (23 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 24/30] KVM: x86: Move IRQ-related helper declarations from kvm_host.h => irq.h Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-13 0:03 ` [PATCH v4 26/30] KVM: x86: Don't treat interrupts as allowed just because a nested run is pending Sean Christopherson
` (4 subsequent siblings)
29 siblings, 0 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Move the declaration of kvm_pv_send_ipi() into lapic.h, as its
implementation is provided by lapic.c (sending PV IPIs relies on the
optimized APIC map provided by the in-kernel local APIC), and it's only
used by KVM x86 code.
No functional change intended.
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 4 ----
arch/x86/kvm/lapic.h | 3 +++
2 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 811687b7a76a..0f6f71642e1d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2341,10 +2341,6 @@ enum {
int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
-int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
- unsigned long ipi_bitmap_high, u32 min,
- unsigned long icr, int op_64_bit);
-
u64 kvm_scale_tsc(u64 tsc, u64 ratio);
u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 32f09b25884a..58dbb94f980d 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -131,6 +131,9 @@ static inline int kvm_irq_delivery_to_apic(struct kvm *kvm,
}
void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high);
+int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
+ unsigned long ipi_bitmap_high, u32 min,
+ unsigned long icr, int op_64_bit);
int kvm_apic_set_base(struct kvm_vcpu *vcpu, u64 value, bool host_initiated);
int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* [PATCH v4 26/30] KVM: x86: Don't treat interrupts as allowed just because a nested run is pending
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (24 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 25/30] KVM: x86: Move kvm_pv_send_ipi() declaration from kvm_host.h => lapic.h Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-15 16:40 ` Yosry Ahmed
2026-06-13 0:03 ` [PATCH v4 27/30] KVM: x86: Rework kvm_arch_interrupt_allowed() into kvm_is_interrupt_allowed() Sean Christopherson
` (3 subsequent siblings)
29 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
When querying whether or not interrupts (IRQs) are allowed, check for a
pending nested run _after_ checking whether or not interrupts are blocked.
If L1 is running L2 _without_ nested_exit_on_intr(), i.e. if L1 IRQs can
be blocked while running L2, and interrupts will indeed be blocked once the
nested VM-Enter to L2 is completed, then KVM should treat interrupts as not
being allowed.
For injection, this avoids an unnecessary (forced) VM-Exit, as KVM can
immediately request an IRQ window, instead of forcing an exit and _then_
requesting an IRQ window (because after the forced exit, KVM will see that
interrupts are blocked).
For non-injection usage, only kvm_vcpu_ready_for_interrupt_injection() is
affected in practice. kvm_vcpu_has_events() is unreachable when a nested
run is pending, as KVM clears nested_run_pending prior to calling
kvm_emulate_halt_noskip() when putting L2 into HLT via GUEST_ACTIVITY_HLT,
and SVM has no equivalent to GUEST_ACTIVITY_STATE. I.e. the vCPU will
always be runnable if a nested run is pending, and thus
kvm_arch_vcpu_runnable() => kvm_vcpu_has_events() is effectively dead code,
as is __kvm_emulate_halt() => kvm_vcpu_has_events(). Oh, and TDX doesn't
support nested VMX. Similarly, kvm_can_do_async_pf() is unreachable as
KVM shouldn't be faulting in memory with a pending nested VM-Enter.
As for kvm_vcpu_ready_for_interrupt_injection(), incorrectly treating
interrupts as being allowed could result in KVM prematurely exiting to
userspace to accept an ExtINT. But, KVM will still hold/block the ExtINT
and request its own IRQ window. I.e. the net effect is more or less the
same as the for-injection case.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/svm.c | 6 +++---
arch/x86/kvm/vmx/vmx.c | 5 ++++-
2 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 526e0fdcd16b..883d5f703391 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4055,12 +4055,12 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
{
struct vcpu_svm *svm = to_svm(vcpu);
- if (vcpu->arch.nested_run_pending)
- return -EBUSY;
-
if (svm_interrupt_blocked(vcpu))
return 0;
+ if (vcpu->arch.nested_run_pending)
+ return -EBUSY;
+
/*
* An IRQ must not be injected into L2 if it's supposed to VM-Exit,
* e.g. if the IRQ arrived asynchronously after checking nested events.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2c120ff47790..9b277e36a521 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5254,6 +5254,9 @@ bool vmx_interrupt_blocked(struct kvm_vcpu *vcpu)
int vmx_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
{
+ if (vmx_interrupt_blocked(vcpu))
+ return 0;
+
if (vcpu->arch.nested_run_pending)
return -EBUSY;
@@ -5264,7 +5267,7 @@ int vmx_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
if (for_injection && is_guest_mode(vcpu) && nested_exit_on_intr(vcpu))
return -EBUSY;
- return !vmx_interrupt_blocked(vcpu);
+ return 1;
}
int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* Re: [PATCH v4 26/30] KVM: x86: Don't treat interrupts as allowed just because a nested run is pending
2026-06-13 0:03 ` [PATCH v4 26/30] KVM: x86: Don't treat interrupts as allowed just because a nested run is pending Sean Christopherson
@ 2026-06-15 16:40 ` Yosry Ahmed
2026-06-15 16:43 ` Yosry Ahmed
2026-06-15 17:26 ` Sean Christopherson
0 siblings, 2 replies; 64+ messages in thread
From: Yosry Ahmed @ 2026-06-15 16:40 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Kai Huang
On Fri, Jun 12, 2026 at 5:04 PM Sean Christopherson <seanjc@google.com> wrote:
>
> When querying whether or not interrupts (IRQs) are allowed, check for a
> pending nested run _after_ checking whether or not interrupts are blocked.
> If L1 is running L2 _without_ nested_exit_on_intr(), i.e. if L1 IRQs can
> be blocked while running L2, and interrupts will indeed be blocked once the
> nested VM-Enter to L2 is completed, then KVM should treat interrupts as not
> being allowed.
>
> For injection, this avoids an unnecessary (forced) VM-Exit, as KVM can
> immediately request an IRQ window, instead of forcing an exit and _then_
> requesting an IRQ window (because after the forced exit, KVM will see that
> interrupts are blocked).
>
> For non-injection usage, only kvm_vcpu_ready_for_interrupt_injection() is
> affected in practice. kvm_vcpu_has_events() is unreachable when a nested
> run is pending, as KVM clears nested_run_pending prior to calling
> kvm_emulate_halt_noskip() when putting L2 into HLT via GUEST_ACTIVITY_HLT,
> and SVM has no equivalent to GUEST_ACTIVITY_STATE. I.e. the vCPU will
> always be runnable if a nested run is pending, and thus
> kvm_arch_vcpu_runnable() => kvm_vcpu_has_events() is effectively dead code,
> as is __kvm_emulate_halt() => kvm_vcpu_has_events(). Oh, and TDX doesn't
> support nested VMX. Similarly, kvm_can_do_async_pf() is unreachable as
> KVM shouldn't be faulting in memory with a pending nested VM-Enter.
>
> As for kvm_vcpu_ready_for_interrupt_injection(), incorrectly treating
> interrupts as being allowed could result in KVM prematurely exiting to
> userspace to accept an ExtINT.
"incorrectly treating interrupts as being allowed" is the status quo,
that this patch fixes, not sth this patch introduces -- right?
The changelog reads like for the non-injection case this change might
not be the right thing to do, but I don't think this is the case? I
assume returning false from
kvm_vcpu_ready_for_interrupt_injection() and kvm_vcpu_has_events() if
L1's interrupts are blocked while L2 is running is the right thing to
do?
The code makes sense to me but I am trying to make sense of the changelog.
Aside from that, I have two comments about existing issues (Sashiko-style):
1. The return values of {vmx/svm}_interrupt_allowed() are annoying.
IIUC, 0 is not allowed, 1 is allowed, and -EBUSY is generally allowed
but not right now, request immediate exit and try again? That should
be documented somewhere (maybe I just missed it?).
2. Aside from TDX, seems like {vmx/svm}_interrupt_allowed() are doing
the same thing? So maybe move all that logic into
kvm_arch_interrupt_allowed(), rename it (because it's only used by
x86), and make interrupt_blocked() the actual per-vendor callback? I
assume this is the case because nested_run_pending used to be
per-vendor, but now we can clean this up. For TDX, I assume the
per-vendor hook can just be tdx_interrupt_allowed() reversed?
> But, KVM will still hold/block the ExtINT
> and request its own IRQ window. I.e. the net effect is more or less the
> same as the for-injection case.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/svm/svm.c | 6 +++---
> arch/x86/kvm/vmx/vmx.c | 5 ++++-
> 2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 526e0fdcd16b..883d5f703391 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4055,12 +4055,12 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
>
> - if (vcpu->arch.nested_run_pending)
> - return -EBUSY;
> -
> if (svm_interrupt_blocked(vcpu))
> return 0;
>
> + if (vcpu->arch.nested_run_pending)
> + return -EBUSY;
> +
> /*
> * An IRQ must not be injected into L2 if it's supposed to VM-Exit,
> * e.g. if the IRQ arrived asynchronously after checking nested events.
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 2c120ff47790..9b277e36a521 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5254,6 +5254,9 @@ bool vmx_interrupt_blocked(struct kvm_vcpu *vcpu)
>
> int vmx_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> {
> + if (vmx_interrupt_blocked(vcpu))
> + return 0;
> +
> if (vcpu->arch.nested_run_pending)
> return -EBUSY;
>
> @@ -5264,7 +5267,7 @@ int vmx_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> if (for_injection && is_guest_mode(vcpu) && nested_exit_on_intr(vcpu))
> return -EBUSY;
>
> - return !vmx_interrupt_blocked(vcpu);
> + return 1;
> }
>
> int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
> --
> 2.54.0.1136.gdb2ca164c4-goog
>
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH v4 26/30] KVM: x86: Don't treat interrupts as allowed just because a nested run is pending
2026-06-15 16:40 ` Yosry Ahmed
@ 2026-06-15 16:43 ` Yosry Ahmed
2026-06-15 17:03 ` Sean Christopherson
2026-06-15 17:26 ` Sean Christopherson
1 sibling, 1 reply; 64+ messages in thread
From: Yosry Ahmed @ 2026-06-15 16:43 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Kai Huang
On Mon, Jun 15, 2026 at 9:40 AM Yosry Ahmed <yosry@kernel.org> wrote:
>
> On Fri, Jun 12, 2026 at 5:04 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > When querying whether or not interrupts (IRQs) are allowed, check for a
> > pending nested run _after_ checking whether or not interrupts are blocked.
> > If L1 is running L2 _without_ nested_exit_on_intr(), i.e. if L1 IRQs can
> > be blocked while running L2, and interrupts will indeed be blocked once the
> > nested VM-Enter to L2 is completed, then KVM should treat interrupts as not
> > being allowed.
> >
> > For injection, this avoids an unnecessary (forced) VM-Exit, as KVM can
> > immediately request an IRQ window, instead of forcing an exit and _then_
> > requesting an IRQ window (because after the forced exit, KVM will see that
> > interrupts are blocked).
> >
> > For non-injection usage, only kvm_vcpu_ready_for_interrupt_injection() is
> > affected in practice. kvm_vcpu_has_events() is unreachable when a nested
> > run is pending, as KVM clears nested_run_pending prior to calling
> > kvm_emulate_halt_noskip() when putting L2 into HLT via GUEST_ACTIVITY_HLT,
> > and SVM has no equivalent to GUEST_ACTIVITY_STATE. I.e. the vCPU will
> > always be runnable if a nested run is pending, and thus
> > kvm_arch_vcpu_runnable() => kvm_vcpu_has_events() is effectively dead code,
> > as is __kvm_emulate_halt() => kvm_vcpu_has_events(). Oh, and TDX doesn't
> > support nested VMX. Similarly, kvm_can_do_async_pf() is unreachable as
> > KVM shouldn't be faulting in memory with a pending nested VM-Enter.
> >
> > As for kvm_vcpu_ready_for_interrupt_injection(), incorrectly treating
> > interrupts as being allowed could result in KVM prematurely exiting to
> > userspace to accept an ExtINT.
>
> "incorrectly treating interrupts as being allowed" is the status quo,
> that this patch fixes, not sth this patch introduces -- right?
>
> The changelog reads like for the non-injection case this change might
> not be the right thing to do, but I don't think this is the case? I
> assume returning false from
> kvm_vcpu_ready_for_interrupt_injection() and kvm_vcpu_has_events() if
> L1's interrupts are blocked while L2 is running is the right thing to
> do?
>
> The code makes sense to me but I am trying to make sense of the changelog.
>
> Aside from that, I have two comments about existing issues (Sashiko-style):
>
> 1. The return values of {vmx/svm}_interrupt_allowed() are annoying.
> IIUC, 0 is not allowed, 1 is allowed, and -EBUSY is generally allowed
> but not right now, request immediate exit and try again? That should
> be documented somewhere (maybe I just missed it?).
The next patch is adding the documentation, I spoke too soon :)
>
> 2. Aside from TDX, seems like {vmx/svm}_interrupt_allowed() are doing
> the same thing? So maybe move all that logic into
> kvm_arch_interrupt_allowed(), rename it (because it's only used by
> x86), and make interrupt_blocked() the actual per-vendor callback? I
> assume this is the case because nested_run_pending used to be
> per-vendor, but now we can clean this up. For TDX, I assume the
> per-vendor hook can just be tdx_interrupt_allowed() reversed?
It also does the renaming (nice!), so we're halfway there. I think for
this series, we should at least convert all direct calls to the vendor
.interrupt_allowed through the new kvm_is_interrupt_allowed() helper.
Unless you wanna go all the way and rework .interrupt_allowed to
.interrupt_blocked while at it as well.
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH v4 26/30] KVM: x86: Don't treat interrupts as allowed just because a nested run is pending
2026-06-15 16:43 ` Yosry Ahmed
@ 2026-06-15 17:03 ` Sean Christopherson
2026-06-15 19:37 ` Yosry Ahmed
0 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-15 17:03 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Kai Huang
On Mon, Jun 15, 2026, Yosry Ahmed wrote:
> On Mon, Jun 15, 2026 at 9:40 AM Yosry Ahmed <yosry@kernel.org> wrote:
> >
> > On Fri, Jun 12, 2026 at 5:04 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > When querying whether or not interrupts (IRQs) are allowed, check for a
> > > pending nested run _after_ checking whether or not interrupts are blocked.
> > > If L1 is running L2 _without_ nested_exit_on_intr(), i.e. if L1 IRQs can
> > > be blocked while running L2, and interrupts will indeed be blocked once the
> > > nested VM-Enter to L2 is completed, then KVM should treat interrupts as not
> > > being allowed.
> > >
> > > For injection, this avoids an unnecessary (forced) VM-Exit, as KVM can
> > > immediately request an IRQ window, instead of forcing an exit and _then_
> > > requesting an IRQ window (because after the forced exit, KVM will see that
> > > interrupts are blocked).
> > >
> > > For non-injection usage, only kvm_vcpu_ready_for_interrupt_injection() is
> > > affected in practice. kvm_vcpu_has_events() is unreachable when a nested
> > > run is pending, as KVM clears nested_run_pending prior to calling
> > > kvm_emulate_halt_noskip() when putting L2 into HLT via GUEST_ACTIVITY_HLT,
> > > and SVM has no equivalent to GUEST_ACTIVITY_STATE. I.e. the vCPU will
> > > always be runnable if a nested run is pending, and thus
> > > kvm_arch_vcpu_runnable() => kvm_vcpu_has_events() is effectively dead code,
> > > as is __kvm_emulate_halt() => kvm_vcpu_has_events(). Oh, and TDX doesn't
> > > support nested VMX. Similarly, kvm_can_do_async_pf() is unreachable as
> > > KVM shouldn't be faulting in memory with a pending nested VM-Enter.
> > >
> > > As for kvm_vcpu_ready_for_interrupt_injection(), incorrectly treating
> > > interrupts as being allowed could result in KVM prematurely exiting to
> > > userspace to accept an ExtINT.
> >
> > "incorrectly treating interrupts as being allowed" is the status quo,
> > that this patch fixes, not sth this patch introduces -- right?
> >
> > The changelog reads like for the non-injection case this change might
> > not be the right thing to do, but I don't think this is the case? I
> > assume returning false from
> > kvm_vcpu_ready_for_interrupt_injection() and kvm_vcpu_has_events() if
> > L1's interrupts are blocked while L2 is running is the right thing to
> > do?
> >
> > The code makes sense to me but I am trying to make sense of the changelog.
> >
> > Aside from that, I have two comments about existing issues (Sashiko-style):
> >
> > 1. The return values of {vmx/svm}_interrupt_allowed() are annoying.
> > IIUC, 0 is not allowed, 1 is allowed, and -EBUSY is generally allowed
> > but not right now, request immediate exit and try again? That should
> > be documented somewhere (maybe I just missed it?).
>
> The next patch is adding the documentation, I spoke too soon :)
>
> >
> > 2. Aside from TDX, seems like {vmx/svm}_interrupt_allowed() are doing
> > the same thing? So maybe move all that logic into
> > kvm_arch_interrupt_allowed(), rename it (because it's only used by
> > x86), and make interrupt_blocked() the actual per-vendor callback?
Doesn't work, at least not with more changes/hooks, because nested_exit_on_intr()
is subtly vendor specifc. The concept is identical, but the implementation needs
to query vmc{b,s}12. :-/
I'm not opposed to figuring out a way to move the logic to common x86, because
I agree that's where it would ideally live, but I don't want to tack that in this
series.
> > I assume this is the case because nested_run_pending used to be per-vendor,
> > but now we can clean this up. For TDX, I assume the per-vendor hook can
> > just be tdx_interrupt_allowed() reversed?
>
> It also does the renaming (nice!), so we're halfway there. I think for
> this series, we should at least convert all direct calls to the vendor
> .interrupt_allowed through the new kvm_is_interrupt_allowed() helper.
Sadly not in this series, because handling the for_injection=true case requires
moving the nested_run_pending logic to common x86, and without that, we end up
with two very silly wrappers.
> Unless you wanna go all the way and rework .interrupt_allowed to
> .interrupt_blocked while at it as well.
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH v4 26/30] KVM: x86: Don't treat interrupts as allowed just because a nested run is pending
2026-06-15 17:03 ` Sean Christopherson
@ 2026-06-15 19:37 ` Yosry Ahmed
0 siblings, 0 replies; 64+ messages in thread
From: Yosry Ahmed @ 2026-06-15 19:37 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Kai Huang
> > > 2. Aside from TDX, seems like {vmx/svm}_interrupt_allowed() are doing
> > > the same thing? So maybe move all that logic into
> > > kvm_arch_interrupt_allowed(), rename it (because it's only used by
> > > x86), and make interrupt_blocked() the actual per-vendor callback?
>
> Doesn't work, at least not with more changes/hooks, because nested_exit_on_intr()
> is subtly vendor specifc. The concept is identical, but the implementation needs
> to query vmc{b,s}12. :-/
Duh, obvious in hindsight :)
>
> I'm not opposed to figuring out a way to move the logic to common x86, because
> I agree that's where it would ideally live, but I don't want to tack that in this
> series.
>
> > > I assume this is the case because nested_run_pending used to be per-vendor,
> > > but now we can clean this up. For TDX, I assume the per-vendor hook can
> > > just be tdx_interrupt_allowed() reversed?
> >
> > It also does the renaming (nice!), so we're halfway there. I think for
> > this series, we should at least convert all direct calls to the vendor
> > .interrupt_allowed through the new kvm_is_interrupt_allowed() helper.
>
> Sadly not in this series, because handling the for_injection=true case requires
> moving the nested_run_pending logic to common x86, and without that, we end up
> with two very silly wrappers.
Yes. I think probably the way to get there is to replace the
.interrupt_allowed hook with .interrupt_blocked and a nested hook to
replace nested_exit_on_intr(), but I am not sure if performance will
matter here if we do two calls instead of one. With that, we can end
up with most of the logic in kvm_is_interrupt_allowed().
But yeah, at this point this is definitely not something to do in this series.
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v4 26/30] KVM: x86: Don't treat interrupts as allowed just because a nested run is pending
2026-06-15 16:40 ` Yosry Ahmed
2026-06-15 16:43 ` Yosry Ahmed
@ 2026-06-15 17:26 ` Sean Christopherson
2026-06-15 19:48 ` Yosry Ahmed
1 sibling, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-15 17:26 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Kai Huang
On Mon, Jun 15, 2026, Yosry Ahmed wrote:
> On Fri, Jun 12, 2026 at 5:04 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > When querying whether or not interrupts (IRQs) are allowed, check for a
> > pending nested run _after_ checking whether or not interrupts are blocked.
> > If L1 is running L2 _without_ nested_exit_on_intr(), i.e. if L1 IRQs can
> > be blocked while running L2, and interrupts will indeed be blocked once the
> > nested VM-Enter to L2 is completed, then KVM should treat interrupts as not
> > being allowed.
> >
> > For injection, this avoids an unnecessary (forced) VM-Exit, as KVM can
> > immediately request an IRQ window, instead of forcing an exit and _then_
> > requesting an IRQ window (because after the forced exit, KVM will see that
> > interrupts are blocked).
> >
> > For non-injection usage, only kvm_vcpu_ready_for_interrupt_injection() is
> > affected in practice. kvm_vcpu_has_events() is unreachable when a nested
> > run is pending, as KVM clears nested_run_pending prior to calling
> > kvm_emulate_halt_noskip() when putting L2 into HLT via GUEST_ACTIVITY_HLT,
> > and SVM has no equivalent to GUEST_ACTIVITY_STATE. I.e. the vCPU will
> > always be runnable if a nested run is pending, and thus
> > kvm_arch_vcpu_runnable() => kvm_vcpu_has_events() is effectively dead code,
> > as is __kvm_emulate_halt() => kvm_vcpu_has_events(). Oh, and TDX doesn't
> > support nested VMX. Similarly, kvm_can_do_async_pf() is unreachable as
> > KVM shouldn't be faulting in memory with a pending nested VM-Enter.
> >
> > As for kvm_vcpu_ready_for_interrupt_injection(), incorrectly treating
> > interrupts as being allowed could result in KVM prematurely exiting to
> > userspace to accept an ExtINT.
>
> "incorrectly treating interrupts as being allowed" is the status quo,
> that this patch fixes, not sth this patch introduces -- right?
Right.
> The changelog reads like for the non-injection case this change might
> not be the right thing to do, but I don't think this is the case? I
> assume returning false from
> kvm_vcpu_ready_for_interrupt_injection() and kvm_vcpu_has_events() if
> L1's interrupts are blocked while L2 is running is the right thing to
> do?
Yes.
> The code makes sense to me but I am trying to make sense of the changelog.
What part (parts?) is confusing? Honest question. I'm trying to reword the
changelog to make it "better", but I'm failing miserable because I don't know
what's wrong :-)
^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: [PATCH v4 26/30] KVM: x86: Don't treat interrupts as allowed just because a nested run is pending
2026-06-15 17:26 ` Sean Christopherson
@ 2026-06-15 19:48 ` Yosry Ahmed
0 siblings, 0 replies; 64+ messages in thread
From: Yosry Ahmed @ 2026-06-15 19:48 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Kai Huang
> > The code makes sense to me but I am trying to make sense of the changelog.
>
> What part (parts?) is confusing? Honest question. I'm trying to reword the
> changelog to make it "better", but I'm failing miserable because I don't know
> what's wrong :-)
1. For kvm_vcpu_has_events() being unaffected, the explanation in
paragraph #3 is focused on the code path from nested_vmx_run() ->
kvm_emulate_halt_noskip(). I don't immediately see how
kvm_arch_vcpu_runnable() is unaffected.
2. More importantly, paragraphs #3 and #4 read like this patch would
regress kvm_vcpu_ready_for_interrupt_injection() and
kvm_vcpu_has_events() if it affected them. Maybe clearly state that
this patch is the right thing to do for these 2 functions as well, but
they are more-or-less unaffected by the bug anyway? For
kvm_vcpu_ready_for_interrupt_injection(), maybe just make it more
clear in paragraph #4 that it currently incorrectly treats interrupts
as allowed in the problematic scenario, but it is not a problem
because ..., and it only results in a spurious exit to userspace (or
not even that?).
^ permalink raw reply [flat|nested] 64+ messages in thread
* [PATCH v4 27/30] KVM: x86: Rework kvm_arch_interrupt_allowed() into kvm_is_interrupt_allowed()
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (25 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 26/30] KVM: x86: Don't treat interrupts as allowed just because a nested run is pending Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-13 0:03 ` [PATCH v4 28/30] KVM: x86/mmu: Move kvm_arch_async_page_ready() below kvm_tdp_page_fault() Sean Christopherson
` (2 subsequent siblings)
29 siblings, 0 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Rename kvm_arch_interrupt_allowed() to kvm_is_interrupt_allowed() and
change its return type to a boolean, as the purpose of the helper is purely
to check if interrupts are architecturally allowed. I.e. whether or not
interrupts are temporarily disallowed due a pending nested VM-Enter is
irrelevant (and callers are most definitely not supposed to care).
Opportunistically bury the helper in x86.c, as it hasn't been referenced by
arch-neutral code since commit a1b37100d9e2 ("KVM: Reduce runnability
interface with arch support code"), and has long since gained _very_
x86-specific semantics (see above).
Opportunistically add a comment to call out that treating -EBUSY as
"allowed" is intentional.
For all intents and purposes, no functional change intended (KVM treats
-EBUSY as "allowed" before and after).
Cc: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/x86.c | 23 +++++++++++++++--------
2 files changed, 15 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0f6f71642e1d..878944aaea85 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2338,7 +2338,6 @@ enum {
# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
#endif
-int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
u64 kvm_scale_tsc(u64 tsc, u64 ratio);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e0456133133b..ed85012d1c0d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2697,6 +2697,18 @@ static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
return 0;
}
+static bool kvm_is_interrupt_allowed(struct kvm_vcpu *vcpu)
+{
+ /*
+ * Note, .interrupt_allowed() returns -EBUSY if interrupts are allowed
+ * based on CPU state, but can't be immediately delivered due to a
+ * pending nested VM-Enter. Treat that case as "allowed", because
+ * the goal here is just to check if interrupts are architecturally
+ * allowed, not to check if they can be injected.
+ */
+ return kvm_x86_call(interrupt_allowed)(vcpu, false);
+}
+
static int kvm_cpu_accept_dm_intr(struct kvm_vcpu *vcpu)
{
/*
@@ -2722,7 +2734,7 @@ static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu *vcpu)
* or KVM_SET_SREGS. For that to work, we must be at an
* instruction boundary and with no events half-injected.
*/
- return (kvm_arch_interrupt_allowed(vcpu) &&
+ return (kvm_is_interrupt_allowed(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu) &&
!kvm_event_needs_reinjection(vcpu) &&
!kvm_is_exception_pending(vcpu));
@@ -8472,7 +8484,7 @@ bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
return true;
- if (kvm_arch_interrupt_allowed(vcpu) && kvm_cpu_has_interrupt(vcpu))
+ if (kvm_is_interrupt_allowed(vcpu) && kvm_cpu_has_interrupt(vcpu))
return true;
if (kvm_hv_has_stimer_pending(vcpu))
@@ -10308,11 +10320,6 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
}
-int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu)
-{
- return kvm_x86_call(interrupt_allowed)(vcpu, false);
-}
-
static inline u32 kvm_async_pf_hash_fn(gfn_t gfn)
{
BUILD_BUG_ON(!is_power_of_2(ASYNC_PF_PER_VCPU));
@@ -10448,7 +10455,7 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
* If interrupts are off we cannot even use an artificial
* halt state.
*/
- return kvm_arch_interrupt_allowed(vcpu);
+ return kvm_is_interrupt_allowed(vcpu);
}
bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* [PATCH v4 28/30] KVM: x86/mmu: Move kvm_arch_async_page_ready() below kvm_tdp_page_fault()
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (26 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 27/30] KVM: x86: Rework kvm_arch_interrupt_allowed() into kvm_is_interrupt_allowed() Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-13 0:03 ` [PATCH v4 29/30] KVM: x86/mmu: Move kvm_mmu_do_page_fault() from mmu_internal.h => mmu.c Sean Christopherson
2026-06-13 0:03 ` [PATCH v4 30/30] KVM: x86: Move a pile of stuff from kvm_host.h => x86.h Sean Christopherson
29 siblings, 0 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Move the implementation of kvm_arch_async_page_ready() "down" in mmu.c so
that it lives below kvm_tdp_page_fault(). This will allow moving
kvm_mmu_do_page_fault() into mmu.c without needing a forward declaration.
No functional change intended.
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 62 +++++++++++++++++++++---------------------
1 file changed, 31 insertions(+), 31 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 313106e84a64..bc844ca9e01f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4571,37 +4571,6 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
kvm_vcpu_gfn_to_hva(vcpu, fault->gfn), &arch);
}
-void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
-{
- int r;
-
- if (WARN_ON_ONCE(work->arch.error_code & PFERR_PRIVATE_ACCESS))
- return;
-
- if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) ||
- work->wakeup_all)
- return;
-
- r = kvm_mmu_reload(vcpu);
- if (unlikely(r))
- return;
-
- if (!vcpu->arch.mmu->root_role.direct &&
- work->arch.cr3 != kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu))
- return;
-
- r = kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, work->arch.error_code,
- true, NULL, NULL);
-
- /*
- * Account fixed page faults, otherwise they'll never be counted, but
- * ignore stats for all other return times. Page-ready "faults" aren't
- * truly spurious and never trigger emulation
- */
- if (r == RET_PF_FIXED)
- vcpu->stat.pf_fixed++;
-}
-
static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault, int r)
{
@@ -5058,6 +5027,37 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
return min(range->size, end - range->gpa);
}
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
+{
+ int r;
+
+ if (WARN_ON_ONCE(work->arch.error_code & PFERR_PRIVATE_ACCESS))
+ return;
+
+ if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) ||
+ work->wakeup_all)
+ return;
+
+ r = kvm_mmu_reload(vcpu);
+ if (unlikely(r))
+ return;
+
+ if (!vcpu->arch.mmu->root_role.direct &&
+ work->arch.cr3 != kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu))
+ return;
+
+ r = kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, work->arch.error_code,
+ true, NULL, NULL);
+
+ /*
+ * Account fixed page faults, otherwise they'll never be counted, but
+ * ignore stats for all other return times. Page-ready "faults" aren't
+ * truly spurious and never trigger emulation
+ */
+ if (r == RET_PF_FIXED)
+ vcpu->stat.pf_fixed++;
+}
+
#ifdef CONFIG_KVM_GUEST_MEMFD
static void kvm_assert_gmem_invalidate_lock_held(struct kvm_memory_slot *slot)
{
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* [PATCH v4 29/30] KVM: x86/mmu: Move kvm_mmu_do_page_fault() from mmu_internal.h => mmu.c
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (27 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 28/30] KVM: x86/mmu: Move kvm_arch_async_page_ready() below kvm_tdp_page_fault() Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-13 0:03 ` [PATCH v4 30/30] KVM: x86: Move a pile of stuff from kvm_host.h => x86.h Sean Christopherson
29 siblings, 0 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Move kvm_mmu_do_page_fault() into mmu.c, as there are no users outside of
mmu.c, and the function typically isn't inlined by the compiler anyways.
This will allow moving the EMULTYPE_xxx definitions into x86.h without
having to include x86.h in mmu_internal.h, i.e. will help preserve the
goal of making x86.h KVM x86's "top-level" include.
No functional change intended.
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmu.c | 67 ++++++++++++++++++++++++++++++++-
arch/x86/kvm/mmu/mmu_internal.h | 66 --------------------------------
2 files changed, 66 insertions(+), 67 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index bc844ca9e01f..e74ac24e4ee3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4927,7 +4927,7 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
}
#endif
-int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+static int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
#ifdef CONFIG_X86_64
if (tdp_mmu_enabled)
@@ -4937,6 +4937,71 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
return direct_page_fault(vcpu, fault);
}
+static int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
+ u64 err, bool prefetch, int *emulation_type,
+ u8 *level)
+{
+ struct kvm_page_fault fault = {
+ .addr = cr2_or_gpa,
+ .error_code = err,
+ .exec = err & PFERR_FETCH_MASK,
+ .write = err & PFERR_WRITE_MASK,
+ .present = err & PFERR_PRESENT_MASK,
+ .rsvd = err & PFERR_RSVD_MASK,
+ .user = err & PFERR_USER_MASK,
+ .prefetch = prefetch,
+ .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
+ .nx_huge_page_workaround_enabled =
+ is_nx_huge_page_enabled(vcpu->kvm),
+
+ .max_level = KVM_MAX_HUGEPAGE_LEVEL,
+ .req_level = PG_LEVEL_4K,
+ .goal_level = PG_LEVEL_4K,
+ .is_private = err & PFERR_PRIVATE_ACCESS,
+
+ .pfn = KVM_PFN_ERR_FAULT,
+ };
+ int r;
+
+ if (vcpu->arch.mmu->root_role.direct) {
+ /*
+ * Things like memslots don't understand the concept of a shared
+ * bit. Strip it so that the GFN can be used like normal, and the
+ * fault.addr can be used when the shared bit is needed.
+ */
+ fault.gfn = gpa_to_gfn(fault.addr) & ~kvm_gfn_direct_bits(vcpu->kvm);
+ fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
+ }
+
+ /*
+ * With retpoline being active an indirect call is rather expensive,
+ * so do a direct call in the most common case.
+ */
+ if (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) && fault.is_tdp)
+ r = kvm_tdp_page_fault(vcpu, &fault);
+ else
+ r = vcpu->arch.mmu->page_fault(vcpu, &fault);
+
+ /*
+ * Not sure what's happening, but punt to userspace and hope that
+ * they can fix it by changing memory to shared, or they can
+ * provide a better error.
+ */
+ if (r == RET_PF_EMULATE && fault.is_private) {
+ pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
+ kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
+ return -EFAULT;
+ }
+
+ if (fault.write_fault_to_shadow_pgtable && emulation_type)
+ *emulation_type |= EMULTYPE_WRITE_PF_TO_SP;
+ if (level)
+ *level = fault.goal_level;
+
+ return r;
+}
+
+
static int kvm_tdp_page_prefault(struct kvm_vcpu *vcpu, gpa_t gpa,
u64 error_code, u8 *level)
{
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 73cdcbccc89e..c29002c60126 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -290,8 +290,6 @@ struct kvm_page_fault {
bool write_fault_to_shadow_pgtable;
};
-int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
-
/*
* Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
* and of course kvm_mmu_do_page_fault().
@@ -337,70 +335,6 @@ static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
fault->is_private);
}
-static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
- u64 err, bool prefetch,
- int *emulation_type, u8 *level)
-{
- struct kvm_page_fault fault = {
- .addr = cr2_or_gpa,
- .error_code = err,
- .exec = err & PFERR_FETCH_MASK,
- .write = err & PFERR_WRITE_MASK,
- .present = err & PFERR_PRESENT_MASK,
- .rsvd = err & PFERR_RSVD_MASK,
- .user = err & PFERR_USER_MASK,
- .prefetch = prefetch,
- .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
- .nx_huge_page_workaround_enabled =
- is_nx_huge_page_enabled(vcpu->kvm),
-
- .max_level = KVM_MAX_HUGEPAGE_LEVEL,
- .req_level = PG_LEVEL_4K,
- .goal_level = PG_LEVEL_4K,
- .is_private = err & PFERR_PRIVATE_ACCESS,
-
- .pfn = KVM_PFN_ERR_FAULT,
- };
- int r;
-
- if (vcpu->arch.mmu->root_role.direct) {
- /*
- * Things like memslots don't understand the concept of a shared
- * bit. Strip it so that the GFN can be used like normal, and the
- * fault.addr can be used when the shared bit is needed.
- */
- fault.gfn = gpa_to_gfn(fault.addr) & ~kvm_gfn_direct_bits(vcpu->kvm);
- fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
- }
-
- /*
- * With retpoline being active an indirect call is rather expensive,
- * so do a direct call in the most common case.
- */
- if (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) && fault.is_tdp)
- r = kvm_tdp_page_fault(vcpu, &fault);
- else
- r = vcpu->arch.mmu->page_fault(vcpu, &fault);
-
- /*
- * Not sure what's happening, but punt to userspace and hope that
- * they can fix it by changing memory to shared, or they can
- * provide a better error.
- */
- if (r == RET_PF_EMULATE && fault.is_private) {
- pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
- kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
- return -EFAULT;
- }
-
- if (fault.write_fault_to_shadow_pgtable && emulation_type)
- *emulation_type |= EMULTYPE_WRITE_PF_TO_SP;
- if (level)
- *level = fault.goal_level;
-
- return r;
-}
-
int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
const struct kvm_memory_slot *slot, gfn_t gfn);
void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* [PATCH v4 30/30] KVM: x86: Move a pile of stuff from kvm_host.h => x86.h
2026-06-13 0:02 [PATCH v4 00/30] KVM: x86: x86.{c,h} spring cleaning Sean Christopherson
` (28 preceding siblings ...)
2026-06-13 0:03 ` [PATCH v4 29/30] KVM: x86/mmu: Move kvm_mmu_do_page_fault() from mmu_internal.h => mmu.c Sean Christopherson
@ 2026-06-13 0:03 ` Sean Christopherson
2026-06-15 13:01 ` Huang, Kai
29 siblings, 1 reply; 64+ messages in thread
From: Sean Christopherson @ 2026-06-13 0:03 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
Cc: kvm, linux-kernel, Yosry Ahmed, Kai Huang
Move the majority of remaining KVM-internal declarations and defines in
kvm_host.h to x86.h, so that kvm_host.h only holds structure and function
definitions that need to be visible to arch-neutral KVM.
Land the emulator interfaces in x86.h, even though kvm_emulate.h *seems*
like a good home, as the interfaces and defines being moved are provided by
x86.c. I.e. keep kvm_emulate.h as an interface to the emulator proper.
Note, any "misses" are likely unintentional.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 199 --------------------------------
arch/x86/kvm/ioapic.c | 1 +
arch/x86/kvm/x86.h | 197 +++++++++++++++++++++++++++++++
3 files changed, 198 insertions(+), 199 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 878944aaea85..d6e1138af9e7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2074,9 +2074,6 @@ extern struct kvm_x86_ops kvm_x86_ops;
#define KVM_X86_OP_OPTIONAL_RET0 KVM_X86_OP
#include <asm/kvm-x86-ops.h>
-int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops);
-void kvm_x86_vendor_exit(void);
-
#define __KVM_HAVE_ARCH_VM_ALLOC
static inline struct kvm *kvm_arch_alloc_vm(void)
{
@@ -2119,175 +2116,6 @@ enum kvm_intr_type {
((vcpu) && (vcpu)->arch.handling_intr_from_guest && \
(!!in_nmi() == ((vcpu)->arch.handling_intr_from_guest == KVM_HANDLING_NMI)))
-/*
- * EMULTYPE_NO_DECODE - Set when re-emulating an instruction (after completing
- * userspace I/O) to indicate that the emulation context
- * should be reused as is, i.e. skip initialization of
- * emulation context, instruction fetch and decode.
- *
- * EMULTYPE_TRAP_UD - Set when emulating an intercepted #UD from hardware.
- * Indicates that only select instructions (tagged with
- * EmulateOnUD) should be emulated (to minimize the emulator
- * attack surface). See also EMULTYPE_TRAP_UD_FORCED.
- *
- * EMULTYPE_SKIP - Set when emulating solely to skip an instruction, i.e. to
- * decode the instruction length. For use *only* by
- * kvm_x86_ops.skip_emulated_instruction() implementations if
- * EMULTYPE_COMPLETE_USER_EXIT is not set.
- *
- * EMULTYPE_ALLOW_RETRY_PF - Set when the emulator should resume the guest to
- * retry native execution under certain conditions,
- * Can only be set in conjunction with EMULTYPE_PF.
- *
- * EMULTYPE_TRAP_UD_FORCED - Set when emulating an intercepted #UD that was
- * triggered by KVM's magic "force emulation" prefix,
- * which is opt in via module param (off by default).
- * Bypasses EmulateOnUD restriction despite emulating
- * due to an intercepted #UD (see EMULTYPE_TRAP_UD).
- * Used to test the full emulator from userspace.
- *
- * EMULTYPE_VMWARE_GP - Set when emulating an intercepted #GP for VMware
- * backdoor emulation, which is opt in via module param.
- * VMware backdoor emulation handles select instructions
- * and reinjects the #GP for all other cases.
- *
- * EMULTYPE_PF - Set when an intercepted #PF triggers the emulation, in which case
- * the CR2/GPA value pass on the stack is valid.
- *
- * EMULTYPE_COMPLETE_USER_EXIT - Set when the emulator should update interruptibility
- * state and inject single-step #DBs after skipping
- * an instruction (after completing userspace I/O).
- *
- * EMULTYPE_WRITE_PF_TO_SP - Set when emulating an intercepted page fault that
- * is attempting to write a gfn that contains one or
- * more of the PTEs used to translate the write itself,
- * and the owning page table is being shadowed by KVM.
- * If emulation of the faulting instruction fails and
- * this flag is set, KVM will exit to userspace instead
- * of retrying emulation as KVM cannot make forward
- * progress.
- *
- * If emulation fails for a write to guest page tables,
- * KVM unprotects (zaps) the shadow page for the target
- * gfn and resumes the guest to retry the non-emulatable
- * instruction (on hardware). Unprotecting the gfn
- * doesn't allow forward progress for a self-changing
- * access because doing so also zaps the translation for
- * the gfn, i.e. retrying the instruction will hit a
- * !PRESENT fault, which results in a new shadow page
- * and sends KVM back to square one.
- *
- * EMULTYPE_SKIP_SOFT_INT - Set in combination with EMULTYPE_SKIP to only skip
- * an instruction if it could generate a given software
- * interrupt, which must be encoded via
- * EMULTYPE_SET_SOFT_INT_VECTOR().
- */
-#define EMULTYPE_NO_DECODE (1 << 0)
-#define EMULTYPE_TRAP_UD (1 << 1)
-#define EMULTYPE_SKIP (1 << 2)
-#define EMULTYPE_ALLOW_RETRY_PF (1 << 3)
-#define EMULTYPE_TRAP_UD_FORCED (1 << 4)
-#define EMULTYPE_VMWARE_GP (1 << 5)
-#define EMULTYPE_PF (1 << 6)
-#define EMULTYPE_COMPLETE_USER_EXIT (1 << 7)
-#define EMULTYPE_WRITE_PF_TO_SP (1 << 8)
-#define EMULTYPE_SKIP_SOFT_INT (1 << 9)
-
-#define EMULTYPE_SET_SOFT_INT_VECTOR(v) ((u32)((v) & 0xff) << 16)
-#define EMULTYPE_GET_SOFT_INT_VECTOR(e) (((e) >> 16) & 0xff)
-
-static inline bool kvm_can_emulate_event_vectoring(int emul_type)
-{
- return !(emul_type & EMULTYPE_PF);
-}
-
-int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);
-int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu,
- void *insn, int insn_len);
-void __kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu,
- u64 *data, u8 ndata);
-void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu);
-
-void kvm_prepare_event_vectoring_exit(struct kvm_vcpu *vcpu, gpa_t gpa);
-void kvm_prepare_unexpected_reason_exit(struct kvm_vcpu *vcpu, u64 exit_reason);
-
-int kvm_emulate_as_nop(struct kvm_vcpu *vcpu);
-int kvm_emulate_invd(struct kvm_vcpu *vcpu);
-int kvm_emulate_mwait(struct kvm_vcpu *vcpu);
-int kvm_handle_invalid_op(struct kvm_vcpu *vcpu);
-int kvm_emulate_monitor(struct kvm_vcpu *vcpu);
-
-int kvm_fast_pio(struct kvm_vcpu *vcpu, int size, unsigned short port, int in);
-int kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
-int kvm_emulate_halt(struct kvm_vcpu *vcpu);
-int kvm_emulate_halt_noskip(struct kvm_vcpu *vcpu);
-int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu);
-int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
-
-void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
-
-int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
- int reason, bool has_error_code, u32 error_code);
-
-int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
-int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
-
-int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu);
-
-void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
-void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
-void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, unsigned long payload);
-void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned int nr,
- bool has_error_code, u32 error_code);
-void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault,
- bool from_hardware);
-void __kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
- struct x86_exception *fault,
- bool from_hardware);
-
-static inline void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
- struct x86_exception *fault)
-{
- __kvm_inject_emulated_page_fault(vcpu, fault, false);
-}
-
-bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr);
-
-void kvm_inject_nmi(struct kvm_vcpu *vcpu);
-int kvm_get_nr_pending_nmis(struct kvm_vcpu *vcpu);
-
-bool kvm_apicv_activated(struct kvm *kvm);
-bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu);
-void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
-void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
- enum kvm_apicv_inhibit reason, bool set);
-void kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
- enum kvm_apicv_inhibit reason, bool set);
-
-static inline void kvm_set_apicv_inhibit(struct kvm *kvm,
- enum kvm_apicv_inhibit reason)
-{
- kvm_set_or_clear_apicv_inhibit(kvm, reason, true);
-}
-
-static inline void kvm_clear_apicv_inhibit(struct kvm *kvm,
- enum kvm_apicv_inhibit reason)
-{
- kvm_set_or_clear_apicv_inhibit(kvm, reason, false);
-}
-
-void kvm_inc_or_dec_irq_window_inhibit(struct kvm *kvm, bool inc);
-
-static inline void kvm_inc_apicv_irq_window_req(struct kvm *kvm)
-{
- kvm_inc_or_dec_irq_window_inhibit(kvm, true);
-}
-
-static inline void kvm_dec_apicv_irq_window_req(struct kvm *kvm)
-{
- kvm_inc_or_dec_irq_window_inhibit(kvm, false);
-}
-
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.has_private_mem)
#endif
@@ -2304,11 +2132,6 @@ static inline unsigned long read_msr(unsigned long msr)
}
#endif
-static inline void kvm_inject_gp(struct kvm_vcpu *vcpu, u32 error_code)
-{
- kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
-}
-
#define TSS_IOPB_BASE_OFFSET 0x66
#define TSS_BASE_SIZE 0x68
#define TSS_IOPB_SIZE (65536 / 8)
@@ -2338,17 +2161,6 @@ enum {
# define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
#endif
-void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
-
-u64 kvm_scale_tsc(u64 tsc, u64 ratio);
-u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
-u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier);
-u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier);
-
-void kvm_make_scan_ioapic_request(struct kvm *kvm);
-void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
- unsigned long *vcpu_bitmap);
-
bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
struct kvm_async_pf *work);
void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
@@ -2357,15 +2169,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
struct kvm_async_pf *work);
void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu);
bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
-extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
-
-int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
-int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
-
-void __user *__x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
- u32 size);
-bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu);
-bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu);
static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
{
@@ -2387,8 +2190,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
#endif
}
-int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
-
static inline bool kvm_arch_has_irq_bypass(void)
{
return enable_device_posted_irqs;
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index f3f4a483ca15..b090c8696116 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -33,6 +33,7 @@
#include "lapic.h"
#include "irq.h"
#include "trace.h"
+#include "x86.h"
static int ioapic_service(struct kvm_ioapic *vioapic, int irq,
bool line_status);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index fd3d0a196526..3180046c713b 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -14,6 +14,9 @@
#define KVM_MAX_MCE_BANKS 32
+int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops);
+void kvm_x86_vendor_exit(void);
+
void kvm_spurious_fault(void);
#define SIZE_OF_MEMSLOTS_HASHTABLE \
@@ -318,6 +321,8 @@ static __always_inline void kvm_request_l1tf_flush_l1d(void)
#endif
}
+void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
+
void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
u64 get_kvmclock_ns(struct kvm *kvm);
@@ -326,6 +331,10 @@ bool kvm_get_monotonic_and_clockread(s64 *kernel_ns, u64 *tsc_timestamp);
int kvm_guest_time_update(struct kvm_vcpu *v);
void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 *user_value);
+u64 kvm_scale_tsc(u64 tsc, u64 ratio);
+u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
+u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier);
+u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier);
u64 kvm_compute_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc);
void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset);
@@ -363,10 +372,196 @@ int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type,
void *insn, int insn_len);
int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
int emulation_type, void *insn, int insn_len);
+/*
+ * EMULTYPE_NO_DECODE - Set when re-emulating an instruction (after completing
+ * userspace I/O) to indicate that the emulation context
+ * should be reused as is, i.e. skip initialization of
+ * emulation context, instruction fetch and decode.
+ *
+ * EMULTYPE_TRAP_UD - Set when emulating an intercepted #UD from hardware.
+ * Indicates that only select instructions (tagged with
+ * EmulateOnUD) should be emulated (to minimize the emulator
+ * attack surface). See also EMULTYPE_TRAP_UD_FORCED.
+ *
+ * EMULTYPE_SKIP - Set when emulating solely to skip an instruction, i.e. to
+ * decode the instruction length. For use *only* by
+ * kvm_x86_ops.skip_emulated_instruction() implementations if
+ * EMULTYPE_COMPLETE_USER_EXIT is not set.
+ *
+ * EMULTYPE_ALLOW_RETRY_PF - Set when the emulator should resume the guest to
+ * retry native execution under certain conditions,
+ * Can only be set in conjunction with EMULTYPE_PF.
+ *
+ * EMULTYPE_TRAP_UD_FORCED - Set when emulating an intercepted #UD that was
+ * triggered by KVM's magic "force emulation" prefix,
+ * which is opt in via module param (off by default).
+ * Bypasses EmulateOnUD restriction despite emulating
+ * due to an intercepted #UD (see EMULTYPE_TRAP_UD).
+ * Used to test the full emulator from userspace.
+ *
+ * EMULTYPE_VMWARE_GP - Set when emulating an intercepted #GP for VMware
+ * backdoor emulation, which is opt in via module param.
+ * VMware backdoor emulation handles select instructions
+ * and reinjects the #GP for all other cases.
+ *
+ * EMULTYPE_PF - Set when an intercepted #PF triggers the emulation, in which case
+ * the CR2/GPA value pass on the stack is valid.
+ *
+ * EMULTYPE_COMPLETE_USER_EXIT - Set when the emulator should update interruptibility
+ * state and inject single-step #DBs after skipping
+ * an instruction (after completing userspace I/O).
+ *
+ * EMULTYPE_WRITE_PF_TO_SP - Set when emulating an intercepted page fault that
+ * is attempting to write a gfn that contains one or
+ * more of the PTEs used to translate the write itself,
+ * and the owning page table is being shadowed by KVM.
+ * If emulation of the faulting instruction fails and
+ * this flag is set, KVM will exit to userspace instead
+ * of retrying emulation as KVM cannot make forward
+ * progress.
+ *
+ * If emulation fails for a write to guest page tables,
+ * KVM unprotects (zaps) the shadow page for the target
+ * gfn and resumes the guest to retry the non-emulatable
+ * instruction (on hardware). Unprotecting the gfn
+ * doesn't allow forward progress for a self-changing
+ * access because doing so also zaps the translation for
+ * the gfn, i.e. retrying the instruction will hit a
+ * !PRESENT fault, which results in a new shadow page
+ * and sends KVM back to square one.
+ *
+ * EMULTYPE_SKIP_SOFT_INT - Set in combination with EMULTYPE_SKIP to only skip
+ * an instruction if it could generate a given software
+ * interrupt, which must be encoded via
+ * EMULTYPE_SET_SOFT_INT_VECTOR().
+ */
+#define EMULTYPE_NO_DECODE (1 << 0)
+#define EMULTYPE_TRAP_UD (1 << 1)
+#define EMULTYPE_SKIP (1 << 2)
+#define EMULTYPE_ALLOW_RETRY_PF (1 << 3)
+#define EMULTYPE_TRAP_UD_FORCED (1 << 4)
+#define EMULTYPE_VMWARE_GP (1 << 5)
+#define EMULTYPE_PF (1 << 6)
+#define EMULTYPE_COMPLETE_USER_EXIT (1 << 7)
+#define EMULTYPE_WRITE_PF_TO_SP (1 << 8)
+#define EMULTYPE_SKIP_SOFT_INT (1 << 9)
+
+#define EMULTYPE_SET_SOFT_INT_VECTOR(v) ((u32)((v) & 0xff) << 16)
+#define EMULTYPE_GET_SOFT_INT_VECTOR(e) (((e) >> 16) & 0xff)
+
+static inline bool kvm_can_emulate_event_vectoring(int emul_type)
+{
+ return !(emul_type & EMULTYPE_PF);
+}
+
+int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);
+int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu,
+ void *insn, int insn_len);
+void __kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu,
+ u64 *data, u8 ndata);
+void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu);
+
+void kvm_prepare_event_vectoring_exit(struct kvm_vcpu *vcpu, gpa_t gpa);
+void kvm_prepare_unexpected_reason_exit(struct kvm_vcpu *vcpu, u64 exit_reason);
fastpath_t handle_fastpath_hlt(struct kvm_vcpu *vcpu);
fastpath_t handle_fastpath_invd(struct kvm_vcpu *vcpu);
+int kvm_emulate_as_nop(struct kvm_vcpu *vcpu);
+int kvm_emulate_invd(struct kvm_vcpu *vcpu);
+int kvm_emulate_mwait(struct kvm_vcpu *vcpu);
+int kvm_handle_invalid_op(struct kvm_vcpu *vcpu);
+int kvm_emulate_monitor(struct kvm_vcpu *vcpu);
+
+int kvm_fast_pio(struct kvm_vcpu *vcpu, int size, unsigned short port, int in);
+int kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
+int kvm_emulate_halt(struct kvm_vcpu *vcpu);
+int kvm_emulate_halt_noskip(struct kvm_vcpu *vcpu);
+int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu);
+int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
+
+void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
+
+int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
+ int reason, bool has_error_code, u32 error_code);
+
+int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
+int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
+int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu);
+
+int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
+int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
+
+void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr);
+void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code);
+void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr, unsigned long payload);
+void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned int nr,
+ bool has_error_code, u32 error_code);
+void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault,
+ bool from_hardware);
+void __kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+ struct x86_exception *fault,
+ bool from_hardware);
+
+static inline void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
+ struct x86_exception *fault)
+{
+ __kvm_inject_emulated_page_fault(vcpu, fault, false);
+}
+
+bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr);
+
+static inline void kvm_inject_gp(struct kvm_vcpu *vcpu, u32 error_code)
+{
+ kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
+}
+
+void kvm_inject_nmi(struct kvm_vcpu *vcpu);
+int kvm_get_nr_pending_nmis(struct kvm_vcpu *vcpu);
+
+void __user *__x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
+ u32 size);
+int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
+
+bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu);
+bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu);
+
+bool kvm_apicv_activated(struct kvm *kvm);
+bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu);
+void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
+void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
+ enum kvm_apicv_inhibit reason, bool set);
+void kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
+ enum kvm_apicv_inhibit reason, bool set);
+
+static inline void kvm_set_apicv_inhibit(struct kvm *kvm,
+ enum kvm_apicv_inhibit reason)
+{
+ kvm_set_or_clear_apicv_inhibit(kvm, reason, true);
+}
+
+static inline void kvm_clear_apicv_inhibit(struct kvm *kvm,
+ enum kvm_apicv_inhibit reason)
+{
+ kvm_set_or_clear_apicv_inhibit(kvm, reason, false);
+}
+
+void kvm_inc_or_dec_irq_window_inhibit(struct kvm *kvm, bool inc);
+
+static inline void kvm_inc_apicv_irq_window_req(struct kvm *kvm)
+{
+ kvm_inc_or_dec_irq_window_inhibit(kvm, true);
+}
+
+static inline void kvm_dec_apicv_irq_window_req(struct kvm *kvm)
+{
+ kvm_inc_or_dec_irq_window_inhibit(kvm, false);
+}
+
+void kvm_make_scan_ioapic_request(struct kvm *kvm);
+void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
+ unsigned long *vcpu_bitmap);
+
void kvm_setup_xss_caps(void);
/*
@@ -507,6 +702,8 @@ static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
vcpu->arch.apf.gfns[i] = ~0;
}
+bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+
/*
* Trigger machine check on the host. We assume all the MSRs are already set up
* by the CPU and that we still run on the same CPU as the MCE occurred on.
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 64+ messages in thread* Re: [PATCH v4 30/30] KVM: x86: Move a pile of stuff from kvm_host.h => x86.h
2026-06-13 0:03 ` [PATCH v4 30/30] KVM: x86: Move a pile of stuff from kvm_host.h => x86.h Sean Christopherson
@ 2026-06-15 13:01 ` Huang, Kai
2026-06-15 14:23 ` Sean Christopherson
0 siblings, 1 reply; 64+ messages in thread
From: Huang, Kai @ 2026-06-15 13:01 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
yosry@kernel.org
On Fri, 2026-06-12 at 17:03 -0700, Sean Christopherson wrote:
> Move the majority of remaining KVM-internal declarations and defines in
> kvm_host.h to x86.h, so that kvm_host.h only holds structure and function
> definitions that need to be visible to arch-neutral KVM.
>
> Land the emulator interfaces in x86.h, even though kvm_emulate.h *seems*
> like a good home, as the interfaces and defines being moved are provided by
> x86.c. I.e. keep kvm_emulate.h as an interface to the emulator proper.
>
> Note, any "misses" are likely unintentional.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
[...]
> -
> #define TSS_IOPB_BASE_OFFSET 0x66
> #define TSS_BASE_SIZE 0x68
> #define TSS_IOPB_SIZE (65536 / 8)
Nit: Seems sashiko was right about TSS_IOPB_BASE and the related macros can be
moved out of asm/kvm_host.h (ditto for the enum of TASK_SWITCH_*)?
https://lore.kernel.org/kvm/20260530075928.888B61F00893@smtp.kernel.org/
But you said "majority" in the changelog, so we can always do more cleanup
later. :-)
[...]
> +void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
> + enum kvm_apicv_inhibit reason, bool set);
> +void kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
> + enum kvm_apicv_inhibit reason, bool set);
> +
> +static inline void kvm_set_apicv_inhibit(struct kvm *kvm,
> + enum kvm_apicv_inhibit reason)
> +{
> + kvm_set_or_clear_apicv_inhibit(kvm, reason, true);
> +}
> +
> +static inline void kvm_clear_apicv_inhibit(struct kvm *kvm,
> + enum kvm_apicv_inhibit reason)
> +{
> + kvm_set_or_clear_apicv_inhibit(kvm, reason, false);
> +}
> +
The definition of 'enum kvm_apicv_inhibit' remains in asm/kvm_host.h. Is it
because "trace.h" still uses APICV_INHIBIT_REASONS?
^ permalink raw reply [flat|nested] 64+ messages in thread* Re: [PATCH v4 30/30] KVM: x86: Move a pile of stuff from kvm_host.h => x86.h
2026-06-15 13:01 ` Huang, Kai
@ 2026-06-15 14:23 ` Sean Christopherson
0 siblings, 0 replies; 64+ messages in thread
From: Sean Christopherson @ 2026-06-15 14:23 UTC (permalink / raw)
To: Kai Huang
Cc: pbonzini@redhat.com, vkuznets@redhat.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, yosry@kernel.org
On Mon, Jun 15, 2026, Kai Huang wrote:
> On Fri, 2026-06-12 at 17:03 -0700, Sean Christopherson wrote:
> > Move the majority of remaining KVM-internal declarations and defines in
> > kvm_host.h to x86.h, so that kvm_host.h only holds structure and function
> > definitions that need to be visible to arch-neutral KVM.
> >
> > Land the emulator interfaces in x86.h, even though kvm_emulate.h *seems*
> > like a good home, as the interfaces and defines being moved are provided by
> > x86.c. I.e. keep kvm_emulate.h as an interface to the emulator proper.
> >
> > Note, any "misses" are likely unintentional.
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
>
> [...]
>
> > -
> > #define TSS_IOPB_BASE_OFFSET 0x66
> > #define TSS_BASE_SIZE 0x68
> > #define TSS_IOPB_SIZE (65536 / 8)
>
> Nit: Seems sashiko was right about TSS_IOPB_BASE and the related macros can be
> moved out of asm/kvm_host.h (ditto for the enum of TASK_SWITCH_*)?
Hmm, agreed. arch/x86/kvm/tss.h seems like the perfect fit. It would be easy
to slot in a patch to move these ahead of the bulk move to x86.h.
> https://lore.kernel.org/kvm/20260530075928.888B61F00893@smtp.kernel.org/
>
> But you said "majority" in the changelog, so we can always do more cleanup
> later. :-)
>
> [...]
>
> > +void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
> > + enum kvm_apicv_inhibit reason, bool set);
> > +void kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
> > + enum kvm_apicv_inhibit reason, bool set);
> > +
> > +static inline void kvm_set_apicv_inhibit(struct kvm *kvm,
> > + enum kvm_apicv_inhibit reason)
> > +{
> > + kvm_set_or_clear_apicv_inhibit(kvm, reason, true);
> > +}
> > +
> > +static inline void kvm_clear_apicv_inhibit(struct kvm *kvm,
> > + enum kvm_apicv_inhibit reason)
> > +{
> > + kvm_set_or_clear_apicv_inhibit(kvm, reason, false);
> > +}
> > +
>
> The definition of 'enum kvm_apicv_inhibit' remains in asm/kvm_host.h. Is it
> because "trace.h" still uses APICV_INHIBIT_REASONS?
No, pretty sure I just overlooked the enum. I'll do another pass of kvm_host.h
to see what #defines and enums can be moved.
^ permalink raw reply [flat|nested] 64+ messages in thread