* [PATCH 01/18] KVM: x86: hyper-v: Introduce XMM output support
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-07-08 14:59 ` Vitaly Kuznetsov
2024-06-09 15:49 ` [PATCH 02/18] KVM: x86: hyper-v: Introduce helpers to check if VSM is exposed to guest Nicolas Saenz Julienne
` (18 subsequent siblings)
19 siblings, 1 reply; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
Prepare infrastructure to be able to return data through the XMM
registers when Hyper-V hypercalls are issues in fast mode. The XMM
registers are exposed to user-space through KVM_EXIT_HYPERV_HCALL and
restored on successful hypercall completion.
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
There was some discussion in the RFC about whether growing 'struct
kvm_hyperv_exit' is ABI breakage. IMO it isn't:
- There is padding in 'struct kvm_run' that ensures that a bigger
'struct kvm_hyperv_exit' doesn't alter the offsets within that struct.
- Adding a new field at the bottom of the 'hcall' field within the
'struct kvm_hyperv_exit' should be fine as well, as it doesn't alter
the offsets within that struct either.
- Ultimately, previous updates to 'struct kvm_hyperv_exit's hint that
its size isn't part of the uABI. It already grew when syndbg was
introduced.
Documentation/virt/kvm/api.rst | 19 ++++++++++
arch/x86/include/asm/hyperv-tlfs.h | 2 +-
arch/x86/kvm/hyperv.c | 56 +++++++++++++++++++++++++++++-
include/uapi/linux/kvm.h | 6 ++++
4 files changed, 81 insertions(+), 2 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index a71d91978d9ef..17893b330b76f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8893,3 +8893,22 @@ Ordering of KVM_GET_*/KVM_SET_* ioctls
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TBD
+
+10. Hyper-V CPUIDs
+==================
+
+This section only applies to x86.
+
+New Hyper-V feature support is no longer being tracked through KVM
+capabilities. Userspace can check if a particular version of KVM supports a
+feature using KMV_GET_SUPPORTED_HV_CPUID. This section documents how Hyper-V
+CPUIDs map to KVM functionality.
+
+10.1 HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE
+------------------------------------------
+
+:Location: CPUID.40000003H:EDX[bit 15]
+
+This CPUID indicates that KVM supports retuning data to the guest in response
+to a hypercall using the XMM registers. It also extends ``struct
+kvm_hyperv_exit`` to allow passing the XMM data from userspace.
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 3787d26810c1c..6a18c9f77d5fe 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -49,7 +49,7 @@
/* Support for physical CPU dynamic partitioning events is available*/
#define HV_X64_CPU_DYNAMIC_PARTITIONING_AVAILABLE BIT(3)
/*
- * Support for passing hypercall input parameter block via XMM
+ * Support for passing hypercall input and output parameter block via XMM
* registers is available
*/
#define HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE BIT(4)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 8a47f8541eab7..42f44546fe79c 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1865,6 +1865,7 @@ struct kvm_hv_hcall {
u16 rep_idx;
bool fast;
bool rep;
+ bool xmm_dirty;
sse128_t xmm[HV_HYPERCALL_MAX_XMM_REGISTERS];
/*
@@ -2396,9 +2397,49 @@ static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u64 result)
return ret;
}
+static void kvm_hv_write_xmm(struct kvm_hyperv_xmm_reg *xmm)
+{
+ int reg;
+
+ kvm_fpu_get();
+ for (reg = 0; reg < HV_HYPERCALL_MAX_XMM_REGISTERS; reg++) {
+ const sse128_t data = sse128(xmm[reg].low, xmm[reg].high);
+ _kvm_write_sse_reg(reg, &data);
+ }
+ kvm_fpu_put();
+}
+
+static bool kvm_hv_is_xmm_output_hcall(u16 code)
+{
+ return false;
+}
+
+static bool kvm_hv_xmm_output_allowed(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+
+ return !hv_vcpu->enforce_cpuid ||
+ hv_vcpu->cpuid_cache.features_edx &
+ HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
+}
+
static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu)
{
- return kvm_hv_hypercall_complete(vcpu, vcpu->run->hyperv.u.hcall.result);
+ bool fast = !!(vcpu->run->hyperv.u.hcall.input & HV_HYPERCALL_FAST_BIT);
+ u16 code = vcpu->run->hyperv.u.hcall.input & 0xffff;
+ u64 result = vcpu->run->hyperv.u.hcall.result;
+
+ if (hv_result_success(result) && fast &&
+ kvm_hv_is_xmm_output_hcall(code)) {
+ if (unlikely(!kvm_hv_xmm_output_allowed(vcpu))) {
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
+ }
+
+ kvm_hv_write_xmm(vcpu->run->hyperv.u.hcall.xmm);
+ }
+
+ return kvm_hv_hypercall_complete(vcpu, result);
}
static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
@@ -2553,6 +2594,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
hc.rep_cnt = (hc.param >> HV_HYPERCALL_REP_COMP_OFFSET) & 0xfff;
hc.rep_idx = (hc.param >> HV_HYPERCALL_REP_START_OFFSET) & 0xfff;
hc.rep = !!(hc.rep_cnt || hc.rep_idx);
+ hc.xmm_dirty = false;
trace_kvm_hv_hypercall(hc.code, hc.fast, hc.var_cnt, hc.rep_cnt,
hc.rep_idx, hc.ingpa, hc.outgpa);
@@ -2673,6 +2715,15 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
break;
}
+ if (hv_result_success(ret) && hc.xmm_dirty) {
+ if (unlikely(!kvm_hv_xmm_output_allowed(vcpu))) {
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
+ }
+
+ kvm_hv_write_xmm((struct kvm_hyperv_xmm_reg *)hc.xmm);
+ }
+
hypercall_complete:
return kvm_hv_hypercall_complete(vcpu, ret);
@@ -2682,6 +2733,8 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
vcpu->run->hyperv.u.hcall.input = hc.param;
vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa;
vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa;
+ if (hc.fast)
+ memcpy(vcpu->run->hyperv.u.hcall.xmm, hc.xmm, sizeof(hc.xmm));
vcpu->arch.complete_userspace_io = kvm_hv_hypercall_complete_userspace;
return 0;
}
@@ -2830,6 +2883,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS;
ent->edx |= HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE;
+ ent->edx |= HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
ent->edx |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;
ent->edx |= HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d03842abae578..fbdee8d754595 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -90,6 +90,11 @@ struct kvm_pit_config {
#define KVM_PIT_SPEAKER_DUMMY 1
+struct kvm_hyperv_xmm_reg {
+ __u64 low;
+ __u64 high;
+};
+
struct kvm_hyperv_exit {
#define KVM_EXIT_HYPERV_SYNIC 1
#define KVM_EXIT_HYPERV_HCALL 2
@@ -108,6 +113,7 @@ struct kvm_hyperv_exit {
__u64 input;
__u64 result;
__u64 params[2];
+ struct kvm_hyperv_xmm_reg xmm[6];
} hcall;
struct {
__u32 msr;
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 01/18] KVM: x86: hyper-v: Introduce XMM output support
2024-06-09 15:49 ` [PATCH 01/18] KVM: x86: hyper-v: Introduce XMM output support Nicolas Saenz Julienne
@ 2024-07-08 14:59 ` Vitaly Kuznetsov
2024-07-17 14:12 ` Nicolas Saenz Julienne
0 siblings, 1 reply; 40+ messages in thread
From: Vitaly Kuznetsov @ 2024-07-08 14:59 UTC (permalink / raw)
To: Nicolas Saenz Julienne, linux-kernel, kvm
Cc: pbonzini, seanjc, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
Nicolas Saenz Julienne <nsaenz@amazon.com> writes:
> Prepare infrastructure to be able to return data through the XMM
> registers when Hyper-V hypercalls are issues in fast mode. The XMM
> registers are exposed to user-space through KVM_EXIT_HYPERV_HCALL and
> restored on successful hypercall completion.
>
> Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
>
> ---
>
> There was some discussion in the RFC about whether growing 'struct
> kvm_hyperv_exit' is ABI breakage. IMO it isn't:
> - There is padding in 'struct kvm_run' that ensures that a bigger
> 'struct kvm_hyperv_exit' doesn't alter the offsets within that struct.
> - Adding a new field at the bottom of the 'hcall' field within the
> 'struct kvm_hyperv_exit' should be fine as well, as it doesn't alter
> the offsets within that struct either.
> - Ultimately, previous updates to 'struct kvm_hyperv_exit's hint that
> its size isn't part of the uABI. It already grew when syndbg was
> introduced.
Yes but SYNDBG exit comes with KVM_EXIT_HYPERV_SYNDBG. While I don't see
any immediate issues with the current approach, we may want to introduce
something like KVM_EXIT_HYPERV_HCALL_XMM: the userspace must be prepared
to handle this new information anyway and it is better to make
unprepared userspace fail with 'unknown exit' then to mishandle a
hypercall by ignoring XMM portion of the data.
>
> Documentation/virt/kvm/api.rst | 19 ++++++++++
> arch/x86/include/asm/hyperv-tlfs.h | 2 +-
> arch/x86/kvm/hyperv.c | 56 +++++++++++++++++++++++++++++-
> include/uapi/linux/kvm.h | 6 ++++
> 4 files changed, 81 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index a71d91978d9ef..17893b330b76f 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8893,3 +8893,22 @@ Ordering of KVM_GET_*/KVM_SET_* ioctls
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> TBD
> +
> +10. Hyper-V CPUIDs
> +==================
> +
> +This section only applies to x86.
We can probably use
:Architectures: x86
which we already use.
> +
> +New Hyper-V feature support is no longer being tracked through KVM
> +capabilities. Userspace can check if a particular version of KVM supports a
> +feature using KMV_GET_SUPPORTED_HV_CPUID. This section documents how Hyper-V
> +CPUIDs map to KVM functionality.
> +
> +10.1 HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE
> +------------------------------------------
> +
> +:Location: CPUID.40000003H:EDX[bit 15]
> +
> +This CPUID indicates that KVM supports retuning data to the guest in response
> +to a hypercall using the XMM registers. It also extends ``struct
> +kvm_hyperv_exit`` to allow passing the XMM data from userspace.
It's always good to document things, thanks! I'm, however, wondering
what should we document as part of KVM API. In the file, we already
have:
- "4.118 KVM_GET_SUPPORTED_HV_CPUID"
- "struct kvm_hyperv_exit" description in "5. The kvm_run structure"
The later should definitely get extended to cover XMM and I guess the
former can accomodate the 'no longer being tracked' comment. With that,
maybe there's no need for a new section?
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 3787d26810c1c..6a18c9f77d5fe 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -49,7 +49,7 @@
> /* Support for physical CPU dynamic partitioning events is available*/
> #define HV_X64_CPU_DYNAMIC_PARTITIONING_AVAILABLE BIT(3)
> /*
> - * Support for passing hypercall input parameter block via XMM
> + * Support for passing hypercall input and output parameter block via XMM
> * registers is available
> */
> #define HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE BIT(4)
This change of the comment is weird (or I may have forgotten something
important), could you please elaborate?. Currently, we have:
/*
* Support for passing hypercall input parameter block via XMM
* registers is available
*/
#define HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE BIT(4)
...
/*
* Support for returning hypercall output block via XMM
* registers is available
*/
#define HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE BIT(15)
which seems to be correct. TLFS also defines
Bit 4: XmmRegistersForFastHypercallAvailable
in CPUID 0x40000009.EDX (Nested Hypervisor Feature Identification) which
probably covers both but we don't set this leaf in KVM currently ...
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 8a47f8541eab7..42f44546fe79c 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1865,6 +1865,7 @@ struct kvm_hv_hcall {
> u16 rep_idx;
> bool fast;
> bool rep;
> + bool xmm_dirty;
> sse128_t xmm[HV_HYPERCALL_MAX_XMM_REGISTERS];
>
> /*
> @@ -2396,9 +2397,49 @@ static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u64 result)
> return ret;
> }
>
> +static void kvm_hv_write_xmm(struct kvm_hyperv_xmm_reg *xmm)
> +{
> + int reg;
> +
> + kvm_fpu_get();
> + for (reg = 0; reg < HV_HYPERCALL_MAX_XMM_REGISTERS; reg++) {
> + const sse128_t data = sse128(xmm[reg].low, xmm[reg].high);
> + _kvm_write_sse_reg(reg, &data);
> + }
> + kvm_fpu_put();
> +}
> +
> +static bool kvm_hv_is_xmm_output_hcall(u16 code)
> +{
> + return false;
> +}
> +
> +static bool kvm_hv_xmm_output_allowed(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +
> + return !hv_vcpu->enforce_cpuid ||
> + hv_vcpu->cpuid_cache.features_edx &
> + HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
> +}
> +
> static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu)
> {
> - return kvm_hv_hypercall_complete(vcpu, vcpu->run->hyperv.u.hcall.result);
> + bool fast = !!(vcpu->run->hyperv.u.hcall.input & HV_HYPERCALL_FAST_BIT);
> + u16 code = vcpu->run->hyperv.u.hcall.input & 0xffff;
> + u64 result = vcpu->run->hyperv.u.hcall.result;
> +
> + if (hv_result_success(result) && fast &&
> + kvm_hv_is_xmm_output_hcall(code)) {
Assuming hypercalls with XMM output are always 'fast', should we include
'fast' check in kvm_hv_is_xmm_output_hcall()?
> + if (unlikely(!kvm_hv_xmm_output_allowed(vcpu))) {
> + kvm_queue_exception(vcpu, UD_VECTOR);
> + return 1;
> + }
> +
> + kvm_hv_write_xmm(vcpu->run->hyperv.u.hcall.xmm);
> + }
> +
> + return kvm_hv_hypercall_complete(vcpu, result);
> }
>
> static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
> @@ -2553,6 +2594,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
> hc.rep_cnt = (hc.param >> HV_HYPERCALL_REP_COMP_OFFSET) & 0xfff;
> hc.rep_idx = (hc.param >> HV_HYPERCALL_REP_START_OFFSET) & 0xfff;
> hc.rep = !!(hc.rep_cnt || hc.rep_idx);
> + hc.xmm_dirty = false;
>
> trace_kvm_hv_hypercall(hc.code, hc.fast, hc.var_cnt, hc.rep_cnt,
> hc.rep_idx, hc.ingpa, hc.outgpa);
> @@ -2673,6 +2715,15 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
> break;
> }
>
> + if (hv_result_success(ret) && hc.xmm_dirty) {
> + if (unlikely(!kvm_hv_xmm_output_allowed(vcpu))) {
> + kvm_queue_exception(vcpu, UD_VECTOR);
> + return 1;
> + }
> +
> + kvm_hv_write_xmm((struct kvm_hyperv_xmm_reg *)hc.xmm);
> + }
> +
> hypercall_complete:
> return kvm_hv_hypercall_complete(vcpu, ret);
>
> @@ -2682,6 +2733,8 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
> vcpu->run->hyperv.u.hcall.input = hc.param;
> vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa;
> vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa;
> + if (hc.fast)
> + memcpy(vcpu->run->hyperv.u.hcall.xmm, hc.xmm, sizeof(hc.xmm));
> vcpu->arch.complete_userspace_io = kvm_hv_hypercall_complete_userspace;
> return 0;
> }
> @@ -2830,6 +2883,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
> ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS;
>
> ent->edx |= HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE;
> + ent->edx |= HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
> ent->edx |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;
> ent->edx |= HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index d03842abae578..fbdee8d754595 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -90,6 +90,11 @@ struct kvm_pit_config {
>
> #define KVM_PIT_SPEAKER_DUMMY 1
>
> +struct kvm_hyperv_xmm_reg {
> + __u64 low;
> + __u64 high;
> +};
> +
> struct kvm_hyperv_exit {
> #define KVM_EXIT_HYPERV_SYNIC 1
> #define KVM_EXIT_HYPERV_HCALL 2
> @@ -108,6 +113,7 @@ struct kvm_hyperv_exit {
> __u64 input;
> __u64 result;
> __u64 params[2];
> + struct kvm_hyperv_xmm_reg xmm[6];
In theory, we have HV_HYPERCALL_MAX_XMM_REGISTERS in TLFS (which you
already use in the code). While I'm not sure it makes sense to make KVM
ABI dependent on TLFS changes (probably not), we may want to leave a
short comment explaining where '6' comes from.
> } hcall;
> struct {
> __u32 msr;
--
Vitaly
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 01/18] KVM: x86: hyper-v: Introduce XMM output support
2024-07-08 14:59 ` Vitaly Kuznetsov
@ 2024-07-17 14:12 ` Nicolas Saenz Julienne
2024-07-29 13:53 ` Vitaly Kuznetsov
0 siblings, 1 reply; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-07-17 14:12 UTC (permalink / raw)
To: Vitaly Kuznetsov, linux-kernel, kvm
Cc: pbonzini, seanjc, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, pdurrant, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
Hi Vitaly,
Thanks for having a look at this.
On Mon Jul 8, 2024 at 2:59 PM UTC, Vitaly Kuznetsov wrote:
> Nicolas Saenz Julienne <nsaenz@amazon.com> writes:
>
> > Prepare infrastructure to be able to return data through the XMM
> > registers when Hyper-V hypercalls are issues in fast mode. The XMM
> > registers are exposed to user-space through KVM_EXIT_HYPERV_HCALL and
> > restored on successful hypercall completion.
> >
> > Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
> >
> > ---
> >
> > There was some discussion in the RFC about whether growing 'struct
> > kvm_hyperv_exit' is ABI breakage. IMO it isn't:
> > - There is padding in 'struct kvm_run' that ensures that a bigger
> > 'struct kvm_hyperv_exit' doesn't alter the offsets within that struct.
> > - Adding a new field at the bottom of the 'hcall' field within the
> > 'struct kvm_hyperv_exit' should be fine as well, as it doesn't alter
> > the offsets within that struct either.
> > - Ultimately, previous updates to 'struct kvm_hyperv_exit's hint that
> > its size isn't part of the uABI. It already grew when syndbg was
> > introduced.
>
> Yes but SYNDBG exit comes with KVM_EXIT_HYPERV_SYNDBG. While I don't see
> any immediate issues with the current approach, we may want to introduce
> something like KVM_EXIT_HYPERV_HCALL_XMM: the userspace must be prepared
> to handle this new information anyway and it is better to make
> unprepared userspace fail with 'unknown exit' then to mishandle a
> hypercall by ignoring XMM portion of the data.
OK, I'll go that way. Just wanted to get a better understanding of why
you felt it was necessary.
> >
> > Documentation/virt/kvm/api.rst | 19 ++++++++++
> > arch/x86/include/asm/hyperv-tlfs.h | 2 +-
> > arch/x86/kvm/hyperv.c | 56 +++++++++++++++++++++++++++++-
> > include/uapi/linux/kvm.h | 6 ++++
> > 4 files changed, 81 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index a71d91978d9ef..17893b330b76f 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -8893,3 +8893,22 @@ Ordering of KVM_GET_*/KVM_SET_* ioctls
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> > TBD
> > +
> > +10. Hyper-V CPUIDs
> > +==================
> > +
> > +This section only applies to x86.
>
> We can probably use
>
> :Architectures: x86
>
> which we already use.
Noted.
> > +
> > +New Hyper-V feature support is no longer being tracked through KVM
> > +capabilities. Userspace can check if a particular version of KVM supports a
> > +feature using KMV_GET_SUPPORTED_HV_CPUID. This section documents how Hyper-V
> > +CPUIDs map to KVM functionality.
> > +
> > +10.1 HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE
> > +------------------------------------------
> > +
> > +:Location: CPUID.40000003H:EDX[bit 15]
> > +
> > +This CPUID indicates that KVM supports retuning data to the guest in response
> > +to a hypercall using the XMM registers. It also extends ``struct
> > +kvm_hyperv_exit`` to allow passing the XMM data from userspace.
>
> It's always good to document things, thanks! I'm, however, wondering
> what should we document as part of KVM API. In the file, we already
> have:
> - "4.118 KVM_GET_SUPPORTED_HV_CPUID"
> - "struct kvm_hyperv_exit" description in "5. The kvm_run structure"
>
> The later should definitely get extended to cover XMM and I guess the
> former can accomodate the 'no longer being tracked' comment. With that,
> maybe there's no need for a new section?
I'll try to fit it that way.
> > diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> > index 3787d26810c1c..6a18c9f77d5fe 100644
> > --- a/arch/x86/include/asm/hyperv-tlfs.h
> > +++ b/arch/x86/include/asm/hyperv-tlfs.h
> > @@ -49,7 +49,7 @@
> > /* Support for physical CPU dynamic partitioning events is available*/
> > #define HV_X64_CPU_DYNAMIC_PARTITIONING_AVAILABLE BIT(3)
> > /*
> > - * Support for passing hypercall input parameter block via XMM
> > + * Support for passing hypercall input and output parameter block via XMM
> > * registers is available
> > */
> > #define HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE BIT(4)
>
> This change of the comment is weird (or I may have forgotten something
> important), could you please elaborate?. Currently, we have:
>
> /*
> * Support for passing hypercall input parameter block via XMM
> * registers is available
> */
> #define HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE BIT(4)
> ...
> /*
> * Support for returning hypercall output block via XMM
> * registers is available
> */
> #define HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE BIT(15)
>
> which seems to be correct. TLFS also defines
>
> Bit 4: XmmRegistersForFastHypercallAvailable
>
> in CPUID 0x40000009.EDX (Nested Hypervisor Feature Identification) which
> probably covers both but we don't set this leaf in KVM currently ...
You're right, this comment update no longer applies. It used to in an
older version of the patch, but slipped through the cracks as I rebased
it. Sorry.
> > diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> > index 8a47f8541eab7..42f44546fe79c 100644
> > --- a/arch/x86/kvm/hyperv.c
> > +++ b/arch/x86/kvm/hyperv.c
> > @@ -1865,6 +1865,7 @@ struct kvm_hv_hcall {
> > u16 rep_idx;
> > bool fast;
> > bool rep;
> > + bool xmm_dirty;
> > sse128_t xmm[HV_HYPERCALL_MAX_XMM_REGISTERS];
> >
> > /*
> > @@ -2396,9 +2397,49 @@ static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u64 result)
> > return ret;
> > }
> >
> > +static void kvm_hv_write_xmm(struct kvm_hyperv_xmm_reg *xmm)
> > +{
> > + int reg;
> > +
> > + kvm_fpu_get();
> > + for (reg = 0; reg < HV_HYPERCALL_MAX_XMM_REGISTERS; reg++) {
> > + const sse128_t data = sse128(xmm[reg].low, xmm[reg].high);
> > + _kvm_write_sse_reg(reg, &data);
> > + }
> > + kvm_fpu_put();
> > +}
> > +
> > +static bool kvm_hv_is_xmm_output_hcall(u16 code)
> > +{
> > + return false;
> > +}
> > +
> > +static bool kvm_hv_xmm_output_allowed(struct kvm_vcpu *vcpu)
> > +{
> > + struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> > +
> > + return !hv_vcpu->enforce_cpuid ||
> > + hv_vcpu->cpuid_cache.features_edx &
> > + HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
> > +}
> > +
> > static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu)
> > {
> > - return kvm_hv_hypercall_complete(vcpu, vcpu->run->hyperv.u.hcall.result);
> > + bool fast = !!(vcpu->run->hyperv.u.hcall.input & HV_HYPERCALL_FAST_BIT);
> > + u16 code = vcpu->run->hyperv.u.hcall.input & 0xffff;
> > + u64 result = vcpu->run->hyperv.u.hcall.result;
> > +
> > + if (hv_result_success(result) && fast &&
> > + kvm_hv_is_xmm_output_hcall(code)) {
>
> Assuming hypercalls with XMM output are always 'fast', should we include
> 'fast' check in kvm_hv_is_xmm_output_hcall()?
Sounds good, yes.
> > + if (unlikely(!kvm_hv_xmm_output_allowed(vcpu))) {
> > + kvm_queue_exception(vcpu, UD_VECTOR);
> > + return 1;
> > + }
> > +
> > + kvm_hv_write_xmm(vcpu->run->hyperv.u.hcall.xmm);
> > + }
> > +
> > + return kvm_hv_hypercall_complete(vcpu, result);
> > }
> >
> > static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
> > @@ -2553,6 +2594,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
> > hc.rep_cnt = (hc.param >> HV_HYPERCALL_REP_COMP_OFFSET) & 0xfff;
> > hc.rep_idx = (hc.param >> HV_HYPERCALL_REP_START_OFFSET) & 0xfff;
> > hc.rep = !!(hc.rep_cnt || hc.rep_idx);
> > + hc.xmm_dirty = false;
> >
> > trace_kvm_hv_hypercall(hc.code, hc.fast, hc.var_cnt, hc.rep_cnt,
> > hc.rep_idx, hc.ingpa, hc.outgpa);
> > @@ -2673,6 +2715,15 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
> > break;
> > }
> >
> > + if (hv_result_success(ret) && hc.xmm_dirty) {
> > + if (unlikely(!kvm_hv_xmm_output_allowed(vcpu))) {
> > + kvm_queue_exception(vcpu, UD_VECTOR);
> > + return 1;
> > + }
> > +
> > + kvm_hv_write_xmm((struct kvm_hyperv_xmm_reg *)hc.xmm);
> > + }
> > +
> > hypercall_complete:
> > return kvm_hv_hypercall_complete(vcpu, ret);
> >
> > @@ -2682,6 +2733,8 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
> > vcpu->run->hyperv.u.hcall.input = hc.param;
> > vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa;
> > vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa;
> > + if (hc.fast)
> > + memcpy(vcpu->run->hyperv.u.hcall.xmm, hc.xmm, sizeof(hc.xmm));
> > vcpu->arch.complete_userspace_io = kvm_hv_hypercall_complete_userspace;
> > return 0;
> > }
> > @@ -2830,6 +2883,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
> > ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS;
> >
> > ent->edx |= HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE;
> > + ent->edx |= HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
> > ent->edx |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;
> > ent->edx |= HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
> >
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index d03842abae578..fbdee8d754595 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -90,6 +90,11 @@ struct kvm_pit_config {
> >
> > #define KVM_PIT_SPEAKER_DUMMY 1
> >
> > +struct kvm_hyperv_xmm_reg {
> > + __u64 low;
> > + __u64 high;
> > +};
> > +
> > struct kvm_hyperv_exit {
> > #define KVM_EXIT_HYPERV_SYNIC 1
> > #define KVM_EXIT_HYPERV_HCALL 2
> > @@ -108,6 +113,7 @@ struct kvm_hyperv_exit {
> > __u64 input;
> > __u64 result;
> > __u64 params[2];
> > + struct kvm_hyperv_xmm_reg xmm[6];
>
> In theory, we have HV_HYPERCALL_MAX_XMM_REGISTERS in TLFS (which you
> already use in the code). While I'm not sure it makes sense to make KVM
> ABI dependent on TLFS changes (probably not), we may want to leave a
> short comment explaining where '6' comes from.
Will do.
Thanks,
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 01/18] KVM: x86: hyper-v: Introduce XMM output support
2024-07-17 14:12 ` Nicolas Saenz Julienne
@ 2024-07-29 13:53 ` Vitaly Kuznetsov
2024-08-05 14:08 ` Nicolas Saenz Julienne
0 siblings, 1 reply; 40+ messages in thread
From: Vitaly Kuznetsov @ 2024-07-29 13:53 UTC (permalink / raw)
To: Nicolas Saenz Julienne, linux-kernel, kvm
Cc: pbonzini, seanjc, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, pdurrant, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
Nicolas Saenz Julienne <nsaenz@amazon.com> writes:
> Hi Vitaly,
> Thanks for having a look at this.
>
> On Mon Jul 8, 2024 at 2:59 PM UTC, Vitaly Kuznetsov wrote:
>> Nicolas Saenz Julienne <nsaenz@amazon.com> writes:
>>
>> > Prepare infrastructure to be able to return data through the XMM
>> > registers when Hyper-V hypercalls are issues in fast mode. The XMM
>> > registers are exposed to user-space through KVM_EXIT_HYPERV_HCALL and
>> > restored on successful hypercall completion.
>> >
>> > Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
>> >
>> > ---
>> >
>> > There was some discussion in the RFC about whether growing 'struct
>> > kvm_hyperv_exit' is ABI breakage. IMO it isn't:
>> > - There is padding in 'struct kvm_run' that ensures that a bigger
>> > 'struct kvm_hyperv_exit' doesn't alter the offsets within that struct.
>> > - Adding a new field at the bottom of the 'hcall' field within the
>> > 'struct kvm_hyperv_exit' should be fine as well, as it doesn't alter
>> > the offsets within that struct either.
>> > - Ultimately, previous updates to 'struct kvm_hyperv_exit's hint that
>> > its size isn't part of the uABI. It already grew when syndbg was
>> > introduced.
>>
>> Yes but SYNDBG exit comes with KVM_EXIT_HYPERV_SYNDBG. While I don't see
>> any immediate issues with the current approach, we may want to introduce
>> something like KVM_EXIT_HYPERV_HCALL_XMM: the userspace must be prepared
>> to handle this new information anyway and it is better to make
>> unprepared userspace fail with 'unknown exit' then to mishandle a
>> hypercall by ignoring XMM portion of the data.
>
> OK, I'll go that way. Just wanted to get a better understanding of why
> you felt it was necessary.
>
(sorry for delayed reply, I was on vacation)
I don't think it's an absolute must but it appears as a cleaner approach
to me.
Imagine there's some userspace which handles KVM_EXIT_HYPERV_HCALL today
and we want to add XMM handling there. How would we know if xmm portion
of the data is actually filled by KVM or not? With your patch, we can of
course check for HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE in
KVM_GET_SUPPORTED_HV_CPUID but this is not really straightforward, is
it? Checking the size is not good either. E.g. think about downstream
versions of KVM which may or may not have certain backports. In case we
(theoretically) do several additions to 'struct kvm_hyperv_exit', it
will quickly become a nightmare.
On the contrary, KVM_EXIT_HYPERV_HCALL_XMM (or just
KVM_EXIT_HYPERV_HCALL2) approach looks cleaner: once userspace sees it,
it knows that 'xmm' portion of the data can be relied upon.
--
Vitaly
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 01/18] KVM: x86: hyper-v: Introduce XMM output support
2024-07-29 13:53 ` Vitaly Kuznetsov
@ 2024-08-05 14:08 ` Nicolas Saenz Julienne
0 siblings, 0 replies; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-08-05 14:08 UTC (permalink / raw)
To: Vitaly Kuznetsov, linux-kernel, kvm
Cc: pbonzini, seanjc, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, pdurrant, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
On Mon Jul 29, 2024 at 1:53 PM UTC, Vitaly Kuznetsov wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> Nicolas Saenz Julienne <nsaenz@amazon.com> writes:
>
> > Hi Vitaly,
> > Thanks for having a look at this.
> >
> > On Mon Jul 8, 2024 at 2:59 PM UTC, Vitaly Kuznetsov wrote:
> >> Nicolas Saenz Julienne <nsaenz@amazon.com> writes:
> >>
> >> > Prepare infrastructure to be able to return data through the XMM
> >> > registers when Hyper-V hypercalls are issues in fast mode. The XMM
> >> > registers are exposed to user-space through KVM_EXIT_HYPERV_HCALL and
> >> > restored on successful hypercall completion.
> >> >
> >> > Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
> >> >
> >> > ---
> >> >
> >> > There was some discussion in the RFC about whether growing 'struct
> >> > kvm_hyperv_exit' is ABI breakage. IMO it isn't:
> >> > - There is padding in 'struct kvm_run' that ensures that a bigger
> >> > 'struct kvm_hyperv_exit' doesn't alter the offsets within that struct.
> >> > - Adding a new field at the bottom of the 'hcall' field within the
> >> > 'struct kvm_hyperv_exit' should be fine as well, as it doesn't alter
> >> > the offsets within that struct either.
> >> > - Ultimately, previous updates to 'struct kvm_hyperv_exit's hint that
> >> > its size isn't part of the uABI. It already grew when syndbg was
> >> > introduced.
> >>
> >> Yes but SYNDBG exit comes with KVM_EXIT_HYPERV_SYNDBG. While I don't see
> >> any immediate issues with the current approach, we may want to introduce
> >> something like KVM_EXIT_HYPERV_HCALL_XMM: the userspace must be prepared
> >> to handle this new information anyway and it is better to make
> >> unprepared userspace fail with 'unknown exit' then to mishandle a
> >> hypercall by ignoring XMM portion of the data.
> >
> > OK, I'll go that way. Just wanted to get a better understanding of why
> > you felt it was necessary.
> >
>
> (sorry for delayed reply, I was on vacation)
>
> I don't think it's an absolute must but it appears as a cleaner approach
> to me.
>
> Imagine there's some userspace which handles KVM_EXIT_HYPERV_HCALL today
> and we want to add XMM handling there. How would we know if xmm portion
> of the data is actually filled by KVM or not? With your patch, we can of
> course check for HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE in
> KVM_GET_SUPPORTED_HV_CPUID but this is not really straightforward, is
> it? Checking the size is not good either. E.g. think about downstream
> versions of KVM which may or may not have certain backports. In case we
> (theoretically) do several additions to 'struct kvm_hyperv_exit', it
> will quickly become a nightmare.
>
> On the contrary, KVM_EXIT_HYPERV_HCALL_XMM (or just
> KVM_EXIT_HYPERV_HCALL2) approach looks cleaner: once userspace sees it,
> it knows that 'xmm' portion of the data can be relied upon.
Makes sense, thanks for the explanation.
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 02/18] KVM: x86: hyper-v: Introduce helpers to check if VSM is exposed to guest
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
2024-06-09 15:49 ` [PATCH 01/18] KVM: x86: hyper-v: Introduce XMM output support Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-06-09 15:49 ` [PATCH 03/18] hyperv-tlfs: Update struct hv_send_ipi{_ex}'s declarations Nicolas Saenz Julienne
` (17 subsequent siblings)
19 siblings, 0 replies; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
Introduce a helper function to check if the guest exposes the VSM CPUID
bit.
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
arch/x86/kvm/hyperv.h | 10 ++++++++++
include/asm-generic/hyperv-tlfs.h | 1 +
2 files changed, 11 insertions(+)
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 923e64903da9a..d007d2203e0e4 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -265,6 +265,12 @@ static inline void kvm_hv_nested_transtion_tlb_flush(struct kvm_vcpu *vcpu,
}
int kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu);
+static inline bool kvm_hv_cpuid_vsm_enabled(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+
+ return hv_vcpu && (hv_vcpu->cpuid_cache.features_ebx & HV_ACCESS_VSM);
+}
#else /* CONFIG_KVM_HYPERV */
static inline void kvm_hv_setup_tsc_page(struct kvm *kvm,
struct pvclock_vcpu_time_info *hv_clock) {}
@@ -322,6 +328,10 @@ static inline u32 kvm_hv_get_vpindex(struct kvm_vcpu *vcpu)
return vcpu->vcpu_idx;
}
static inline void kvm_hv_nested_transtion_tlb_flush(struct kvm_vcpu *vcpu, bool tdp_enabled) {}
+static inline bool kvm_hv_cpuid_vsm_enabled(struct kvm_vcpu *vcpu)
+{
+ return false;
+}
#endif /* CONFIG_KVM_HYPERV */
#endif /* __ARCH_X86_KVM_HYPERV_H__ */
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 814207e7c37fc..ffac04bbd0c19 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -89,6 +89,7 @@
#define HV_ACCESS_STATS BIT(8)
#define HV_DEBUGGING BIT(11)
#define HV_CPU_MANAGEMENT BIT(12)
+#define HV_ACCESS_VSM BIT(16)
#define HV_ENABLE_EXTENDED_HYPERCALLS BIT(20)
#define HV_ISOLATION BIT(22)
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 03/18] hyperv-tlfs: Update struct hv_send_ipi{_ex}'s declarations
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
2024-06-09 15:49 ` [PATCH 01/18] KVM: x86: hyper-v: Introduce XMM output support Nicolas Saenz Julienne
2024-06-09 15:49 ` [PATCH 02/18] KVM: x86: hyper-v: Introduce helpers to check if VSM is exposed to guest Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-06-09 15:49 ` [PATCH 04/18] KVM: x86: hyper-v: Introduce VTL awareness to Hyper-V's PV-IPIs Nicolas Saenz Julienne
` (16 subsequent siblings)
19 siblings, 0 replies; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
Both 'struct hv_send_ipi' and 'struct hv_send_ipi_ex' have an 'union
hv_input_vtl' parameter which has been ignored until now. Expose it, as
KVM will soon provide a way of dealing with VTL-aware IPIs. While doing
Also fixup __send_ipi_mask_ex().
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
arch/x86/hyperv/hv_apic.c | 3 +--
include/asm-generic/hyperv-tlfs.h | 6 ++++--
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index 0569f579338b5..97907371d51ef 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -121,9 +121,8 @@ static bool __send_ipi_mask_ex(const struct cpumask *mask, int vector,
if (unlikely(!ipi_arg))
goto ipi_mask_ex_done;
+ memset(ipi_arg, 0, sizeof(*ipi_arg));
ipi_arg->vector = vector;
- ipi_arg->reserved = 0;
- ipi_arg->vp_set.valid_bank_mask = 0;
/*
* Use HV_GENERIC_SET_ALL and avoid converting cpumask to VP_SET
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index ffac04bbd0c19..28cde641b5474 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -425,14 +425,16 @@ struct hv_vpset {
/* HvCallSendSyntheticClusterIpi hypercall */
struct hv_send_ipi {
u32 vector;
- u32 reserved;
+ union hv_input_vtl in_vtl;
+ u8 reserved[3];
u64 cpu_mask;
} __packed;
/* HvCallSendSyntheticClusterIpiEx hypercall */
struct hv_send_ipi_ex {
u32 vector;
- u32 reserved;
+ union hv_input_vtl in_vtl;
+ u8 reserved[3];
struct hv_vpset vp_set;
} __packed;
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 04/18] KVM: x86: hyper-v: Introduce VTL awareness to Hyper-V's PV-IPIs
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
` (2 preceding siblings ...)
2024-06-09 15:49 ` [PATCH 03/18] hyperv-tlfs: Update struct hv_send_ipi{_ex}'s declarations Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-09-13 18:02 ` Sean Christopherson
2024-06-09 15:49 ` [PATCH 05/18] KVM: x86: hyper-v: Introduce MP_STATE_HV_INACTIVE_VTL Nicolas Saenz Julienne
` (15 subsequent siblings)
19 siblings, 1 reply; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
HvCallSendSyntheticClusterIpi and HvCallSendSyntheticClusterIpiEx allow
sending VTL-aware IPIs. Honour the hcall by exiting to user-space upon
receiving a request with a valid VTL target. This behaviour is only
available if the VSM CPUID flag is available and exposed to the guest.
It doesn't introduce a behaviour change otherwise.
User-space is accountable for the correct processing of the PV-IPI
before resuming execution.
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
arch/x86/kvm/hyperv.c | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 42f44546fe79c..d00baf3ffb165 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2217,16 +2217,20 @@ static void kvm_hv_send_ipi_to_many(struct kvm *kvm, u32 vector,
static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
{
+ bool vsm_enabled = kvm_hv_cpuid_vsm_enabled(vcpu);
struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
u64 *sparse_banks = hv_vcpu->sparse_banks;
struct kvm *kvm = vcpu->kvm;
struct hv_send_ipi_ex send_ipi_ex;
struct hv_send_ipi send_ipi;
+ union hv_input_vtl *in_vtl;
u64 valid_bank_mask;
+ int rsvd_shift;
u32 vector;
bool all_cpus;
if (hc->code == HVCALL_SEND_IPI) {
+ in_vtl = &send_ipi.in_vtl;
if (!hc->fast) {
if (unlikely(kvm_read_guest(kvm, hc->ingpa, &send_ipi,
sizeof(send_ipi))))
@@ -2235,16 +2239,22 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
vector = send_ipi.vector;
} else {
/* 'reserved' part of hv_send_ipi should be 0 */
- if (unlikely(hc->ingpa >> 32 != 0))
+ rsvd_shift = vsm_enabled ? 40 : 32;
+ if (unlikely(hc->ingpa >> rsvd_shift != 0))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
+ in_vtl->as_uint8 = (u8)(hc->ingpa >> 32);
sparse_banks[0] = hc->outgpa;
vector = (u32)hc->ingpa;
}
all_cpus = false;
valid_bank_mask = BIT_ULL(0);
+ if (in_vtl->use_target_vtl)
+ return -ENODEV;
+
trace_kvm_hv_send_ipi(vector, sparse_banks[0]);
} else {
+ in_vtl = &send_ipi_ex.in_vtl;
if (!hc->fast) {
if (unlikely(kvm_read_guest(kvm, hc->ingpa, &send_ipi_ex,
sizeof(send_ipi_ex))))
@@ -2253,8 +2263,12 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
send_ipi_ex.vector = (u32)hc->ingpa;
send_ipi_ex.vp_set.format = hc->outgpa;
send_ipi_ex.vp_set.valid_bank_mask = sse128_lo(hc->xmm[0]);
+ in_vtl->as_uint8 = (u8)(hc->ingpa >> 32);
}
+ if (vsm_enabled && in_vtl->use_target_vtl)
+ return -ENODEV;
+
trace_kvm_hv_send_ipi_ex(send_ipi_ex.vector,
send_ipi_ex.vp_set.format,
send_ipi_ex.vp_set.valid_bank_mask);
@@ -2682,6 +2696,9 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
break;
}
ret = kvm_hv_send_ipi(vcpu, &hc);
+ /* VTL-enabled ipi, let user-space handle it */
+ if (ret == -ENODEV)
+ goto hypercall_userspace_exit;
break;
case HVCALL_POST_DEBUG_DATA:
case HVCALL_RETRIEVE_DEBUG_DATA:
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 04/18] KVM: x86: hyper-v: Introduce VTL awareness to Hyper-V's PV-IPIs
2024-06-09 15:49 ` [PATCH 04/18] KVM: x86: hyper-v: Introduce VTL awareness to Hyper-V's PV-IPIs Nicolas Saenz Julienne
@ 2024-09-13 18:02 ` Sean Christopherson
2024-09-16 14:52 ` Nicolas Saenz Julienne
0 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2024-09-13 18:02 UTC (permalink / raw)
To: Nicolas Saenz Julienne
Cc: linux-kernel, kvm, pbonzini, vkuznets, linux-doc, linux-hyperv,
linux-arch, linux-trace-kernel, graf, dwmw2, paul, mlevitsk,
jgowans, corbet, decui, tglx, mingo, bp, dave.hansen, x86,
amoorthy
On Sun, Jun 09, 2024, Nicolas Saenz Julienne wrote:
> HvCallSendSyntheticClusterIpi and HvCallSendSyntheticClusterIpiEx allow
> sending VTL-aware IPIs. Honour the hcall by exiting to user-space upon
> receiving a request with a valid VTL target. This behaviour is only
> available if the VSM CPUID flag is available and exposed to the guest.
> It doesn't introduce a behaviour change otherwise.
>
> User-space is accountable for the correct processing of the PV-IPI
> before resuming execution.
>
> Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
> ---
> arch/x86/kvm/hyperv.c | 19 ++++++++++++++++++-
> 1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 42f44546fe79c..d00baf3ffb165 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -2217,16 +2217,20 @@ static void kvm_hv_send_ipi_to_many(struct kvm *kvm, u32 vector,
>
> static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
> {
> + bool vsm_enabled = kvm_hv_cpuid_vsm_enabled(vcpu);
> struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> u64 *sparse_banks = hv_vcpu->sparse_banks;
> struct kvm *kvm = vcpu->kvm;
> struct hv_send_ipi_ex send_ipi_ex;
> struct hv_send_ipi send_ipi;
> + union hv_input_vtl *in_vtl;
> u64 valid_bank_mask;
> + int rsvd_shift;
> u32 vector;
> bool all_cpus;
>
> if (hc->code == HVCALL_SEND_IPI) {
> + in_vtl = &send_ipi.in_vtl;
I don't see any value in having a local pointer to a union. Just use send_ipi.in_vtl.
> if (!hc->fast) {
> if (unlikely(kvm_read_guest(kvm, hc->ingpa, &send_ipi,
> sizeof(send_ipi))))
> @@ -2235,16 +2239,22 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
> vector = send_ipi.vector;
> } else {
> /* 'reserved' part of hv_send_ipi should be 0 */
> - if (unlikely(hc->ingpa >> 32 != 0))
> + rsvd_shift = vsm_enabled ? 40 : 32;
> + if (unlikely(hc->ingpa >> rsvd_shift != 0))
> return HV_STATUS_INVALID_HYPERCALL_INPUT;
The existing error handling doesn't make any sense to me. Why is this the _only_
path that enforces reserved bits?
Regarding the shift, I think it makes more sense to do:
/* Bits 63:40 are always reserved. */
if (unlikely(hc->ingpa >> 40 != 0))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
send_ipi.in_vtl.as_uint8 = (u8)(hc->ingpa >> 32);
if (unlikely(!vsm_enabled && send_ipi.in_vtl.as_uint8))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
so that it's more obvious exactly what is/isn't reserved when VSM isn't/is enabled.
> + in_vtl->as_uint8 = (u8)(hc->ingpa >> 32);
> sparse_banks[0] = hc->outgpa;
> vector = (u32)hc->ingpa;
> }
> all_cpus = false;
> valid_bank_mask = BIT_ULL(0);
>
> + if (in_vtl->use_target_vtl)
Due to the lack of error checking for the !hc->fast case, this will do the wrong
thing if vsm_enabled=false.
> + return -ENODEV;
> +
> trace_kvm_hv_send_ipi(vector, sparse_banks[0]);
> } else {
> + in_vtl = &send_ipi_ex.in_vtl;
> if (!hc->fast) {
> if (unlikely(kvm_read_guest(kvm, hc->ingpa, &send_ipi_ex,
> sizeof(send_ipi_ex))))
> @@ -2253,8 +2263,12 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
> send_ipi_ex.vector = (u32)hc->ingpa;
> send_ipi_ex.vp_set.format = hc->outgpa;
> send_ipi_ex.vp_set.valid_bank_mask = sse128_lo(hc->xmm[0]);
> + in_vtl->as_uint8 = (u8)(hc->ingpa >> 32);
> }
>
> + if (vsm_enabled && in_vtl->use_target_vtl)
> + return -ENODEV;
> +
> trace_kvm_hv_send_ipi_ex(send_ipi_ex.vector,
> send_ipi_ex.vp_set.format,
> send_ipi_ex.vp_set.valid_bank_mask);
> @@ -2682,6 +2696,9 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
> break;
> }
> ret = kvm_hv_send_ipi(vcpu, &hc);
> + /* VTL-enabled ipi, let user-space handle it */
> + if (ret == -ENODEV)
I generally don't love "magic" error codes, but I don't see an obvious better
solution either. The other weird thing is that "ret" is a u64, versus the more
common int or even long. I doubt it's problematic in practice, just a bit odd.
> + goto hypercall_userspace_exit;
> break;
> case HVCALL_POST_DEBUG_DATA:
> case HVCALL_RETRIEVE_DEBUG_DATA:
> --
> 2.40.1
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 04/18] KVM: x86: hyper-v: Introduce VTL awareness to Hyper-V's PV-IPIs
2024-09-13 18:02 ` Sean Christopherson
@ 2024-09-16 14:52 ` Nicolas Saenz Julienne
0 siblings, 0 replies; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-09-16 14:52 UTC (permalink / raw)
To: Sean Christopherson
Cc: linux-kernel, kvm, pbonzini, vkuznets, linux-doc, linux-hyperv,
linux-arch, linux-trace-kernel, graf, dwmw2, paul, mlevitsk,
jgowans, corbet, decui, tglx, mingo, bp, dave.hansen, x86,
amoorthy
On Fri Sep 13, 2024 at 6:02 PM UTC, Sean Christopherson wrote:
> On Sun, Jun 09, 2024, Nicolas Saenz Julienne wrote:
> > HvCallSendSyntheticClusterIpi and HvCallSendSyntheticClusterIpiEx allow
> > sending VTL-aware IPIs. Honour the hcall by exiting to user-space upon
> > receiving a request with a valid VTL target. This behaviour is only
> > available if the VSM CPUID flag is available and exposed to the guest.
> > It doesn't introduce a behaviour change otherwise.
> >
> > User-space is accountable for the correct processing of the PV-IPI
> > before resuming execution.
> >
> > Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
> > ---
> > arch/x86/kvm/hyperv.c | 19 ++++++++++++++++++-
> > 1 file changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> > index 42f44546fe79c..d00baf3ffb165 100644
> > --- a/arch/x86/kvm/hyperv.c
> > +++ b/arch/x86/kvm/hyperv.c
> > @@ -2217,16 +2217,20 @@ static void kvm_hv_send_ipi_to_many(struct kvm *kvm, u32 vector,
> >
> > static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
> > {
> > + bool vsm_enabled = kvm_hv_cpuid_vsm_enabled(vcpu);
> > struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> > u64 *sparse_banks = hv_vcpu->sparse_banks;
> > struct kvm *kvm = vcpu->kvm;
> > struct hv_send_ipi_ex send_ipi_ex;
> > struct hv_send_ipi send_ipi;
> > + union hv_input_vtl *in_vtl;
> > u64 valid_bank_mask;
> > + int rsvd_shift;
> > u32 vector;
> > bool all_cpus;
> >
> > if (hc->code == HVCALL_SEND_IPI) {
> > + in_vtl = &send_ipi.in_vtl;
>
> I don't see any value in having a local pointer to a union. Just use send_ipi.in_vtl.
OK, I'll simplify it.
> > if (!hc->fast) {
> > if (unlikely(kvm_read_guest(kvm, hc->ingpa, &send_ipi,
> > sizeof(send_ipi))))
> > @@ -2235,16 +2239,22 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
> > vector = send_ipi.vector;
> > } else {
> > /* 'reserved' part of hv_send_ipi should be 0 */
> > - if (unlikely(hc->ingpa >> 32 != 0))
> > + rsvd_shift = vsm_enabled ? 40 : 32;
> > + if (unlikely(hc->ingpa >> rsvd_shift != 0))
> > return HV_STATUS_INVALID_HYPERCALL_INPUT;
>
> The existing error handling doesn't make any sense to me. Why is this the _only_
> path that enforces reserved bits?
I don't know.
As far as I can tell, the hypercall was introduced in v5 of the TLFS and
already contained the VTL selection bits. Unfortunately the spec doesn't
explicitly state what to do when hv_input_vtl is received from a non-VSM
enabled guest, so I tried to keep the current behaviour for every case
(send_ipi/send_ipi_ex/fast/!fat).
> Regarding the shift, I think it makes more sense to do:
>
> /* Bits 63:40 are always reserved. */
> if (unlikely(hc->ingpa >> 40 != 0))
> return HV_STATUS_INVALID_HYPERCALL_INPUT;
>
> send_ipi.in_vtl.as_uint8 = (u8)(hc->ingpa >> 32);
> if (unlikely(!vsm_enabled && send_ipi.in_vtl.as_uint8))
> return HV_STATUS_INVALID_HYPERCALL_INPUT;
>
> so that it's more obvious exactly what is/isn't reserved when VSM isn't/is enabled.
OK, I agree it's nicer.
> > + in_vtl->as_uint8 = (u8)(hc->ingpa >> 32);
> > sparse_banks[0] = hc->outgpa;
> > vector = (u32)hc->ingpa;
> > }
> > all_cpus = false;
> > valid_bank_mask = BIT_ULL(0);
> >
> > + if (in_vtl->use_target_vtl)
>
> Due to the lack of error checking for the !hc->fast case, this will do the wrong
> thing if vsm_enabled=false.
Yes. I'll fix it.
Thanks,
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 05/18] KVM: x86: hyper-v: Introduce MP_STATE_HV_INACTIVE_VTL
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
` (3 preceding siblings ...)
2024-06-09 15:49 ` [PATCH 04/18] KVM: x86: hyper-v: Introduce VTL awareness to Hyper-V's PV-IPIs Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-09-13 19:01 ` Sean Christopherson
2024-06-09 15:49 ` [PATCH 06/18] KVM: x86: hyper-v: Exit on Get/SetVpRegisters hcall Nicolas Saenz Julienne
` (14 subsequent siblings)
19 siblings, 1 reply; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
Model inactive VTL vCPUs' behaviour with a new MP state.
Inactive VTLs are in an artificial halt state. They enter into this
state in response to invoking HvCallVtlCall, HvCallVtlReturn.
User-space, which is VTL aware, can processes the hypercall, and set the
vCPU in MP_STATE_HV_INACTIVE_VTL. When a vCPU is run in this state it'll
block until a wakeup event is received. The rules of what constitutes an
event are analogous to halt's except that VTL's ignore RFLAGS.IF.
When a wakeup event is registered, KVM will exit to user-space with a
KVM_SYSTEM_EVENT exit, and KVM_SYSTEM_EVENT_WAKEUP event type.
User-space is responsible of deciding whether the event has precedence
over the active VTL and will switch the vCPU to KVM_MP_STATE_RUNNABLE
before resuming execution on it.
Running a KVM_MP_STATE_HV_INACTIVE_VTL vCPU with pending events will
return immediately to user-space.
Note that by re-using the readily available halt infrastructure in
KVM_RUN, MP_STATE_HV_INACTIVE_VTL correctly handles (or disables)
virtualisation features like the VMX preemption timer or APICv before
blocking.
Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
I do recall Sean mentioning using MP states for this might have
unexpected side-effects. But it was in the context of introducing a
broader `HALTED_USERSPACE` style state. I believe that by narrowing down
the MP state's semantics to the specifics of inactive VTLs --
alternatively, we could change RFLAGS.IF in user-space before updating
the mp state -- we cement this as a VSM-only API as well as limit the
ambiguity on the guest/vCPU's state upon entering into this execution
mode.
Documentation/virt/kvm/api.rst | 19 +++++++++++++++++++
arch/x86/kvm/hyperv.h | 8 ++++++++
arch/x86/kvm/svm/svm.c | 7 ++++++-
arch/x86/kvm/vmx/vmx.c | 7 ++++++-
arch/x86/kvm/x86.c | 16 +++++++++++++++-
include/uapi/linux/kvm.h | 1 +
6 files changed, 55 insertions(+), 3 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 17893b330b76f..e664c54a13b04 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1517,6 +1517,8 @@ Possible values are:
[s390]
KVM_MP_STATE_SUSPENDED the vcpu is in a suspend state and is waiting
for a wakeup event [arm64]
+ KVM_MP_STATE_HV_INACTIVE_VTL the vcpu is an inactive VTL and is waiting for
+ a wakeup event [x86]
========================== ===============================================
On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
@@ -1559,6 +1561,23 @@ KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
On LoongArch, only the KVM_MP_STATE_RUNNABLE state is used to reflect
whether the vcpu is runnable.
+For x86:
+^^^^^^^^
+
+KVM_MP_STATE_HV_INACTIVE_VTL is only available to a VM if Hyper-V's
+HV_ACCESS_VSM CPUID is exposed to the guest. This processor state models the
+behavior of an inactive VTL and should only be used for this purpose. A
+userspace process should only switch a vCPU into this MP state in response to a
+HvCallVtlCall, HvCallVtlReturn.
+
+If a vCPU is in KVM_MP_STATE_HV_INACTIVE_VTL, KVM will emulate the
+architectural execution of a HLT instruction with the caveat that RFLAGS.IF is
+ignored when deciding whether to wake up (TLFS 12.12.2.1). If a wakeup is
+recognized, KVM will exit to userspace with a KVM_SYSTEM_EVENT exit, where the
+event type is KVM_SYSTEM_EVENT_WAKEUP. Userspace has the responsibility to
+switch the vCPU back into KVM_MP_STATE_RUNNABLE state. Calling KVM_RUN on a
+KVM_MP_STATE_HV_INACTIVE_VTL vCPU with pending events will exit immediately.
+
4.39 KVM_SET_MP_STATE
---------------------
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index d007d2203e0e4..d42fe3f85b002 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -271,6 +271,10 @@ static inline bool kvm_hv_cpuid_vsm_enabled(struct kvm_vcpu *vcpu)
return hv_vcpu && (hv_vcpu->cpuid_cache.features_ebx & HV_ACCESS_VSM);
}
+static inline bool kvm_hv_vcpu_is_idle_vtl(struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.mp_state == KVM_MP_STATE_HV_INACTIVE_VTL;
+}
#else /* CONFIG_KVM_HYPERV */
static inline void kvm_hv_setup_tsc_page(struct kvm *kvm,
struct pvclock_vcpu_time_info *hv_clock) {}
@@ -332,6 +336,10 @@ static inline bool kvm_hv_cpuid_vsm_enabled(struct kvm_vcpu *vcpu)
{
return false;
}
+static inline bool kvm_hv_vcpu_is_idle_vtl(struct kvm_vcpu *vcpu)
+{
+ return false;
+}
#endif /* CONFIG_KVM_HYPERV */
#endif /* __ARCH_X86_KVM_HYPERV_H__ */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 296c524988f95..9671191fef4ea 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -49,6 +49,7 @@
#include "svm.h"
#include "svm_ops.h"
+#include "hyperv.h"
#include "kvm_onhyperv.h"
#include "svm_onhyperv.h"
@@ -3797,6 +3798,10 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
if (!gif_set(svm))
return true;
+ /*
+ * The Hyper-V TLFS states that RFLAGS.IF is ignored when deciding
+ * whether to block interrupts targeted at inactive VTLs.
+ */
if (is_guest_mode(vcpu)) {
/* As long as interrupts are being delivered... */
if ((svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
@@ -3808,7 +3813,7 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
if (nested_exit_on_intr(svm))
return false;
} else {
- if (!svm_get_if_flag(vcpu))
+ if (!svm_get_if_flag(vcpu) && !kvm_hv_vcpu_is_idle_vtl(vcpu))
return true;
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b3c83c06f8265..ac0682fece604 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5057,7 +5057,12 @@ bool vmx_interrupt_blocked(struct kvm_vcpu *vcpu)
if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu))
return false;
- return !(vmx_get_rflags(vcpu) & X86_EFLAGS_IF) ||
+ /*
+ * The Hyper-V TLFS states that RFLAGS.IF is ignored when deciding
+ * whether to block interrupts targeted at inactive VTLs.
+ */
+ return (!(vmx_get_rflags(vcpu) & X86_EFLAGS_IF) &&
+ !kvm_hv_vcpu_is_idle_vtl(vcpu)) ||
(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
(GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8c9e4281d978d..a6e2312ccb68f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -134,6 +134,7 @@ static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu);
static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
+static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu);
static DEFINE_MUTEX(vendor_module_lock);
struct kvm_x86_ops kvm_x86_ops __read_mostly;
@@ -11176,7 +11177,8 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
kvm_lapic_switch_to_sw_timer(vcpu);
kvm_vcpu_srcu_read_unlock(vcpu);
- if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
+ if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED ||
+ kvm_hv_vcpu_is_idle_vtl(vcpu))
kvm_vcpu_halt(vcpu);
else
kvm_vcpu_block(vcpu);
@@ -11218,6 +11220,7 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
vcpu->arch.apf.halted = false;
break;
case KVM_MP_STATE_INIT_RECEIVED:
+ case KVM_MP_STATE_HV_INACTIVE_VTL:
break;
default:
WARN_ON_ONCE(1);
@@ -11264,6 +11267,13 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
if (kvm_cpu_has_pending_timer(vcpu))
kvm_inject_pending_timer_irqs(vcpu);
+ if (kvm_hv_vcpu_is_idle_vtl(vcpu) && kvm_vcpu_has_events(vcpu)) {
+ r = 0;
+ vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+ vcpu->run->system_event.type = KVM_SYSTEM_EVENT_WAKEUP;
+ break;
+ }
+
if (dm_request_for_irq_injection(vcpu) &&
kvm_vcpu_ready_for_interrupt_injection(vcpu)) {
r = 0;
@@ -11703,6 +11713,10 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
goto out;
break;
+ case KVM_MP_STATE_HV_INACTIVE_VTL:
+ if (is_guest_mode(vcpu) || !kvm_hv_cpuid_vsm_enabled(vcpu))
+ goto out;
+ break;
case KVM_MP_STATE_RUNNABLE:
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index fbdee8d754595..f4864e6907e0b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -564,6 +564,7 @@ struct kvm_vapic_addr {
#define KVM_MP_STATE_LOAD 8
#define KVM_MP_STATE_AP_RESET_HOLD 9
#define KVM_MP_STATE_SUSPENDED 10
+#define KVM_MP_STATE_HV_INACTIVE_VTL 11
struct kvm_mp_state {
__u32 mp_state;
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 05/18] KVM: x86: hyper-v: Introduce MP_STATE_HV_INACTIVE_VTL
2024-06-09 15:49 ` [PATCH 05/18] KVM: x86: hyper-v: Introduce MP_STATE_HV_INACTIVE_VTL Nicolas Saenz Julienne
@ 2024-09-13 19:01 ` Sean Christopherson
2024-09-16 15:33 ` Nicolas Saenz Julienne
0 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2024-09-13 19:01 UTC (permalink / raw)
To: Nicolas Saenz Julienne
Cc: linux-kernel, kvm, pbonzini, vkuznets, linux-doc, linux-hyperv,
linux-arch, linux-trace-kernel, graf, dwmw2, paul, mlevitsk,
jgowans, corbet, decui, tglx, mingo, bp, dave.hansen, x86,
amoorthy
On Sun, Jun 09, 2024, Nicolas Saenz Julienne wrote:
> Model inactive VTL vCPUs' behaviour with a new MP state.
>
> Inactive VTLs are in an artificial halt state. They enter into this
> state in response to invoking HvCallVtlCall, HvCallVtlReturn.
> User-space, which is VTL aware, can processes the hypercall, and set the
> vCPU in MP_STATE_HV_INACTIVE_VTL. When a vCPU is run in this state it'll
> block until a wakeup event is received. The rules of what constitutes an
> event are analogous to halt's except that VTL's ignore RFLAGS.IF.
>
> When a wakeup event is registered, KVM will exit to user-space with a
> KVM_SYSTEM_EVENT exit, and KVM_SYSTEM_EVENT_WAKEUP event type.
> User-space is responsible of deciding whether the event has precedence
> over the active VTL and will switch the vCPU to KVM_MP_STATE_RUNNABLE
> before resuming execution on it.
>
> Running a KVM_MP_STATE_HV_INACTIVE_VTL vCPU with pending events will
> return immediately to user-space.
>
> Note that by re-using the readily available halt infrastructure in
> KVM_RUN, MP_STATE_HV_INACTIVE_VTL correctly handles (or disables)
> virtualisation features like the VMX preemption timer or APICv before
> blocking.
IIUC, this is a convoluted and roundabout way to let userspace check if a vCPU
has a wake event, correct? Even by the end of the series, KVM never sets
MP_STATE_HV_INACTIVE_VTL, i.e. the only use for this is to combine it as:
KVM_SET_MP_STATE => KVM_RUN => KVM_SET_MP_STATE => KVM_RUN
The upside to this approach is that it requires minimal uAPI and very few KVM
changes, but that's about it AFAICT. On the other hand, making this so painfully
specific feels like a missed opportunity, and unnecessarily bleeds VTL details
into KVM.
Bringing halt-polling into the picture (by going down kvm_vcpu_halt()) is also
rather bizarre since quite a bit of time has already elapsed since the vCPU first
did HvCallVtlCall/HvCallVtlReturn. But that doesn't really have anything to do
with MP_STATE_HV_INACTIVE_VTL, e.g. it'd be just as easy to go to kvm_vcpu_block().
Why not add an ioctl() to very explicitly block until a wake event is ready?
Or probably better, a generic "wait" ioctl() that takes the wait type as an
argument.
Kinda like your idea of supporting .poll() on the vCPU FD[*], except it's very
specifically restricted to a single caller (takes vcpu->mutex). We could probably
actually implement it via .poll(), but I suspect that would be more confusing than
helpful.
E.g. extract the guts of vcpu_block() to a separate helper, and then wire that
up to an ioctl().
As for the RFLAGS.IF quirk, maybe handle that via a kvm_run flag? That way,
userspace doesn't need to do a round-trip just to set a single bit. E.g. I think
we should be able to squeeze it into "struct kvm_hyperv_exit".
Actually, speaking of kvm_hyperv_exit, is there a reason we can't simply wire up
HVCALL_VTL_CALL and/or HVCALL_VTL_RETURN to a dedicated complete_userspace_io()
callback that blocks if some flag is set? That would make it _much_ cleaner to
scope the RFLAGS.IF check to kvm_hyperv_exit, and would require little to no new
uAPI.
> @@ -3797,6 +3798,10 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
> if (!gif_set(svm))
> return true;
>
> + /*
> + * The Hyper-V TLFS states that RFLAGS.IF is ignored when deciding
> + * whether to block interrupts targeted at inactive VTLs.
> + */
> if (is_guest_mode(vcpu)) {
> /* As long as interrupts are being delivered... */
> if ((svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
> @@ -3808,7 +3813,7 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
> if (nested_exit_on_intr(svm))
> return false;
> } else {
> - if (!svm_get_if_flag(vcpu))
> + if (!svm_get_if_flag(vcpu) && !kvm_hv_vcpu_is_idle_vtl(vcpu))
Speaking of RFLAGS.IF, I think it makes sense to add a common x86 helper to handle
the RFLAGS.IF vs. idle VTL logic. Naming will be annoying, but that's about it.
E.g. kvm_is_irq_blocked_by_rflags_if() or so.
[*] https://lore.kernel.org/lkml/20231001111313.77586-1-nsaenz@amazon.com
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 05/18] KVM: x86: hyper-v: Introduce MP_STATE_HV_INACTIVE_VTL
2024-09-13 19:01 ` Sean Christopherson
@ 2024-09-16 15:33 ` Nicolas Saenz Julienne
2024-09-18 7:56 ` Sean Christopherson
0 siblings, 1 reply; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-09-16 15:33 UTC (permalink / raw)
To: Sean Christopherson
Cc: linux-kernel, kvm, pbonzini, vkuznets, linux-doc, linux-hyperv,
linux-arch, linux-trace-kernel, graf, dwmw2, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
On Fri Sep 13, 2024 at 7:01 PM UTC, Sean Christopherson wrote:
> On Sun, Jun 09, 2024, Nicolas Saenz Julienne wrote:
> > Model inactive VTL vCPUs' behaviour with a new MP state.
> >
> > Inactive VTLs are in an artificial halt state. They enter into this
> > state in response to invoking HvCallVtlCall, HvCallVtlReturn.
> > User-space, which is VTL aware, can processes the hypercall, and set the
> > vCPU in MP_STATE_HV_INACTIVE_VTL. When a vCPU is run in this state it'll
> > block until a wakeup event is received. The rules of what constitutes an
> > event are analogous to halt's except that VTL's ignore RFLAGS.IF.
> >
> > When a wakeup event is registered, KVM will exit to user-space with a
> > KVM_SYSTEM_EVENT exit, and KVM_SYSTEM_EVENT_WAKEUP event type.
> > User-space is responsible of deciding whether the event has precedence
> > over the active VTL and will switch the vCPU to KVM_MP_STATE_RUNNABLE
> > before resuming execution on it.
> >
> > Running a KVM_MP_STATE_HV_INACTIVE_VTL vCPU with pending events will
> > return immediately to user-space.
> >
> > Note that by re-using the readily available halt infrastructure in
> > KVM_RUN, MP_STATE_HV_INACTIVE_VTL correctly handles (or disables)
> > virtualisation features like the VMX preemption timer or APICv before
> > blocking.
>
> IIUC, this is a convoluted and roundabout way to let userspace check if a vCPU
> has a wake event, correct? Even by the end of the series, KVM never sets
> MP_STATE_HV_INACTIVE_VTL, i.e. the only use for this is to combine it as:
>
> KVM_SET_MP_STATE => KVM_RUN => KVM_SET_MP_STATE => KVM_RUN
Correct.
> The upside to this approach is that it requires minimal uAPI and very few KVM
> changes, but that's about it AFAICT. On the other hand, making this so painfully
> specific feels like a missed opportunity, and unnecessarily bleeds VTL details
> into KVM.
>
> Bringing halt-polling into the picture (by going down kvm_vcpu_halt()) is also
> rather bizarre since quite a bit of time has already elapsed since the vCPU first
> did HvCallVtlCall/HvCallVtlReturn. But that doesn't really have anything to do
> with MP_STATE_HV_INACTIVE_VTL, e.g. it'd be just as easy to go to kvm_vcpu_block().
>
> Why not add an ioctl() to very explicitly block until a wake event is ready?
> Or probably better, a generic "wait" ioctl() that takes the wait type as an
> argument.
>
> Kinda like your idea of supporting .poll() on the vCPU FD[*], except it's very
> specifically restricted to a single caller (takes vcpu->mutex). We could probably
> actually implement it via .poll(), but I suspect that would be more confusing than
> helpful.
>
> E.g. extract the guts of vcpu_block() to a separate helper, and then wire that
> up to an ioctl().
>
> As for the RFLAGS.IF quirk, maybe handle that via a kvm_run flag? That way,
> userspace doesn't need to do a round-trip just to set a single bit. E.g. I think
> we should be able to squeeze it into "struct kvm_hyperv_exit".
It's things like the RFLAG.IF exemption that deterred me from building a
generic interface. We might find out that the generic blocking logic
doesn't match the expected VTL semantics and be stuck with a uAPI that
isn't enough for VSM, nor useful for any other use-case. We can always
introduce 'flags' I guess.
Note that I'm just being cautious here, AFAICT the generic approach
works, and I'm fine with going the "wait" ioctl.
> Actually, speaking of kvm_hyperv_exit, is there a reason we can't simply wire up
> HVCALL_VTL_CALL and/or HVCALL_VTL_RETURN to a dedicated complete_userspace_io()
> callback that blocks if some flag is set? That would make it _much_ cleaner to
> scope the RFLAGS.IF check to kvm_hyperv_exit, and would require little to no new
> uAPI.
So IIUC, the approach is to have complete_userspace_io() block after
re-entering HVCALL_VTL_RETURN. Then, have it exit back onto user-space
whenever an event is made available (maybe re-using
KVM_SYSTEM_EVENT_WAKEUP?). That would work, but will need something
extra to be compatible with migration/live-update.
> > @@ -3797,6 +3798,10 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
> > if (!gif_set(svm))
> > return true;
> >
> > + /*
> > + * The Hyper-V TLFS states that RFLAGS.IF is ignored when deciding
> > + * whether to block interrupts targeted at inactive VTLs.
> > + */
> > if (is_guest_mode(vcpu)) {
> > /* As long as interrupts are being delivered... */
> > if ((svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
> > @@ -3808,7 +3813,7 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
> > if (nested_exit_on_intr(svm))
> > return false;
> > } else {
> > - if (!svm_get_if_flag(vcpu))
> > + if (!svm_get_if_flag(vcpu) && !kvm_hv_vcpu_is_idle_vtl(vcpu))
>
> Speaking of RFLAGS.IF, I think it makes sense to add a common x86 helper to handle
> the RFLAGS.IF vs. idle VTL logic. Naming will be annoying, but that's about it.
>
> E.g. kvm_is_irq_blocked_by_rflags_if() or so.
Noted.
Thanks,
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 05/18] KVM: x86: hyper-v: Introduce MP_STATE_HV_INACTIVE_VTL
2024-09-16 15:33 ` Nicolas Saenz Julienne
@ 2024-09-18 7:56 ` Sean Christopherson
0 siblings, 0 replies; 40+ messages in thread
From: Sean Christopherson @ 2024-09-18 7:56 UTC (permalink / raw)
To: Nicolas Saenz Julienne
Cc: linux-kernel, kvm, pbonzini, vkuznets, linux-doc, linux-hyperv,
linux-arch, linux-trace-kernel, graf, dwmw2, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
On Mon, Sep 16, 2024, Nicolas Saenz Julienne wrote:
> On Fri Sep 13, 2024 at 7:01 PM UTC, Sean Christopherson wrote:
> > On Sun, Jun 09, 2024, Nicolas Saenz Julienne wrote:
> > E.g. extract the guts of vcpu_block() to a separate helper, and then wire that
> > up to an ioctl().
> >
> > As for the RFLAGS.IF quirk, maybe handle that via a kvm_run flag? That way,
> > userspace doesn't need to do a round-trip just to set a single bit. E.g. I think
> > we should be able to squeeze it into "struct kvm_hyperv_exit".
>
> It's things like the RFLAG.IF exemption that deterred me from building a
> generic interface. We might find out that the generic blocking logic
> doesn't match the expected VTL semantics and be stuck with a uAPI that
> isn't enough for VSM, nor useful for any other use-case.
That's only motivation for ensuring that we are as confident as we can reasonably
be that the uAPI we merge will work for VSM, e.g. by building out userspace and
proving that a generic ioctl() provides the necessary functionality. If there's
no other immediate use case, then there's no reason to merge a generic ioctl()
until VSM support is imminent. And if there is another use case, then the concern
that a generic ioctl() isn't useful obviously goes away.
> We can always introduce 'flags' I guess.
>
> Note that I'm just being cautious here, AFAICT the generic approach
> works, and I'm fine with going the "wait" ioctl.
>
> > Actually, speaking of kvm_hyperv_exit, is there a reason we can't simply wire up
> > HVCALL_VTL_CALL and/or HVCALL_VTL_RETURN to a dedicated complete_userspace_io()
> > callback that blocks if some flag is set? That would make it _much_ cleaner to
> > scope the RFLAGS.IF check to kvm_hyperv_exit, and would require little to no new
> > uAPI.
>
> So IIUC, the approach is to have complete_userspace_io() block after
> re-entering HVCALL_VTL_RETURN. Then, have it exit back onto user-space
> whenever an event is made available (maybe re-using KVM_SYSTEM_EVENT_WAKEUP?).
Mostly out of curiosity, why does control need to return to userspace?
> That would work, but will need something extra to be compatible with
> migration/live-update.
Gah, right, because KVM's generic ABI is that userspace must complete I/O exits
before saving/restoring state. Yeah, having KVM automatically enter a blocking
state is probably a bad idea.
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 06/18] KVM: x86: hyper-v: Exit on Get/SetVpRegisters hcall
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
` (4 preceding siblings ...)
2024-06-09 15:49 ` [PATCH 05/18] KVM: x86: hyper-v: Introduce MP_STATE_HV_INACTIVE_VTL Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-06-09 15:49 ` [PATCH 07/18] KVM: x86: hyper-v: Exit on TranslateVirtualAddress hcall Nicolas Saenz Julienne
` (13 subsequent siblings)
19 siblings, 0 replies; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
Let user-space handle HvGetVpRegisters and HvSetVpRegisters as they are
VTL aware hypercalls used solely in the context of VSM. Additionally,
expose the cpuid bit.
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
Documentation/virt/kvm/api.rst | 10 ++++++++++
arch/x86/kvm/hyperv.c | 15 +++++++++++++++
include/asm-generic/hyperv-tlfs.h | 1 +
3 files changed, 26 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index e664c54a13b04..05b01b00a395c 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8931,3 +8931,13 @@ CPUIDs map to KVM functionality.
This CPUID indicates that KVM supports retuning data to the guest in response
to a hypercall using the XMM registers. It also extends ``struct
kvm_hyperv_exit`` to allow passing the XMM data from userspace.
+
+10.2 HV_ACCESS_VP_REGISTERS
+---------------------------
+
+:Location: CPUID.40000003H:EBX[bit 17]
+
+This CPUID indicates that KVM supports HvGetVpRegisters and HvSetVpRegisters.
+Currently, it is only used in conjunction with HV_ACCESS_VSM, and immediately
+exits to userspace with KVM_EXIT_HYPERV_HCALL as the reason. Userspace is
+expected to complete the hypercall before resuming execution.
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index d00baf3ffb165..d0edc2bec5a4f 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2425,6 +2425,11 @@ static void kvm_hv_write_xmm(struct kvm_hyperv_xmm_reg *xmm)
static bool kvm_hv_is_xmm_output_hcall(u16 code)
{
+ switch (code) {
+ case HVCALL_GET_VP_REGISTERS:
+ return true;
+ }
+
return false;
}
@@ -2505,6 +2510,8 @@ static bool is_xmm_fast_hypercall(struct kvm_hv_hcall *hc)
case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX:
case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX:
case HVCALL_SEND_IPI_EX:
+ case HVCALL_GET_VP_REGISTERS:
+ case HVCALL_SET_VP_REGISTERS:
return true;
}
@@ -2543,6 +2550,10 @@ static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code)
*/
return !kvm_hv_is_syndbg_enabled(hv_vcpu->vcpu) ||
hv_vcpu->cpuid_cache.features_ebx & HV_DEBUGGING;
+ case HVCALL_GET_VP_REGISTERS:
+ case HVCALL_SET_VP_REGISTERS:
+ return hv_vcpu->cpuid_cache.features_ebx &
+ HV_ACCESS_VP_REGISTERS;
case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX:
case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX:
if (!(hv_vcpu->cpuid_cache.enlightenments_eax &
@@ -2727,6 +2738,9 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
break;
}
goto hypercall_userspace_exit;
+ case HVCALL_GET_VP_REGISTERS:
+ case HVCALL_SET_VP_REGISTERS:
+ goto hypercall_userspace_exit;
default:
ret = HV_STATUS_INVALID_HYPERCALL_CODE;
break;
@@ -2898,6 +2912,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
ent->ebx |= HV_POST_MESSAGES;
ent->ebx |= HV_SIGNAL_EVENTS;
ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS;
+ ent->ebx |= HV_ACCESS_VP_REGISTERS;
ent->edx |= HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE;
ent->edx |= HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 28cde641b5474..9e909f0834598 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -90,6 +90,7 @@
#define HV_DEBUGGING BIT(11)
#define HV_CPU_MANAGEMENT BIT(12)
#define HV_ACCESS_VSM BIT(16)
+#define HV_ACCESS_VP_REGISTERS BIT(17)
#define HV_ENABLE_EXTENDED_HYPERCALLS BIT(20)
#define HV_ISOLATION BIT(22)
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 07/18] KVM: x86: hyper-v: Exit on TranslateVirtualAddress hcall
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
` (5 preceding siblings ...)
2024-06-09 15:49 ` [PATCH 06/18] KVM: x86: hyper-v: Exit on Get/SetVpRegisters hcall Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-06-09 15:49 ` [PATCH 08/18] KVM: x86: hyper-v: Exit on StartVirtualProcessor and GetVpIndexFromApicId hcalls Nicolas Saenz Julienne
` (12 subsequent siblings)
19 siblings, 0 replies; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
Handle HvTranslateVirtualAddress in user-space. The hypercall is
VTL-aware and only used in the context of VSM. Additionally, the TLFS
doesn't introduce an ad-hoc CPUID bit for it, so the hypercall
availability is tracked as part of the HV_ACCESS_VSM CPUID. This will be
documented with the main VSM commit.
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
arch/x86/kvm/hyperv.c | 3 +++
include/asm-generic/hyperv-tlfs.h | 1 +
2 files changed, 4 insertions(+)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index d0edc2bec5a4f..cbe2aca52514b 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2427,6 +2427,7 @@ static bool kvm_hv_is_xmm_output_hcall(u16 code)
{
switch (code) {
case HVCALL_GET_VP_REGISTERS:
+ case HVCALL_TRANSLATE_VIRTUAL_ADDRESS:
return true;
}
@@ -2512,6 +2513,7 @@ static bool is_xmm_fast_hypercall(struct kvm_hv_hcall *hc)
case HVCALL_SEND_IPI_EX:
case HVCALL_GET_VP_REGISTERS:
case HVCALL_SET_VP_REGISTERS:
+ case HVCALL_TRANSLATE_VIRTUAL_ADDRESS:
return true;
}
@@ -2740,6 +2742,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
goto hypercall_userspace_exit;
case HVCALL_GET_VP_REGISTERS:
case HVCALL_SET_VP_REGISTERS:
+ case HVCALL_TRANSLATE_VIRTUAL_ADDRESS:
goto hypercall_userspace_exit;
default:
ret = HV_STATUS_INVALID_HYPERCALL_CODE;
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 9e909f0834598..57c791c555861 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -159,6 +159,7 @@ union hv_reference_tsc_msr {
#define HVCALL_CREATE_VP 0x004e
#define HVCALL_GET_VP_REGISTERS 0x0050
#define HVCALL_SET_VP_REGISTERS 0x0051
+#define HVCALL_TRANSLATE_VIRTUAL_ADDRESS 0x0052
#define HVCALL_POST_MESSAGE 0x005c
#define HVCALL_SIGNAL_EVENT 0x005d
#define HVCALL_POST_DEBUG_DATA 0x0069
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 08/18] KVM: x86: hyper-v: Exit on StartVirtualProcessor and GetVpIndexFromApicId hcalls
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
` (6 preceding siblings ...)
2024-06-09 15:49 ` [PATCH 07/18] KVM: x86: hyper-v: Exit on TranslateVirtualAddress hcall Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-06-09 15:49 ` [PATCH 09/18] KVM: Define and communicate KVM_EXIT_MEMORY_FAULT RWX flags to userspace Nicolas Saenz Julienne
` (11 subsequent siblings)
19 siblings, 0 replies; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
Both HvCallStartVirtualProcessor and GetVpIndexFromApicId are used as
part of the Hyper-V VSM CPU bootstrap process, and requires VTL
awareness, as such handle the hypercall in user-space. Also, expose the
ad-hoc CPUID bit.
Note that these hypercalls aren't necessary on Hyper-V guests that don't
enable VSM.
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
Documentation/virt/kvm/api.rst | 11 +++++++++++
arch/x86/kvm/hyperv.c | 7 +++++++
include/asm-generic/hyperv-tlfs.h | 1 +
3 files changed, 19 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 05b01b00a395c..161a772c23c6a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8941,3 +8941,14 @@ This CPUID indicates that KVM supports HvGetVpRegisters and HvSetVpRegisters.
Currently, it is only used in conjunction with HV_ACCESS_VSM, and immediately
exits to userspace with KVM_EXIT_HYPERV_HCALL as the reason. Userspace is
expected to complete the hypercall before resuming execution.
+
+10.3 HV_START_VIRTUAL_PROCESSOR
+-------------------------------
+
+:Location: CPUID.40000003H:EBX[bit 21]
+
+This CPUID indicates that KVM supports HvCallStartVirtualProcessor and
+HvCallGetVpIndexFromApicId. Currently, it is only used in conjunction with
+HV_ACCESS_VSM, and immediately exits to userspace with KVM_EXIT_HYPERV_HCALL as
+the reason. Userspace is expected to complete the hypercall before resuming
+execution.
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index cbe2aca52514b..dd64f41dc835d 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2556,6 +2556,10 @@ static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code)
case HVCALL_SET_VP_REGISTERS:
return hv_vcpu->cpuid_cache.features_ebx &
HV_ACCESS_VP_REGISTERS;
+ case HVCALL_START_VP:
+ case HVCALL_GET_VP_ID_FROM_APIC_ID:
+ return hv_vcpu->cpuid_cache.features_ebx &
+ HV_START_VIRTUAL_PROCESSOR;
case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX:
case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX:
if (!(hv_vcpu->cpuid_cache.enlightenments_eax &
@@ -2743,6 +2747,8 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
case HVCALL_GET_VP_REGISTERS:
case HVCALL_SET_VP_REGISTERS:
case HVCALL_TRANSLATE_VIRTUAL_ADDRESS:
+ case HVCALL_START_VP:
+ case HVCALL_GET_VP_ID_FROM_APIC_ID:
goto hypercall_userspace_exit;
default:
ret = HV_STATUS_INVALID_HYPERCALL_CODE;
@@ -2916,6 +2922,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
ent->ebx |= HV_SIGNAL_EVENTS;
ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS;
ent->ebx |= HV_ACCESS_VP_REGISTERS;
+ ent->ebx |= HV_START_VIRTUAL_PROCESSOR;
ent->edx |= HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE;
ent->edx |= HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 57c791c555861..e24b88ec4ec00 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -92,6 +92,7 @@
#define HV_ACCESS_VSM BIT(16)
#define HV_ACCESS_VP_REGISTERS BIT(17)
#define HV_ENABLE_EXTENDED_HYPERCALLS BIT(20)
+#define HV_START_VIRTUAL_PROCESSOR BIT(21)
#define HV_ISOLATION BIT(22)
/*
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 09/18] KVM: Define and communicate KVM_EXIT_MEMORY_FAULT RWX flags to userspace
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
` (7 preceding siblings ...)
2024-06-09 15:49 ` [PATCH 08/18] KVM: x86: hyper-v: Exit on StartVirtualProcessor and GetVpIndexFromApicId hcalls Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-06-09 15:49 ` [PATCH 10/18] KVM: x86: Keep track of instruction length during faults Nicolas Saenz Julienne
` (10 subsequent siblings)
19 siblings, 0 replies; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
From: Anish Moorthy <amoorthy@google.com>
kvm_prepare_memory_fault_exit() already takes parameters describing the
RWX-ness of the relevant access but doesn't actually do anything with
them. Define and use the flags necessary to pass this information on to
userspace.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Anish Moorthy <amoorthy@google.com>
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
Documentation/virt/kvm/api.rst | 5 +++++
include/linux/kvm_host.h | 9 ++++++++-
include/uapi/linux/kvm.h | 3 +++
3 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 161a772c23c6a..761b99987cf1a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7014,6 +7014,9 @@ spec refer, https://github.com/riscv/riscv-sbi-doc.
/* KVM_EXIT_MEMORY_FAULT */
struct {
+ #define KVM_MEMORY_EXIT_FLAG_READ (1ULL << 0)
+ #define KVM_MEMORY_EXIT_FLAG_WRITE (1ULL << 1)
+ #define KVM_MEMORY_EXIT_FLAG_EXEC (1ULL << 2)
#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3)
__u64 flags;
__u64 gpa;
@@ -7025,6 +7028,8 @@ could not be resolved by KVM. The 'gpa' and 'size' (in bytes) describe the
guest physical address range [gpa, gpa + size) of the fault. The 'flags' field
describes properties of the faulting access that are likely pertinent:
+ - KVM_MEMORY_EXIT_FLAG_READ/WRITE/EXEC - When set, indicates that the memory
+ fault occurred on a read/write/exec access respectively.
- KVM_MEMORY_EXIT_FLAG_PRIVATE - When set, indicates the memory fault occurred
on a private memory access. When clear, indicates the fault occurred on a
shared access.
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 692c01e41a18e..59f687985ba24 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2397,8 +2397,15 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
vcpu->run->memory_fault.gpa = gpa;
vcpu->run->memory_fault.size = size;
- /* RWX flags are not (yet) defined or communicated to userspace. */
vcpu->run->memory_fault.flags = 0;
+
+ if (is_write)
+ vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_WRITE;
+ else if (is_exec)
+ vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_EXEC;
+ else
+ vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_READ;
+
if (is_private)
vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f4864e6907e0b..d6d8b17bfa9a7 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -434,6 +434,9 @@ struct kvm_run {
} notify;
/* KVM_EXIT_MEMORY_FAULT */
struct {
+#define KVM_MEMORY_EXIT_FLAG_READ (1ULL << 0)
+#define KVM_MEMORY_EXIT_FLAG_WRITE (1ULL << 1)
+#define KVM_MEMORY_EXIT_FLAG_EXEC (1ULL << 2)
#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3)
__u64 flags;
__u64 gpa;
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 10/18] KVM: x86: Keep track of instruction length during faults
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
` (8 preceding siblings ...)
2024-06-09 15:49 ` [PATCH 09/18] KVM: Define and communicate KVM_EXIT_MEMORY_FAULT RWX flags to userspace Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-09-13 19:10 ` Sean Christopherson
2024-06-09 15:49 ` [PATCH 11/18] KVM: x86: Pass the instruction length on memory fault user-space exits Nicolas Saenz Julienne
` (9 subsequent siblings)
19 siblings, 1 reply; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
Both VMX and SVM provide the length of the instruction
being run at the time of the page fault. Save it within 'struct
kvm_page_fault', as it'll become useful in the future.
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
arch/x86/kvm/mmu/mmu.c | 11 ++++++++---
arch/x86/kvm/mmu/mmu_internal.h | 5 ++++-
arch/x86/kvm/vmx/vmx.c | 16 ++++++++++++++--
3 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8d74bdef68c1d..39b113afefdfc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4271,7 +4271,8 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
work->arch.cr3 != kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu))
return;
- kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, work->arch.error_code, true, NULL);
+ kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, work->arch.error_code,
+ true, NULL, 0);
}
static inline u8 kvm_max_level_for_order(int order)
@@ -5887,7 +5888,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
if (r == RET_PF_INVALID) {
r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false,
- &emulation_type);
+ &emulation_type, insn_len);
if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
return -EIO;
}
@@ -5924,8 +5925,12 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
if (!mmio_info_in_cache(vcpu, cr2_or_gpa, direct) && !is_guest_mode(vcpu))
emulation_type |= EMULTYPE_ALLOW_RETRY_PF;
emulate:
+ /*
+ * x86_emulate_instruction() expects insn to contain data if
+ * insn_len > 0.
+ */
return x86_emulate_instruction(vcpu, cr2_or_gpa, emulation_type, insn,
- insn_len);
+ insn ? insn_len : 0);
}
EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index ce2fcd19ba6be..a0cde1a0e39b0 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -192,6 +192,7 @@ struct kvm_page_fault {
const gpa_t addr;
const u64 error_code;
const bool prefetch;
+ const u8 insn_len;
/* Derived from error_code. */
const bool exec;
@@ -288,11 +289,13 @@ static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
}
static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
- u64 err, bool prefetch, int *emulation_type)
+ u64 err, bool prefetch,
+ int *emulation_type, u8 insn_len)
{
struct kvm_page_fault fault = {
.addr = cr2_or_gpa,
.error_code = err,
+ .insn_len = insn_len,
.exec = err & PFERR_FETCH_MASK,
.write = err & PFERR_WRITE_MASK,
.present = err & PFERR_PRESENT_MASK,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ac0682fece604..9ba38e0b0c7a8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5807,11 +5807,13 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
if (unlikely(allow_smaller_maxphyaddr && !kvm_vcpu_is_legal_gpa(vcpu, gpa)))
return kvm_emulate_instruction(vcpu, 0);
- return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
+ return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL,
+ vmcs_read32(VM_EXIT_INSTRUCTION_LEN));
}
static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
{
+ u8 insn_len = 0;
gpa_t gpa;
if (vmx_check_emulate_instruction(vcpu, EMULTYPE_PF, NULL, 0))
@@ -5828,7 +5830,17 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
return kvm_skip_emulated_instruction(vcpu);
}
- return kvm_mmu_page_fault(vcpu, gpa, PFERR_RSVD_MASK, NULL, 0);
+ /*
+ * Using VMCS.VM_EXIT_INSTRUCTION_LEN on EPT misconfig depends on
+ * undefined behavior: Intel's SDM doesn't mandate the VMCS field be
+ * set when EPT misconfig occurs. In practice, real hardware updates
+ * VM_EXIT_INSTRUCTION_LEN on EPT misconfig, but other hypervisors
+ * (namely Hyper-V) don't set it due to it being undefined behavior.
+ */
+ if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
+ insn_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+
+ return kvm_mmu_page_fault(vcpu, gpa, PFERR_RSVD_MASK, NULL, insn_len);
}
static int handle_nmi_window(struct kvm_vcpu *vcpu)
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 10/18] KVM: x86: Keep track of instruction length during faults
2024-06-09 15:49 ` [PATCH 10/18] KVM: x86: Keep track of instruction length during faults Nicolas Saenz Julienne
@ 2024-09-13 19:10 ` Sean Christopherson
0 siblings, 0 replies; 40+ messages in thread
From: Sean Christopherson @ 2024-09-13 19:10 UTC (permalink / raw)
To: Nicolas Saenz Julienne
Cc: linux-kernel, kvm, pbonzini, vkuznets, linux-doc, linux-hyperv,
linux-arch, linux-trace-kernel, graf, dwmw2, paul, mlevitsk,
jgowans, corbet, decui, tglx, mingo, bp, dave.hansen, x86,
amoorthy
On Sun, Jun 09, 2024, Nicolas Saenz Julienne wrote:
> Both VMX and SVM provide the length of the instruction
> being run at the time of the page fault. Save it within 'struct
> kvm_page_fault', as it'll become useful in the future.
Nit, please wrap closer to 75 characters.
> Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
> ---
> arch/x86/kvm/mmu/mmu.c | 11 ++++++++---
> arch/x86/kvm/mmu/mmu_internal.h | 5 ++++-
> arch/x86/kvm/vmx/vmx.c | 16 ++++++++++++++--
> 3 files changed, 26 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 8d74bdef68c1d..39b113afefdfc 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4271,7 +4271,8 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
> work->arch.cr3 != kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu))
> return;
>
> - kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, work->arch.error_code, true, NULL);
> + kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, work->arch.error_code,
> + true, NULL, 0);
Hrm, I just proposed adding another (out) parameter to kvm_mmu_do_page_fault()
in the TDX series[*], I wonder if we're reaching the point where it makes sense
to have kvm_mmu_do_page_fault() take a struct too.
[*] https://lore.kernel.org/all/ZuR09EqzU1WbQYGd@google.com
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ac0682fece604..9ba38e0b0c7a8 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5807,11 +5807,13 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
> if (unlikely(allow_smaller_maxphyaddr && !kvm_vcpu_is_legal_gpa(vcpu, gpa)))
> return kvm_emulate_instruction(vcpu, 0);
>
> - return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
> + return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL,
> + vmcs_read32(VM_EXIT_INSTRUCTION_LEN));
It might be worth adding a cached EXREG for instruction length, e.g.
VCPU_EXREG_EXIT_INFO_3 + vmx_get_insn_len(), similar to how for vmx_get_exit_qual()
and vmx_get_intr_info() pair up with VCPU_EXREG_EXIT_INFO_{1,2}.
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 11/18] KVM: x86: Pass the instruction length on memory fault user-space exits
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
` (9 preceding siblings ...)
2024-06-09 15:49 ` [PATCH 10/18] KVM: x86: Keep track of instruction length during faults Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-09-13 19:11 ` Sean Christopherson
2024-06-09 15:49 ` [PATCH 12/18] KVM: x86/mmu: Introduce infrastructure to handle non-executable mappings Nicolas Saenz Julienne
` (8 subsequent siblings)
19 siblings, 1 reply; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
In order to simplify Hyper-V VSM secure memory intercept generation in
user-space (it avoids the need of implementing an x86 instruction
decoder and the actual decoding). Pass the instruction length being run
at the time of the guest exit as part of the memory fault exit
information.
The presence of this additional information is indicated by a new
capability, KVM_CAP_FAULT_EXIT_INSN_LEN.
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
Documentation/virt/kvm/api.rst | 6 +++++-
arch/x86/kvm/mmu/mmu_internal.h | 2 +-
arch/x86/kvm/x86.c | 1 +
include/linux/kvm_host.h | 3 ++-
include/uapi/linux/kvm.h | 2 ++
5 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 761b99987cf1a..18ddea9c4c58a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7021,11 +7021,15 @@ spec refer, https://github.com/riscv/riscv-sbi-doc.
__u64 flags;
__u64 gpa;
__u64 size;
+ __u8 insn_len;
} memory_fault;
KVM_EXIT_MEMORY_FAULT indicates the vCPU has encountered a memory fault that
could not be resolved by KVM. The 'gpa' and 'size' (in bytes) describe the
-guest physical address range [gpa, gpa + size) of the fault. The 'flags' field
+guest physical address range [gpa, gpa + size) of the fault. The
+'insn_len' field describes the size (in bytes) of the instruction
+that caused the fault. It is only available if the underlying HW exposes that
+information on guest exit, otherwise it's set to 0. The 'flags' field
describes properties of the faulting access that are likely pertinent:
- KVM_MEMORY_EXIT_FLAG_READ/WRITE/EXEC - When set, indicates that the memory
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index a0cde1a0e39b0..4f5c4c8af9941 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -285,7 +285,7 @@ static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
{
kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT,
PAGE_SIZE, fault->write, fault->exec,
- fault->is_private);
+ fault->is_private, fault->insn_len);
}
static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a6e2312ccb68f..d2b8b74cb48bf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4704,6 +4704,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES:
case KVM_CAP_IRQFD_RESAMPLE:
case KVM_CAP_MEMORY_FAULT_INFO:
+ case KVM_CAP_FAULT_EXIT_INSN_LEN:
r = 1;
break;
case KVM_CAP_EXIT_HYPERCALL:
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 59f687985ba24..4fa16c4772269 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2391,11 +2391,12 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr)
static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
gpa_t gpa, gpa_t size,
bool is_write, bool is_exec,
- bool is_private)
+ bool is_private, u8 insn_len)
{
vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT;
vcpu->run->memory_fault.gpa = gpa;
vcpu->run->memory_fault.size = size;
+ vcpu->run->memory_fault.insn_len = insn_len;
vcpu->run->memory_fault.flags = 0;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d6d8b17bfa9a7..516d39910f9ab 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -441,6 +441,7 @@ struct kvm_run {
__u64 flags;
__u64 gpa;
__u64 size;
+ __u8 insn_len;
} memory_fault;
/* Fix the size of the union. */
char padding[256];
@@ -927,6 +928,7 @@ struct kvm_enable_cap {
#define KVM_CAP_MEMORY_ATTRIBUTES 233
#define KVM_CAP_GUEST_MEMFD 234
#define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_FAULT_EXIT_INSN_LEN 236
struct kvm_irq_routing_irqchip {
__u32 irqchip;
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 11/18] KVM: x86: Pass the instruction length on memory fault user-space exits
2024-06-09 15:49 ` [PATCH 11/18] KVM: x86: Pass the instruction length on memory fault user-space exits Nicolas Saenz Julienne
@ 2024-09-13 19:11 ` Sean Christopherson
2024-09-16 15:53 ` Nicolas Saenz Julienne
0 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2024-09-13 19:11 UTC (permalink / raw)
To: Nicolas Saenz Julienne
Cc: linux-kernel, kvm, pbonzini, vkuznets, linux-doc, linux-hyperv,
linux-arch, linux-trace-kernel, graf, dwmw2, paul, mlevitsk,
jgowans, corbet, decui, tglx, mingo, bp, dave.hansen, x86,
amoorthy
On Sun, Jun 09, 2024, Nicolas Saenz Julienne wrote:
> In order to simplify Hyper-V VSM secure memory intercept generation in
> user-space (it avoids the need of implementing an x86 instruction
> decoder and the actual decoding). Pass the instruction length being run
> at the time of the guest exit as part of the memory fault exit
> information.
Why does userspace need the instruction length, but not the associated code stream?
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 11/18] KVM: x86: Pass the instruction length on memory fault user-space exits
2024-09-13 19:11 ` Sean Christopherson
@ 2024-09-16 15:53 ` Nicolas Saenz Julienne
0 siblings, 0 replies; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-09-16 15:53 UTC (permalink / raw)
To: Sean Christopherson
Cc: linux-kernel, kvm, pbonzini, vkuznets, linux-doc, linux-hyperv,
linux-arch, linux-trace-kernel, graf, dwmw2, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
On Fri Sep 13, 2024 at 7:11 PM UTC, Sean Christopherson wrote:
> On Sun, Jun 09, 2024, Nicolas Saenz Julienne wrote:
> > In order to simplify Hyper-V VSM secure memory intercept generation in
> > user-space (it avoids the need of implementing an x86 instruction
> > decoder and the actual decoding). Pass the instruction length being run
> > at the time of the guest exit as part of the memory fault exit
> > information.
>
> Why does userspace need the instruction length, but not the associated code stream?
Since the fault already provides the GPA it's trivial to read it from
the VMM. Then again, now that I've dug deeper into the RWX memory
attributes's edge cases, this doesn't always work. For example when
getting a fault during a page walk (the CPU being unable to access the
page that contains the next GPTE due to it being marked non-readable by
a memattr). The fault exit GPA will not point to the code stream.
I will rework/rethink this once I have the complete memattrs story.
Thanks,
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 12/18] KVM: x86/mmu: Introduce infrastructure to handle non-executable mappings
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
` (10 preceding siblings ...)
2024-06-09 15:49 ` [PATCH 11/18] KVM: x86: Pass the instruction length on memory fault user-space exits Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-06-09 15:49 ` [PATCH 13/18] KVM: x86/mmu: Avoid warning when installing non-private memory attributes Nicolas Saenz Julienne
` (7 subsequent siblings)
19 siblings, 0 replies; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
The upcoming access restriction KVM memory attributes open the door to
installing non-executable mappings. Introduce a new attribute in struct
kvm_page_fault, map_executable, to control whether the gfn range should
be mapped as executable and make sure it's taken into account when
generating new sptes.
No functional change intended.
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
arch/x86/kvm/mmu/mmu.c | 6 +++++-
arch/x86/kvm/mmu/mmu_internal.h | 2 ++
arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++++--
3 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 39b113afefdfc..b0c210b96419f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3197,6 +3197,7 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
struct kvm_shadow_walk_iterator it;
+ unsigned int access = ACC_ALL;
struct kvm_mmu_page *sp;
int ret;
gfn_t base_gfn = fault->gfn;
@@ -3229,7 +3230,10 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
if (WARN_ON_ONCE(it.level != fault->goal_level))
return -EFAULT;
- ret = mmu_set_spte(vcpu, fault->slot, it.sptep, ACC_ALL,
+ if (!fault->map_executable)
+ access &= ~ACC_EXEC_MASK;
+
+ ret = mmu_set_spte(vcpu, fault->slot, it.sptep, access,
base_gfn, fault->pfn, fault);
if (ret == RET_PF_SPURIOUS)
return ret;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 4f5c4c8af9941..af0c3a154ed89 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -241,6 +241,7 @@ struct kvm_page_fault {
kvm_pfn_t pfn;
hva_t hva;
bool map_writable;
+ bool map_executable;
/*
* Indicates the guest is trying to write a gfn that contains one or
@@ -313,6 +314,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
.pfn = KVM_PFN_ERR_FAULT,
.hva = KVM_HVA_ERR_BAD,
+ .map_executable = true,
};
int r;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 36539c1b36cd6..344781981999a 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1018,6 +1018,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
struct tdp_iter *iter)
{
struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(iter->sptep));
+ unsigned int access = ACC_ALL;
u64 new_spte;
int ret = RET_PF_FIXED;
bool wrprot = false;
@@ -1025,10 +1026,13 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
if (WARN_ON_ONCE(sp->role.level != fault->goal_level))
return RET_PF_RETRY;
+ if (!fault->map_executable)
+ access &= ~ACC_EXEC_MASK;
+
if (unlikely(!fault->slot))
- new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
+ new_spte = make_mmio_spte(vcpu, iter->gfn, access);
else
- wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
+ wrprot = make_spte(vcpu, sp, fault->slot, access, iter->gfn,
fault->pfn, iter->old_spte, fault->prefetch, true,
fault->map_writable, &new_spte);
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 13/18] KVM: x86/mmu: Avoid warning when installing non-private memory attributes
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
` (11 preceding siblings ...)
2024-06-09 15:49 ` [PATCH 12/18] KVM: x86/mmu: Introduce infrastructure to handle non-executable mappings Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-09-13 19:13 ` Sean Christopherson
2024-06-09 15:49 ` [PATCH 14/18] KVM: x86/mmu: Init memslot if memory attributes available Nicolas Saenz Julienne
` (6 subsequent siblings)
19 siblings, 1 reply; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
In preparation to introducing RWX memory attributes, make sure
user-space is attempting to install a memory attribute with
KVM_MEMORY_ATTRIBUTE_PRIVATE before throwing a warning on systems with
no private memory support.
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
arch/x86/kvm/mmu/mmu.c | 8 ++++++--
virt/kvm/kvm_main.c | 1 +
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b0c210b96419f..d56c04fbdc66b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7359,6 +7359,9 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
struct kvm_gfn_range *range)
{
+ unsigned long attrs = range->arg.attributes;
+ bool priv_attr = attrs & KVM_MEMORY_ATTRIBUTE_PRIVATE;
+
/*
* Zap SPTEs even if the slot can't be mapped PRIVATE. KVM x86 only
* supports KVM_MEMORY_ATTRIBUTE_PRIVATE, and so it *seems* like KVM
@@ -7370,7 +7373,7 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
* Zapping SPTEs in this case ensures KVM will reassess whether or not
* a hugepage can be used for affected ranges.
*/
- if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
+ if (WARN_ON_ONCE(priv_attr && !kvm_arch_has_private_mem(kvm)))
return false;
return kvm_unmap_gfn_range(kvm, range);
@@ -7415,6 +7418,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
struct kvm_gfn_range *range)
{
unsigned long attrs = range->arg.attributes;
+ bool priv_attr = attrs & KVM_MEMORY_ATTRIBUTE_PRIVATE;
struct kvm_memory_slot *slot = range->slot;
int level;
@@ -7427,7 +7431,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
* a range that has PRIVATE GFNs, and conversely converting a range to
* SHARED may now allow hugepages.
*/
- if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
+ if (WARN_ON_ONCE(priv_attr && !kvm_arch_has_private_mem(kvm)))
return false;
/*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 14841acb8b959..63c4b6739edee 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2506,6 +2506,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
struct kvm_mmu_notifier_range pre_set_range = {
.start = start,
.end = end,
+ .arg.attributes = attributes,
.handler = kvm_pre_set_memory_attributes,
.on_lock = kvm_mmu_invalidate_begin,
.flush_on_ret = true,
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 13/18] KVM: x86/mmu: Avoid warning when installing non-private memory attributes
2024-06-09 15:49 ` [PATCH 13/18] KVM: x86/mmu: Avoid warning when installing non-private memory attributes Nicolas Saenz Julienne
@ 2024-09-13 19:13 ` Sean Christopherson
0 siblings, 0 replies; 40+ messages in thread
From: Sean Christopherson @ 2024-09-13 19:13 UTC (permalink / raw)
To: Nicolas Saenz Julienne
Cc: linux-kernel, kvm, pbonzini, vkuznets, linux-doc, linux-hyperv,
linux-arch, linux-trace-kernel, graf, dwmw2, paul, mlevitsk,
jgowans, corbet, decui, tglx, mingo, bp, dave.hansen, x86,
amoorthy
On Sun, Jun 09, 2024, Nicolas Saenz Julienne wrote:
> In preparation to introducing RWX memory attributes, make sure
> user-space is attempting to install a memory attribute with
> KVM_MEMORY_ATTRIBUTE_PRIVATE before throwing a warning on systems with
> no private memory support.
>
> Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
> ---
> arch/x86/kvm/mmu/mmu.c | 8 ++++++--
> virt/kvm/kvm_main.c | 1 +
> 2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b0c210b96419f..d56c04fbdc66b 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -7359,6 +7359,9 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
> bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
> struct kvm_gfn_range *range)
> {
> + unsigned long attrs = range->arg.attributes;
> + bool priv_attr = attrs & KVM_MEMORY_ATTRIBUTE_PRIVATE;
It's probably worth making this check generic straightaway, e.g. build and then
check the set of allowed attributes, similar to how check_memory_region_flags()
builds and checks the set of allowed flags.
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 14/18] KVM: x86/mmu: Init memslot if memory attributes available
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
` (12 preceding siblings ...)
2024-06-09 15:49 ` [PATCH 13/18] KVM: x86/mmu: Avoid warning when installing non-private memory attributes Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-06-09 15:49 ` [PATCH 15/18] KVM: Introduce RWX memory attributes Nicolas Saenz Julienne
` (5 subsequent siblings)
19 siblings, 0 replies; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
Systems that lack private memory support are about to start using memory
attributes. So query if the memory attributes xarray is empty in order
to decide whether it's necessary to init the hugepage information when
installing a new memslot.
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
arch/x86/kvm/mmu/mmu.c | 2 +-
include/linux/kvm_host.h | 9 +++++++++
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d56c04fbdc66b..91edd873dcdbc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7487,7 +7487,7 @@ void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
{
int level;
- if (!kvm_arch_has_private_mem(kvm))
+ if (!kvm_memory_attributes_in_use(kvm))
return;
for (level = PG_LEVEL_2M; level <= KVM_MAX_HUGEPAGE_LEVEL; level++) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4fa16c4772269..9250bf1c4db15 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2424,12 +2424,21 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
struct kvm_gfn_range *range);
+static inline bool kvm_memory_attributes_in_use(struct kvm *kvm)
+{
+ return !xa_empty(&kvm->mem_attr_array);
+}
+
static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
{
return IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) &&
kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
}
#else
+static inline bool kvm_memory_attributes_in_use(struct kvm *kvm)
+{
+ return false;
+}
static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
{
return false;
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 15/18] KVM: Introduce RWX memory attributes
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
` (13 preceding siblings ...)
2024-06-09 15:49 ` [PATCH 14/18] KVM: x86/mmu: Init memslot if memory attributes available Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-06-09 15:49 ` [PATCH 16/18] KVM: x86: Take mem attributes into account when faulting memory Nicolas Saenz Julienne
` (4 subsequent siblings)
19 siblings, 0 replies; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
Declare memory attributes to map memory regions as non-readable,
non-writable, and/or non-executable.
The attributes are negated for the following reasons:
- Setting a 0 memory attribute (attr->attributes == 0) shouldn't
introduce any access restrictions. For example, when moving from
private to shared mappings in context of confidential computing.
- In practice, with negated attributes, a non-private RWX memory
attribute is analogous to a delete operation. It's a nice outcome, as
it forces remapping the region with huge-pages, doing the right thing
for use-cases that have short-lived access restricted regions like
Hyper-V's VSM.
- A non-negated version of the flags has no way of expressing
non-access mapping (NR/NW/NX) without having to introduce an extra
flag (since 0 isn't available).
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
Documentation/virt/kvm/api.rst | 14 +++++++++++---
include/linux/kvm_host.h | 22 +++++++++++++++++++++-
include/uapi/linux/kvm.h | 3 +++
virt/kvm/kvm_main.c | 32 +++++++++++++++++++++++++++++---
4 files changed, 64 insertions(+), 7 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 18ddea9c4c58a..6d3bc5092ea63 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6313,15 +6313,23 @@ of guest physical memory.
__u64 flags;
};
+ #define KVM_MEMORY_ATTRIBUTE_NR (1ULL << 0)
+ #define KVM_MEMORY_ATTRIBUTE_NW (1ULL << 1)
+ #define KVM_MEMORY_ATTRIBUTE_NX (1ULL << 2)
#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
The address and size must be page aligned. The supported attributes can be
retrieved via ioctl(KVM_CHECK_EXTENSION) on KVM_CAP_MEMORY_ATTRIBUTES. If
executed on a VM, KVM_CAP_MEMORY_ATTRIBUTES precisely returns the attributes
supported by that VM. If executed at system scope, KVM_CAP_MEMORY_ATTRIBUTES
-returns all attributes supported by KVM. The only attribute defined at this
-time is KVM_MEMORY_ATTRIBUTE_PRIVATE, which marks the associated gfn as being
-guest private memory.
+returns all attributes supported by KVM. The attribute defined at this
+time are:
+
+ - KVM_MEMORY_ATTRIBUTE_NR/NW/NX - Respectively marks the memory region as
+ non-read, non-write and/or non-exec. Note that write-only, exec-only and
+ write-exec mappings are not supported.
+ - KVM_MEMORY_ATTRIBUTE_PRIVATE - Which marks the associated gfn as being guest
+ private memory.
Note, there is no "get" API. Userspace is responsible for explicitly tracking
the state of a gfn/page as needed.
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9250bf1c4db15..85378345e8e77 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2411,6 +2411,21 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE;
}
+static inline bool kvm_mem_attributes_may_read(u64 attrs)
+{
+ return !(attrs & KVM_MEMORY_ATTRIBUTE_NR);
+}
+
+static inline bool kvm_mem_attributes_may_write(u64 attrs)
+{
+ return !(attrs & KVM_MEMORY_ATTRIBUTE_NW);
+}
+
+static inline bool kvm_mem_attributes_may_exec(u64 attrs)
+{
+ return !(attrs & KVM_MEMORY_ATTRIBUTE_NX);
+}
+
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
{
@@ -2423,7 +2438,7 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
struct kvm_gfn_range *range);
bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
struct kvm_gfn_range *range);
-
+bool kvm_mem_attributes_valid(struct kvm *kvm, unsigned long attrs);
static inline bool kvm_memory_attributes_in_use(struct kvm *kvm)
{
return !xa_empty(&kvm->mem_attr_array);
@@ -2435,6 +2450,11 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
}
#else
+static inline bool kvm_mem_attributes_valid(struct kvm *kvm,
+ unsigned long attrs)
+{
+ return false;
+}
static inline bool kvm_memory_attributes_in_use(struct kvm *kvm)
{
return false;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 516d39910f9ab..26d4477dae8c6 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1550,6 +1550,9 @@ struct kvm_memory_attributes {
__u64 flags;
};
+#define KVM_MEMORY_ATTRIBUTE_NR (1ULL << 0)
+#define KVM_MEMORY_ATTRIBUTE_NW (1ULL << 1)
+#define KVM_MEMORY_ATTRIBUTE_NX (1ULL << 2)
#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
#define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 63c4b6739edee..bd27fc01e9715 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2430,10 +2430,14 @@ bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
static u64 kvm_supported_mem_attributes(struct kvm *kvm)
{
+ u64 supported_attrs = KVM_MEMORY_ATTRIBUTE_NR |
+ KVM_MEMORY_ATTRIBUTE_NW |
+ KVM_MEMORY_ATTRIBUTE_NX;
+
if (!kvm || kvm_arch_has_private_mem(kvm))
- return KVM_MEMORY_ATTRIBUTE_PRIVATE;
+ supported_attrs |= KVM_MEMORY_ATTRIBUTE_PRIVATE;
- return 0;
+ return supported_attrs;
}
static __always_inline void kvm_handle_gfn_range(struct kvm *kvm,
@@ -2557,6 +2561,28 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
return r;
}
+
+bool kvm_mem_attributes_valid(struct kvm *kvm, unsigned long attrs)
+{
+ bool may_read = kvm_mem_attributes_may_read(attrs);
+ bool may_write = kvm_mem_attributes_may_write(attrs);
+ bool may_exec = kvm_mem_attributes_may_exec(attrs);
+ bool priv = attrs & KVM_MEMORY_ATTRIBUTE_PRIVATE;
+
+ if (attrs & ~kvm_supported_mem_attributes(kvm))
+ return false;
+
+ /* Private memory and access permissions are incompatible */
+ if (priv && (!may_read || !may_write || !may_exec))
+ return false;
+
+ /* Write and exec mappings require read access */
+ if ((may_write || may_exec) && !may_read)
+ return false;
+
+ return true;
+}
+
static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
struct kvm_memory_attributes *attrs)
{
@@ -2565,7 +2591,7 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
/* flags is currently not used. */
if (attrs->flags)
return -EINVAL;
- if (attrs->attributes & ~kvm_supported_mem_attributes(kvm))
+ if (!kvm_mem_attributes_valid(kvm, attrs->attributes))
return -EINVAL;
if (attrs->size == 0 || attrs->address + attrs->size < attrs->address)
return -EINVAL;
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 16/18] KVM: x86: Take mem attributes into account when faulting memory
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
` (14 preceding siblings ...)
2024-06-09 15:49 ` [PATCH 15/18] KVM: Introduce RWX memory attributes Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-08-22 15:21 ` Nicolas Saenz Julienne
2024-06-09 15:49 ` [PATCH 17/18] KVM: Introduce traces to track memory attributes modification Nicolas Saenz Julienne
` (3 subsequent siblings)
19 siblings, 1 reply; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
Take into account access restrictions memory attributes when faulting
guest memory. Prohibited memory accesses will cause an user-space fault
exit.
Additionally, bypass a warning in the !tdp case. Access restrictions in
guest page tables might not necessarily match the host pte's when memory
attributes are in use.
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
arch/x86/kvm/mmu/mmu.c | 64 ++++++++++++++++++++++++++++------
arch/x86/kvm/mmu/mmutrace.h | 29 +++++++++++++++
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
include/linux/kvm_host.h | 4 +++
4 files changed, 87 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 91edd873dcdbc..dfe50c9c31f7b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -754,7 +754,8 @@ static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
return sp->role.access;
}
-static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
+static void kvm_mmu_page_set_translation(struct kvm *kvm,
+ struct kvm_mmu_page *sp, int index,
gfn_t gfn, unsigned int access)
{
if (sp_has_gptes(sp)) {
@@ -762,10 +763,17 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
return;
}
- WARN_ONCE(access != kvm_mmu_page_get_access(sp, index),
- "access mismatch under %s page %llx (expected %u, got %u)\n",
- sp->role.passthrough ? "passthrough" : "direct",
- sp->gfn, kvm_mmu_page_get_access(sp, index), access);
+ /*
+ * Userspace might have introduced memory attributes for this gfn,
+ * breaking the assumption that the spte's access restrictions match
+ * the guest's. Userspace is also responsible from taking care of
+ * faults caused by these 'artificial' access restrictions.
+ */
+ WARN_ONCE(access != kvm_mmu_page_get_access(sp, index) &&
+ !kvm_get_memory_attributes(kvm, gfn),
+ "access mismatch under %s page %llx (expected %u, got %u)\n",
+ sp->role.passthrough ? "passthrough" : "direct", sp->gfn,
+ kvm_mmu_page_get_access(sp, index), access);
WARN_ONCE(gfn != kvm_mmu_page_get_gfn(sp, index),
"gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
@@ -773,12 +781,12 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
}
-static void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index,
- unsigned int access)
+static void kvm_mmu_page_set_access(struct kvm *kvm, struct kvm_mmu_page *sp,
+ int index, unsigned int access)
{
gfn_t gfn = kvm_mmu_page_get_gfn(sp, index);
- kvm_mmu_page_set_translation(sp, index, gfn, access);
+ kvm_mmu_page_set_translation(kvm, sp, index, gfn, access);
}
/*
@@ -1607,7 +1615,7 @@ static void __rmap_add(struct kvm *kvm,
int rmap_count;
sp = sptep_to_sp(spte);
- kvm_mmu_page_set_translation(sp, spte_index(spte), gfn, access);
+ kvm_mmu_page_set_translation(kvm, sp, spte_index(spte), gfn, access);
kvm_update_page_stats(kvm, sp->role.level, 1);
rmap_head = gfn_to_rmap(gfn, sp->role.level, slot);
@@ -2928,7 +2936,8 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
rmap_add(vcpu, slot, sptep, gfn, pte_access);
} else {
/* Already rmapped but the pte_access bits may have changed. */
- kvm_mmu_page_set_access(sp, spte_index(sptep), pte_access);
+ kvm_mmu_page_set_access(vcpu->kvm, sp, spte_index(sptep),
+ pte_access);
}
return ret;
@@ -4320,6 +4329,38 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
return RET_PF_CONTINUE;
}
+static int kvm_mem_attributes_faultin_access_prots(struct kvm_vcpu *vcpu,
+ struct kvm_page_fault *fault)
+{
+ bool may_read, may_write, may_exec;
+ unsigned long attrs;
+
+ attrs = kvm_get_memory_attributes(vcpu->kvm, fault->gfn);
+ if (!attrs)
+ return RET_PF_CONTINUE;
+
+ if (!kvm_mem_attributes_valid(vcpu->kvm, attrs)) {
+ kvm_err("Invalid mem attributes 0x%lx found for address 0x%016llx\n",
+ attrs, fault->addr);
+ return -EFAULT;
+ }
+
+ trace_kvm_mem_attributes_faultin_access_prots(vcpu, fault, attrs);
+
+ may_read = kvm_mem_attributes_may_read(attrs);
+ may_write = kvm_mem_attributes_may_write(attrs);
+ may_exec = kvm_mem_attributes_may_exec(attrs);
+
+ if ((fault->user && !may_read) || (fault->write && !may_write) ||
+ (fault->exec && !may_exec))
+ return -EFAULT;
+
+ fault->map_writable = may_write;
+ fault->map_executable = may_exec;
+
+ return RET_PF_CONTINUE;
+}
+
static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
bool async;
@@ -4375,7 +4416,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
* Now that we have a snapshot of mmu_invalidate_seq we can check for a
* private vs. shared mismatch.
*/
- if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
+ if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn) ||
+ kvm_mem_attributes_faultin_access_prots(vcpu, fault)) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
return -EFAULT;
}
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 195d98bc8de85..ddbdd7396e9fa 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -440,6 +440,35 @@ TRACE_EVENT(
__entry->gfn, __entry->spte, __entry->level, __entry->errno)
);
+TRACE_EVENT(kvm_mem_attributes_faultin_access_prots,
+ TP_PROTO(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+ u64 mem_attrs),
+ TP_ARGS(vcpu, fault, mem_attrs),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, vcpu_id)
+ __field(unsigned long, guest_rip)
+ __field(u64, fault_address)
+ __field(bool, write)
+ __field(bool, exec)
+ __field(u64, mem_attrs)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu_id = vcpu->vcpu_id;
+ __entry->guest_rip = kvm_rip_read(vcpu);
+ __entry->fault_address = fault->addr;
+ __entry->write = fault->write;
+ __entry->exec = fault->exec;
+ __entry->mem_attrs = mem_attrs;
+ ),
+
+ TP_printk("vcpu %d rip 0x%lx gfn 0x%016llx access %s mem_attrs 0x%llx",
+ __entry->vcpu_id, __entry->guest_rip, __entry->fault_address,
+ __entry->exec ? "X" : (__entry->write ? "W" : "R"),
+ __entry->mem_attrs)
+);
+
#endif /* _TRACE_KVMMMU_H */
#undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index d3dbcf382ed2d..166f5f0e885e0 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -954,7 +954,7 @@ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int
return 0;
/* Update the shadowed access bits in case they changed. */
- kvm_mmu_page_set_access(sp, i, pte_access);
+ kvm_mmu_page_set_access(vcpu->kvm, sp, i, pte_access);
sptep = &sp->spt[i];
spte = *sptep;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 85378345e8e77..9c26161d13dea 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2463,6 +2463,10 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
{
return false;
}
+static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
+{
+ return 0;
+}
#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
#ifdef CONFIG_KVM_PRIVATE_MEM
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 16/18] KVM: x86: Take mem attributes into account when faulting memory
2024-06-09 15:49 ` [PATCH 16/18] KVM: x86: Take mem attributes into account when faulting memory Nicolas Saenz Julienne
@ 2024-08-22 15:21 ` Nicolas Saenz Julienne
2024-08-22 16:58 ` Sean Christopherson
0 siblings, 1 reply; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-08-22 15:21 UTC (permalink / raw)
To: Nicolas Saenz Julienne, linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, pdurrant, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
On Sun Jun 9, 2024 at 3:49 PM UTC, Nicolas Saenz Julienne wrote:
> Take into account access restrictions memory attributes when faulting
> guest memory. Prohibited memory accesses will cause an user-space fault
> exit.
>
> Additionally, bypass a warning in the !tdp case. Access restrictions in
> guest page tables might not necessarily match the host pte's when memory
> attributes are in use.
>
> Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
I now realize that only taking into account memory attributes during
faults isn't good enough for VSM. We should check the attributes anytime
KVM takes GPAs as input for any action initiated by the guest. If the
memory attributes are incompatible with such action, it should be
stopped. Failure to do so opens side channels that unprivileged VTLs can
abuse to infer information about privileged VTL. Some examples I came up
with:
- Guest page walks: VTL0 could install malicious directory entries that
point to GPAs only visible to VTL1. KVM will happily continue the
walk. Among other things, this could be use to infer VTL1's GVA->GPA
mappings.
- PV interfaces like the Hyper-V TSC page or VP assist page, could be
used to modify portions of VTL1 memory.
- Hyper-V hypercalls that take GPAs as input/output can be abused in a
myriad of ways. Including ones that exit into user-space.
We would be protected against all these if we implemented the memory
access restrictions through the memory slots API. As is, it has the
drawback of having to quiesce the whole VM for any non-trivial slot
modification (i.e. VSM's memory protections). But if we found a way to
speed up the slot updates we could rely on that, and avoid having to
teach kvm_read/write_guest() and friends to deal with memattrs. Note
that we would still need to use memory attributes to request for faults
to exit onto user-space on those select GPAs. Any opinions or
suggestions?
Note that, for now, I'll stick with the memory attributes approach to
see what the full solution looks like.
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 16/18] KVM: x86: Take mem attributes into account when faulting memory
2024-08-22 15:21 ` Nicolas Saenz Julienne
@ 2024-08-22 16:58 ` Sean Christopherson
2024-09-13 18:26 ` Nicolas Saenz Julienne
0 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2024-08-22 16:58 UTC (permalink / raw)
To: Nicolas Saenz Julienne
Cc: linux-kernel, kvm, pbonzini, vkuznets, linux-doc, linux-hyperv,
linux-arch, linux-trace-kernel, graf, dwmw2, pdurrant, mlevitsk,
jgowans, corbet, decui, tglx, mingo, bp, dave.hansen, x86,
amoorthy
On Thu, Aug 22, 2024, Nicolas Saenz Julienne wrote:
> On Sun Jun 9, 2024 at 3:49 PM UTC, Nicolas Saenz Julienne wrote:
> > Take into account access restrictions memory attributes when faulting
> > guest memory. Prohibited memory accesses will cause an user-space fault
> > exit.
> >
> > Additionally, bypass a warning in the !tdp case. Access restrictions in
> > guest page tables might not necessarily match the host pte's when memory
> > attributes are in use.
> >
> > Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
>
> I now realize that only taking into account memory attributes during
> faults isn't good enough for VSM. We should check the attributes anytime
> KVM takes GPAs as input for any action initiated by the guest. If the
> memory attributes are incompatible with such action, it should be
> stopped. Failure to do so opens side channels that unprivileged VTLs can
> abuse to infer information about privileged VTL. Some examples I came up
> with:
> - Guest page walks: VTL0 could install malicious directory entries that
> point to GPAs only visible to VTL1. KVM will happily continue the
> walk. Among other things, this could be use to infer VTL1's GVA->GPA
> mappings.
> - PV interfaces like the Hyper-V TSC page or VP assist page, could be
> used to modify portions of VTL1 memory.
> - Hyper-V hypercalls that take GPAs as input/output can be abused in a
> myriad of ways. Including ones that exit into user-space.
>
> We would be protected against all these if we implemented the memory
> access restrictions through the memory slots API. As is, it has the
> drawback of having to quiesce the whole VM for any non-trivial slot
> modification (i.e. VSM's memory protections). But if we found a way to
> speed up the slot updates we could rely on that, and avoid having to
> teach kvm_read/write_guest() and friends to deal with memattrs. Note
> that we would still need to use memory attributes to request for faults
> to exit onto user-space on those select GPAs. Any opinions or
> suggestions?
>
> Note that, for now, I'll stick with the memory attributes approach to
> see what the full solution looks like.
FWIW, I suspect we'll be better off honoring memory attributes. It's not just
the KVM side that has issues with memslot updates, my understanding is userspace
has also built up "slow" code with respect to memslot updates, in part because
it's such a slow path in KVM.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 16/18] KVM: x86: Take mem attributes into account when faulting memory
2024-08-22 16:58 ` Sean Christopherson
@ 2024-09-13 18:26 ` Nicolas Saenz Julienne
0 siblings, 0 replies; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-09-13 18:26 UTC (permalink / raw)
To: Sean Christopherson
Cc: linux-kernel, kvm, pbonzini, vkuznets, linux-doc, linux-hyperv,
linux-arch, linux-trace-kernel, graf, dwmw2, pdurrant, mlevitsk,
jgowans, corbet, decui, tglx, mingo, bp, dave.hansen, x86,
amoorthy
On Thu Aug 22, 2024 at 4:58 PM UTC, Sean Christopherson wrote:
> On Thu, Aug 22, 2024, Nicolas Saenz Julienne wrote:
> > On Sun Jun 9, 2024 at 3:49 PM UTC, Nicolas Saenz Julienne wrote:
> > > Take into account access restrictions memory attributes when faulting
> > > guest memory. Prohibited memory accesses will cause an user-space fault
> > > exit.
> > >
> > > Additionally, bypass a warning in the !tdp case. Access restrictions in
> > > guest page tables might not necessarily match the host pte's when memory
> > > attributes are in use.
> > >
> > > Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
> >
> > I now realize that only taking into account memory attributes during
> > faults isn't good enough for VSM. We should check the attributes anytime
> > KVM takes GPAs as input for any action initiated by the guest. If the
> > memory attributes are incompatible with such action, it should be
> > stopped. Failure to do so opens side channels that unprivileged VTLs can
> > abuse to infer information about privileged VTL. Some examples I came up
> > with:
> > - Guest page walks: VTL0 could install malicious directory entries that
> > point to GPAs only visible to VTL1. KVM will happily continue the
> > walk. Among other things, this could be use to infer VTL1's GVA->GPA
> > mappings.
> > - PV interfaces like the Hyper-V TSC page or VP assist page, could be
> > used to modify portions of VTL1 memory.
> > - Hyper-V hypercalls that take GPAs as input/output can be abused in a
> > myriad of ways. Including ones that exit into user-space.
> >
> > We would be protected against all these if we implemented the memory
> > access restrictions through the memory slots API. As is, it has the
> > drawback of having to quiesce the whole VM for any non-trivial slot
> > modification (i.e. VSM's memory protections). But if we found a way to
> > speed up the slot updates we could rely on that, and avoid having to
> > teach kvm_read/write_guest() and friends to deal with memattrs. Note
> > that we would still need to use memory attributes to request for faults
> > to exit onto user-space on those select GPAs. Any opinions or
> > suggestions?
> >
> > Note that, for now, I'll stick with the memory attributes approach to
> > see what the full solution looks like.
>
> FWIW, I suspect we'll be better off honoring memory attributes. It's not just
> the KVM side that has issues with memslot updates, my understanding is userspace
> has also built up "slow" code with respect to memslot updates, in part because
> it's such a slow path in KVM.
Sean, since I see you're looking at the series. I don't think it's worth
spending too much time with the memory attributes patches. Since
figuring out the sidechannels mentioned above, I found even more
shortcomings in this implementation. I'm reworking the whole thing in a
separate series [1], taking into account sidechannels, MMIO, non-TDP
MMUs, etc. and introducing selftests and an in-depth design document.
[1] https://github.com/vianpl/linux branch 'vsm/memory-protections' (wip)
Thanks,
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 17/18] KVM: Introduce traces to track memory attributes modification.
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
` (15 preceding siblings ...)
2024-06-09 15:49 ` [PATCH 16/18] KVM: x86: Take mem attributes into account when faulting memory Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-06-09 15:49 ` [PATCH 18/18] KVM: x86: hyper-v: Handle VSM hcalls in user-space Nicolas Saenz Julienne
` (2 subsequent siblings)
19 siblings, 0 replies; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
Introduce traces that track memory attributes modification.
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
include/trace/events/kvm.h | 20 ++++++++++++++++++++
virt/kvm/kvm_main.c | 2 ++
2 files changed, 22 insertions(+)
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 74e40d5d4af42..aa6caeb16f12a 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -489,6 +489,26 @@ TRACE_EVENT(kvm_test_age_hva,
TP_printk("mmu notifier test age hva: %#016lx", __entry->hva)
);
+TRACE_EVENT(kvm_vm_set_mem_attributes,
+ TP_PROTO(u64 start, u64 cnt, u64 attributes),
+ TP_ARGS(start, cnt, attributes),
+
+ TP_STRUCT__entry(
+ __field( u64, start )
+ __field( u64, cnt )
+ __field( u64, attributes )
+ ),
+
+ TP_fast_assign(
+ __entry->start = start;
+ __entry->cnt = cnt;
+ __entry->attributes = attributes;
+ ),
+
+ TP_printk("gfn 0x%llx, cnt 0x%llx, attributes 0x%llx",
+ __entry->start, __entry->cnt, __entry->attributes)
+);
+
#endif /* _TRACE_KVM_MAIN_H */
/* This part must be outside protection */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bd27fc01e9715..1c493ece3deb1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2556,6 +2556,8 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
kvm_handle_gfn_range(kvm, &post_set_range);
+ trace_kvm_vm_set_mem_attributes(start, end - start, attributes);
+
out_unlock:
mutex_unlock(&kvm->slots_lock);
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 18/18] KVM: x86: hyper-v: Handle VSM hcalls in user-space
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
` (16 preceding siblings ...)
2024-06-09 15:49 ` [PATCH 17/18] KVM: Introduce traces to track memory attributes modification Nicolas Saenz Julienne
@ 2024-06-09 15:49 ` Nicolas Saenz Julienne
2024-07-03 9:55 ` [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
2024-09-13 19:19 ` Sean Christopherson
19 siblings, 0 replies; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-06-09 15:49 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: pbonzini, seanjc, vkuznets, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, paul, nsaenz, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
Let user-space handle all hypercalls that fall under the AccessVsm
partition privilege flag. That is:
- HvCallModifyVtlProtectionMask
- HvCallEnablePartitionVtl
- HvCallEnableVpVtl
- HvCallVtlCall
- HvCallVtlReturn
All these are VTL aware and as such need to be handled in user-space.
Additionally, select KVM_GENERIC_MEMORY_ATTRIBUTES when
CONFIG_KVM_HYPERV is enabled, as it's necessary in order to implement
VTL memory protections.
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
---
Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/hyperv.c | 29 +++++++++++++++++++++++++----
include/asm-generic/hyperv-tlfs.h | 6 +++++-
4 files changed, 54 insertions(+), 5 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 6d3bc5092ea63..77af2ccf49a30 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8969,3 +8969,26 @@ HvCallGetVpIndexFromApicId. Currently, it is only used in conjunction with
HV_ACCESS_VSM, and immediately exits to userspace with KVM_EXIT_HYPERV_HCALL as
the reason. Userspace is expected to complete the hypercall before resuming
execution.
+
+10.4 HV_ACCESS_VSM
+------------------
+
+:Location: CPUID.40000003H:EBX[bit 16]
+
+This CPUID indicates that KVM supports HvCallModifyVtlProtectionMask,
+HvCallEnablePartitionVtl, HvCallEnableVpVtl, HvCallVtlCall, and
+HvCallVtlReturn. Additionally, as a prerequirsite to being able to implement
+Hyper-V VSM, it also identifies the availability of HvTranslateVirtualAddress,
+as well as the VTL-aware aspects of HvCallSendSyntheticClusterIpi and
+HvCallSendSyntheticClusterIpiEx.
+
+All these hypercalls immediately exit with KVM_EXIT_HYPERV_HCALL as the reason.
+Userspace is expected to complete the hypercall before resuming execution.
+Note that both IPI hypercalls will only exit to userspace if the request is
+VTL-aware, which will only happen if HV_ACCESS_VSM is exposed to the guest.
+
+Access restriction memory attributes (4.141) are available to simplify
+HvCallModifyVtlProtectionMask's implementation.
+
+Ultimately this CPUID also indicates that KVM_MP_STATE_HV_INACTIVE_VTL is
+available.
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index fec95a7702703..8d851fe3b8c25 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -157,6 +157,7 @@ config KVM_SMM
config KVM_HYPERV
bool "Support for Microsoft Hyper-V emulation"
depends on KVM
+ select KVM_GENERIC_MEMORY_ATTRIBUTES
default y
help
Provides KVM support for emulating Microsoft Hyper-V. This allows KVM
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index dd64f41dc835d..1158c59a92790 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2388,7 +2388,12 @@ static void kvm_hv_hypercall_set_result(struct kvm_vcpu *vcpu, u64 result)
}
}
-static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u64 result)
+static inline bool kvm_hv_is_vtl_call_return(u16 code)
+{
+ return code == HVCALL_VTL_CALL || code == HVCALL_VTL_RETURN;
+}
+
+static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u16 code, u64 result)
{
u32 tlb_lock_count = 0;
int ret;
@@ -2400,9 +2405,12 @@ static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u64 result)
result = HV_STATUS_INVALID_HYPERCALL_INPUT;
trace_kvm_hv_hypercall_done(result);
- kvm_hv_hypercall_set_result(vcpu, result);
++vcpu->stat.hypercalls;
+ /* VTL call and return don't set a hcall result */
+ if (!kvm_hv_is_vtl_call_return(code))
+ kvm_hv_hypercall_set_result(vcpu, result);
+
ret = kvm_skip_emulated_instruction(vcpu);
if (tlb_lock_count)
@@ -2459,7 +2467,7 @@ static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu)
kvm_hv_write_xmm(vcpu->run->hyperv.u.hcall.xmm);
}
- return kvm_hv_hypercall_complete(vcpu, result);
+ return kvm_hv_hypercall_complete(vcpu, code, result);
}
static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
@@ -2513,6 +2521,7 @@ static bool is_xmm_fast_hypercall(struct kvm_hv_hcall *hc)
case HVCALL_SEND_IPI_EX:
case HVCALL_GET_VP_REGISTERS:
case HVCALL_SET_VP_REGISTERS:
+ case HVCALL_MODIFY_VTL_PROTECTION_MASK:
case HVCALL_TRANSLATE_VIRTUAL_ADDRESS:
return true;
}
@@ -2552,6 +2561,12 @@ static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code)
*/
return !kvm_hv_is_syndbg_enabled(hv_vcpu->vcpu) ||
hv_vcpu->cpuid_cache.features_ebx & HV_DEBUGGING;
+ case HVCALL_MODIFY_VTL_PROTECTION_MASK:
+ case HVCALL_ENABLE_PARTITION_VTL:
+ case HVCALL_ENABLE_VP_VTL:
+ case HVCALL_VTL_CALL:
+ case HVCALL_VTL_RETURN:
+ return hv_vcpu->cpuid_cache.features_ebx & HV_ACCESS_VSM;
case HVCALL_GET_VP_REGISTERS:
case HVCALL_SET_VP_REGISTERS:
return hv_vcpu->cpuid_cache.features_ebx &
@@ -2744,6 +2759,11 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
break;
}
goto hypercall_userspace_exit;
+ case HVCALL_MODIFY_VTL_PROTECTION_MASK:
+ case HVCALL_ENABLE_PARTITION_VTL:
+ case HVCALL_ENABLE_VP_VTL:
+ case HVCALL_VTL_CALL:
+ case HVCALL_VTL_RETURN:
case HVCALL_GET_VP_REGISTERS:
case HVCALL_SET_VP_REGISTERS:
case HVCALL_TRANSLATE_VIRTUAL_ADDRESS:
@@ -2765,7 +2785,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
}
hypercall_complete:
- return kvm_hv_hypercall_complete(vcpu, ret);
+ return kvm_hv_hypercall_complete(vcpu, hc.code, ret);
hypercall_userspace_exit:
vcpu->run->exit_reason = KVM_EXIT_HYPERV;
@@ -2921,6 +2941,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
ent->ebx |= HV_POST_MESSAGES;
ent->ebx |= HV_SIGNAL_EVENTS;
ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS;
+ ent->ebx |= HV_ACCESS_VSM;
ent->ebx |= HV_ACCESS_VP_REGISTERS;
ent->ebx |= HV_START_VIRTUAL_PROCESSOR;
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index e24b88ec4ec00..6b12e5818292c 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -149,9 +149,13 @@ union hv_reference_tsc_msr {
/* Declare the various hypercall operations. */
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE 0x0002
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST 0x0003
-#define HVCALL_ENABLE_VP_VTL 0x000f
#define HVCALL_NOTIFY_LONG_SPIN_WAIT 0x0008
#define HVCALL_SEND_IPI 0x000b
+#define HVCALL_MODIFY_VTL_PROTECTION_MASK 0x000c
+#define HVCALL_ENABLE_PARTITION_VTL 0x000d
+#define HVCALL_ENABLE_VP_VTL 0x000f
+#define HVCALL_VTL_CALL 0x0011
+#define HVCALL_VTL_RETURN 0x0012
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
#define HVCALL_SEND_IPI_EX 0x0015
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
` (17 preceding siblings ...)
2024-06-09 15:49 ` [PATCH 18/18] KVM: x86: hyper-v: Handle VSM hcalls in user-space Nicolas Saenz Julienne
@ 2024-07-03 9:55 ` Nicolas Saenz Julienne
2024-07-03 12:48 ` Vitaly Kuznetsov
2024-09-13 19:19 ` Sean Christopherson
19 siblings, 1 reply; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-07-03 9:55 UTC (permalink / raw)
To: seanjc
Cc: pbonzini, seanjc, linux-kernel, kvm, vkuznets, linux-doc,
linux-hyperv, linux-arch, nsaenz, linux-trace-kernel, graf, dwmw2,
pdurrant, mlevitsk, jgowans, corbet, decui, tglx, mingo, bp,
dave.hansen, x86, amoorthy
Hi Sean,
On Sun Jun 9, 2024 at 3:49 PM UTC, Nicolas Saenz Julienne wrote:
> This series introduces core KVM functionality necessary to emulate Hyper-V's
> Virtual Secure Mode in a Virtual Machine Monitor (VMM).
Just wanted to make sure the series is in your radar.
Thanks,
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation
2024-07-03 9:55 ` [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
@ 2024-07-03 12:48 ` Vitaly Kuznetsov
2024-07-03 13:18 ` Nicolas Saenz Julienne
0 siblings, 1 reply; 40+ messages in thread
From: Vitaly Kuznetsov @ 2024-07-03 12:48 UTC (permalink / raw)
To: Nicolas Saenz Julienne, seanjc
Cc: pbonzini, seanjc, linux-kernel, kvm, linux-doc, linux-hyperv,
linux-arch, nsaenz, linux-trace-kernel, graf, dwmw2, pdurrant,
mlevitsk, jgowans, corbet, decui, tglx, mingo, bp, dave.hansen,
x86, amoorthy
Nicolas Saenz Julienne <nsaenz@amazon.com> writes:
> Hi Sean,
>
> On Sun Jun 9, 2024 at 3:49 PM UTC, Nicolas Saenz Julienne wrote:
>> This series introduces core KVM functionality necessary to emulate Hyper-V's
>> Virtual Secure Mode in a Virtual Machine Monitor (VMM).
>
> Just wanted to make sure the series is in your radar.
>
Not Sean here but I was planning to take a look at least at Hyper-V
parts of it next week.
--
Vitaly
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation
2024-07-03 12:48 ` Vitaly Kuznetsov
@ 2024-07-03 13:18 ` Nicolas Saenz Julienne
0 siblings, 0 replies; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-07-03 13:18 UTC (permalink / raw)
To: Vitaly Kuznetsov, seanjc
Cc: pbonzini, linux-kernel, kvm, linux-doc, linux-hyperv, linux-arch,
linux-trace-kernel, graf, dwmw2, pdurrant, mlevitsk, jgowans,
corbet, decui, tglx, mingo, bp, dave.hansen, x86, amoorthy
Hi Vitaly,
On Wed Jul 3, 2024 at 12:48 PM UTC, Vitaly Kuznetsov wrote:
> Nicolas Saenz Julienne <nsaenz@amazon.com> writes:
>
> > Hi Sean,
> >
> > On Sun Jun 9, 2024 at 3:49 PM UTC, Nicolas Saenz Julienne wrote:
> >> This series introduces core KVM functionality necessary to emulate Hyper-V's
> >> Virtual Secure Mode in a Virtual Machine Monitor (VMM).
> >
> > Just wanted to make sure the series is in your radar.
> >
>
> Not Sean here but I was planning to take a look at least at Hyper-V
> parts of it next week.
Thanks for the update.
Nicolas
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation
2024-06-09 15:49 [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
` (18 preceding siblings ...)
2024-07-03 9:55 ` [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation Nicolas Saenz Julienne
@ 2024-09-13 19:19 ` Sean Christopherson
2024-09-16 16:32 ` Nicolas Saenz Julienne
19 siblings, 1 reply; 40+ messages in thread
From: Sean Christopherson @ 2024-09-13 19:19 UTC (permalink / raw)
To: Nicolas Saenz Julienne
Cc: linux-kernel, kvm, pbonzini, vkuznets, linux-doc, linux-hyperv,
linux-arch, linux-trace-kernel, graf, dwmw2, paul, mlevitsk,
jgowans, corbet, decui, tglx, mingo, bp, dave.hansen, x86,
amoorthy
On Sun, Jun 09, 2024, Nicolas Saenz Julienne wrote:
> This series introduces core KVM functionality necessary to emulate Hyper-V's
> Virtual Secure Mode in a Virtual Machine Monitor (VMM).
...
> As discussed at LPC2023 and in our previous RFC [2], we decided to model each
> VTL as a distinct KVM VM. With this approach, and the RWX memory attributes
> introduced in this series, we have been able to implement VTL memory
> protections in a non-intrusive way, using generic KVM APIs. Additionally, each
> CPU's VTL is modeled as a distinct KVM vCPU, owned by the KVM VM tracking that
> VTL's state. VTL awareness is fully removed from KVM, and the responsibility
> for VTL-aware hypercalls, VTL scheduling, and state transfer is delegated to
> userspace.
>
> Series overview:
> - 1-8: Introduce a number of Hyper-V hyper-calls, all of which are VTL-aware and
> expected to be handled in userspace. Additionally an new VTL-specifc MP
> state is introduced.
> - 9-10: Pass the instruction length as part of the userspace fault exit data
> in order to simplify VSM's secure intercept generation.
> - 11-17: Introduce RWX memory attributes as well as extend userspace faults.
> - 18: Introduces the main VSM CPUID bit which gates all VTL configuration and
> runtime hypercalls.
Aside from the RWX attributes, which to no one's surprise will need a lot of work
to get them performant and functional, are there any "big" TODO items that you see
in KVM?
If this series is more or less code complete, IMO modeling VTLs as distinct VM
structures is a clear win. Except for the "idle VTL" stuff, which I think we can
simplify, this series is quite boring, and I mean that in the best possible way :-)
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation
2024-09-13 19:19 ` Sean Christopherson
@ 2024-09-16 16:32 ` Nicolas Saenz Julienne
0 siblings, 0 replies; 40+ messages in thread
From: Nicolas Saenz Julienne @ 2024-09-16 16:32 UTC (permalink / raw)
To: Sean Christopherson
Cc: linux-kernel, kvm, pbonzini, vkuznets, linux-doc, linux-hyperv,
linux-arch, linux-trace-kernel, graf, dwmw2, paul, mlevitsk,
jgowans, corbet, decui, tglx, mingo, bp, dave.hansen, x86,
amoorthy
On Fri Sep 13, 2024 at 7:19 PM UTC, Sean Christopherson wrote:
> On Sun, Jun 09, 2024, Nicolas Saenz Julienne wrote:
> > This series introduces core KVM functionality necessary to emulate Hyper-V's
> > Virtual Secure Mode in a Virtual Machine Monitor (VMM).
>
> ...
>
> > As discussed at LPC2023 and in our previous RFC [2], we decided to model each
> > VTL as a distinct KVM VM. With this approach, and the RWX memory attributes
> > introduced in this series, we have been able to implement VTL memory
> > protections in a non-intrusive way, using generic KVM APIs. Additionally, each
> > CPU's VTL is modeled as a distinct KVM vCPU, owned by the KVM VM tracking that
> > VTL's state. VTL awareness is fully removed from KVM, and the responsibility
> > for VTL-aware hypercalls, VTL scheduling, and state transfer is delegated to
> > userspace.
> >
> > Series overview:
> > - 1-8: Introduce a number of Hyper-V hyper-calls, all of which are VTL-aware and
> > expected to be handled in userspace. Additionally an new VTL-specifc MP
> > state is introduced.
> > - 9-10: Pass the instruction length as part of the userspace fault exit data
> > in order to simplify VSM's secure intercept generation.
> > - 11-17: Introduce RWX memory attributes as well as extend userspace faults.
> > - 18: Introduces the main VSM CPUID bit which gates all VTL configuration and
> > runtime hypercalls.
>
> Aside from the RWX attributes, which to no one's surprise will need a lot of work
> to get them performant and functional, are there any "big" TODO items that you see
> in KVM?
Aside from VTLs and VTL switching, there is bunch of KVM features we
still need to be fully compliant with the VSM spec:
- KVM_TRANSLATE2, which Nikolas Wipper posted a week ago [1].
Technically we can do this in user-space, but it's way simpler to
re-use KVM's page-walker.
- Hv's TlbFlushInhibit, it allows VTL1 to block VTL0 vCPUs from issuing
TLB Flushes, and blocks them until uninhibited. Note this only applies
to para-virtualized TLB flushes:
HvFlushVirtualAddress{Space,SpaceEx,List,ListEx}, so it's 100% Hyper-V
specific.
- CPU register pinning/intecepting, we plan on reusing what HEKI
proposed some time ago, and expose it through an IOCTL using ONE_REG
to represent registers.
- MBEC aware memory attributes, we don't plan on enabling support for
these with the first RWX memattrs submission. We'll do it as a follow
up, especially as not every Windows VBS feature requires it
(Credential Guard doesn't need it, HVCI does).
> If this series is more or less code complete, IMO modeling VTLs as distinct VM
> structures is a clear win.
I agree.
> Except for the "idle VTL" stuff, which I think we can simplify, this
> series is quite boring, and I mean that in the best possible way :-)
:)
Thanks,
Nicolas
[1] https://lore.kernel.org/kvm/20240910152207.38974-1-nikwip@amazon.de
^ permalink raw reply [flat|nested] 40+ messages in thread