From: Paolo Bonzini <pbonzini@redhat.com>
To: David Woodhouse <dwmw2@infradead.org>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH] KVM: x86: Reinstate userspace hypercall support
Date: Mon, 30 Nov 2020 19:36:44 +0100 [thread overview]
Message-ID: <dc2d4330-e45d-641f-226b-005a477efd22@redhat.com> (raw)
In-Reply-To: <1bde4a992be29581e559f7a57818e206e11f84f5.camel@infradead.org>
On 28/11/20 15:20, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> For supporting Xen guests we really want to be able to use vmcall/vmmcall
> for hypercalls as Xen itself does. Reinstate the KVM_EXIT_HYPERCALL
> support that Anthony ripped out in 2007.
>
> Yes, we *could* make it work with KVM_EXIT_IO if we really had to, but
> that makes it guest-visible and makes it distinctly non-trivial to do
> live migration from Xen because we'd have to update the hypercall page(s)
> (which are at unknown locations) as well as dealing with any guest RIP
> which happens to be *in* a hypercall page at the time.
>
> We also actively want to *prevent* the KVM hypercalls from suddenly
> becoming available to guests which think they are Xen guests, which
> this achieves too.
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> Should this test work OK on AMD? I see a separate test which is
> explicitly testing VMCALL on AMD, which makes me suspect it's expected
> to work as well as VMMCALL?
Yes, it should (via the #UD intercept instead of the VMMCALL exit).
> Do we have the test infrastructure for running 32-bit guests easily?
Nope, unfortunately not, and I'm not going to ask you to port the
selftests infrastructure to 32-bit x86 (though it should not be too hard).
Paolo
> Documentation/virt/kvm/api.rst | 23 +++--
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/x86.c | 46 +++++++++
> include/uapi/linux/kvm.h | 1 +
> tools/include/uapi/linux/kvm.h | 57 ++++++++++-
> tools/testing/selftests/kvm/.gitignore | 1 +
> tools/testing/selftests/kvm/Makefile | 1 +
> .../selftests/kvm/x86_64/user_vmcall_test.c | 94 +++++++++++++++++++
> 8 files changed, 214 insertions(+), 10 deletions(-)
> create mode 100644 tools/testing/selftests/kvm/x86_64/user_vmcall_test.c
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 70254eaa5229..fa9160920a08 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4990,13 +4990,13 @@ to the byte array.
>
> .. note::
>
> - For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR,
> - KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
> - operations are complete (and guest state is consistent) only after userspace
> - has re-entered the kernel with KVM_RUN. The kernel side will first finish
> - incomplete operations and then check for pending signals. Userspace
> - can re-enter the guest with an unmasked signal pending to complete
> - pending operations.
> + For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_EPR,
> + KVM_EXIT_HYPERCALL, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the
> + corresponding operations are complete (and guest state is consistent) only
> + after userspace has re-entered the kernel with KVM_RUN. The kernel side
> + will first finish incomplete operations and then check for pending signals.
> + Userspace can re-enter the guest with an unmasked signal pending to
> + complete pending operations.
>
> ::
>
> @@ -5009,8 +5009,13 @@ to the byte array.
> __u32 pad;
> } hypercall;
>
> -Unused. This was once used for 'hypercall to userspace'. To implement
> -such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).
> +This occurs when KVM_CAP_X86_USER_SPACE_HYPERCALL is enabled and the vcpu has
> +executed a VMCALL(Intel) or VMMCALL(AMD) instruction. The arguments are taken
> +from the vcpu registers in accordance with the Xen hypercall ABI.
> +
> +Except for compatibility with existing hypervisors such as Xen, userspace
> +handling of hypercalls is discouraged. To implement such functionality,
> +use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).
>
> .. note:: KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO.
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f002cdb13a0b..a6f72adc48cd 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -984,6 +984,7 @@ struct kvm_arch {
>
> bool guest_can_read_msr_platform_info;
> bool exception_payload_enabled;
> + bool user_space_hypercall;
>
> /* Deflect RDMSR and WRMSR to user space when they trigger a #GP */
> u32 user_space_msr_mask;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a3fdc16cfd6f..e8c5c079a85d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3755,6 +3755,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_SET_GUEST_DEBUG:
> case KVM_CAP_LAST_CPU:
> case KVM_CAP_X86_USER_SPACE_MSR:
> + case KVM_CAP_X86_USER_SPACE_HYPERCALL:
> case KVM_CAP_X86_MSR_FILTER:
> case KVM_CAP_ENFORCE_PV_FEATURE_CPUID:
> r = 1;
> @@ -5275,6 +5276,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> kvm->arch.user_space_msr_mask = cap->args[0];
> r = 0;
> break;
> + case KVM_CAP_X86_USER_SPACE_HYPERCALL:
> + kvm->arch.user_space_hypercall = cap->args[0];
> + r = 0;
> + break;
> default:
> r = -EINVAL;
> break;
> @@ -8063,11 +8068,52 @@ static void kvm_sched_yield(struct kvm *kvm, unsigned long dest_id)
> kvm_vcpu_yield_to(target);
> }
>
> +static int complete_userspace_hypercall(struct kvm_vcpu *vcpu)
> +{
> + kvm_rax_write(vcpu, vcpu->run->hypercall.ret);
> +
> + return kvm_skip_emulated_instruction(vcpu);
> +}
> +
> +int kvm_userspace_hypercall(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_run *run = vcpu->run;
> +
> + if (is_long_mode(vcpu)) {
> + run->hypercall.longmode = 1;
> + run->hypercall.nr = kvm_rax_read(vcpu);
> + run->hypercall.args[0] = kvm_rdi_read(vcpu);
> + run->hypercall.args[1] = kvm_rsi_read(vcpu);
> + run->hypercall.args[2] = kvm_rdx_read(vcpu);
> + run->hypercall.args[3] = kvm_r10_read(vcpu);
> + run->hypercall.args[4] = kvm_r8_read(vcpu);
> + run->hypercall.args[5] = kvm_r9_read(vcpu);
> + run->hypercall.ret = -KVM_ENOSYS;
> + } else {
> + run->hypercall.longmode = 0;
> + run->hypercall.nr = (u32)kvm_rbx_read(vcpu);
> + run->hypercall.args[0] = (u32)kvm_rbx_read(vcpu);
> + run->hypercall.args[1] = (u32)kvm_rcx_read(vcpu);
> + run->hypercall.args[2] = (u32)kvm_rdx_read(vcpu);
> + run->hypercall.args[3] = (u32)kvm_rsi_read(vcpu);
> + run->hypercall.args[4] = (u32)kvm_rdi_read(vcpu);
> + run->hypercall.args[5] = (u32)kvm_rbp_read(vcpu);
> + run->hypercall.ret = (u32)-KVM_ENOSYS;
> + }
> + run->exit_reason = KVM_EXIT_HYPERCALL;
> + vcpu->arch.complete_userspace_io = complete_userspace_hypercall;
> +
> + return 0;
> +}
> +
> int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> {
> unsigned long nr, a0, a1, a2, a3, ret;
> int op_64_bit;
>
> + if (vcpu->kvm->arch.user_space_hypercall)
> + return kvm_userspace_hypercall(vcpu);
> +
> if (kvm_hv_hypercall_enabled(vcpu->kvm))
> return kvm_hv_hypercall(vcpu);
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 886802b8ffba..e01b679e0132 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1056,6 +1056,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
> #define KVM_CAP_SYS_HYPERV_CPUID 191
> #define KVM_CAP_DIRTY_LOG_RING 192
> +#define KVM_CAP_X86_USER_SPACE_HYPERCALL 193
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
> index ca41220b40b8..e01b679e0132 100644
> --- a/tools/include/uapi/linux/kvm.h
> +++ b/tools/include/uapi/linux/kvm.h
> @@ -250,6 +250,7 @@ struct kvm_hyperv_exit {
> #define KVM_EXIT_ARM_NISV 28
> #define KVM_EXIT_X86_RDMSR 29
> #define KVM_EXIT_X86_WRMSR 30
> +#define KVM_EXIT_DIRTY_RING_FULL 31
>
> /* For KVM_EXIT_INTERNAL_ERROR */
> /* Emulate instruction failed. */
> @@ -1053,6 +1054,9 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_X86_USER_SPACE_MSR 188
> #define KVM_CAP_X86_MSR_FILTER 189
> #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
> +#define KVM_CAP_SYS_HYPERV_CPUID 191
> +#define KVM_CAP_DIRTY_LOG_RING 192
> +#define KVM_CAP_X86_USER_SPACE_HYPERCALL 193
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> @@ -1511,7 +1515,7 @@ struct kvm_enc_region {
> /* Available with KVM_CAP_MANUAL_DIRTY_LOG_PROTECT_2 */
> #define KVM_CLEAR_DIRTY_LOG _IOWR(KVMIO, 0xc0, struct kvm_clear_dirty_log)
>
> -/* Available with KVM_CAP_HYPERV_CPUID */
> +/* Available with KVM_CAP_HYPERV_CPUID (vcpu) / KVM_CAP_SYS_HYPERV_CPUID (system) */
> #define KVM_GET_SUPPORTED_HV_CPUID _IOWR(KVMIO, 0xc1, struct kvm_cpuid2)
>
> /* Available with KVM_CAP_ARM_SVE */
> @@ -1557,6 +1561,9 @@ struct kvm_pv_cmd {
> /* Available with KVM_CAP_X86_MSR_FILTER */
> #define KVM_X86_SET_MSR_FILTER _IOW(KVMIO, 0xc6, struct kvm_msr_filter)
>
> +/* Available with KVM_CAP_DIRTY_LOG_RING */
> +#define KVM_RESET_DIRTY_RINGS _IO(KVMIO, 0xc7)
> +
> /* Secure Encrypted Virtualization command */
> enum sev_cmd_id {
> /* Guest initialization commands */
> @@ -1710,4 +1717,52 @@ struct kvm_hyperv_eventfd {
> #define KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (1 << 0)
> #define KVM_DIRTY_LOG_INITIALLY_SET (1 << 1)
>
> +/*
> + * Arch needs to define the macro after implementing the dirty ring
> + * feature. KVM_DIRTY_LOG_PAGE_OFFSET should be defined as the
> + * starting page offset of the dirty ring structures.
> + */
> +#ifndef KVM_DIRTY_LOG_PAGE_OFFSET
> +#define KVM_DIRTY_LOG_PAGE_OFFSET 0
> +#endif
> +
> +/*
> + * KVM dirty GFN flags, defined as:
> + *
> + * |---------------+---------------+--------------|
> + * | bit 1 (reset) | bit 0 (dirty) | Status |
> + * |---------------+---------------+--------------|
> + * | 0 | 0 | Invalid GFN |
> + * | 0 | 1 | Dirty GFN |
> + * | 1 | X | GFN to reset |
> + * |---------------+---------------+--------------|
> + *
> + * Lifecycle of a dirty GFN goes like:
> + *
> + * dirtied harvested reset
> + * 00 -----------> 01 -------------> 1X -------+
> + * ^ |
> + * | |
> + * +------------------------------------------+
> + *
> + * The userspace program is only responsible for the 01->1X state
> + * conversion after harvesting an entry. Also, it must not skip any
> + * dirty bits, so that dirty bits are always harvested in sequence.
> + */
> +#define KVM_DIRTY_GFN_F_DIRTY BIT(0)
> +#define KVM_DIRTY_GFN_F_RESET BIT(1)
> +#define KVM_DIRTY_GFN_F_MASK 0x3
> +
> +/*
> + * KVM dirty rings should be mapped at KVM_DIRTY_LOG_PAGE_OFFSET of
> + * per-vcpu mmaped regions as an array of struct kvm_dirty_gfn. The
> + * size of the gfn buffer is decided by the first argument when
> + * enabling KVM_CAP_DIRTY_LOG_RING.
> + */
> +struct kvm_dirty_gfn {
> + __u32 flags;
> + __u32 slot;
> + __u64 offset;
> +};
> +
> #endif /* __LINUX_KVM_H */
> diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> index 5468db7dd674..3c715f6491c1 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -18,6 +18,7 @@
> /x86_64/sync_regs_test
> /x86_64/tsc_msrs_test
> /x86_64/user_msr_test
> +/x86_64/user_vmcall_test
> /x86_64/vmx_apic_access_test
> /x86_64/vmx_close_while_nested_test
> /x86_64/vmx_dirty_log_test
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 4febf4d5ead9..c7468eb1dcf0 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -59,6 +59,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/xss_msr_test
> TEST_GEN_PROGS_x86_64 += x86_64/debug_regs
> TEST_GEN_PROGS_x86_64 += x86_64/tsc_msrs_test
> TEST_GEN_PROGS_x86_64 += x86_64/user_msr_test
> +TEST_GEN_PROGS_x86_64 += x86_64/user_vmcall_test
> TEST_GEN_PROGS_x86_64 += demand_paging_test
> TEST_GEN_PROGS_x86_64 += dirty_log_test
> TEST_GEN_PROGS_x86_64 += dirty_log_perf_test
> diff --git a/tools/testing/selftests/kvm/x86_64/user_vmcall_test.c b/tools/testing/selftests/kvm/x86_64/user_vmcall_test.c
> new file mode 100644
> index 000000000000..e6286e5d5294
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/user_vmcall_test.c
> @@ -0,0 +1,94 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * user_vmcall_test
> + *
> + * Copyright © 2020 Amazon.com, Inc. or its affiliates.
> + *
> + * Userspace hypercall testing
> + */
> +
> +#include "test_util.h"
> +#include "kvm_util.h"
> +#include "processor.h"
> +
> +#define VCPU_ID 5
> +
> +static struct kvm_vm *vm;
> +
> +#define ARGVALUE(x) (0xdeadbeef5a5a0000UL + x)
> +#define RETVALUE 0xcafef00dfbfbffffUL
> +
> +static void guest_code(void)
> +{
> + unsigned long rax = ARGVALUE(1);
> + unsigned long rdi = ARGVALUE(2);
> + unsigned long rsi = ARGVALUE(3);
> + unsigned long rdx = ARGVALUE(4);
> + register unsigned long r10 __asm__("r10") = ARGVALUE(5);
> + register unsigned long r8 __asm__("r8") = ARGVALUE(6);
> + register unsigned long r9 __asm__("r9") = ARGVALUE(7);
> + __asm__ __volatile__("vmcall" :
> + "=a"(rax) :
> + "a"(rax), "D"(rdi), "S"(rsi), "d"(rdx),
> + "r"(r10), "r"(r8), "r"(r9));
> + GUEST_ASSERT(rax == RETVALUE);
> + GUEST_DONE();
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + if (!kvm_check_cap(KVM_CAP_X86_USER_SPACE_HYPERCALL)) {
> + print_skip("KVM_CAP_X86_USER_SPACE_HYPERCALL not available");
> + exit(KSFT_SKIP);
> + }
> +
> + struct kvm_enable_cap cap = {
> + .cap = KVM_CAP_X86_USER_SPACE_HYPERCALL,
> + .flags = 0,
> + .args[0] = 1,
> + };
> + vm = vm_create_default(VCPU_ID, 0, (void *) guest_code);
> + vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid());
> +
> + vm_enable_cap(vm, &cap);
> +
> + for (;;) {
> + volatile struct kvm_run *run = vcpu_state(vm, VCPU_ID);
> + struct ucall uc;
> +
> + vcpu_run(vm, VCPU_ID);
> +
> + if (run->exit_reason == KVM_EXIT_HYPERCALL) {
> + ASSERT_EQ(run->hypercall.longmode, 1);
> + ASSERT_EQ(run->hypercall.nr, ARGVALUE(1));
> + ASSERT_EQ(run->hypercall.args[0], ARGVALUE(2));
> + ASSERT_EQ(run->hypercall.args[1], ARGVALUE(3));
> + ASSERT_EQ(run->hypercall.args[2], ARGVALUE(4));
> + ASSERT_EQ(run->hypercall.args[3], ARGVALUE(5));
> + ASSERT_EQ(run->hypercall.args[4], ARGVALUE(6));
> + ASSERT_EQ(run->hypercall.args[5], ARGVALUE(7));
> + run->hypercall.ret = RETVALUE;
> + continue;
> + }
> +
> + TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
> + "Got exit_reason other than KVM_EXIT_IO: %u (%s)\n",
> + run->exit_reason,
> + exit_reason_str(run->exit_reason));
> +
> + switch (get_ucall(vm, VCPU_ID, &uc)) {
> + case UCALL_ABORT:
> + TEST_FAIL("%s", (const char *)uc.args[0]);
> + /* NOT REACHED */
> + case UCALL_SYNC:
> + break;
> + case UCALL_DONE:
> + goto done;
> + default:
> + TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd);
> + }
> + }
> +done:
> + kvm_vm_free(vm);
> + return 0;
> +}
>
next prev parent reply other threads:[~2020-11-30 18:38 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-28 14:20 [PATCH] KVM: x86: Reinstate userspace hypercall support David Woodhouse
2020-11-28 17:50 ` David Woodhouse
2020-11-28 23:16 ` Joao Martins
2020-11-29 2:05 ` David Woodhouse
2020-11-30 18:36 ` Paolo Bonzini [this message]
2020-11-30 18:59 ` David Woodhouse
2020-11-30 20:46 ` Sean Christopherson
2020-11-30 22:42 ` David Woodhouse
2020-12-02 9:20 ` Paolo Bonzini
2020-12-02 14:23 ` Joao Martins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dc2d4330-e45d-641f-226b-005a477efd22@redhat.com \
--to=pbonzini@redhat.com \
--cc=dwmw2@infradead.org \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox