From: Marc Zyngier <maz@kernel.org>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Joel Fernandes <joelaf@google.com>,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
Suleiman Souhlal <suleiman@google.com>,
Will Deacon <will@kernel.org>,
kvmarm@lists.cs.columbia.edu,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCHv2 4/4] arm64: add host pv-vcpu-state support
Date: Mon, 12 Jul 2021 17:24:42 +0100 [thread overview]
Message-ID: <874kcz33g5.wl-maz@kernel.org> (raw)
In-Reply-To: <20210709043713.887098-5-senozhatsky@chromium.org>
On Fri, 09 Jul 2021 05:37:13 +0100,
Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
>
> Add PV-vcpu-state support bits to the host. Host uses the guest
> PV-state per-CPU pointers to update the VCPU state each time
> it kvm_arch_vcpu_load() or kvm_arch_vcpu_put() the VCPU, so
> that guest scheduler can become aware of the fact that not
> all VCPUs are always available. Currently guest scheduler
> on amr64 always assumes that all CPUs are available because
arm64
> vcpu_is_preempted() is not implemented on arm64.
>
> - schbench -t 3 -m 3 -p 4096
>
> Latency percentiles (usec)
> BASE
> ================================================
> 50.0th: 1 (3556427 samples)
> 75.0th: 13 (879210 samples)
> 90.0th: 15 (893311 samples)
> 95.0th: 18 (159594 samples)
> *99.0th: 118 (224187 samples)
> 99.5th: 691 (28555 samples)
> 99.9th: 7384 (23076 samples)
> min=1, max=104218
> avg worker transfer: 25192.00 ops/sec 98.41MB/s
>
> PATCHED
> ================================================
> 50.0th: 1 (3507010 samples)
> 75.0th: 13 (1635775 samples)
> 90.0th: 16 (901271 samples)
> 95.0th: 24 (281051 samples)
> *99.0th: 114 (255581 samples)
> 99.5th: 382 (33051 samples)
> 99.9th: 6392 (26592 samples)
> min=1, max=83877
> avg worker transfer: 28613.39 ops/sec 111.77MB/s
>
> - perf bench sched all
>
> ops/sec
> BASE PATCHED
> ================================================
> 33452 36485
> 33541 39405
> 33365 36858
> 33455 38047
> 33449 37866
> 33616 34922
> 33479 34388
> 33594 37203
> 33458 35363
> 33704 35180
>
> Student's T-test
>
> N Min Max Median Avg Stddev
> base 10 33365 33704 33479 33511.3 100.92467
> patched 10 34388 39405 36858 36571.7 1607.454
> Difference at 95.0% confidence
> 3060.4 +/- 1070.09
> 9.13244% +/- 3.19321%
> (Student's t, pooled s = 1138.88)
>
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
> arch/arm64/include/asm/kvm_host.h | 23 +++++++++++
> arch/arm64/kvm/Makefile | 3 +-
> arch/arm64/kvm/arm.c | 3 ++
> arch/arm64/kvm/hypercalls.c | 11 ++++++
> arch/arm64/kvm/pv-vcpu-state.c | 64 +++++++++++++++++++++++++++++++
> 5 files changed, 103 insertions(+), 1 deletion(-)
> create mode 100644 arch/arm64/kvm/pv-vcpu-state.c
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 41911585ae0c..e782f4d0c916 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -381,6 +381,12 @@ struct kvm_vcpu_arch {
> u64 last_steal;
> gpa_t base;
> } steal;
> +
> + /* PV state of the VCPU */
> + struct {
> + gpa_t base;
> + struct gfn_to_hva_cache ghc;
Please stay consistent with what we use of steal time accounting
(i.e. don't use gfn_to_hva_cache, but directly use a gpa with
kvm_put_guest()). If you can demonstrate that there is an unacceptable
overhead in doing so, then both steal time and preemption will be
updated at the same time.
> + } vcpu_state;
> };
>
> /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
> @@ -695,6 +701,23 @@ static inline bool kvm_arm_is_pvtime_enabled(struct kvm_vcpu_arch *vcpu_arch)
> return (vcpu_arch->steal.base != GPA_INVALID);
> }
>
> +int kvm_init_vcpu_state(struct kvm_vcpu *vcpu, gfn_t addr);
> +int kvm_release_vcpu_state(struct kvm_vcpu *vcpu);
> +
> +static inline void kvm_arm_vcpu_state_init(struct kvm_vcpu_arch *vcpu_arch)
> +{
> + vcpu_arch->vcpu_state.base = GPA_INVALID;
> + memset(&vcpu_arch->vcpu_state.ghc, 0, sizeof(struct gfn_to_hva_cache));
Does it really need to be inlined?
> +}
> +
> +static inline bool
> +kvm_arm_is_vcpu_state_enabled(struct kvm_vcpu_arch *vcpu_arch)
> +{
> + return (vcpu_arch->vcpu_state.base != GPA_INVALID);
> +}
> +
> +void kvm_update_vcpu_preempted(struct kvm_vcpu *vcpu, bool preempted);
> +
> void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
>
> struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 989bb5dad2c8..2a3ee82c6d90 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -12,7 +12,8 @@ obj-$(CONFIG_KVM) += hyp/
>
> kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
> $(KVM)/vfio.o $(KVM)/irqchip.o $(KVM)/binary_stats.o \
> - arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \
> + arm.o mmu.o mmio.o psci.o perf.o hypercalls.o \
> + pvtime.o pv-vcpu-state.o \
> inject_fault.o va_layout.o handle_exit.o \
> guest.o debug.o reset.o sys_regs.o \
> vgic-sys-reg-v3.o fpsimd.o pmu.o \
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index e9a2b8f27792..43e995c9fddb 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -332,6 +332,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> kvm_arm_reset_debug_ptr(vcpu);
>
> kvm_arm_pvtime_vcpu_init(&vcpu->arch);
> + kvm_arm_vcpu_state_init(&vcpu->arch);
>
> vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
>
> @@ -429,10 +430,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> if (vcpu_has_ptrauth(vcpu))
> vcpu_ptrauth_disable(vcpu);
> kvm_arch_vcpu_load_debug_state_flags(vcpu);
> + kvm_update_vcpu_preempted(vcpu, false);
> }
>
> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> {
> + kvm_update_vcpu_preempted(vcpu, true);
This doesn't look right. With this, you are now telling the guest that
a vcpu that is blocked on WFI is preempted. This really isn't the
case, as it has voluntarily entered a low-power mode while waiting for
an interrupt. Indeed, the vcpu isn't running. A physical CPU wouldn't
be running either.
> kvm_arch_vcpu_put_debug_state_flags(vcpu);
> kvm_arch_vcpu_put_fp(vcpu);
> if (has_vhe())
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index 30da78f72b3b..95bcf86e0b6f 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -110,6 +110,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> case ARM_SMCCC_HV_PV_TIME_FEATURES:
> val[0] = SMCCC_RET_SUCCESS;
> break;
> + case ARM_SMCCC_HV_PV_VCPU_STATE_FEATURES:
> + val[0] = SMCCC_RET_SUCCESS;
> + break;
> }
> break;
> case ARM_SMCCC_HV_PV_TIME_FEATURES:
> @@ -139,6 +142,14 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> case ARM_SMCCC_TRNG_RND32:
> case ARM_SMCCC_TRNG_RND64:
> return kvm_trng_call(vcpu);
> + case ARM_SMCCC_HV_PV_VCPU_STATE_INIT:
> + if (kvm_init_vcpu_state(vcpu, smccc_get_arg1(vcpu)) == 0)
> + val[0] = SMCCC_RET_SUCCESS;
> + break;
> + case ARM_SMCCC_HV_PV_VCPU_STATE_RELEASE:
> + if (kvm_release_vcpu_state(vcpu) == 0)
> + val[0] = SMCCC_RET_SUCCESS;
> + break;
> default:
> return kvm_psci_call(vcpu);
> }
> diff --git a/arch/arm64/kvm/pv-vcpu-state.c b/arch/arm64/kvm/pv-vcpu-state.c
> new file mode 100644
> index 000000000000..8496bb2a5966
> --- /dev/null
> +++ b/arch/arm64/kvm/pv-vcpu-state.c
> @@ -0,0 +1,64 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include <linux/arm-smccc.h>
> +#include <linux/kvm_host.h>
> +
> +#include <asm/kvm_mmu.h>
> +#include <asm/paravirt.h>
> +
> +#include <kvm/arm_psci.h>
Why this include?
> +
> +int kvm_init_vcpu_state(struct kvm_vcpu *vcpu, gpa_t addr)
> +{
> + struct kvm *kvm = vcpu->kvm;
> + int ret;
> + u64 idx;
> +
> + if (kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
> + return 0;
> +
> + idx = srcu_read_lock(&kvm->srcu);
> + ret = kvm_gfn_to_hva_cache_init(vcpu->kvm,
> + &vcpu->arch.vcpu_state.ghc,
> + addr,
> + sizeof(struct vcpu_state));
> + srcu_read_unlock(&kvm->srcu, idx);
> +
> + if (!ret)
> + vcpu->arch.vcpu_state.base = addr;
> + return ret;
> +}
> +
> +int kvm_release_vcpu_state(struct kvm_vcpu *vcpu)
> +{
> + if (!kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
> + return 0;
> +
> + kvm_arm_vcpu_state_init(&vcpu->arch);
> + return 0;
> +}
> +
> +void kvm_update_vcpu_preempted(struct kvm_vcpu *vcpu, bool preempted)
> +{
> + struct kvm *kvm = vcpu->kvm;
> + u64 idx;
> +
> + if (!kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
> + return;
> +
> + /*
> + * This function is called from atomic context, so we need to
> + * disable page faults. kvm_write_guest_cached() will call
> + * might_fault().
> + */
> + pagefault_disable();
> + /*
> + * Need to take the SRCU lock because kvm_write_guest_offset_cached()
> + * calls kvm_memslots();
> + */
> + idx = srcu_read_lock(&kvm->srcu);
> + kvm_write_guest_cached(kvm, &vcpu->arch.vcpu_state.ghc,
> + &preempted, sizeof(bool));
> + srcu_read_unlock(&kvm->srcu, idx);
> + pagefault_enable();
> +}
Before rushing into an implementation, this really could do with some
specification:
- where is the memory allocated?
- what is the lifetime of the memory?
- what is its expected memory type?
- what is the coherency model (what if the guest runs with caching
disabled, for example)?
- is the functionality preserved across a VM reset?
- what is the strategy w.r.t live migration?
- can userspace detect that the feature is implemented?
- can userspace prevent the feature from being used?
- where is the documentation?
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Will Deacon <will@kernel.org>,
Suleiman Souhlal <suleiman@google.com>,
Joel Fernandes <joelaf@google.com>,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org
Subject: Re: [PATCHv2 4/4] arm64: add host pv-vcpu-state support
Date: Mon, 12 Jul 2021 17:24:42 +0100 [thread overview]
Message-ID: <874kcz33g5.wl-maz@kernel.org> (raw)
In-Reply-To: <20210709043713.887098-5-senozhatsky@chromium.org>
On Fri, 09 Jul 2021 05:37:13 +0100,
Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
>
> Add PV-vcpu-state support bits to the host. Host uses the guest
> PV-state per-CPU pointers to update the VCPU state each time
> it kvm_arch_vcpu_load() or kvm_arch_vcpu_put() the VCPU, so
> that guest scheduler can become aware of the fact that not
> all VCPUs are always available. Currently guest scheduler
> on amr64 always assumes that all CPUs are available because
arm64
> vcpu_is_preempted() is not implemented on arm64.
>
> - schbench -t 3 -m 3 -p 4096
>
> Latency percentiles (usec)
> BASE
> ================================================
> 50.0th: 1 (3556427 samples)
> 75.0th: 13 (879210 samples)
> 90.0th: 15 (893311 samples)
> 95.0th: 18 (159594 samples)
> *99.0th: 118 (224187 samples)
> 99.5th: 691 (28555 samples)
> 99.9th: 7384 (23076 samples)
> min=1, max=104218
> avg worker transfer: 25192.00 ops/sec 98.41MB/s
>
> PATCHED
> ================================================
> 50.0th: 1 (3507010 samples)
> 75.0th: 13 (1635775 samples)
> 90.0th: 16 (901271 samples)
> 95.0th: 24 (281051 samples)
> *99.0th: 114 (255581 samples)
> 99.5th: 382 (33051 samples)
> 99.9th: 6392 (26592 samples)
> min=1, max=83877
> avg worker transfer: 28613.39 ops/sec 111.77MB/s
>
> - perf bench sched all
>
> ops/sec
> BASE PATCHED
> ================================================
> 33452 36485
> 33541 39405
> 33365 36858
> 33455 38047
> 33449 37866
> 33616 34922
> 33479 34388
> 33594 37203
> 33458 35363
> 33704 35180
>
> Student's T-test
>
> N Min Max Median Avg Stddev
> base 10 33365 33704 33479 33511.3 100.92467
> patched 10 34388 39405 36858 36571.7 1607.454
> Difference at 95.0% confidence
> 3060.4 +/- 1070.09
> 9.13244% +/- 3.19321%
> (Student's t, pooled s = 1138.88)
>
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
> arch/arm64/include/asm/kvm_host.h | 23 +++++++++++
> arch/arm64/kvm/Makefile | 3 +-
> arch/arm64/kvm/arm.c | 3 ++
> arch/arm64/kvm/hypercalls.c | 11 ++++++
> arch/arm64/kvm/pv-vcpu-state.c | 64 +++++++++++++++++++++++++++++++
> 5 files changed, 103 insertions(+), 1 deletion(-)
> create mode 100644 arch/arm64/kvm/pv-vcpu-state.c
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 41911585ae0c..e782f4d0c916 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -381,6 +381,12 @@ struct kvm_vcpu_arch {
> u64 last_steal;
> gpa_t base;
> } steal;
> +
> + /* PV state of the VCPU */
> + struct {
> + gpa_t base;
> + struct gfn_to_hva_cache ghc;
Please stay consistent with what we use of steal time accounting
(i.e. don't use gfn_to_hva_cache, but directly use a gpa with
kvm_put_guest()). If you can demonstrate that there is an unacceptable
overhead in doing so, then both steal time and preemption will be
updated at the same time.
> + } vcpu_state;
> };
>
> /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
> @@ -695,6 +701,23 @@ static inline bool kvm_arm_is_pvtime_enabled(struct kvm_vcpu_arch *vcpu_arch)
> return (vcpu_arch->steal.base != GPA_INVALID);
> }
>
> +int kvm_init_vcpu_state(struct kvm_vcpu *vcpu, gfn_t addr);
> +int kvm_release_vcpu_state(struct kvm_vcpu *vcpu);
> +
> +static inline void kvm_arm_vcpu_state_init(struct kvm_vcpu_arch *vcpu_arch)
> +{
> + vcpu_arch->vcpu_state.base = GPA_INVALID;
> + memset(&vcpu_arch->vcpu_state.ghc, 0, sizeof(struct gfn_to_hva_cache));
Does it really need to be inlined?
> +}
> +
> +static inline bool
> +kvm_arm_is_vcpu_state_enabled(struct kvm_vcpu_arch *vcpu_arch)
> +{
> + return (vcpu_arch->vcpu_state.base != GPA_INVALID);
> +}
> +
> +void kvm_update_vcpu_preempted(struct kvm_vcpu *vcpu, bool preempted);
> +
> void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
>
> struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 989bb5dad2c8..2a3ee82c6d90 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -12,7 +12,8 @@ obj-$(CONFIG_KVM) += hyp/
>
> kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
> $(KVM)/vfio.o $(KVM)/irqchip.o $(KVM)/binary_stats.o \
> - arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \
> + arm.o mmu.o mmio.o psci.o perf.o hypercalls.o \
> + pvtime.o pv-vcpu-state.o \
> inject_fault.o va_layout.o handle_exit.o \
> guest.o debug.o reset.o sys_regs.o \
> vgic-sys-reg-v3.o fpsimd.o pmu.o \
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index e9a2b8f27792..43e995c9fddb 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -332,6 +332,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> kvm_arm_reset_debug_ptr(vcpu);
>
> kvm_arm_pvtime_vcpu_init(&vcpu->arch);
> + kvm_arm_vcpu_state_init(&vcpu->arch);
>
> vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
>
> @@ -429,10 +430,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> if (vcpu_has_ptrauth(vcpu))
> vcpu_ptrauth_disable(vcpu);
> kvm_arch_vcpu_load_debug_state_flags(vcpu);
> + kvm_update_vcpu_preempted(vcpu, false);
> }
>
> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> {
> + kvm_update_vcpu_preempted(vcpu, true);
This doesn't look right. With this, you are now telling the guest that
a vcpu that is blocked on WFI is preempted. This really isn't the
case, as it has voluntarily entered a low-power mode while waiting for
an interrupt. Indeed, the vcpu isn't running. A physical CPU wouldn't
be running either.
> kvm_arch_vcpu_put_debug_state_flags(vcpu);
> kvm_arch_vcpu_put_fp(vcpu);
> if (has_vhe())
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index 30da78f72b3b..95bcf86e0b6f 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -110,6 +110,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> case ARM_SMCCC_HV_PV_TIME_FEATURES:
> val[0] = SMCCC_RET_SUCCESS;
> break;
> + case ARM_SMCCC_HV_PV_VCPU_STATE_FEATURES:
> + val[0] = SMCCC_RET_SUCCESS;
> + break;
> }
> break;
> case ARM_SMCCC_HV_PV_TIME_FEATURES:
> @@ -139,6 +142,14 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> case ARM_SMCCC_TRNG_RND32:
> case ARM_SMCCC_TRNG_RND64:
> return kvm_trng_call(vcpu);
> + case ARM_SMCCC_HV_PV_VCPU_STATE_INIT:
> + if (kvm_init_vcpu_state(vcpu, smccc_get_arg1(vcpu)) == 0)
> + val[0] = SMCCC_RET_SUCCESS;
> + break;
> + case ARM_SMCCC_HV_PV_VCPU_STATE_RELEASE:
> + if (kvm_release_vcpu_state(vcpu) == 0)
> + val[0] = SMCCC_RET_SUCCESS;
> + break;
> default:
> return kvm_psci_call(vcpu);
> }
> diff --git a/arch/arm64/kvm/pv-vcpu-state.c b/arch/arm64/kvm/pv-vcpu-state.c
> new file mode 100644
> index 000000000000..8496bb2a5966
> --- /dev/null
> +++ b/arch/arm64/kvm/pv-vcpu-state.c
> @@ -0,0 +1,64 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include <linux/arm-smccc.h>
> +#include <linux/kvm_host.h>
> +
> +#include <asm/kvm_mmu.h>
> +#include <asm/paravirt.h>
> +
> +#include <kvm/arm_psci.h>
Why this include?
> +
> +int kvm_init_vcpu_state(struct kvm_vcpu *vcpu, gpa_t addr)
> +{
> + struct kvm *kvm = vcpu->kvm;
> + int ret;
> + u64 idx;
> +
> + if (kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
> + return 0;
> +
> + idx = srcu_read_lock(&kvm->srcu);
> + ret = kvm_gfn_to_hva_cache_init(vcpu->kvm,
> + &vcpu->arch.vcpu_state.ghc,
> + addr,
> + sizeof(struct vcpu_state));
> + srcu_read_unlock(&kvm->srcu, idx);
> +
> + if (!ret)
> + vcpu->arch.vcpu_state.base = addr;
> + return ret;
> +}
> +
> +int kvm_release_vcpu_state(struct kvm_vcpu *vcpu)
> +{
> + if (!kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
> + return 0;
> +
> + kvm_arm_vcpu_state_init(&vcpu->arch);
> + return 0;
> +}
> +
> +void kvm_update_vcpu_preempted(struct kvm_vcpu *vcpu, bool preempted)
> +{
> + struct kvm *kvm = vcpu->kvm;
> + u64 idx;
> +
> + if (!kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
> + return;
> +
> + /*
> + * This function is called from atomic context, so we need to
> + * disable page faults. kvm_write_guest_cached() will call
> + * might_fault().
> + */
> + pagefault_disable();
> + /*
> + * Need to take the SRCU lock because kvm_write_guest_offset_cached()
> + * calls kvm_memslots();
> + */
> + idx = srcu_read_lock(&kvm->srcu);
> + kvm_write_guest_cached(kvm, &vcpu->arch.vcpu_state.ghc,
> + &preempted, sizeof(bool));
> + srcu_read_unlock(&kvm->srcu, idx);
> + pagefault_enable();
> +}
Before rushing into an implementation, this really could do with some
specification:
- where is the memory allocated?
- what is the lifetime of the memory?
- what is its expected memory type?
- what is the coherency model (what if the guest runs with caching
disabled, for example)?
- is the functionality preserved across a VM reset?
- what is the strategy w.r.t live migration?
- can userspace detect that the feature is implemented?
- can userspace prevent the feature from being used?
- where is the documentation?
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Will Deacon <will@kernel.org>,
Suleiman Souhlal <suleiman@google.com>,
Joel Fernandes <joelaf@google.com>,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org
Subject: Re: [PATCHv2 4/4] arm64: add host pv-vcpu-state support
Date: Mon, 12 Jul 2021 17:24:42 +0100 [thread overview]
Message-ID: <874kcz33g5.wl-maz@kernel.org> (raw)
In-Reply-To: <20210709043713.887098-5-senozhatsky@chromium.org>
On Fri, 09 Jul 2021 05:37:13 +0100,
Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
>
> Add PV-vcpu-state support bits to the host. Host uses the guest
> PV-state per-CPU pointers to update the VCPU state each time
> it kvm_arch_vcpu_load() or kvm_arch_vcpu_put() the VCPU, so
> that guest scheduler can become aware of the fact that not
> all VCPUs are always available. Currently guest scheduler
> on amr64 always assumes that all CPUs are available because
arm64
> vcpu_is_preempted() is not implemented on arm64.
>
> - schbench -t 3 -m 3 -p 4096
>
> Latency percentiles (usec)
> BASE
> ================================================
> 50.0th: 1 (3556427 samples)
> 75.0th: 13 (879210 samples)
> 90.0th: 15 (893311 samples)
> 95.0th: 18 (159594 samples)
> *99.0th: 118 (224187 samples)
> 99.5th: 691 (28555 samples)
> 99.9th: 7384 (23076 samples)
> min=1, max=104218
> avg worker transfer: 25192.00 ops/sec 98.41MB/s
>
> PATCHED
> ================================================
> 50.0th: 1 (3507010 samples)
> 75.0th: 13 (1635775 samples)
> 90.0th: 16 (901271 samples)
> 95.0th: 24 (281051 samples)
> *99.0th: 114 (255581 samples)
> 99.5th: 382 (33051 samples)
> 99.9th: 6392 (26592 samples)
> min=1, max=83877
> avg worker transfer: 28613.39 ops/sec 111.77MB/s
>
> - perf bench sched all
>
> ops/sec
> BASE PATCHED
> ================================================
> 33452 36485
> 33541 39405
> 33365 36858
> 33455 38047
> 33449 37866
> 33616 34922
> 33479 34388
> 33594 37203
> 33458 35363
> 33704 35180
>
> Student's T-test
>
> N Min Max Median Avg Stddev
> base 10 33365 33704 33479 33511.3 100.92467
> patched 10 34388 39405 36858 36571.7 1607.454
> Difference at 95.0% confidence
> 3060.4 +/- 1070.09
> 9.13244% +/- 3.19321%
> (Student's t, pooled s = 1138.88)
>
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
> arch/arm64/include/asm/kvm_host.h | 23 +++++++++++
> arch/arm64/kvm/Makefile | 3 +-
> arch/arm64/kvm/arm.c | 3 ++
> arch/arm64/kvm/hypercalls.c | 11 ++++++
> arch/arm64/kvm/pv-vcpu-state.c | 64 +++++++++++++++++++++++++++++++
> 5 files changed, 103 insertions(+), 1 deletion(-)
> create mode 100644 arch/arm64/kvm/pv-vcpu-state.c
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 41911585ae0c..e782f4d0c916 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -381,6 +381,12 @@ struct kvm_vcpu_arch {
> u64 last_steal;
> gpa_t base;
> } steal;
> +
> + /* PV state of the VCPU */
> + struct {
> + gpa_t base;
> + struct gfn_to_hva_cache ghc;
Please stay consistent with what we use of steal time accounting
(i.e. don't use gfn_to_hva_cache, but directly use a gpa with
kvm_put_guest()). If you can demonstrate that there is an unacceptable
overhead in doing so, then both steal time and preemption will be
updated at the same time.
> + } vcpu_state;
> };
>
> /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
> @@ -695,6 +701,23 @@ static inline bool kvm_arm_is_pvtime_enabled(struct kvm_vcpu_arch *vcpu_arch)
> return (vcpu_arch->steal.base != GPA_INVALID);
> }
>
> +int kvm_init_vcpu_state(struct kvm_vcpu *vcpu, gfn_t addr);
> +int kvm_release_vcpu_state(struct kvm_vcpu *vcpu);
> +
> +static inline void kvm_arm_vcpu_state_init(struct kvm_vcpu_arch *vcpu_arch)
> +{
> + vcpu_arch->vcpu_state.base = GPA_INVALID;
> + memset(&vcpu_arch->vcpu_state.ghc, 0, sizeof(struct gfn_to_hva_cache));
Does it really need to be inlined?
> +}
> +
> +static inline bool
> +kvm_arm_is_vcpu_state_enabled(struct kvm_vcpu_arch *vcpu_arch)
> +{
> + return (vcpu_arch->vcpu_state.base != GPA_INVALID);
> +}
> +
> +void kvm_update_vcpu_preempted(struct kvm_vcpu *vcpu, bool preempted);
> +
> void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
>
> struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 989bb5dad2c8..2a3ee82c6d90 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -12,7 +12,8 @@ obj-$(CONFIG_KVM) += hyp/
>
> kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
> $(KVM)/vfio.o $(KVM)/irqchip.o $(KVM)/binary_stats.o \
> - arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \
> + arm.o mmu.o mmio.o psci.o perf.o hypercalls.o \
> + pvtime.o pv-vcpu-state.o \
> inject_fault.o va_layout.o handle_exit.o \
> guest.o debug.o reset.o sys_regs.o \
> vgic-sys-reg-v3.o fpsimd.o pmu.o \
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index e9a2b8f27792..43e995c9fddb 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -332,6 +332,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> kvm_arm_reset_debug_ptr(vcpu);
>
> kvm_arm_pvtime_vcpu_init(&vcpu->arch);
> + kvm_arm_vcpu_state_init(&vcpu->arch);
>
> vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
>
> @@ -429,10 +430,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> if (vcpu_has_ptrauth(vcpu))
> vcpu_ptrauth_disable(vcpu);
> kvm_arch_vcpu_load_debug_state_flags(vcpu);
> + kvm_update_vcpu_preempted(vcpu, false);
> }
>
> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> {
> + kvm_update_vcpu_preempted(vcpu, true);
This doesn't look right. With this, you are now telling the guest that
a vcpu that is blocked on WFI is preempted. This really isn't the
case, as it has voluntarily entered a low-power mode while waiting for
an interrupt. Indeed, the vcpu isn't running. A physical CPU wouldn't
be running either.
> kvm_arch_vcpu_put_debug_state_flags(vcpu);
> kvm_arch_vcpu_put_fp(vcpu);
> if (has_vhe())
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index 30da78f72b3b..95bcf86e0b6f 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -110,6 +110,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> case ARM_SMCCC_HV_PV_TIME_FEATURES:
> val[0] = SMCCC_RET_SUCCESS;
> break;
> + case ARM_SMCCC_HV_PV_VCPU_STATE_FEATURES:
> + val[0] = SMCCC_RET_SUCCESS;
> + break;
> }
> break;
> case ARM_SMCCC_HV_PV_TIME_FEATURES:
> @@ -139,6 +142,14 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> case ARM_SMCCC_TRNG_RND32:
> case ARM_SMCCC_TRNG_RND64:
> return kvm_trng_call(vcpu);
> + case ARM_SMCCC_HV_PV_VCPU_STATE_INIT:
> + if (kvm_init_vcpu_state(vcpu, smccc_get_arg1(vcpu)) == 0)
> + val[0] = SMCCC_RET_SUCCESS;
> + break;
> + case ARM_SMCCC_HV_PV_VCPU_STATE_RELEASE:
> + if (kvm_release_vcpu_state(vcpu) == 0)
> + val[0] = SMCCC_RET_SUCCESS;
> + break;
> default:
> return kvm_psci_call(vcpu);
> }
> diff --git a/arch/arm64/kvm/pv-vcpu-state.c b/arch/arm64/kvm/pv-vcpu-state.c
> new file mode 100644
> index 000000000000..8496bb2a5966
> --- /dev/null
> +++ b/arch/arm64/kvm/pv-vcpu-state.c
> @@ -0,0 +1,64 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include <linux/arm-smccc.h>
> +#include <linux/kvm_host.h>
> +
> +#include <asm/kvm_mmu.h>
> +#include <asm/paravirt.h>
> +
> +#include <kvm/arm_psci.h>
Why this include?
> +
> +int kvm_init_vcpu_state(struct kvm_vcpu *vcpu, gpa_t addr)
> +{
> + struct kvm *kvm = vcpu->kvm;
> + int ret;
> + u64 idx;
> +
> + if (kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
> + return 0;
> +
> + idx = srcu_read_lock(&kvm->srcu);
> + ret = kvm_gfn_to_hva_cache_init(vcpu->kvm,
> + &vcpu->arch.vcpu_state.ghc,
> + addr,
> + sizeof(struct vcpu_state));
> + srcu_read_unlock(&kvm->srcu, idx);
> +
> + if (!ret)
> + vcpu->arch.vcpu_state.base = addr;
> + return ret;
> +}
> +
> +int kvm_release_vcpu_state(struct kvm_vcpu *vcpu)
> +{
> + if (!kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
> + return 0;
> +
> + kvm_arm_vcpu_state_init(&vcpu->arch);
> + return 0;
> +}
> +
> +void kvm_update_vcpu_preempted(struct kvm_vcpu *vcpu, bool preempted)
> +{
> + struct kvm *kvm = vcpu->kvm;
> + u64 idx;
> +
> + if (!kvm_arm_is_vcpu_state_enabled(&vcpu->arch))
> + return;
> +
> + /*
> + * This function is called from atomic context, so we need to
> + * disable page faults. kvm_write_guest_cached() will call
> + * might_fault().
> + */
> + pagefault_disable();
> + /*
> + * Need to take the SRCU lock because kvm_write_guest_offset_cached()
> + * calls kvm_memslots();
> + */
> + idx = srcu_read_lock(&kvm->srcu);
> + kvm_write_guest_cached(kvm, &vcpu->arch.vcpu_state.ghc,
> + &preempted, sizeof(bool));
> + srcu_read_unlock(&kvm->srcu, idx);
> + pagefault_enable();
> +}
Before rushing into an implementation, this really could do with some
specification:
- where is the memory allocated?
- what is the lifetime of the memory?
- what is its expected memory type?
- what is the coherency model (what if the guest runs with caching
disabled, for example)?
- is the functionality preserved across a VM reset?
- what is the strategy w.r.t live migration?
- can userspace detect that the feature is implemented?
- can userspace prevent the feature from being used?
- where is the documentation?
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2021-07-12 16:24 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-09 4:37 [PATCHv2 0/4] arm64:kvm: teach guest sched that VCPUs can be preempted Sergey Senozhatsky
2021-07-09 4:37 ` Sergey Senozhatsky
2021-07-09 4:37 ` Sergey Senozhatsky
2021-07-09 4:37 ` [PATCHv2 1/4] arm64: smccc: Add SMCCC pv-vcpu-state function call IDs Sergey Senozhatsky
2021-07-09 4:37 ` Sergey Senozhatsky
2021-07-09 4:37 ` Sergey Senozhatsky
2021-07-12 14:22 ` Marc Zyngier
2021-07-12 14:22 ` Marc Zyngier
2021-07-12 14:22 ` Marc Zyngier
2021-07-09 4:37 ` [PATCHv2 2/4] arm64: add guest pvstate support Sergey Senozhatsky
2021-07-09 4:37 ` Sergey Senozhatsky
2021-07-09 4:37 ` Sergey Senozhatsky
2021-07-09 7:39 ` David Edmondson
2021-07-09 7:39 ` David Edmondson
2021-07-09 7:39 ` David Edmondson
2021-07-09 7:52 ` Sergey Senozhatsky
2021-07-09 7:52 ` Sergey Senozhatsky
2021-07-09 7:52 ` Sergey Senozhatsky
2021-07-09 18:58 ` Joel Fernandes
2021-07-09 18:58 ` Joel Fernandes
2021-07-09 18:58 ` Joel Fernandes
2021-07-09 21:53 ` Sergey Senozhatsky
2021-07-09 21:53 ` Sergey Senozhatsky
2021-07-09 21:53 ` Sergey Senozhatsky
2021-07-11 16:58 ` Joel Fernandes
2021-07-11 16:58 ` Joel Fernandes
2021-07-11 16:58 ` Joel Fernandes
2021-07-12 15:42 ` Marc Zyngier
2021-07-12 15:42 ` Marc Zyngier
2021-07-12 15:42 ` Marc Zyngier
2021-07-21 2:05 ` Sergey Senozhatsky
2021-07-21 2:05 ` Sergey Senozhatsky
2021-07-21 2:05 ` Sergey Senozhatsky
2021-07-21 8:22 ` Marc Zyngier
2021-07-21 8:22 ` Marc Zyngier
2021-07-21 8:22 ` Marc Zyngier
2021-07-21 8:47 ` Sergey Senozhatsky
2021-07-21 8:47 ` Sergey Senozhatsky
2021-07-21 8:47 ` Sergey Senozhatsky
2021-07-21 10:16 ` Marc Zyngier
2021-07-21 10:16 ` Marc Zyngier
2021-07-21 10:16 ` Marc Zyngier
2021-07-09 4:37 ` [PATCHv2 3/4] arm64: do not use dummy vcpu_is_preempted() Sergey Senozhatsky
2021-07-09 4:37 ` Sergey Senozhatsky
2021-07-09 4:37 ` Sergey Senozhatsky
2021-07-12 15:47 ` Marc Zyngier
2021-07-12 15:47 ` Marc Zyngier
2021-07-12 15:47 ` Marc Zyngier
2021-07-21 2:06 ` Sergey Senozhatsky
2021-07-21 2:06 ` Sergey Senozhatsky
2021-07-21 2:06 ` Sergey Senozhatsky
2021-07-09 4:37 ` [PATCHv2 4/4] arm64: add host pv-vcpu-state support Sergey Senozhatsky
2021-07-09 4:37 ` Sergey Senozhatsky
2021-07-09 4:37 ` Sergey Senozhatsky
2021-07-12 16:24 ` Marc Zyngier [this message]
2021-07-12 16:24 ` Marc Zyngier
2021-07-12 16:24 ` Marc Zyngier
2021-07-20 18:44 ` Joel Fernandes
2021-07-20 18:44 ` Joel Fernandes
2021-07-20 18:44 ` Joel Fernandes
2021-07-21 8:40 ` Marc Zyngier
2021-07-21 8:40 ` Marc Zyngier
2021-07-21 8:40 ` Marc Zyngier
2021-07-21 10:38 ` Sergey Senozhatsky
2021-07-21 10:38 ` Sergey Senozhatsky
2021-07-21 10:38 ` Sergey Senozhatsky
2021-07-21 11:08 ` Marc Zyngier
2021-07-21 11:08 ` Marc Zyngier
2021-07-21 11:08 ` Marc Zyngier
2021-07-21 1:15 ` Sergey Senozhatsky
2021-07-21 1:15 ` Sergey Senozhatsky
2021-07-21 1:15 ` Sergey Senozhatsky
2021-07-21 9:10 ` Marc Zyngier
2021-07-21 9:10 ` Marc Zyngier
2021-07-21 9:10 ` Marc Zyngier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874kcz33g5.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=joelaf@google.com \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=senozhatsky@chromium.org \
--cc=suleiman@google.com \
--cc=virtualization@lists.linux-foundation.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.