* [PATCH 0/3] KVM: arm64: nv: Fixes for Nested Virtualization issues
@ 2022-08-24 6:03 Ganapatrao Kulkarni
2022-08-24 6:03 ` [PATCH 1/3] KVM: arm64: nv: only emulate timers that have not yet fired Ganapatrao Kulkarni
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Ganapatrao Kulkarni @ 2022-08-24 6:03 UTC (permalink / raw)
To: maz
Cc: scott, kvm, catalin.marinas, keyur, gankulkarni, will, kvmarm,
linux-arm-kernel
This series contains 3 fixes which were found while testing
ARM64 Nested Virtualization patch series.
First patch avoids the restart of hrtimer when timer interrupt is
fired/forwarded to Guest-Hypervisor.
Second patch fixes the vtimer interrupt drop from the Guest-Hypervisor.
Third patch fixes the NestedVM boot hang seen when Guest Hypersior
configured with 64K pagesize where as Host Hypervisor with 4K.
These patches are rebased on Nested Virtualization V6 patchset[1].
[1] https://www.spinics.net/lists/kvm/msg265656.html
D Scott Phillips (1):
KVM: arm64: nv: only emulate timers that have not yet fired
Ganapatrao Kulkarni (2):
KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired.
KVM: arm64: nv: Avoid block mapping if max_map_size is smaller than
block size.
arch/arm64/kvm/arch_timer.c | 8 +++++++-
arch/arm64/kvm/mmu.c | 2 +-
2 files changed, 8 insertions(+), 2 deletions(-)
--
2.33.1
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/3] KVM: arm64: nv: only emulate timers that have not yet fired
2022-08-24 6:03 [PATCH 0/3] KVM: arm64: nv: Fixes for Nested Virtualization issues Ganapatrao Kulkarni
@ 2022-08-24 6:03 ` Ganapatrao Kulkarni
2022-12-29 13:00 ` Marc Zyngier
2022-08-24 6:03 ` [PATCH 2/3] KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired Ganapatrao Kulkarni
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Ganapatrao Kulkarni @ 2022-08-24 6:03 UTC (permalink / raw)
To: maz
Cc: scott, kvm, catalin.marinas, keyur, gankulkarni, will, kvmarm,
linux-arm-kernel
From: D Scott Phillips <scott@os.amperecomputing.com>
The timer emulation logic goes into an infinite loop when the NestedVM(L2)
timer is being emulated.
While the CPU is executing in L1 context, the L2 timers are emulated using
host hrtimer. When the delta of cval and current time reaches zero, the
vtimer interrupt is fired/forwarded to L2, however the emulation function
in Host-Hypervisor(L0) is still restarting the hrtimer with an expiry time
set to now, triggering hrtimer to fire immediately and resulting in a
continuous trigger of hrtimer and endless looping in the timer emulation.
Adding a fix to avoid restarting of the hrtimer if the interrupt is
already fired.
Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
---
arch/arm64/kvm/arch_timer.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 2371796b1ab5..27a6ec46803a 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -472,7 +472,8 @@ static void timer_emulate(struct arch_timer_context *ctx)
return;
}
- soft_timer_start(&ctx->hrtimer, kvm_timer_compute_delta(ctx));
+ if (!ctx->irq.level)
+ soft_timer_start(&ctx->hrtimer, kvm_timer_compute_delta(ctx));
}
static void timer_save_state(struct arch_timer_context *ctx)
--
2.33.1
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/3] KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired.
2022-08-24 6:03 [PATCH 0/3] KVM: arm64: nv: Fixes for Nested Virtualization issues Ganapatrao Kulkarni
2022-08-24 6:03 ` [PATCH 1/3] KVM: arm64: nv: only emulate timers that have not yet fired Ganapatrao Kulkarni
@ 2022-08-24 6:03 ` Ganapatrao Kulkarni
2022-12-29 13:53 ` Marc Zyngier
2022-08-24 6:03 ` [PATCH 3/3] KVM: arm64: nv: Avoid block mapping if max_map_size is smaller than block size Ganapatrao Kulkarni
2022-10-10 5:56 ` [PATCH 0/3] KVM: arm64: nv: Fixes for Nested Virtualization issues Ganapatrao Kulkarni
3 siblings, 1 reply; 12+ messages in thread
From: Ganapatrao Kulkarni @ 2022-08-24 6:03 UTC (permalink / raw)
To: maz
Cc: scott, kvm, catalin.marinas, keyur, gankulkarni, will, kvmarm,
linux-arm-kernel
Guest-Hypervisor forwards the timer interrupt to Guest-Guest, if it is
enabled, unmasked and ISTATUS bit of register CNTV_CTL_EL0 is set for a
loaded timer.
For NV2 implementation, the Host-Hypervisor is not emulating the ISTATUS
bit while forwarding the Emulated Vtimer Interrupt to Guest-Hypervisor.
This results in the drop of interrupt from Guest-Hypervisor, where as
Host Hypervisor marked it as an active interrupt and expecting Guest-Guest
to consume and acknowledge. Due to this, some of the Guest-Guest vCPUs
are stuck in Idle thread and rcu soft lockups are seen.
This issue is not seen with NV1 case since the register CNTV_CTL_EL0 read
trap handler is emulating the ISTATUS bit.
Adding code to set/emulate the ISTATUS when the emulated timers are fired.
Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
---
arch/arm64/kvm/arch_timer.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 27a6ec46803a..0b32d943d2d5 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -63,6 +63,7 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
struct arch_timer_context *timer,
enum kvm_arch_timer_regs treg);
static bool kvm_arch_timer_get_input_level(int vintid);
+static u64 read_timer_ctl(struct arch_timer_context *timer);
static struct irq_ops arch_timer_irq_ops = {
.get_input_level = kvm_arch_timer_get_input_level,
@@ -356,6 +357,8 @@ static enum hrtimer_restart kvm_hrtimer_expire(struct hrtimer *hrt)
return HRTIMER_RESTART;
}
+ /* Timer emulated, emulate ISTATUS also */
+ timer_set_ctl(ctx, read_timer_ctl(ctx));
kvm_timer_update_irq(vcpu, true, ctx);
return HRTIMER_NORESTART;
}
@@ -458,6 +461,8 @@ static void timer_emulate(struct arch_timer_context *ctx)
trace_kvm_timer_emulate(ctx, should_fire);
if (should_fire != ctx->irq.level) {
+ /* Timer emulated, emulate ISTATUS also */
+ timer_set_ctl(ctx, read_timer_ctl(ctx));
kvm_timer_update_irq(ctx->vcpu, should_fire, ctx);
return;
}
--
2.33.1
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/3] KVM: arm64: nv: Avoid block mapping if max_map_size is smaller than block size.
2022-08-24 6:03 [PATCH 0/3] KVM: arm64: nv: Fixes for Nested Virtualization issues Ganapatrao Kulkarni
2022-08-24 6:03 ` [PATCH 1/3] KVM: arm64: nv: only emulate timers that have not yet fired Ganapatrao Kulkarni
2022-08-24 6:03 ` [PATCH 2/3] KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired Ganapatrao Kulkarni
@ 2022-08-24 6:03 ` Ganapatrao Kulkarni
2022-12-29 17:42 ` Marc Zyngier
2022-10-10 5:56 ` [PATCH 0/3] KVM: arm64: nv: Fixes for Nested Virtualization issues Ganapatrao Kulkarni
3 siblings, 1 reply; 12+ messages in thread
From: Ganapatrao Kulkarni @ 2022-08-24 6:03 UTC (permalink / raw)
To: maz
Cc: scott, kvm, catalin.marinas, keyur, gankulkarni, will, kvmarm,
linux-arm-kernel
In NV case, Shadow stage 2 page table is created using host hypervisor
page table configuration like page size, block size etc. Also, the shadow
stage 2 table uses block level mapping if the Guest Hypervisor IPA is
backed by the THP pages. However, this is resulting in illegal mapping of
NestedVM IPA to Host Hypervisor PA, when Guest Hypervisor and Host
hypervisor are configured with different pagesize.
Adding fix to avoid block level mapping in stage 2 mapping if
max_map_size is smaller than the block size.
Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
---
arch/arm64/kvm/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6caa48da1b2e..3d4b53f153a1 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1304,7 +1304,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* backed by a THP and thus use block mapping if possible.
*/
if (vma_pagesize == PAGE_SIZE &&
- !(max_map_size == PAGE_SIZE || device)) {
+ !(max_map_size < PMD_SIZE || device)) {
if (fault_status == FSC_PERM && fault_granule > PAGE_SIZE)
vma_pagesize = fault_granule;
else
--
2.33.1
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] KVM: arm64: nv: Fixes for Nested Virtualization issues
2022-08-24 6:03 [PATCH 0/3] KVM: arm64: nv: Fixes for Nested Virtualization issues Ganapatrao Kulkarni
` (2 preceding siblings ...)
2022-08-24 6:03 ` [PATCH 3/3] KVM: arm64: nv: Avoid block mapping if max_map_size is smaller than block size Ganapatrao Kulkarni
@ 2022-10-10 5:56 ` Ganapatrao Kulkarni
2022-10-19 7:59 ` Marc Zyngier
3 siblings, 1 reply; 12+ messages in thread
From: Ganapatrao Kulkarni @ 2022-10-10 5:56 UTC (permalink / raw)
To: maz
Cc: scott, kvm, catalin.marinas, keyur, Darren Hart, will, kvmarm,
linux-arm-kernel
Hi Marc,
Any review comments on this series?
On 24-08-2022 11:33 am, Ganapatrao Kulkarni wrote:
> This series contains 3 fixes which were found while testing
> ARM64 Nested Virtualization patch series.
>
> First patch avoids the restart of hrtimer when timer interrupt is
> fired/forwarded to Guest-Hypervisor.
>
> Second patch fixes the vtimer interrupt drop from the Guest-Hypervisor.
>
> Third patch fixes the NestedVM boot hang seen when Guest Hypersior
> configured with 64K pagesize where as Host Hypervisor with 4K.
>
> These patches are rebased on Nested Virtualization V6 patchset[1].
>
> [1] https://www.spinics.net/lists/kvm/msg265656.html
>
> D Scott Phillips (1):
> KVM: arm64: nv: only emulate timers that have not yet fired
>
> Ganapatrao Kulkarni (2):
> KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired.
> KVM: arm64: nv: Avoid block mapping if max_map_size is smaller than
> block size.
>
> arch/arm64/kvm/arch_timer.c | 8 +++++++-
> arch/arm64/kvm/mmu.c | 2 +-
> 2 files changed, 8 insertions(+), 2 deletions(-)
>
Thanks,
Ganapat
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] KVM: arm64: nv: Fixes for Nested Virtualization issues
2022-10-10 5:56 ` [PATCH 0/3] KVM: arm64: nv: Fixes for Nested Virtualization issues Ganapatrao Kulkarni
@ 2022-10-19 7:59 ` Marc Zyngier
0 siblings, 0 replies; 12+ messages in thread
From: Marc Zyngier @ 2022-10-19 7:59 UTC (permalink / raw)
To: Ganapatrao Kulkarni
Cc: scott, kvm, catalin.marinas, keyur, Darren Hart, will, kvmarm,
linux-arm-kernel
On Mon, 10 Oct 2022 06:56:31 +0100,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>
> Hi Marc,
>
> Any review comments on this series?
Not yet. So far, the NV stuff is put on ice until I can source some
actual HW to make the development less painful.
M.
--
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] KVM: arm64: nv: only emulate timers that have not yet fired
2022-08-24 6:03 ` [PATCH 1/3] KVM: arm64: nv: only emulate timers that have not yet fired Ganapatrao Kulkarni
@ 2022-12-29 13:00 ` Marc Zyngier
0 siblings, 0 replies; 12+ messages in thread
From: Marc Zyngier @ 2022-12-29 13:00 UTC (permalink / raw)
To: Ganapatrao Kulkarni
Cc: scott, kvm, catalin.marinas, keyur, will, kvmarm,
linux-arm-kernel
On Wed, 24 Aug 2022 07:03:02 +0100,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>
> From: D Scott Phillips <scott@os.amperecomputing.com>
>
> The timer emulation logic goes into an infinite loop when the NestedVM(L2)
> timer is being emulated.
>
> While the CPU is executing in L1 context, the L2 timers are emulated using
> host hrtimer. When the delta of cval and current time reaches zero, the
> vtimer interrupt is fired/forwarded to L2, however the emulation function
> in Host-Hypervisor(L0) is still restarting the hrtimer with an expiry time
> set to now, triggering hrtimer to fire immediately and resulting in a
> continuous trigger of hrtimer and endless looping in the timer emulation.
>
> Adding a fix to avoid restarting of the hrtimer if the interrupt is
> already fired.
>
> Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
> Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> ---
> arch/arm64/kvm/arch_timer.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 2371796b1ab5..27a6ec46803a 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -472,7 +472,8 @@ static void timer_emulate(struct arch_timer_context *ctx)
> return;
> }
>
> - soft_timer_start(&ctx->hrtimer, kvm_timer_compute_delta(ctx));
> + if (!ctx->irq.level)
> + soft_timer_start(&ctx->hrtimer, kvm_timer_compute_delta(ctx));
> }
>
> static void timer_save_state(struct arch_timer_context *ctx)
I think this is a regression introduced by bee038a67487 ("KVM:
arm/arm64: Rework the timer code to use a timer_map"), and you can see
it because the comment in this function doesn't make much sense
anymore.
Does the following work for you, mostly restoring the original code?
Thanks,
M.
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index ad2a5df88810..4945c5b96f05 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -480,7 +480,7 @@ static void timer_emulate(struct arch_timer_context *ctx)
* scheduled for the future. If the timer cannot fire at all,
* then we also don't need a soft timer.
*/
- if (!kvm_timer_irq_can_fire(ctx)) {
+ if (should_fire || !kvm_timer_irq_can_fire(ctx)) {
soft_timer_cancel(&ctx->hrtimer);
return;
}
--
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired.
2022-08-24 6:03 ` [PATCH 2/3] KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired Ganapatrao Kulkarni
@ 2022-12-29 13:53 ` Marc Zyngier
2023-01-02 11:46 ` Marc Zyngier
0 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2022-12-29 13:53 UTC (permalink / raw)
To: Ganapatrao Kulkarni
Cc: scott, kvm, catalin.marinas, keyur, will, kvmarm,
linux-arm-kernel
On Wed, 24 Aug 2022 07:03:03 +0100,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>
> Guest-Hypervisor forwards the timer interrupt to Guest-Guest, if it is
> enabled, unmasked and ISTATUS bit of register CNTV_CTL_EL0 is set for a
> loaded timer.
>
> For NV2 implementation, the Host-Hypervisor is not emulating the ISTATUS
> bit while forwarding the Emulated Vtimer Interrupt to Guest-Hypervisor.
> This results in the drop of interrupt from Guest-Hypervisor, where as
> Host Hypervisor marked it as an active interrupt and expecting Guest-Guest
> to consume and acknowledge. Due to this, some of the Guest-Guest vCPUs
> are stuck in Idle thread and rcu soft lockups are seen.
>
> This issue is not seen with NV1 case since the register CNTV_CTL_EL0 read
> trap handler is emulating the ISTATUS bit.
>
> Adding code to set/emulate the ISTATUS when the emulated timers are fired.
>
> Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> ---
> arch/arm64/kvm/arch_timer.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 27a6ec46803a..0b32d943d2d5 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -63,6 +63,7 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
> struct arch_timer_context *timer,
> enum kvm_arch_timer_regs treg);
> static bool kvm_arch_timer_get_input_level(int vintid);
> +static u64 read_timer_ctl(struct arch_timer_context *timer);
>
> static struct irq_ops arch_timer_irq_ops = {
> .get_input_level = kvm_arch_timer_get_input_level,
> @@ -356,6 +357,8 @@ static enum hrtimer_restart kvm_hrtimer_expire(struct hrtimer *hrt)
> return HRTIMER_RESTART;
> }
>
> + /* Timer emulated, emulate ISTATUS also */
> + timer_set_ctl(ctx, read_timer_ctl(ctx));
Why should we do that for non-NV2 configurations?
> kvm_timer_update_irq(vcpu, true, ctx);
> return HRTIMER_NORESTART;
> }
> @@ -458,6 +461,8 @@ static void timer_emulate(struct arch_timer_context *ctx)
> trace_kvm_timer_emulate(ctx, should_fire);
>
> if (should_fire != ctx->irq.level) {
> + /* Timer emulated, emulate ISTATUS also */
> + timer_set_ctl(ctx, read_timer_ctl(ctx));
> kvm_timer_update_irq(ctx->vcpu, should_fire, ctx);
> return;
> }
I'm not overly keen on this. Yes, we can set the status bit there. But
conversely, the bit will not get cleared when the guest reprograms the
timer, and will take a full exit/entry cycle for it to appear.
Ergo, the architecture is buggy as memory (the VNCR page) cannot be
used to emulate something as dynamic as a timer.
It is only with FEAT_ECV that we can solve this correctly by trapping
the counter/timer accesses and emulate them for the guest hypervisor.
I'd rather we add support for that, as I expect all the FEAT_NV2
implementations to have it (and hopefully FEAT_FGT as well).
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] KVM: arm64: nv: Avoid block mapping if max_map_size is smaller than block size.
2022-08-24 6:03 ` [PATCH 3/3] KVM: arm64: nv: Avoid block mapping if max_map_size is smaller than block size Ganapatrao Kulkarni
@ 2022-12-29 17:42 ` Marc Zyngier
2023-01-03 4:26 ` Ganapatrao Kulkarni
0 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2022-12-29 17:42 UTC (permalink / raw)
To: Ganapatrao Kulkarni
Cc: scott, kvm, catalin.marinas, keyur, will, kvmarm,
linux-arm-kernel
On Wed, 24 Aug 2022 07:03:04 +0100,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>
> In NV case, Shadow stage 2 page table is created using host hypervisor
> page table configuration like page size, block size etc. Also, the shadow
> stage 2 table uses block level mapping if the Guest Hypervisor IPA is
> backed by the THP pages. However, this is resulting in illegal mapping of
> NestedVM IPA to Host Hypervisor PA, when Guest Hypervisor and Host
> hypervisor are configured with different pagesize.
>
> Adding fix to avoid block level mapping in stage 2 mapping if
> max_map_size is smaller than the block size.
>
> Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> ---
> arch/arm64/kvm/mmu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 6caa48da1b2e..3d4b53f153a1 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1304,7 +1304,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> * backed by a THP and thus use block mapping if possible.
> */
> if (vma_pagesize == PAGE_SIZE &&
> - !(max_map_size == PAGE_SIZE || device)) {
> + !(max_map_size < PMD_SIZE || device)) {
> if (fault_status == FSC_PERM && fault_granule > PAGE_SIZE)
> vma_pagesize = fault_granule;
> else
That's quite a nice catch. I guess this was the main issue with
running 64kB L1 on a 4kB L0? Now, I'm not that fond of the fix itself,
and I think max_map_size should always represent something that is a
valid size *on the host*, specially when outside of NV-specific code.
How about something like this instead:
@@ -1346,6 +1346,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* table uses at least as big a mapping.
*/
max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
+
+ if (max_map_size >= PMD_SIZE && max_map_size < PUD_SIZE)
+ max_map_size = PMD_SIZE;
+ else if (max_map_size >= PAGE_SIZE && max_map_size < PMD_SIZE)
+ max_map_size = PAGE_SIZE;
}
vma_pagesize = min(vma_pagesize, max_map_size);
Admittedly, this is a lot uglier than your fix. But it keep the nested
horror localised, and doesn't risk being reverted by accident by
people who would not take NV into account (can't blame them, really).
Can you please give it a go?
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired.
2022-12-29 13:53 ` Marc Zyngier
@ 2023-01-02 11:46 ` Marc Zyngier
2023-01-03 4:21 ` Ganapatrao Kulkarni
0 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2023-01-02 11:46 UTC (permalink / raw)
To: Ganapatrao Kulkarni
Cc: scott, kvm, catalin.marinas, keyur, will, kvmarm,
linux-arm-kernel
On Thu, 29 Dec 2022 13:53:15 +0000,
Marc Zyngier <maz@kernel.org> wrote:
>
> On Wed, 24 Aug 2022 07:03:03 +0100,
> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> >
> > Guest-Hypervisor forwards the timer interrupt to Guest-Guest, if it is
> > enabled, unmasked and ISTATUS bit of register CNTV_CTL_EL0 is set for a
> > loaded timer.
> >
> > For NV2 implementation, the Host-Hypervisor is not emulating the ISTATUS
> > bit while forwarding the Emulated Vtimer Interrupt to Guest-Hypervisor.
> > This results in the drop of interrupt from Guest-Hypervisor, where as
> > Host Hypervisor marked it as an active interrupt and expecting Guest-Guest
> > to consume and acknowledge. Due to this, some of the Guest-Guest vCPUs
> > are stuck in Idle thread and rcu soft lockups are seen.
> >
> > This issue is not seen with NV1 case since the register CNTV_CTL_EL0 read
> > trap handler is emulating the ISTATUS bit.
> >
> > Adding code to set/emulate the ISTATUS when the emulated timers are fired.
> >
> > Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> > ---
> > arch/arm64/kvm/arch_timer.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> > index 27a6ec46803a..0b32d943d2d5 100644
> > --- a/arch/arm64/kvm/arch_timer.c
> > +++ b/arch/arm64/kvm/arch_timer.c
> > @@ -63,6 +63,7 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
> > struct arch_timer_context *timer,
> > enum kvm_arch_timer_regs treg);
> > static bool kvm_arch_timer_get_input_level(int vintid);
> > +static u64 read_timer_ctl(struct arch_timer_context *timer);
> >
> > static struct irq_ops arch_timer_irq_ops = {
> > .get_input_level = kvm_arch_timer_get_input_level,
> > @@ -356,6 +357,8 @@ static enum hrtimer_restart kvm_hrtimer_expire(struct hrtimer *hrt)
> > return HRTIMER_RESTART;
> > }
> >
> > + /* Timer emulated, emulate ISTATUS also */
> > + timer_set_ctl(ctx, read_timer_ctl(ctx));
>
> Why should we do that for non-NV2 configurations?
>
> > kvm_timer_update_irq(vcpu, true, ctx);
> > return HRTIMER_NORESTART;
> > }
> > @@ -458,6 +461,8 @@ static void timer_emulate(struct arch_timer_context *ctx)
> > trace_kvm_timer_emulate(ctx, should_fire);
> >
> > if (should_fire != ctx->irq.level) {
> > + /* Timer emulated, emulate ISTATUS also */
> > + timer_set_ctl(ctx, read_timer_ctl(ctx));
> > kvm_timer_update_irq(ctx->vcpu, should_fire, ctx);
> > return;
> > }
>
> I'm not overly keen on this. Yes, we can set the status bit there. But
> conversely, the bit will not get cleared when the guest reprograms the
> timer, and will take a full exit/entry cycle for it to appear.
>
> Ergo, the architecture is buggy as memory (the VNCR page) cannot be
> used to emulate something as dynamic as a timer.
>
> It is only with FEAT_ECV that we can solve this correctly by trapping
> the counter/timer accesses and emulate them for the guest hypervisor.
> I'd rather we add support for that, as I expect all the FEAT_NV2
> implementations to have it (and hopefully FEAT_FGT as well).
So I went ahead and implemented some very basic FEAT_ECV support to
correctly emulate the timers (trapping the CTL/CVAL accesses).
Performance dropped like a rock (~30% extra overhead) for L2
exit-heavy workloads that are terminated in userspace, such as virtio.
For those workloads, vcpu_{load,put}() in L1 now generate extra traps,
as we save/restore the timer context, and this is enough to make
things visibly slower, even on a pretty fast machine.
I managed to get *some* performance back by satisfying CTL/CVAL reads
very early on the exit path (a pretty common theme with NV). Which
means we end-up needing something like what you have -- only a bit
more complete. I came up with the following:
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 4945c5b96f05..a198a6211e2a 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -450,6 +450,25 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
{
int ret;
+ /*
+ * Paper over NV2 brokenness by publishing the interrupt status
+ * bit. This still results in a poor quality of emulation (guest
+ * writes will have no effect until the next exit).
+ *
+ * But hey, it's fast, right?
+ */
+ if (vcpu_has_nv2(vcpu) && is_hyp_ctxt(vcpu) &&
+ (timer_ctx == vcpu_vtimer(vcpu) || timer_ctx == vcpu_ptimer(vcpu))) {
+ u32 ctl = timer_get_ctl(timer_ctx);
+
+ if (new_level)
+ ctl |= ARCH_TIMER_CTRL_IT_STAT;
+ else
+ ctl &= ~ARCH_TIMER_CTRL_IT_STAT;
+
+ timer_set_ctl(timer_ctx, ctl);
+ }
+
timer_ctx->irq.level = new_level;
trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
timer_ctx->irq.level);
which reports the interrupt state in all cases.
Does this work for you?
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired.
2023-01-02 11:46 ` Marc Zyngier
@ 2023-01-03 4:21 ` Ganapatrao Kulkarni
0 siblings, 0 replies; 12+ messages in thread
From: Ganapatrao Kulkarni @ 2023-01-03 4:21 UTC (permalink / raw)
To: Marc Zyngier
Cc: scott, kvm, catalin.marinas, darren, will, kvmarm,
linux-arm-kernel
On 02-01-2023 05:16 pm, Marc Zyngier wrote:
> On Thu, 29 Dec 2022 13:53:15 +0000,
> Marc Zyngier <maz@kernel.org> wrote:
>>
>> On Wed, 24 Aug 2022 07:03:03 +0100,
>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>>
>>> Guest-Hypervisor forwards the timer interrupt to Guest-Guest, if it is
>>> enabled, unmasked and ISTATUS bit of register CNTV_CTL_EL0 is set for a
>>> loaded timer.
>>>
>>> For NV2 implementation, the Host-Hypervisor is not emulating the ISTATUS
>>> bit while forwarding the Emulated Vtimer Interrupt to Guest-Hypervisor.
>>> This results in the drop of interrupt from Guest-Hypervisor, where as
>>> Host Hypervisor marked it as an active interrupt and expecting Guest-Guest
>>> to consume and acknowledge. Due to this, some of the Guest-Guest vCPUs
>>> are stuck in Idle thread and rcu soft lockups are seen.
>>>
>>> This issue is not seen with NV1 case since the register CNTV_CTL_EL0 read
>>> trap handler is emulating the ISTATUS bit.
>>>
>>> Adding code to set/emulate the ISTATUS when the emulated timers are fired.
>>>
>>> Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
>>> ---
>>> arch/arm64/kvm/arch_timer.c | 5 +++++
>>> 1 file changed, 5 insertions(+)
>>>
>>> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
>>> index 27a6ec46803a..0b32d943d2d5 100644
>>> --- a/arch/arm64/kvm/arch_timer.c
>>> +++ b/arch/arm64/kvm/arch_timer.c
>>> @@ -63,6 +63,7 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
>>> struct arch_timer_context *timer,
>>> enum kvm_arch_timer_regs treg);
>>> static bool kvm_arch_timer_get_input_level(int vintid);
>>> +static u64 read_timer_ctl(struct arch_timer_context *timer);
>>>
>>> static struct irq_ops arch_timer_irq_ops = {
>>> .get_input_level = kvm_arch_timer_get_input_level,
>>> @@ -356,6 +357,8 @@ static enum hrtimer_restart kvm_hrtimer_expire(struct hrtimer *hrt)
>>> return HRTIMER_RESTART;
>>> }
>>>
>>> + /* Timer emulated, emulate ISTATUS also */
>>> + timer_set_ctl(ctx, read_timer_ctl(ctx));
>>
>> Why should we do that for non-NV2 configurations?
>>
>>> kvm_timer_update_irq(vcpu, true, ctx);
>>> return HRTIMER_NORESTART;
>>> }
>>> @@ -458,6 +461,8 @@ static void timer_emulate(struct arch_timer_context *ctx)
>>> trace_kvm_timer_emulate(ctx, should_fire);
>>>
>>> if (should_fire != ctx->irq.level) {
>>> + /* Timer emulated, emulate ISTATUS also */
>>> + timer_set_ctl(ctx, read_timer_ctl(ctx));
>>> kvm_timer_update_irq(ctx->vcpu, should_fire, ctx);
>>> return;
>>> }
>>
>> I'm not overly keen on this. Yes, we can set the status bit there. But
>> conversely, the bit will not get cleared when the guest reprograms the
>> timer, and will take a full exit/entry cycle for it to appear.
>>
>> Ergo, the architecture is buggy as memory (the VNCR page) cannot be
>> used to emulate something as dynamic as a timer.
>>
>> It is only with FEAT_ECV that we can solve this correctly by trapping
>> the counter/timer accesses and emulate them for the guest hypervisor.
>> I'd rather we add support for that, as I expect all the FEAT_NV2
>> implementations to have it (and hopefully FEAT_FGT as well).
>
> So I went ahead and implemented some very basic FEAT_ECV support to
> correctly emulate the timers (trapping the CTL/CVAL accesses).
>
> Performance dropped like a rock (~30% extra overhead) for L2
> exit-heavy workloads that are terminated in userspace, such as virtio.
> For those workloads, vcpu_{load,put}() in L1 now generate extra traps,
> as we save/restore the timer context, and this is enough to make
> things visibly slower, even on a pretty fast machine.
>
> I managed to get *some* performance back by satisfying CTL/CVAL reads
> very early on the exit path (a pretty common theme with NV). Which
> means we end-up needing something like what you have -- only a bit
> more complete. I came up with the following:
>
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 4945c5b96f05..a198a6211e2a 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -450,6 +450,25 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
> {
> int ret;
>
> + /*
> + * Paper over NV2 brokenness by publishing the interrupt status
> + * bit. This still results in a poor quality of emulation (guest
> + * writes will have no effect until the next exit).
> + *
> + * But hey, it's fast, right?
> + */
> + if (vcpu_has_nv2(vcpu) && is_hyp_ctxt(vcpu) &&
> + (timer_ctx == vcpu_vtimer(vcpu) || timer_ctx == vcpu_ptimer(vcpu))) {
> + u32 ctl = timer_get_ctl(timer_ctx);
> +
> + if (new_level)
> + ctl |= ARCH_TIMER_CTRL_IT_STAT;
> + else
> + ctl &= ~ARCH_TIMER_CTRL_IT_STAT;
> +
> + timer_set_ctl(timer_ctx, ctl);
> + }
> +
> timer_ctx->irq.level = new_level;
> trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
> timer_ctx->irq.level);
>
> which reports the interrupt state in all cases.
>
> Does this work for you?
Thanks Marc for the patch. I will try this and update at the earliest.
>
> Thanks,
>
> M.
>
Thanks,
Ganapat
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] KVM: arm64: nv: Avoid block mapping if max_map_size is smaller than block size.
2022-12-29 17:42 ` Marc Zyngier
@ 2023-01-03 4:26 ` Ganapatrao Kulkarni
0 siblings, 0 replies; 12+ messages in thread
From: Ganapatrao Kulkarni @ 2023-01-03 4:26 UTC (permalink / raw)
To: Marc Zyngier
Cc: scott, kvm, catalin.marinas, darren, will, kvmarm,
linux-arm-kernel
On 29-12-2022 11:12 pm, Marc Zyngier wrote:
> On Wed, 24 Aug 2022 07:03:04 +0100,
> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>
>> In NV case, Shadow stage 2 page table is created using host hypervisor
>> page table configuration like page size, block size etc. Also, the shadow
>> stage 2 table uses block level mapping if the Guest Hypervisor IPA is
>> backed by the THP pages. However, this is resulting in illegal mapping of
>> NestedVM IPA to Host Hypervisor PA, when Guest Hypervisor and Host
>> hypervisor are configured with different pagesize.
>>
>> Adding fix to avoid block level mapping in stage 2 mapping if
>> max_map_size is smaller than the block size.
>>
>> Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
>> ---
>> arch/arm64/kvm/mmu.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 6caa48da1b2e..3d4b53f153a1 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -1304,7 +1304,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> * backed by a THP and thus use block mapping if possible.
>> */
>> if (vma_pagesize == PAGE_SIZE &&
>> - !(max_map_size == PAGE_SIZE || device)) {
>> + !(max_map_size < PMD_SIZE || device)) {
>> if (fault_status == FSC_PERM && fault_granule > PAGE_SIZE)
>> vma_pagesize = fault_granule;
>> else
>
> That's quite a nice catch. I guess this was the main issue with
> running 64kB L1 on a 4kB L0? Now, I'm not that fond of the fix itself,
> and I think max_map_size should always represent something that is a
> valid size *on the host*, specially when outside of NV-specific code.
>
Thanks Marc, yes this patch was to fix the issue seen with L1 64K and L0
4K page size.
> How about something like this instead:
>
> @@ -1346,6 +1346,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> * table uses at least as big a mapping.
> */
> max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
> +
> + if (max_map_size >= PMD_SIZE && max_map_size < PUD_SIZE)
> + max_map_size = PMD_SIZE;
> + else if (max_map_size >= PAGE_SIZE && max_map_size < PMD_SIZE)
> + max_map_size = PAGE_SIZE;
> }
>
> vma_pagesize = min(vma_pagesize, max_map_size);
>
>
> Admittedly, this is a lot uglier than your fix. But it keep the nested
> horror localised, and doesn't risk being reverted by accident by
> people who would not take NV into account (can't blame them, really).
>
> Can you please give it a go?
Sure, I will try this and update at the earliest.
>
> Thanks,
>
> M.
>
Thanks,
Ganapat
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-01-03 4:26 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-24 6:03 [PATCH 0/3] KVM: arm64: nv: Fixes for Nested Virtualization issues Ganapatrao Kulkarni
2022-08-24 6:03 ` [PATCH 1/3] KVM: arm64: nv: only emulate timers that have not yet fired Ganapatrao Kulkarni
2022-12-29 13:00 ` Marc Zyngier
2022-08-24 6:03 ` [PATCH 2/3] KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired Ganapatrao Kulkarni
2022-12-29 13:53 ` Marc Zyngier
2023-01-02 11:46 ` Marc Zyngier
2023-01-03 4:21 ` Ganapatrao Kulkarni
2022-08-24 6:03 ` [PATCH 3/3] KVM: arm64: nv: Avoid block mapping if max_map_size is smaller than block size Ganapatrao Kulkarni
2022-12-29 17:42 ` Marc Zyngier
2023-01-03 4:26 ` Ganapatrao Kulkarni
2022-10-10 5:56 ` [PATCH 0/3] KVM: arm64: nv: Fixes for Nested Virtualization issues Ganapatrao Kulkarni
2022-10-19 7:59 ` Marc Zyngier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox