* [PATCH 1/6] KVM: PPC: Book3S HV P9: Fix "lost kick" race
2022-03-03 5:33 [PATCH 0/6] KVM: PPC: Book3S HV interrupt fixes Nicholas Piggin
@ 2022-03-03 5:33 ` Nicholas Piggin
2022-03-09 13:07 ` Fabiano Rosas
2022-03-03 5:33 ` [PATCH 2/6] KVM: PPC: Book3S HV P9: Inject pending xive interrupts at guest entry Nicholas Piggin
` (5 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Nicholas Piggin @ 2022-03-03 5:33 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Cédric Le Goater, Nicholas Piggin
When new work is created that requires attention from the hypervisor
(e.g., to inject an interrupt into the guest), fast_vcpu_kick is used to
pull the target vcpu out of the guest if it may have been running.
Therefore the work creation side looks like this:
vcpu->arch.doorbell_request = 1;
kvmppc_fast_vcpu_kick_hv(vcpu) {
smp_mb();
cpu = vcpu->cpu;
if (cpu != -1)
send_ipi(cpu);
}
And the guest entry side *should* look like this:
vcpu->cpu = smp_processor_id();
smp_mb();
if (vcpu->arch.doorbell_request) {
// do something (abort entry or inject doorbell etc)
}
But currently the store and load are flipped, so it is possible for the
entry to see no doorbell pending, and the doorbell creation misses the
store to set cpu, resulting lost work (or at least delayed until the
next guest exit).
Fix this by reordering the entry operations and adding a smp_mb
between them. The P8 path appears to have a similar race which is
commented but not addressed yet.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kvm/book3s_hv.c | 41 +++++++++++++++++++++++++++++-------
1 file changed, 33 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 84c89f08ae9a..f8c0f1f52a1e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -225,6 +225,13 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
int cpu;
struct rcuwait *waitp;
+ /*
+ * rcuwait_wake_up contains smp_mb() which orders prior stores that
+ * create pending work vs below loads of cpu fields. The other side
+ * is the barrier in vcpu run that orders setting the cpu fields vs
+ * testing for pending work.
+ */
+
waitp = kvm_arch_vcpu_get_wait(vcpu);
if (rcuwait_wake_up(waitp))
++vcpu->stat.generic.halt_wakeup;
@@ -1089,7 +1096,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
break;
}
tvcpu->arch.prodded = 1;
- smp_mb();
+ smp_mb(); /* This orders prodded store vs ceded load */
if (tvcpu->arch.ceded)
kvmppc_fast_vcpu_kick_hv(tvcpu);
break;
@@ -3771,6 +3778,14 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
pvc = core_info.vc[sub];
pvc->pcpu = pcpu + thr;
for_each_runnable_thread(i, vcpu, pvc) {
+ /*
+ * XXX: is kvmppc_start_thread called too late here?
+ * It updates vcpu->cpu and vcpu->arch.thread_cpu
+ * which are used by kvmppc_fast_vcpu_kick_hv(), but
+ * kick is called after new exceptions become available
+ * and exceptions are checked earlier than here, by
+ * kvmppc_core_prepare_to_enter.
+ */
kvmppc_start_thread(vcpu, pvc);
kvmppc_create_dtl_entry(vcpu, pvc);
trace_kvm_guest_enter(vcpu);
@@ -4492,6 +4507,21 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
if (need_resched() || !kvm->arch.mmu_ready)
goto out;
+ vcpu->cpu = pcpu;
+ vcpu->arch.thread_cpu = pcpu;
+ vc->pcpu = pcpu;
+ local_paca->kvm_hstate.kvm_vcpu = vcpu;
+ local_paca->kvm_hstate.ptid = 0;
+ local_paca->kvm_hstate.fake_suspend = 0;
+
+ /*
+ * Orders set cpu/thread_cpu vs testing for pending interrupts and
+ * doorbells below. The other side is when these fields are set vs
+ * kvmppc_fast_vcpu_kick_hv reading the cpu/thread_cpu fields to
+ * kick a vCPU to notice the pending interrupt.
+ */
+ smp_mb();
+
if (!nested) {
kvmppc_core_prepare_to_enter(vcpu);
if (test_bit(BOOK3S_IRQPRIO_EXTERNAL,
@@ -4511,13 +4541,6 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
tb = mftb();
- vcpu->cpu = pcpu;
- vcpu->arch.thread_cpu = pcpu;
- vc->pcpu = pcpu;
- local_paca->kvm_hstate.kvm_vcpu = vcpu;
- local_paca->kvm_hstate.ptid = 0;
- local_paca->kvm_hstate.fake_suspend = 0;
-
__kvmppc_create_dtl_entry(vcpu, pcpu, tb + vc->tb_offset, 0);
trace_kvm_guest_enter(vcpu);
@@ -4619,6 +4642,8 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
run->exit_reason = KVM_EXIT_INTR;
vcpu->arch.ret = -EINTR;
out:
+ vcpu->cpu = -1;
+ vcpu->arch.thread_cpu = -1;
powerpc_local_irq_pmu_restore(flags);
preempt_enable();
goto done;
--
2.23.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 1/6] KVM: PPC: Book3S HV P9: Fix "lost kick" race
2022-03-03 5:33 ` [PATCH 1/6] KVM: PPC: Book3S HV P9: Fix "lost kick" race Nicholas Piggin
@ 2022-03-09 13:07 ` Fabiano Rosas
0 siblings, 0 replies; 14+ messages in thread
From: Fabiano Rosas @ 2022-03-09 13:07 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Cédric Le Goater, Nicholas Piggin
Nicholas Piggin <npiggin@gmail.com> writes:
> When new work is created that requires attention from the hypervisor
> (e.g., to inject an interrupt into the guest), fast_vcpu_kick is used to
> pull the target vcpu out of the guest if it may have been running.
>
> Therefore the work creation side looks like this:
>
> vcpu->arch.doorbell_request = 1;
> kvmppc_fast_vcpu_kick_hv(vcpu) {
> smp_mb();
> cpu = vcpu->cpu;
> if (cpu != -1)
> send_ipi(cpu);
> }
>
> And the guest entry side *should* look like this:
>
> vcpu->cpu = smp_processor_id();
> smp_mb();
> if (vcpu->arch.doorbell_request) {
> // do something (abort entry or inject doorbell etc)
> }
>
> But currently the store and load are flipped, so it is possible for the
> entry to see no doorbell pending, and the doorbell creation misses the
> store to set cpu, resulting lost work (or at least delayed until the
> next guest exit).
>
> Fix this by reordering the entry operations and adding a smp_mb
> between them. The P8 path appears to have a similar race which is
> commented but not addressed yet.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/6] KVM: PPC: Book3S HV P9: Inject pending xive interrupts at guest entry
2022-03-03 5:33 [PATCH 0/6] KVM: PPC: Book3S HV interrupt fixes Nicholas Piggin
2022-03-03 5:33 ` [PATCH 1/6] KVM: PPC: Book3S HV P9: Fix "lost kick" race Nicholas Piggin
@ 2022-03-03 5:33 ` Nicholas Piggin
2022-03-07 23:19 ` Fabiano Rosas
2022-03-03 5:33 ` [PATCH 3/6] KVM: PPC: Book3S HV P9: Move cede logic out of XIVE escalation rearming Nicholas Piggin
` (4 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Nicholas Piggin @ 2022-03-03 5:33 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Cédric Le Goater, Nicholas Piggin
If there is a pending xive interrupt, inject it at guest entry (if
MSR[EE] is enabled) rather than take another interrupt when the guest
is entered. If xive is enabled then LPCR[LPES] is set so this behaviour
should be expected.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kvm/book3s_hv.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index f8c0f1f52a1e..5df359053147 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4524,9 +4524,14 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
if (!nested) {
kvmppc_core_prepare_to_enter(vcpu);
- if (test_bit(BOOK3S_IRQPRIO_EXTERNAL,
- &vcpu->arch.pending_exceptions))
+ if (vcpu->arch.shregs.msr & MSR_EE) {
+ if (xive_interrupt_pending(vcpu))
+ kvmppc_inject_interrupt_hv(vcpu,
+ BOOK3S_INTERRUPT_EXTERNAL, 0);
+ } else if (test_bit(BOOK3S_IRQPRIO_EXTERNAL,
+ &vcpu->arch.pending_exceptions)) {
lpcr |= LPCR_MER;
+ }
} else if (vcpu->arch.pending_exceptions ||
vcpu->arch.doorbell_request ||
xive_interrupt_pending(vcpu)) {
--
2.23.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 2/6] KVM: PPC: Book3S HV P9: Inject pending xive interrupts at guest entry
2022-03-03 5:33 ` [PATCH 2/6] KVM: PPC: Book3S HV P9: Inject pending xive interrupts at guest entry Nicholas Piggin
@ 2022-03-07 23:19 ` Fabiano Rosas
0 siblings, 0 replies; 14+ messages in thread
From: Fabiano Rosas @ 2022-03-07 23:19 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Cédric Le Goater, Nicholas Piggin
Nicholas Piggin <npiggin@gmail.com> writes:
> If there is a pending xive interrupt, inject it at guest entry (if
> MSR[EE] is enabled) rather than take another interrupt when the guest
> is entered. If xive is enabled then LPCR[LPES] is set so this behaviour
> should be expected.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> arch/powerpc/kvm/book3s_hv.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index f8c0f1f52a1e..5df359053147 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4524,9 +4524,14 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>
> if (!nested) {
> kvmppc_core_prepare_to_enter(vcpu);
> - if (test_bit(BOOK3S_IRQPRIO_EXTERNAL,
> - &vcpu->arch.pending_exceptions))
> + if (vcpu->arch.shregs.msr & MSR_EE) {
> + if (xive_interrupt_pending(vcpu))
> + kvmppc_inject_interrupt_hv(vcpu,
> + BOOK3S_INTERRUPT_EXTERNAL, 0);
> + } else if (test_bit(BOOK3S_IRQPRIO_EXTERNAL,
> + &vcpu->arch.pending_exceptions)) {
> lpcr |= LPCR_MER;
> + }
> } else if (vcpu->arch.pending_exceptions ||
> vcpu->arch.doorbell_request ||
> xive_interrupt_pending(vcpu)) {
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/6] KVM: PPC: Book3S HV P9: Move cede logic out of XIVE escalation rearming
2022-03-03 5:33 [PATCH 0/6] KVM: PPC: Book3S HV interrupt fixes Nicholas Piggin
2022-03-03 5:33 ` [PATCH 1/6] KVM: PPC: Book3S HV P9: Fix "lost kick" race Nicholas Piggin
2022-03-03 5:33 ` [PATCH 2/6] KVM: PPC: Book3S HV P9: Inject pending xive interrupts at guest entry Nicholas Piggin
@ 2022-03-03 5:33 ` Nicholas Piggin
2022-03-09 13:55 ` Cédric Le Goater
2022-03-09 14:41 ` Fabiano Rosas
2022-03-03 5:33 ` [PATCH 4/6] KVM: PPC: Book3S HV P9: Split !nested case out from guest entry Nicholas Piggin
` (3 subsequent siblings)
6 siblings, 2 replies; 14+ messages in thread
From: Nicholas Piggin @ 2022-03-03 5:33 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Cédric Le Goater, Nicholas Piggin
Move the cede abort logic out of xive escalation rearming and into
the caller to prepare for handling a similar case with nested guest
entry.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/include/asm/kvm_ppc.h | 4 ++--
arch/powerpc/kvm/book3s_hv.c | 10 ++++++++--
arch/powerpc/kvm/book3s_xive.c | 9 ++++++---
3 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index a14dbcd1b8ce..94fa5f246657 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -671,7 +671,7 @@ extern int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq,
int level, bool line_status);
extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
extern void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu);
-extern void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu);
+extern bool kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu);
static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
{
@@ -709,7 +709,7 @@ static inline int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 ir
int level, bool line_status) { return -ENODEV; }
static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
static inline void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu) { }
-static inline void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu) { }
+static inline bool kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu) { return true; }
static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
{ return 0; }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 5df359053147..a0b674d3a2da 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4073,10 +4073,16 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
!(vcpu->arch.shregs.msr & MSR_PR)) {
unsigned long req = kvmppc_get_gpr(vcpu, 3);
- /* H_CEDE has to be handled now, not later */
+ /* H_CEDE has to be handled now */
if (req == H_CEDE) {
kvmppc_cede(vcpu);
- kvmppc_xive_rearm_escalation(vcpu); /* may un-cede */
+ if (!kvmppc_xive_rearm_escalation(vcpu)) {
+ /*
+ * Pending escalation so abort
+ * the cede.
+ */
+ vcpu->arch.ceded = 0;
+ }
kvmppc_set_gpr(vcpu, 3, 0);
trap = 0;
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index e216c068075d..7b513e14cada 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -179,12 +179,13 @@ void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvmppc_xive_pull_vcpu);
-void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
+bool kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
{
void __iomem *esc_vaddr = (void __iomem *)vcpu->arch.xive_esc_vaddr;
+ bool ret = true;
if (!esc_vaddr)
- return;
+ return ret;
/* we are using XIVE with single escalation */
@@ -197,7 +198,7 @@ void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
* we also don't want to set xive_esc_on to 1 here in
* case we race with xive_esc_irq().
*/
- vcpu->arch.ceded = 0;
+ ret = false;
/*
* The escalation interrupts are special as we don't EOI them.
* There is no need to use the load-after-store ordering offset
@@ -210,6 +211,8 @@ void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
__raw_readq(esc_vaddr + XIVE_ESB_SET_PQ_00);
}
mb();
+
+ return ret;
}
EXPORT_SYMBOL_GPL(kvmppc_xive_rearm_escalation);
--
2.23.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 3/6] KVM: PPC: Book3S HV P9: Move cede logic out of XIVE escalation rearming
2022-03-03 5:33 ` [PATCH 3/6] KVM: PPC: Book3S HV P9: Move cede logic out of XIVE escalation rearming Nicholas Piggin
@ 2022-03-09 13:55 ` Cédric Le Goater
2022-03-09 14:41 ` Fabiano Rosas
1 sibling, 0 replies; 14+ messages in thread
From: Cédric Le Goater @ 2022-03-09 13:55 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev
On 3/3/22 06:33, Nicholas Piggin wrote:
> Move the cede abort logic out of xive escalation rearming and into
> the caller to prepare for handling a similar case with nested guest
> entry.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
In xive_esc_irq() :
if (vcpu->arch.ceded)
kvmppc_fast_vcpu_kick(vcpu);
which does :
vcpu->kvm->arch.kvm_ops->fast_vcpu_kick(vcpu);
That's a lot of indirection which are costly on PPC. May be for this case,
since we know XIVE is only supported on KVM HV, we could use directly
kvmppc_fast_vcpu_kick_hv().
Thanks,
C.
> ---
> arch/powerpc/include/asm/kvm_ppc.h | 4 ++--
> arch/powerpc/kvm/book3s_hv.c | 10 ++++++++--
> arch/powerpc/kvm/book3s_xive.c | 9 ++++++---
> 3 files changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index a14dbcd1b8ce..94fa5f246657 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -671,7 +671,7 @@ extern int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq,
> int level, bool line_status);
> extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
> extern void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu);
> -extern void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu);
> +extern bool kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu);
>
> static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
> {
> @@ -709,7 +709,7 @@ static inline int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 ir
> int level, bool line_status) { return -ENODEV; }
> static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
> static inline void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu) { }
> -static inline void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu) { }
> +static inline bool kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu) { return true; }
>
> static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
> { return 0; }
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 5df359053147..a0b674d3a2da 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4073,10 +4073,16 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
> !(vcpu->arch.shregs.msr & MSR_PR)) {
> unsigned long req = kvmppc_get_gpr(vcpu, 3);
>
> - /* H_CEDE has to be handled now, not later */
> + /* H_CEDE has to be handled now */
> if (req == H_CEDE) {
> kvmppc_cede(vcpu);
> - kvmppc_xive_rearm_escalation(vcpu); /* may un-cede */
> + if (!kvmppc_xive_rearm_escalation(vcpu)) {
> + /*
> + * Pending escalation so abort
> + * the cede.
> + */
> + vcpu->arch.ceded = 0;
> + }
> kvmppc_set_gpr(vcpu, 3, 0);
> trap = 0;
>
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index e216c068075d..7b513e14cada 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -179,12 +179,13 @@ void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu)
> }
> EXPORT_SYMBOL_GPL(kvmppc_xive_pull_vcpu);
>
> -void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
> +bool kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
> {
> void __iomem *esc_vaddr = (void __iomem *)vcpu->arch.xive_esc_vaddr;
> + bool ret = true;
>
> if (!esc_vaddr)
> - return;
> + return ret;
>
> /* we are using XIVE with single escalation */
>
> @@ -197,7 +198,7 @@ void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
> * we also don't want to set xive_esc_on to 1 here in
> * case we race with xive_esc_irq().
> */
> - vcpu->arch.ceded = 0;
> + ret = false;
> /*
> * The escalation interrupts are special as we don't EOI them.
> * There is no need to use the load-after-store ordering offset
> @@ -210,6 +211,8 @@ void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
> __raw_readq(esc_vaddr + XIVE_ESB_SET_PQ_00);
> }
> mb();
> +
> + return ret;
> }
> EXPORT_SYMBOL_GPL(kvmppc_xive_rearm_escalation);
>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH 3/6] KVM: PPC: Book3S HV P9: Move cede logic out of XIVE escalation rearming
2022-03-03 5:33 ` [PATCH 3/6] KVM: PPC: Book3S HV P9: Move cede logic out of XIVE escalation rearming Nicholas Piggin
2022-03-09 13:55 ` Cédric Le Goater
@ 2022-03-09 14:41 ` Fabiano Rosas
1 sibling, 0 replies; 14+ messages in thread
From: Fabiano Rosas @ 2022-03-09 14:41 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Cédric Le Goater, Nicholas Piggin
Nicholas Piggin <npiggin@gmail.com> writes:
> Move the cede abort logic out of xive escalation rearming and into
> the caller to prepare for handling a similar case with nested guest
> entry.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> arch/powerpc/include/asm/kvm_ppc.h | 4 ++--
> arch/powerpc/kvm/book3s_hv.c | 10 ++++++++--
> arch/powerpc/kvm/book3s_xive.c | 9 ++++++---
> 3 files changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index a14dbcd1b8ce..94fa5f246657 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -671,7 +671,7 @@ extern int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq,
> int level, bool line_status);
> extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
> extern void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu);
> -extern void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu);
> +extern bool kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu);
>
> static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
> {
> @@ -709,7 +709,7 @@ static inline int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 ir
> int level, bool line_status) { return -ENODEV; }
> static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
> static inline void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu) { }
> -static inline void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu) { }
> +static inline bool kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu) { return true; }
>
> static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
> { return 0; }
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 5df359053147..a0b674d3a2da 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4073,10 +4073,16 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
> !(vcpu->arch.shregs.msr & MSR_PR)) {
> unsigned long req = kvmppc_get_gpr(vcpu, 3);
>
> - /* H_CEDE has to be handled now, not later */
> + /* H_CEDE has to be handled now */
> if (req == H_CEDE) {
> kvmppc_cede(vcpu);
> - kvmppc_xive_rearm_escalation(vcpu); /* may un-cede */
> + if (!kvmppc_xive_rearm_escalation(vcpu)) {
> + /*
> + * Pending escalation so abort
> + * the cede.
> + */
> + vcpu->arch.ceded = 0;
This was moved after the mb() in kvmppc_xive_rearm_escalation, so I
think a concurrent H_PROD might continue to see tvcpu->arch.ceded = 1
after the escalation has been set. Is this an issue?
> + }
> kvmppc_set_gpr(vcpu, 3, 0);
> trap = 0;
>
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index e216c068075d..7b513e14cada 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -179,12 +179,13 @@ void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu)
> }
> EXPORT_SYMBOL_GPL(kvmppc_xive_pull_vcpu);
>
> -void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
> +bool kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
> {
> void __iomem *esc_vaddr = (void __iomem *)vcpu->arch.xive_esc_vaddr;
> + bool ret = true;
>
> if (!esc_vaddr)
> - return;
> + return ret;
>
> /* we are using XIVE with single escalation */
>
> @@ -197,7 +198,7 @@ void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
> * we also don't want to set xive_esc_on to 1 here in
> * case we race with xive_esc_irq().
> */
> - vcpu->arch.ceded = 0;
> + ret = false;
> /*
> * The escalation interrupts are special as we don't EOI them.
> * There is no need to use the load-after-store ordering offset
> @@ -210,6 +211,8 @@ void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
> __raw_readq(esc_vaddr + XIVE_ESB_SET_PQ_00);
> }
> mb();
> +
> + return ret;
> }
> EXPORT_SYMBOL_GPL(kvmppc_xive_rearm_escalation);
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 4/6] KVM: PPC: Book3S HV P9: Split !nested case out from guest entry
2022-03-03 5:33 [PATCH 0/6] KVM: PPC: Book3S HV interrupt fixes Nicholas Piggin
` (2 preceding siblings ...)
2022-03-03 5:33 ` [PATCH 3/6] KVM: PPC: Book3S HV P9: Move cede logic out of XIVE escalation rearming Nicholas Piggin
@ 2022-03-03 5:33 ` Nicholas Piggin
2022-03-09 17:17 ` Fabiano Rosas
2022-03-03 5:33 ` [PATCH 5/6] KVM: PPC: Book3S HV Nested: L2 must not run with L1 xive context Nicholas Piggin
` (2 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Nicholas Piggin @ 2022-03-03 5:33 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Cédric Le Goater, Nicholas Piggin
The differences between nested and !nested will become larger in
later changes so split them out for readability.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kvm/book3s_hv.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a0b674d3a2da..0289d076c0a8 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4034,6 +4034,8 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu *vcpu, u64 time_limit, uns
static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
unsigned long lpcr, u64 *tb)
{
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_nested_guest *nested = vcpu->arch.nested;
u64 next_timer;
int trap;
@@ -4053,23 +4055,30 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
trap = kvmhv_vcpu_entry_p9_nested(vcpu, time_limit, lpcr, tb);
/* H_CEDE has to be handled now, not later */
- if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested &&
+ if (trap == BOOK3S_INTERRUPT_SYSCALL && !nested &&
kvmppc_get_gpr(vcpu, 3) == H_CEDE) {
kvmppc_cede(vcpu);
kvmppc_set_gpr(vcpu, 3, 0);
trap = 0;
}
- } else {
- struct kvm *kvm = vcpu->kvm;
+ } else if (nested) {
+ kvmppc_xive_push_vcpu(vcpu);
+ __this_cpu_write(cpu_in_guest, kvm);
+ trap = kvmhv_vcpu_entry_p9(vcpu, time_limit, lpcr, tb);
+ __this_cpu_write(cpu_in_guest, NULL);
+
+ kvmppc_xive_pull_vcpu(vcpu);
+
+ } else {
kvmppc_xive_push_vcpu(vcpu);
__this_cpu_write(cpu_in_guest, kvm);
trap = kvmhv_vcpu_entry_p9(vcpu, time_limit, lpcr, tb);
__this_cpu_write(cpu_in_guest, NULL);
- if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested &&
+ if (trap == BOOK3S_INTERRUPT_SYSCALL &&
!(vcpu->arch.shregs.msr & MSR_PR)) {
unsigned long req = kvmppc_get_gpr(vcpu, 3);
--
2.23.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 5/6] KVM: PPC: Book3S HV Nested: L2 must not run with L1 xive context
2022-03-03 5:33 [PATCH 0/6] KVM: PPC: Book3S HV interrupt fixes Nicholas Piggin
` (3 preceding siblings ...)
2022-03-03 5:33 ` [PATCH 4/6] KVM: PPC: Book3S HV P9: Split !nested case out from guest entry Nicholas Piggin
@ 2022-03-03 5:33 ` Nicholas Piggin
2022-03-03 5:33 ` [PATCH 6/6] KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting Nicholas Piggin
2022-05-24 10:51 ` [PATCH 0/6] KVM: PPC: Book3S HV interrupt fixes Michael Ellerman
6 siblings, 0 replies; 14+ messages in thread
From: Nicholas Piggin @ 2022-03-03 5:33 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Cédric Le Goater, Nicholas Piggin
The PowerNV L0 currently pushes the OS xive context when running a vCPU,
regardless of whether it is running a nested guest. The problem is that
xive OS ring interrupts will be delivered while the L2 is running.
At the moment, by default, the L2 guest runs with LPCR[LPES]=0, which
actually makes external interrupts go to the L0. That causes the L2 to
exit and the interrupt taken or injected into the L1, so in some
respects this behaves like an escalation. It's not clear if this was
deliberate or not, there's no comment about it and the L1 is actually
allowed to clear LPES in the L2, so it's confusing at best.
When the L2 is running, the L1 is essentially in a ceded state with
respect to external interrupts (it can't respond to them directly and
won't get scheduled again absent some additional event). So the natural
way to solve this is when the L0 handles a H_ENTER_NESTED hypercall to
run the L2, have it arm the escalation interrupt and don't push the L1
context while running the L2.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kvm/book3s_hv.c | 26 ++++++++++++++++++++------
arch/powerpc/kvm/book3s_xive.c | 2 +-
2 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 0289d076c0a8..77315c2c3f43 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4063,14 +4063,10 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
}
} else if (nested) {
- kvmppc_xive_push_vcpu(vcpu);
-
__this_cpu_write(cpu_in_guest, kvm);
trap = kvmhv_vcpu_entry_p9(vcpu, time_limit, lpcr, tb);
__this_cpu_write(cpu_in_guest, NULL);
- kvmppc_xive_pull_vcpu(vcpu);
-
} else {
kvmppc_xive_push_vcpu(vcpu);
@@ -4082,8 +4078,13 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
!(vcpu->arch.shregs.msr & MSR_PR)) {
unsigned long req = kvmppc_get_gpr(vcpu, 3);
- /* H_CEDE has to be handled now */
+ /*
+ * XIVE rearm and XICS hcalls must be handled
+ * before xive context is pulled (is this
+ * true?)
+ */
if (req == H_CEDE) {
+ /* H_CEDE has to be handled now */
kvmppc_cede(vcpu);
if (!kvmppc_xive_rearm_escalation(vcpu)) {
/*
@@ -4095,7 +4096,20 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
kvmppc_set_gpr(vcpu, 3, 0);
trap = 0;
- /* XICS hcalls must be handled before xive is pulled */
+ } else if (req == H_ENTER_NESTED) {
+ /*
+ * L2 should not run with the L1
+ * context so rearm and pull it.
+ */
+ if (!kvmppc_xive_rearm_escalation(vcpu)) {
+ /*
+ * Pending escalation so abort
+ * H_ENTER_NESTED.
+ */
+ kvmppc_set_gpr(vcpu, 3, 0);
+ trap = 0;
+ }
+
} else if (hcall_is_xics(req)) {
int ret;
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 7b513e14cada..e44e251509fe 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -241,7 +241,7 @@ static irqreturn_t xive_esc_irq(int irq, void *data)
vcpu->arch.irq_pending = 1;
smp_mb();
- if (vcpu->arch.ceded)
+ if (vcpu->arch.ceded || vcpu->arch.nested)
kvmppc_fast_vcpu_kick(vcpu);
/* Since we have the no-EOI flag, the interrupt is effectively
--
2.23.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 6/6] KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting
2022-03-03 5:33 [PATCH 0/6] KVM: PPC: Book3S HV interrupt fixes Nicholas Piggin
` (4 preceding siblings ...)
2022-03-03 5:33 ` [PATCH 5/6] KVM: PPC: Book3S HV Nested: L2 must not run with L1 xive context Nicholas Piggin
@ 2022-03-03 5:33 ` Nicholas Piggin
2022-03-09 19:49 ` Fabiano Rosas
2022-05-24 10:51 ` [PATCH 0/6] KVM: PPC: Book3S HV interrupt fixes Michael Ellerman
6 siblings, 1 reply; 14+ messages in thread
From: Nicholas Piggin @ 2022-03-03 5:33 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Cédric Le Goater, Nicholas Piggin
The L1 should not be able to adjust LPES mode for the L2. Setting LPES
if the L0 needs it clear would cause external interrupts to be sent to
L2 and missed by the L0.
Clearing LPES when it may be set, as typically happens with XIVE enabled
could cause a performance issue despite having no native XIVE support in
the guest, because it will cause mediated interrupts for the L2 to be
taken in HV mode, which then have to be injected.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/kvm/book3s_hv.c | 4 ++++
arch/powerpc/kvm/book3s_hv_nested.c | 3 +--
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 77315c2c3f43..acba9cb241c9 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5323,6 +5323,10 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
kvm->arch.host_lpcr = lpcr = mfspr(SPRN_LPCR);
lpcr &= LPCR_PECE | LPCR_LPES;
} else {
+ /*
+ * The L2 LPES mode will be set by the L0 according to whether
+ * or not it needs to take external interrupts in HV mode.
+ */
lpcr = 0;
}
lpcr |= (4UL << LPCR_DPFD_SH) | LPCR_HDICE |
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 9d373f8963ee..58e05a9122ac 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -261,8 +261,7 @@ static void load_l2_hv_regs(struct kvm_vcpu *vcpu,
/*
* Don't let L1 change LPCR bits for the L2 except these:
*/
- mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD |
- LPCR_LPES | LPCR_MER;
+ mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD | LPCR_MER;
/*
* Additional filtering is required depending on hardware
--
2.23.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 6/6] KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting
2022-03-03 5:33 ` [PATCH 6/6] KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting Nicholas Piggin
@ 2022-03-09 19:49 ` Fabiano Rosas
0 siblings, 0 replies; 14+ messages in thread
From: Fabiano Rosas @ 2022-03-09 19:49 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Cédric Le Goater, Nicholas Piggin
Nicholas Piggin <npiggin@gmail.com> writes:
> The L1 should not be able to adjust LPES mode for the L2. Setting LPES
> if the L0 needs it clear would cause external interrupts to be sent to
> L2 and missed by the L0.
>
> Clearing LPES when it may be set, as typically happens with XIVE enabled
> could cause a performance issue despite having no native XIVE support in
> the guest, because it will cause mediated interrupts for the L2 to be
> taken in HV mode, which then have to be injected.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/6] KVM: PPC: Book3S HV interrupt fixes
2022-03-03 5:33 [PATCH 0/6] KVM: PPC: Book3S HV interrupt fixes Nicholas Piggin
` (5 preceding siblings ...)
2022-03-03 5:33 ` [PATCH 6/6] KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting Nicholas Piggin
@ 2022-05-24 10:51 ` Michael Ellerman
6 siblings, 0 replies; 14+ messages in thread
From: Michael Ellerman @ 2022-05-24 10:51 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Cédric Le Goater
On Thu, 3 Mar 2022 15:33:09 +1000, Nicholas Piggin wrote:
> This series fixes up a bunch of little interrupt issues which were found
> by inspection haven't seem to have caused big problems but possibly
> could or could cause the occasional latency spike from a temporarily lost
> interrupt.
>
> The big thing is the xive context change. Currently we run an L2 with
> its L1's xive OS context pushed. I'm proposing that we instead treat
> that as an escalation similar to cede.
>
> [...]
Patches 2-6 applied to powerpc/topic/ppc-kvm.
[2/6] KVM: PPC: Book3S HV P9: Inject pending xive interrupts at guest entry
https://git.kernel.org/powerpc/c/026728dc5d41f830e8194fe01e432dd4eb9b3d9a
[3/6] KVM: PPC: Book3S HV P9: Move cede logic out of XIVE escalation rearming
https://git.kernel.org/powerpc/c/ad5ace91c55e7bd16813617f67bcb7619d51a295
[4/6] KVM: PPC: Book3S HV P9: Split !nested case out from guest entry
https://git.kernel.org/powerpc/c/42b4a2b347b09e7ee4c86f7121e3b45214b63e69
[5/6] KVM: PPC: Book3S HV Nested: L2 must not run with L1 xive context
https://git.kernel.org/powerpc/c/11681b79b1ab52e7625844d7ce52c4d5201a43b2
[6/6] KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting
https://git.kernel.org/powerpc/c/2852ebfa10afdcefff35ec72c8da97141df9845c
cheers
^ permalink raw reply [flat|nested] 14+ messages in thread