* [PATCH 01/13] powerpc: Add simple cache inhibited MMIO accessors
2016-08-19 5:35 [PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM Paul Mackerras
@ 2016-08-19 5:35 ` Paul Mackerras
2016-08-19 5:35 ` [PATCH 02/13] KVM: PPC: Book3S HV: Convert kvmppc_read_intr to a C function Paul Mackerras
` (12 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Paul Mackerras @ 2016-08-19 5:35 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
From: Suresh Warrier <warrier@linux.vnet.ibm.com>
Add simple cache inhibited accessors for memory mapped I/O.
Unlike the accessors built from the DEF_MMIO_* macros, these
don't include any hardware memory barriers, callers need to
manage memory barriers on their own. These can only be called
in hypervisor real mode.
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
[paulus@ozlabs.org - added line to comment]
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/io.h | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 2fd1690..f6fda84 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -241,6 +241,35 @@ static inline void out_be64(volatile u64 __iomem *addr, u64 val)
#endif
#endif /* __powerpc64__ */
+
+/*
+ * Simple Cache inhibited accessors
+ * Unlike the DEF_MMIO_* macros, these don't include any h/w memory
+ * barriers, callers need to manage memory barriers on their own.
+ * These can only be used in hypervisor real mode.
+ */
+
+static inline u32 _lwzcix(unsigned long addr)
+{
+ u32 ret;
+
+ __asm__ __volatile__("lwzcix %0,0, %1"
+ : "=r" (ret) : "r" (addr) : "memory");
+ return ret;
+}
+
+static inline void _stbcix(u64 addr, u8 val)
+{
+ __asm__ __volatile__("stbcix %0,0,%1"
+ : : "r" (val), "r" (addr) : "memory");
+}
+
+static inline void _stwcix(u64 addr, u32 val)
+{
+ __asm__ __volatile__("stwcix %0,0,%1"
+ : : "r" (val), "r" (addr) : "memory");
+}
+
/*
* Low level IO stream instructions are defined out of line for now
*/
--
2.8.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 02/13] KVM: PPC: Book3S HV: Convert kvmppc_read_intr to a C function
2016-08-19 5:35 [PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM Paul Mackerras
2016-08-19 5:35 ` [PATCH 01/13] powerpc: Add simple cache inhibited MMIO accessors Paul Mackerras
@ 2016-08-19 5:35 ` Paul Mackerras
2016-08-19 5:35 ` [PATCH 03/13] KVM: PPC: select IRQ_BYPASS_MANAGER Paul Mackerras
` (11 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Paul Mackerras @ 2016-08-19 5:35 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
From: Suresh Warrier <warrier@linux.vnet.ibm.com>
Modify kvmppc_read_intr to make it a C function. Because it is called
from kvmppc_check_wake_reason, any of the assembler code that calls
either kvmppc_read_intr or kvmppc_check_wake_reason now has to assume
that the volatile registers might have been modified.
This also adds in the optimization of clearing saved_xirr in the case
where we completely handle and EOI an IPI. Without this, the next
device interrupt will require two trips through the host interrupt
handling code.
[paulus@ozlabs.org - made kvmppc_check_wake_reason create a stack frame
when it is calling kvmppc_read_intr, which means we can set r12 to
the trap number (0x500) after the call to kvmppc_read_intr, instead
of using r31. Also moved the deliver_guest_interrupt label so as to
restore XER and CTR, plus other minor tweaks.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv_builtin.c | 84 +++++++++++++++++
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 158 +++++++++++++++-----------------
2 files changed, 158 insertions(+), 84 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 5f0380d..b476a6a 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -25,6 +25,7 @@
#include <asm/xics.h>
#include <asm/dbell.h>
#include <asm/cputhreads.h>
+#include <asm/io.h>
#define KVM_CMA_CHUNK_ORDER 18
@@ -286,3 +287,86 @@ void kvmhv_commence_exit(int trap)
struct kvmppc_host_rm_ops *kvmppc_host_rm_ops_hv;
EXPORT_SYMBOL_GPL(kvmppc_host_rm_ops_hv);
+
+/*
+ * Determine what sort of external interrupt is pending (if any).
+ * Returns:
+ * 0 if no interrupt is pending
+ * 1 if an interrupt is pending that needs to be handled by the host
+ * -1 if there was a guest wakeup IPI (which has now been cleared)
+ */
+
+long kvmppc_read_intr(void)
+{
+ unsigned long xics_phys;
+ u32 h_xirr;
+ __be32 xirr;
+ u32 xisr;
+ u8 host_ipi;
+
+ /* see if a host IPI is pending */
+ host_ipi = local_paca->kvm_hstate.host_ipi;
+ if (host_ipi)
+ return 1;
+
+ /* Now read the interrupt from the ICP */
+ xics_phys = local_paca->kvm_hstate.xics_phys;
+ if (unlikely(!xics_phys))
+ return 1;
+
+ /*
+ * Save XIRR for later. Since we get control in reverse endian
+ * on LE systems, save it byte reversed and fetch it back in
+ * host endian. Note that xirr is the value read from the
+ * XIRR register, while h_xirr is the host endian version.
+ */
+ xirr = _lwzcix(xics_phys + XICS_XIRR);
+ h_xirr = be32_to_cpu(xirr);
+ local_paca->kvm_hstate.saved_xirr = h_xirr;
+ xisr = h_xirr & 0xffffff;
+ /*
+ * Ensure that the store/load complete to guarantee all side
+ * effects of loading from XIRR has completed
+ */
+ smp_mb();
+
+ /* if nothing pending in the ICP */
+ if (!xisr)
+ return 0;
+
+ /* We found something in the ICP...
+ *
+ * If it is an IPI, clear the MFRR and EOI it.
+ */
+ if (xisr == XICS_IPI) {
+ _stbcix(xics_phys + XICS_MFRR, 0xff);
+ _stwcix(xics_phys + XICS_XIRR, xirr);
+ /*
+ * Need to ensure side effects of above stores
+ * complete before proceeding.
+ */
+ smp_mb();
+
+ /*
+ * We need to re-check host IPI now in case it got set in the
+ * meantime. If it's clear, we bounce the interrupt to the
+ * guest
+ */
+ host_ipi = local_paca->kvm_hstate.host_ipi;
+ if (unlikely(host_ipi != 0)) {
+ /* We raced with the host,
+ * we need to resend that IPI, bummer
+ */
+ _stbcix(xics_phys + XICS_MFRR, IPI_PRIORITY);
+ /* Let side effects complete */
+ smp_mb();
+ return 1;
+ }
+
+ /* OK, it's an IPI for us */
+ local_paca->kvm_hstate.saved_xirr = 0;
+ return -1;
+ }
+
+ return 1;
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 9756555..dccfa85 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -221,6 +221,13 @@ kvmppc_primary_no_guest:
li r3, 0 /* Don't wake on privileged (OS) doorbell */
b kvm_do_nap
+/*
+ * kvm_novcpu_wakeup
+ * Entered from kvm_start_guest if kvm_hstate.napping is set
+ * to NAPPING_NOVCPU
+ * r2 = kernel TOC
+ * r13 = paca
+ */
kvm_novcpu_wakeup:
ld r1, HSTATE_HOST_R1(r13)
ld r5, HSTATE_KVM_VCORE(r13)
@@ -230,6 +237,13 @@ kvm_novcpu_wakeup:
/* check the wake reason */
bl kvmppc_check_wake_reason
+ /*
+ * Restore volatile registers since we could have called
+ * a C routine in kvmppc_check_wake_reason.
+ * r5 = VCORE
+ */
+ ld r5, HSTATE_KVM_VCORE(r13)
+
/* see if any other thread is already exiting */
lwz r0, VCORE_ENTRY_EXIT(r5)
cmpwi r0, 0x100
@@ -322,6 +336,11 @@ kvm_start_guest:
/* Check the wake reason in SRR1 to see why we got here */
bl kvmppc_check_wake_reason
+ /*
+ * kvmppc_check_wake_reason could invoke a C routine, but we
+ * have no volatile registers to restore when we return.
+ */
+
cmpdi r3, 0
bge kvm_no_guest
@@ -881,6 +900,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
cmpwi r3, 512 /* 1 microsecond */
blt hdec_soon
+deliver_guest_interrupt:
ld r6, VCPU_CTR(r4)
ld r7, VCPU_XER(r4)
@@ -895,7 +915,6 @@ kvmppc_cede_reentry: /* r4 = vcpu, r13 = paca */
mtspr SPRN_SRR0, r6
mtspr SPRN_SRR1, r7
-deliver_guest_interrupt:
/* r11 = vcpu->arch.msr & ~MSR_HV */
rldicl r11, r11, 63 - MSR_HV_LG, 1
rotldi r11, r11, 1 + MSR_HV_LG
@@ -1155,10 +1174,36 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
* set, we know the host wants us out so let's do it now
*/
bl kvmppc_read_intr
+
+ /*
+ * Restore the active volatile registers after returning from
+ * a C function.
+ */
+ ld r9, HSTATE_KVM_VCPU(r13)
+ li r12, BOOK3S_INTERRUPT_EXTERNAL
+
+ /*
+ * kvmppc_read_intr return codes:
+ *
+ * Exit to host (r3 > 0)
+ * 1 An interrupt is pending that needs to be handled by the host
+ * Exit guest and return to host by branching to guest_exit_cont
+ *
+ * Before returning to guest, we check if any CPU is heading out
+ * to the host and if so, we head out also. If no CPUs are heading
+ * check return values <= 0.
+ *
+ * Return to guest (r3 <= 0)
+ * 0 No external interrupt is pending
+ * -1 A guest wakeup IPI (which has now been cleared)
+ * In either case, we return to guest to deliver any pending
+ * guest interrupts.
+ */
+
cmpdi r3, 0
bgt guest_exit_cont
- /* Check if any CPU is heading out to the host, if so head out too */
+ /* Return code <= 0 */
4: ld r5, HSTATE_KVM_VCORE(r13)
lwz r0, VCORE_ENTRY_EXIT(r5)
cmpwi r0, 0x100
@@ -2213,10 +2258,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
ld r29, VCPU_GPR(R29)(r4)
ld r30, VCPU_GPR(R30)(r4)
ld r31, VCPU_GPR(R31)(r4)
-
+
/* Check the wake reason in SRR1 to see why we got here */
bl kvmppc_check_wake_reason
+ /*
+ * Restore volatile registers since we could have called a
+ * C routine in kvmppc_check_wake_reason
+ * r4 = VCPU
+ * r3 tells us whether we need to return to host or not
+ * WARNING: it gets checked further down:
+ * should not modify r3 until this check is done.
+ */
+ ld r4, HSTATE_KVM_VCPU(r13)
+
/* clear our bit in vcore->napping_threads */
34: ld r5,HSTATE_KVM_VCORE(r13)
lbz r7,HSTATE_PTID(r13)
@@ -2230,7 +2285,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
li r0,0
stb r0,HSTATE_NAPPING(r13)
- /* See if the wake reason means we need to exit */
+ /* See if the wake reason saved in r3 means we need to exit */
stw r12, VCPU_TRAP(r4)
mr r9, r4
cmpdi r3, 0
@@ -2300,7 +2355,9 @@ machine_check_realmode:
*
* Also sets r12 to the interrupt vector for any interrupt that needs
* to be handled now by the host (0x500 for external interrupt), or zero.
- * Modifies r0, r6, r7, r8.
+ * Modifies all volatile registers (since it may call a C function).
+ * This routine calls kvmppc_read_intr, a C function, if an external
+ * interrupt is pending.
*/
kvmppc_check_wake_reason:
mfspr r6, SPRN_SRR1
@@ -2310,8 +2367,7 @@ FTR_SECTION_ELSE
rlwinm r6, r6, 45-31, 0xe /* P7 wake reason field is 3 bits */
ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_207S)
cmpwi r6, 8 /* was it an external interrupt? */
- li r12, BOOK3S_INTERRUPT_EXTERNAL
- beq kvmppc_read_intr /* if so, see what it was */
+ beq 7f /* if so, see what it was */
li r3, 0
li r12, 0
cmpwi r6, 6 /* was it the decrementer? */
@@ -2350,83 +2406,17 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
li r3, 1
blr
-/*
- * Determine what sort of external interrupt is pending (if any).
- * Returns:
- * 0 if no interrupt is pending
- * 1 if an interrupt is pending that needs to be handled by the host
- * -1 if there was a guest wakeup IPI (which has now been cleared)
- * Modifies r0, r6, r7, r8, returns value in r3.
- */
-kvmppc_read_intr:
- /* see if a host IPI is pending */
- li r3, 1
- lbz r0, HSTATE_HOST_IPI(r13)
- cmpwi r0, 0
- bne 1f
-
- /* Now read the interrupt from the ICP */
- ld r6, HSTATE_XICS_PHYS(r13)
- li r7, XICS_XIRR
- cmpdi r6, 0
- beq- 1f
- lwzcix r0, r6, r7
- /*
- * Save XIRR for later. Since we get in in reverse endian on LE
- * systems, save it byte reversed and fetch it back in host endian.
- */
- li r3, HSTATE_SAVED_XIRR
- STWX_BE r0, r3, r13
-#ifdef __LITTLE_ENDIAN__
- lwz r3, HSTATE_SAVED_XIRR(r13)
-#else
- mr r3, r0
-#endif
- rlwinm. r3, r3, 0, 0xffffff
- sync
- beq 1f /* if nothing pending in the ICP */
-
- /* We found something in the ICP...
- *
- * If it's not an IPI, stash it in the PACA and return to
- * the host, we don't (yet) handle directing real external
- * interrupts directly to the guest
- */
- cmpwi r3, XICS_IPI /* if there is, is it an IPI? */
- bne 42f
-
- /* It's an IPI, clear the MFRR and EOI it */
- li r3, 0xff
- li r8, XICS_MFRR
- stbcix r3, r6, r8 /* clear the IPI */
- stwcix r0, r6, r7 /* EOI it */
- sync
-
- /* We need to re-check host IPI now in case it got set in the
- * meantime. If it's clear, we bounce the interrupt to the
- * guest
- */
- lbz r0, HSTATE_HOST_IPI(r13)
- cmpwi r0, 0
- bne- 43f
-
- /* OK, it's an IPI for us */
- li r12, 0
- li r3, -1
-1: blr
-
-42: /* It's not an IPI and it's for the host. We saved a copy of XIRR in
- * the PACA earlier, it will be picked up by the host ICP driver
- */
- li r3, 1
- b 1b
-
-43: /* We raced with the host, we need to resend that IPI, bummer */
- li r0, IPI_PRIORITY
- stbcix r0, r6, r8 /* set the IPI */
- sync
- li r3, 1
- b 1b
+ /* external interrupt - create a stack frame so we can call C */
+7: mflr r0
+ std r0, PPC_LR_STKOFF(r1)
+ stdu r1, -PPC_MIN_STKFRM(r1)
+ bl kvmppc_read_intr
+ nop
+ li r12, BOOK3S_INTERRUPT_EXTERNAL
+ ld r0, PPC_MIN_STKFRM+PPC_LR_STKOFF(r1)
+ addi r1, r1, PPC_MIN_STKFRM
+ mtlr r0
+ blr
/*
* Save away FP, VMX and VSX registers.
--
2.8.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 03/13] KVM: PPC: select IRQ_BYPASS_MANAGER
2016-08-19 5:35 [PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM Paul Mackerras
2016-08-19 5:35 ` [PATCH 01/13] powerpc: Add simple cache inhibited MMIO accessors Paul Mackerras
2016-08-19 5:35 ` [PATCH 02/13] KVM: PPC: Book3S HV: Convert kvmppc_read_intr to a C function Paul Mackerras
@ 2016-08-19 5:35 ` Paul Mackerras
2016-08-19 5:35 ` [PATCH 04/13] KVM: PPC: Book3S HV: Introduce kvmppc_passthru_irqmap Paul Mackerras
` (10 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Paul Mackerras @ 2016-08-19 5:35 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
From: Suresh Warrier <warrier@linux.vnet.ibm.com>
Select IRQ_BYPASS_MANAGER for PPC when CONFIG_KVM is set.
Add the PPC producer functions for add and del producer.
[paulus@ozlabs.org - Moved new functions from book3s.c to powerpc.c
so booke compiles; added kvm_arch_has_irq_bypass implementation.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_ppc.h | 4 ++++
arch/powerpc/kvm/Kconfig | 2 ++
arch/powerpc/kvm/powerpc.c | 38 ++++++++++++++++++++++++++++++++++++++
3 files changed, 44 insertions(+)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2544eda..94715e2 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -287,6 +287,10 @@ struct kvmppc_ops {
long (*arch_vm_ioctl)(struct file *filp, unsigned int ioctl,
unsigned long arg);
int (*hcall_implemented)(unsigned long hcall);
+ int (*irq_bypass_add_producer)(struct irq_bypass_consumer *,
+ struct irq_bypass_producer *);
+ void (*irq_bypass_del_producer)(struct irq_bypass_consumer *,
+ struct irq_bypass_producer *);
};
extern struct kvmppc_ops *kvmppc_hv_ops;
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index c2024ac..7ac0569 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -22,6 +22,8 @@ config KVM
select ANON_INODES
select HAVE_KVM_EVENTFD
select SRCU
+ select IRQ_BYPASS_MANAGER
+ select HAVE_KVM_IRQ_BYPASS
config KVM_BOOK3S_HANDLER
bool
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6ce40dd..6d51e0f 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -27,6 +27,8 @@
#include <linux/slab.h>
#include <linux/file.h>
#include <linux/module.h>
+#include <linux/irqbypass.h>
+#include <linux/kvm_irqfd.h>
#include <asm/cputable.h>
#include <asm/uaccess.h>
#include <asm/kvm_ppc.h>
@@ -739,6 +741,42 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
#endif
}
+/*
+ * irq_bypass_add_producer and irq_bypass_del_producer are only
+ * useful if the architecture supports PCI passthrough.
+ * irq_bypass_stop and irq_bypass_start are not needed and so
+ * kvm_ops are not defined for them.
+ */
+bool kvm_arch_has_irq_bypass(void)
+{
+ return ((kvmppc_hv_ops && kvmppc_hv_ops->irq_bypass_add_producer) ||
+ (kvmppc_pr_ops && kvmppc_pr_ops->irq_bypass_add_producer));
+}
+
+int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons,
+ struct irq_bypass_producer *prod)
+{
+ struct kvm_kernel_irqfd *irqfd =
+ container_of(cons, struct kvm_kernel_irqfd, consumer);
+ struct kvm *kvm = irqfd->kvm;
+
+ if (kvm->arch.kvm_ops->irq_bypass_add_producer)
+ return kvm->arch.kvm_ops->irq_bypass_add_producer(cons, prod);
+
+ return 0;
+}
+
+void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
+ struct irq_bypass_producer *prod)
+{
+ struct kvm_kernel_irqfd *irqfd =
+ container_of(cons, struct kvm_kernel_irqfd, consumer);
+ struct kvm *kvm = irqfd->kvm;
+
+ if (kvm->arch.kvm_ops->irq_bypass_del_producer)
+ kvm->arch.kvm_ops->irq_bypass_del_producer(cons, prod);
+}
+
static void kvmppc_complete_mmio_load(struct kvm_vcpu *vcpu,
struct kvm_run *run)
{
--
2.8.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 04/13] KVM: PPC: Book3S HV: Introduce kvmppc_passthru_irqmap
2016-08-19 5:35 [PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM Paul Mackerras
` (2 preceding siblings ...)
2016-08-19 5:35 ` [PATCH 03/13] KVM: PPC: select IRQ_BYPASS_MANAGER Paul Mackerras
@ 2016-08-19 5:35 ` Paul Mackerras
2016-08-19 5:35 ` [PATCH 05/13] powerpc/powernv: Provide facilities for EOI, usable from real mode Paul Mackerras
` (9 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Paul Mackerras @ 2016-08-19 5:35 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
From: Suresh Warrier <warrier@linux.vnet.ibm.com>
This patch introduces an IRQ mapping structure, the
kvmppc_passthru_irqmap structure that is to be used
to map the real hardware IRQ in the host with the virtual
hardware IRQ (gsi) that is injected into a guest by KVM for
passthrough adapters.
Currently, we assume a separate IRQ mapping structure for
each guest. Each kvmppc_passthru_irqmap has a mapping arrays,
containing all defined real<->virtual IRQs.
[paulus@ozlabs.org - removed irq_chip field from struct
kvmppc_passthru_irqmap; changed parameter for
kvmppc_get_passthru_irqmap from struct kvm_vcpu * to struct
kvm *, removed small cached array.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_host.h | 17 +++++++++++++++++
arch/powerpc/include/asm/kvm_ppc.h | 14 ++++++++++++++
arch/powerpc/kvm/book3s_hv.c | 13 +++++++++++++
3 files changed, 44 insertions(+)
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index ec35af3..3eb5092 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -197,6 +197,8 @@ struct kvmppc_spapr_tce_table {
struct kvmppc_xics;
struct kvmppc_icp;
+struct kvmppc_passthru_irqmap;
+
/*
* The reverse mapping array has one entry for each HPTE,
* which stores the guest's view of the second word of the HPTE
@@ -267,6 +269,7 @@ struct kvm_arch {
#endif
#ifdef CONFIG_KVM_XICS
struct kvmppc_xics *xics;
+ struct kvmppc_passthru_irqmap *pimap;
#endif
struct kvmppc_ops *kvm_ops;
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
@@ -397,6 +400,20 @@ struct kvmhv_tb_accumulator {
u64 tb_max; /* max time */
};
+#ifdef CONFIG_PPC_BOOK3S_64
+struct kvmppc_irq_map {
+ u32 r_hwirq;
+ u32 v_hwirq;
+ struct irq_desc *desc;
+};
+
+#define KVMPPC_PIRQ_MAPPED 1024
+struct kvmppc_passthru_irqmap {
+ int n_mapped;
+ struct kvmppc_irq_map mapped[KVMPPC_PIRQ_MAPPED];
+};
+#endif
+
# ifdef CONFIG_PPC_FSL_BOOK3E
#define KVMPPC_BOOKE_IAC_NUM 2
#define KVMPPC_BOOKE_DAC_NUM 2
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 94715e2..4ca2ba3 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -457,8 +457,18 @@ static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu)
{
return vcpu->arch.irq_type == KVMPPC_IRQ_XICS;
}
+
+static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap(
+ struct kvm *kvm)
+{
+ if (kvm)
+ return kvm->arch.pimap;
+ return NULL;
+}
+
extern void kvmppc_alloc_host_rm_ops(void);
extern void kvmppc_free_host_rm_ops(void);
+extern void kvmppc_free_pimap(struct kvm *kvm);
extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu);
extern int kvmppc_xics_create_icp(struct kvm_vcpu *vcpu, unsigned long server);
extern int kvm_vm_ioctl_xics_irq(struct kvm *kvm, struct kvm_irq_level *args);
@@ -470,8 +480,12 @@ extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
extern void kvmppc_xics_ipi_action(void);
extern int h_ipi_redirect;
#else
+static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap(
+ struct kvm *kvm)
+ { return NULL; }
static inline void kvmppc_alloc_host_rm_ops(void) {};
static inline void kvmppc_free_host_rm_ops(void) {};
+static inline void kvmppc_free_pimap(struct kvm *kvm) {};
static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu)
{ return 0; }
static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2fd5580..413b5c2f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3282,6 +3282,19 @@ static int kvmppc_core_check_processor_compat_hv(void)
return 0;
}
+#ifdef CONFIG_KVM_XICS
+
+void kvmppc_free_pimap(struct kvm *kvm)
+{
+ kfree(kvm->arch.pimap);
+}
+
+struct kvmppc_passthru_irqmap *kvmppc_alloc_pimap(void)
+{
+ return kzalloc(sizeof(struct kvmppc_passthru_irqmap), GFP_KERNEL);
+}
+#endif
+
static long kvm_arch_vm_ioctl_hv(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
--
2.8.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 05/13] powerpc/powernv: Provide facilities for EOI, usable from real mode
2016-08-19 5:35 [PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM Paul Mackerras
` (3 preceding siblings ...)
2016-08-19 5:35 ` [PATCH 04/13] KVM: PPC: Book3S HV: Introduce kvmppc_passthru_irqmap Paul Mackerras
@ 2016-08-19 5:35 ` Paul Mackerras
2016-08-19 5:35 ` [PATCH 06/13] KVM: PPC: Book3S HV: Enable IRQ bypass Paul Mackerras
` (8 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Paul Mackerras @ 2016-08-19 5:35 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
From: Suresh Warrier <warrier@linux.vnet.ibm.com>
This adds a new function pnv_opal_pci_msi_eoi() which does the part of
end-of-interrupt (EOI) handling of an MSI which involves doing an
OPAL call. This function can be called in real mode. This doesn't
just export pnv_ioda2_msi_eoi() because that does a call to
icp_native_eoi(), which does not work in real mode.
This also adds a function, is_pnv_opal_msi(), which KVM can call to
check whether an interrupt is one for which we should be calling
pnv_opal_pci_msi_eoi() when we need to do an EOI.
[paulus@ozlabs.org - split out the addition of pnv_opal_pci_msi_eoi()
from Suresh's patch "KVM: PPC: Book3S HV: Handle passthrough
interrupts in guest"; added is_pnv_opal_msi(); wrote description.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/pnv-pci.h | 3 +++
arch/powerpc/platforms/powernv/pci-ioda.c | 24 ++++++++++++++++++++----
2 files changed, 23 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
index 0cbd813..1b46b52 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -12,6 +12,7 @@
#include <linux/pci.h>
#include <linux/pci_hotplug.h>
+#include <linux/irq.h>
#include <misc/cxl-base.h>
#include <asm/opal-api.h>
@@ -33,6 +34,8 @@ int pnv_cxl_alloc_hwirqs(struct pci_dev *dev, int num);
void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num);
int pnv_cxl_get_irq_count(struct pci_dev *dev);
struct device_node *pnv_pci_get_phb_node(struct pci_dev *dev);
+int64_t pnv_opal_pci_msi_eoi(struct irq_chip *chip, unsigned int hw_irq);
+bool is_pnv_opal_msi(struct irq_chip *chip);
#ifdef CONFIG_CXL_BASE
int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index fd9444f..9ce48ae 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2710,15 +2710,21 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
}
#ifdef CONFIG_PCI_MSI
-static void pnv_ioda2_msi_eoi(struct irq_data *d)
+int64_t pnv_opal_pci_msi_eoi(struct irq_chip *chip, unsigned int hw_irq)
{
- unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d);
- struct irq_chip *chip = irq_data_get_irq_chip(d);
struct pnv_phb *phb = container_of(chip, struct pnv_phb,
ioda.irq_chip);
+
+ return opal_pci_msi_eoi(phb->opal_id, hw_irq);
+}
+
+static void pnv_ioda2_msi_eoi(struct irq_data *d)
+{
int64_t rc;
+ unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d);
+ struct irq_chip *chip = irq_data_get_irq_chip(d);
- rc = opal_pci_msi_eoi(phb->opal_id, hw_irq);
+ rc = pnv_opal_pci_msi_eoi(chip, hw_irq);
WARN_ON_ONCE(rc);
icp_native_eoi(d);
@@ -2748,6 +2754,16 @@ void pnv_set_msi_irq_chip(struct pnv_phb *phb, unsigned int virq)
irq_set_chip(virq, &phb->ioda.irq_chip);
}
+/*
+ * Returns true iff chip is something that we could call
+ * pnv_opal_pci_msi_eoi for.
+ */
+bool is_pnv_opal_msi(struct irq_chip *chip)
+{
+ return chip->irq_eoi == pnv_ioda2_msi_eoi;
+}
+EXPORT_SYMBOL_GPL(is_pnv_opal_msi);
+
static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
unsigned int hwirq, unsigned int virq,
unsigned int is_64, struct msi_msg *msg)
--
2.8.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 06/13] KVM: PPC: Book3S HV: Enable IRQ bypass
2016-08-19 5:35 [PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM Paul Mackerras
` (4 preceding siblings ...)
2016-08-19 5:35 ` [PATCH 05/13] powerpc/powernv: Provide facilities for EOI, usable from real mode Paul Mackerras
@ 2016-08-19 5:35 ` Paul Mackerras
2016-08-19 5:35 ` [PATCH 07/13] KVM: PPC: Book3S HV: Handle passthrough interrupts in guest Paul Mackerras
` (7 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Paul Mackerras @ 2016-08-19 5:35 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
From: Suresh Warrier <warrier@linux.vnet.ibm.com>
Add the irq_bypass_add_producer and irq_bypass_del_producer
functions. These functions get called whenever a GSI is being
defined for a guest. They create/remove the mapping between
host real IRQ numbers and the guest GSI.
Add the following helper functions to manage the
passthrough IRQ map.
kvmppc_set_passthru_irq()
Creates a mapping in the passthrough IRQ map that maps a host
IRQ to a guest GSI. It allocates the structure (one per guest VM)
the first time it is called.
kvmppc_clr_passthru_irq()
Removes the passthrough IRQ map entry given a guest GSI.
The passthrough IRQ map structure is not freed even when the
number of mapped entries goes to zero. It is only freed when
the VM is destroyed.
[paulus@ozlabs.org - modified to use is_pnv_opal_msi() rather than
requiring all passed-through interrupts to use the same irq_chip;
changed deletion so it zeroes out the r_hwirq field rather than
copying the last entry down and decrementing the number of entries.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv.c | 160 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 159 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 413b5c2f..aa11647 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -53,10 +53,13 @@
#include <asm/smp.h>
#include <asm/dbell.h>
#include <asm/hmi.h>
+#include <asm/pnv-pci.h>
#include <linux/gfp.h>
#include <linux/vmalloc.h>
#include <linux/highmem.h>
#include <linux/hugetlb.h>
+#include <linux/kvm_irqfd.h>
+#include <linux/irqbypass.h>
#include <linux/module.h>
#include "book3s.h"
@@ -3247,6 +3250,8 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
kvmppc_free_vcores(kvm);
kvmppc_free_hpt(kvm);
+
+ kvmppc_free_pimap(kvm);
}
/* We don't need to emulate any privileged instructions or dcbz */
@@ -3289,10 +3294,159 @@ void kvmppc_free_pimap(struct kvm *kvm)
kfree(kvm->arch.pimap);
}
-struct kvmppc_passthru_irqmap *kvmppc_alloc_pimap(void)
+static struct kvmppc_passthru_irqmap *kvmppc_alloc_pimap(void)
{
return kzalloc(sizeof(struct kvmppc_passthru_irqmap), GFP_KERNEL);
}
+
+static int kvmppc_set_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi)
+{
+ struct irq_desc *desc;
+ struct kvmppc_irq_map *irq_map;
+ struct kvmppc_passthru_irqmap *pimap;
+ struct irq_chip *chip;
+ int i;
+
+ desc = irq_to_desc(host_irq);
+ if (!desc)
+ return -EIO;
+
+ mutex_lock(&kvm->lock);
+
+ pimap = kvm->arch.pimap;
+ if (pimap == NULL) {
+ /* First call, allocate structure to hold IRQ map */
+ pimap = kvmppc_alloc_pimap();
+ if (pimap == NULL) {
+ mutex_unlock(&kvm->lock);
+ return -ENOMEM;
+ }
+ kvm->arch.pimap = pimap;
+ }
+
+ /*
+ * For now, we only support interrupts for which the EOI operation
+ * is an OPAL call followed by a write to XIRR, since that's
+ * what our real-mode EOI code does.
+ */
+ chip = irq_data_get_irq_chip(&desc->irq_data);
+ if (!chip || !is_pnv_opal_msi(chip)) {
+ pr_warn("kvmppc_set_passthru_irq_hv: Could not assign IRQ map for (%d,%d)\n",
+ host_irq, guest_gsi);
+ mutex_unlock(&kvm->lock);
+ return -ENOENT;
+ }
+
+ /*
+ * See if we already have an entry for this guest IRQ number.
+ * If it's mapped to a hardware IRQ number, that's an error,
+ * otherwise re-use this entry.
+ */
+ for (i = 0; i < pimap->n_mapped; i++) {
+ if (guest_gsi == pimap->mapped[i].v_hwirq) {
+ if (pimap->mapped[i].r_hwirq) {
+ mutex_unlock(&kvm->lock);
+ return -EINVAL;
+ }
+ break;
+ }
+ }
+
+ if (i == KVMPPC_PIRQ_MAPPED) {
+ mutex_unlock(&kvm->lock);
+ return -EAGAIN; /* table is full */
+ }
+
+ irq_map = &pimap->mapped[i];
+
+ irq_map->v_hwirq = guest_gsi;
+ irq_map->r_hwirq = desc->irq_data.hwirq;
+ irq_map->desc = desc;
+
+ if (i == pimap->n_mapped)
+ pimap->n_mapped++;
+
+ mutex_unlock(&kvm->lock);
+
+ return 0;
+}
+
+static int kvmppc_clr_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi)
+{
+ struct irq_desc *desc;
+ struct kvmppc_passthru_irqmap *pimap;
+ int i;
+
+ desc = irq_to_desc(host_irq);
+ if (!desc)
+ return -EIO;
+
+ mutex_lock(&kvm->lock);
+
+ if (kvm->arch.pimap == NULL) {
+ mutex_unlock(&kvm->lock);
+ return 0;
+ }
+ pimap = kvm->arch.pimap;
+
+ for (i = 0; i < pimap->n_mapped; i++) {
+ if (guest_gsi == pimap->mapped[i].v_hwirq)
+ break;
+ }
+
+ if (i == pimap->n_mapped) {
+ mutex_unlock(&kvm->lock);
+ return -ENODEV;
+ }
+
+ /* invalidate the entry */
+ pimap->mapped[i].r_hwirq = 0;
+
+ /*
+ * We don't free this structure even when the count goes to
+ * zero. The structure is freed when we destroy the VM.
+ */
+
+ mutex_unlock(&kvm->lock);
+ return 0;
+}
+
+static int kvmppc_irq_bypass_add_producer_hv(struct irq_bypass_consumer *cons,
+ struct irq_bypass_producer *prod)
+{
+ int ret = 0;
+ struct kvm_kernel_irqfd *irqfd =
+ container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+ irqfd->producer = prod;
+
+ ret = kvmppc_set_passthru_irq(irqfd->kvm, prod->irq, irqfd->gsi);
+ if (ret)
+ pr_info("kvmppc_set_passthru_irq (irq %d, gsi %d) fails: %d\n",
+ prod->irq, irqfd->gsi, ret);
+
+ return ret;
+}
+
+static void kvmppc_irq_bypass_del_producer_hv(struct irq_bypass_consumer *cons,
+ struct irq_bypass_producer *prod)
+{
+ int ret;
+ struct kvm_kernel_irqfd *irqfd =
+ container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+ irqfd->producer = NULL;
+
+ /*
+ * When producer of consumer is unregistered, we change back to
+ * default external interrupt handling mode - KVM real mode
+ * will switch back to host.
+ */
+ ret = kvmppc_clr_passthru_irq(irqfd->kvm, prod->irq, irqfd->gsi);
+ if (ret)
+ pr_warn("kvmppc_clr_passthru_irq (irq %d, gsi %d) fails: %d\n",
+ prod->irq, irqfd->gsi, ret);
+}
#endif
static long kvm_arch_vm_ioctl_hv(struct file *filp,
@@ -3413,6 +3567,10 @@ static struct kvmppc_ops kvm_ops_hv = {
.fast_vcpu_kick = kvmppc_fast_vcpu_kick_hv,
.arch_vm_ioctl = kvm_arch_vm_ioctl_hv,
.hcall_implemented = kvmppc_hcall_impl_hv,
+#ifdef CONFIG_KVM_XICS
+ .irq_bypass_add_producer = kvmppc_irq_bypass_add_producer_hv,
+ .irq_bypass_del_producer = kvmppc_irq_bypass_del_producer_hv,
+#endif
};
static int kvm_init_subcore_bitmap(void)
--
2.8.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 07/13] KVM: PPC: Book3S HV: Handle passthrough interrupts in guest
2016-08-19 5:35 [PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM Paul Mackerras
` (5 preceding siblings ...)
2016-08-19 5:35 ` [PATCH 06/13] KVM: PPC: Book3S HV: Enable IRQ bypass Paul Mackerras
@ 2016-08-19 5:35 ` Paul Mackerras
2016-09-12 0:53 ` [PATCH v2 " Paul Mackerras
2016-08-19 5:35 ` [PATCH 08/13] KVM: PPC: Book3S HV: Complete passthrough interrupt in host Paul Mackerras
` (6 subsequent siblings)
13 siblings, 1 reply; 16+ messages in thread
From: Paul Mackerras @ 2016-08-19 5:35 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
From: Suresh Warrier <warrier@linux.vnet.ibm.com>
Currently, KVM switches back to the host to handle any external
interrupt (when the interrupt is received while running in the
guest). This patch updates real-mode KVM to check if an interrupt
is generated by a passthrough adapter that is owned by this guest.
If so, the real mode KVM will directly inject the corresponding
virtual interrupt to the guest VCPU's ICS and also EOI the interrupt
in hardware. In short, the interrupt is handled entirely in real
mode in the guest context without switching back to the host.
In some rare cases, the interrupt cannot be completely handled in
real mode, for instance, a VCPU that is sleeping needs to be woken
up. In this case, KVM simply switches back to the host with trap
reason set to 0x500. This works, but it is clearly not very efficient.
A following patch will distinguish this case and handle it
correctly in the host. Note that we can use the existing
check_too_hard() routine even though we are not in a hypercall to
determine if there is unfinished business that needs to be
completed in host virtual mode.
The patch assumes that the mapping between hardware interrupt IRQ
and virtual IRQ to be injected to the guest already exists for the
PCI passthrough interrupts that need to be handled in real mode.
If the mapping does not exist, KVM falls back to the default
existing behavior.
The KVM real mode code reads mappings from the mapped array in the
passthrough IRQ map without taking any lock. We carefully order the
loads and stores of the fields in the kvmppc_irq_map data structure
using memory barriers to avoid an inconsistent mapping being seen by
the reader. Thus, although it is possible to miss a map entry, it is
not possible to read a stale value.
[paulus@ozlabs.org - get irq_chip from irq_map rather than pimap,
pulled out powernv eoi change into a separate patch, made
kvmppc_read_intr get the vcpu from the paca rather than being
passed in, rewrote the logic at the end of kvmppc_read_intr to
avoid deep indentation, simplified logic in book3s_hv_rmhandlers.S
since we were always restoring SRR0/1 anyway, get rid of the cached
array (just use the mapped array), removed the kick_all_cpus_sync()
call, clear saved_xirr PACA field when we handle the interrupt in
real mode.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_ppc.h | 3 ++
arch/powerpc/kvm/book3s_hv.c | 8 ++++-
arch/powerpc/kvm/book3s_hv_builtin.c | 58 ++++++++++++++++++++++++++++++++-
arch/powerpc/kvm/book3s_hv_rm_xics.c | 44 +++++++++++++++++++++++++
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 6 ++++
5 files changed, 117 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 4ca2ba3..4299a1f 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -478,6 +478,9 @@ extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
struct kvm_vcpu *vcpu, u32 cpu);
extern void kvmppc_xics_ipi_action(void);
+extern long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, u32 xirr,
+ struct kvmppc_irq_map *irq_map,
+ struct kvmppc_passthru_irqmap *pimap);
extern int h_ipi_redirect;
#else
static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap(
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index aa11647..175bdab 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3360,9 +3360,15 @@ static int kvmppc_set_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi)
irq_map = &pimap->mapped[i];
irq_map->v_hwirq = guest_gsi;
- irq_map->r_hwirq = desc->irq_data.hwirq;
irq_map->desc = desc;
+ /*
+ * Order the above two stores before the next to serialize with
+ * the KVM real mode handler.
+ */
+ smp_wmb();
+ irq_map->r_hwirq = desc->irq_data.hwirq;
+
if (i == pimap->n_mapped)
pimap->n_mapped++;
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index b476a6a..fdb8aef 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -288,12 +288,41 @@ void kvmhv_commence_exit(int trap)
struct kvmppc_host_rm_ops *kvmppc_host_rm_ops_hv;
EXPORT_SYMBOL_GPL(kvmppc_host_rm_ops_hv);
+static struct kvmppc_irq_map *get_irqmap(struct kvmppc_passthru_irqmap *pimap,
+ u32 xisr)
+{
+ int i;
+
+ /*
+ * We access the mapped array here without a lock. That
+ * is safe because we never reduce the number of entries
+ * in the array and we never change the v_hwirq field of
+ * an entry once it is set.
+ *
+ * We have also carefully ordered the stores in the writer
+ * and the loads here in the reader, so that if we find a matching
+ * hwirq here, the associated GSI and irq_desc fields are valid.
+ */
+ for (i = 0; i < pimap->n_mapped; i++) {
+ if (xisr == pimap->mapped[i].r_hwirq) {
+ /*
+ * Order subsequent reads in the caller to serialize
+ * with the writer.
+ */
+ smp_rmb();
+ return &pimap->mapped[i];
+ }
+ }
+ return NULL;
+}
+
/*
* Determine what sort of external interrupt is pending (if any).
* Returns:
* 0 if no interrupt is pending
* 1 if an interrupt is pending that needs to be handled by the host
* -1 if there was a guest wakeup IPI (which has now been cleared)
+ * -2 if there is PCI passthrough external interrupt that was handled
*/
long kvmppc_read_intr(void)
@@ -302,6 +331,9 @@ long kvmppc_read_intr(void)
u32 h_xirr;
__be32 xirr;
u32 xisr;
+ struct kvmppc_passthru_irqmap *pimap;
+ struct kvmppc_irq_map *irq_map;
+ struct kvm_vcpu *vcpu;
u8 host_ipi;
/* see if a host IPI is pending */
@@ -368,5 +400,29 @@ long kvmppc_read_intr(void)
return -1;
}
- return 1;
+ /*
+ * If it's not an IPI, check if we have a passthrough adapter and
+ * if so, check if this external interrupt is for the adapter.
+ * We will attempt to deliver the IRQ directly to the target VCPU's
+ * ICP, the virtual ICP (based on affinity - the xive value in ICS).
+ *
+ * If the delivery fails or if this is not for a passthrough adapter,
+ * return to the host to handle this interrupt. We earlier
+ * saved a copy of the XIRR in the PACA, it will be picked up by
+ * the host ICP driver
+ */
+ vcpu = local_paca->kvm_hstate.kvm_vcpu;
+ if (!vcpu)
+ return 1;
+ pimap = kvmppc_get_passthru_irqmap(vcpu->kvm);
+ if (!pimap)
+ return 1;
+ irq_map = get_irqmap(pimap, xisr);
+ if (!irq_map)
+ return 1;
+
+ /* We're handling this interrupt, generic code doesn't need to */
+ local_paca->kvm_hstate.saved_xirr = 0;
+
+ return kvmppc_deliver_irq_passthru(vcpu, xirr, irq_map, pimap);
}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 980d8a6..17f5b85 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -19,6 +19,7 @@
#include <asm/synch.h>
#include <asm/cputhreads.h>
#include <asm/ppc-opcode.h>
+#include <asm/pnv-pci.h>
#include "book3s_xics.h"
@@ -712,6 +713,49 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
return check_too_hard(xics, icp);
}
+unsigned long eoi_rc;
+
+static void icp_eoi(struct irq_chip *c, u32 hwirq, u32 xirr)
+{
+ unsigned long xics_phys;
+ int64_t rc;
+
+ rc = pnv_opal_pci_msi_eoi(c, hwirq);
+
+ if (rc)
+ eoi_rc = rc;
+
+ iosync();
+
+ /* EOI it */
+ xics_phys = local_paca->kvm_hstate.xics_phys;
+ _stwcix(xics_phys + XICS_XIRR, xirr);
+}
+
+long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
+ u32 xirr,
+ struct kvmppc_irq_map *irq_map,
+ struct kvmppc_passthru_irqmap *pimap)
+{
+ struct kvmppc_xics *xics;
+ struct kvmppc_icp *icp;
+ u32 irq;
+
+ irq = irq_map->v_hwirq;
+ xics = vcpu->kvm->arch.xics;
+ icp = vcpu->arch.icp;
+
+ icp_rm_deliver_irq(xics, icp, irq);
+
+ /* EOI the interrupt */
+ icp_eoi(irq_desc_get_chip(irq_map->desc), irq_map->r_hwirq, xirr);
+
+ if (check_too_hard(xics, icp) == H_TOO_HARD)
+ return 1;
+ else
+ return -2;
+}
+
/* --- Non-real mode XICS-related built-in routines --- */
/**
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index dccfa85..12fb2af 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1198,6 +1198,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
* -1 A guest wakeup IPI (which has now been cleared)
* In either case, we return to guest to deliver any pending
* guest interrupts.
+ *
+ * -2 A PCI passthrough external interrupt was handled
+ * (interrupt was delivered directly to guest)
+ * Return to guest to deliver any pending guest interrupts.
*/
cmpdi r3, 0
@@ -2352,6 +2356,8 @@ machine_check_realmode:
* 0 if nothing needs to be done
* 1 if something happened that needs to be handled by the host
* -1 if there was a guest wakeup (IPI or msgsnd)
+ * -2 if we handled a PCI passthrough interrupt (returned by
+ * kvmppc_read_intr only)
*
* Also sets r12 to the interrupt vector for any interrupt that needs
* to be handled now by the host (0x500 for external interrupt), or zero.
--
2.8.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 07/13] KVM: PPC: Book3S HV: Handle passthrough interrupts in guest
2016-08-19 5:35 ` [PATCH 07/13] KVM: PPC: Book3S HV: Handle passthrough interrupts in guest Paul Mackerras
@ 2016-09-12 0:53 ` Paul Mackerras
0 siblings, 0 replies; 16+ messages in thread
From: Paul Mackerras @ 2016-09-12 0:53 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
Currently, KVM switches back to the host to handle any external
interrupt (when the interrupt is received while running in the
guest). This patch updates real-mode KVM to check if an interrupt
is generated by a passthrough adapter that is owned by this guest.
If so, the real mode KVM will directly inject the corresponding
virtual interrupt to the guest VCPU's ICS and also EOI the interrupt
in hardware. In short, the interrupt is handled entirely in real
mode in the guest context without switching back to the host.
In some rare cases, the interrupt cannot be completely handled in
real mode, for instance, a VCPU that is sleeping needs to be woken
up. In this case, KVM simply switches back to the host with trap
reason set to 0x500. This works, but it is clearly not very efficient.
A following patch will distinguish this case and handle it
correctly in the host. Note that we can use the existing
check_too_hard() routine even though we are not in a hypercall to
determine if there is unfinished business that needs to be
completed in host virtual mode.
The patch assumes that the mapping between hardware interrupt IRQ
and virtual IRQ to be injected to the guest already exists for the
PCI passthrough interrupts that need to be handled in real mode.
If the mapping does not exist, KVM falls back to the default
existing behavior.
The KVM real mode code reads mappings from the mapped array in the
passthrough IRQ map without taking any lock. We carefully order the
loads and stores of the fields in the kvmppc_irq_map data structure
using memory barriers to avoid an inconsistent mapping being seen by
the reader. Thus, although it is possible to miss a map entry, it is
not possible to read a stale value.
[paulus@ozlabs.org - get irq_chip from irq_map rather than pimap,
pulled out powernv eoi change into a separate patch, made
kvmppc_read_intr get the vcpu from the paca rather than being
passed in, rewrote the logic at the end of kvmppc_read_intr to
avoid deep indentation, simplified logic in book3s_hv_rmhandlers.S
since we were always restoring SRR0/1 anyway, get rid of the cached
array (just use the mapped array), removed the kick_all_cpus_sync()
call, clear saved_xirr PACA field when we handle the interrupt in
real mode, fix compilation with CONFIG_KVM_XICS=n.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
v2 - fix compilation with CONFIG_KVM_XICS=n
arch/powerpc/include/asm/kvm_ppc.h | 3 ++
arch/powerpc/kvm/book3s_hv.c | 8 +++-
arch/powerpc/kvm/book3s_hv_builtin.c | 73 ++++++++++++++++++++++++++++++++-
arch/powerpc/kvm/book3s_hv_rm_xics.c | 44 ++++++++++++++++++++
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 6 +++
5 files changed, 132 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 4ca2ba3..4299a1f 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -478,6 +478,9 @@ extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
struct kvm_vcpu *vcpu, u32 cpu);
extern void kvmppc_xics_ipi_action(void);
+extern long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, u32 xirr,
+ struct kvmppc_irq_map *irq_map,
+ struct kvmppc_passthru_irqmap *pimap);
extern int h_ipi_redirect;
#else
static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap(
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a7fe178..a393c2b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3490,9 +3490,15 @@ static int kvmppc_set_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi)
irq_map = &pimap->mapped[i];
irq_map->v_hwirq = guest_gsi;
- irq_map->r_hwirq = desc->irq_data.hwirq;
irq_map->desc = desc;
+ /*
+ * Order the above two stores before the next to serialize with
+ * the KVM real mode handler.
+ */
+ smp_wmb();
+ irq_map->r_hwirq = desc->irq_data.hwirq;
+
if (i == pimap->n_mapped)
pimap->n_mapped++;
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index b476a6a..7531e29 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -288,12 +288,83 @@ void kvmhv_commence_exit(int trap)
struct kvmppc_host_rm_ops *kvmppc_host_rm_ops_hv;
EXPORT_SYMBOL_GPL(kvmppc_host_rm_ops_hv);
+#ifdef CONFIG_KVM_XICS
+static struct kvmppc_irq_map *get_irqmap(struct kvmppc_passthru_irqmap *pimap,
+ u32 xisr)
+{
+ int i;
+
+ /*
+ * We access the mapped array here without a lock. That
+ * is safe because we never reduce the number of entries
+ * in the array and we never change the v_hwirq field of
+ * an entry once it is set.
+ *
+ * We have also carefully ordered the stores in the writer
+ * and the loads here in the reader, so that if we find a matching
+ * hwirq here, the associated GSI and irq_desc fields are valid.
+ */
+ for (i = 0; i < pimap->n_mapped; i++) {
+ if (xisr == pimap->mapped[i].r_hwirq) {
+ /*
+ * Order subsequent reads in the caller to serialize
+ * with the writer.
+ */
+ smp_rmb();
+ return &pimap->mapped[i];
+ }
+ }
+ return NULL;
+}
+
+/*
+ * If we have an interrupt that's not an IPI, check if we have a
+ * passthrough adapter and if so, check if this external interrupt
+ * is for the adapter.
+ * We will attempt to deliver the IRQ directly to the target VCPU's
+ * ICP, the virtual ICP (based on affinity - the xive value in ICS).
+ *
+ * If the delivery fails or if this is not for a passthrough adapter,
+ * return to the host to handle this interrupt. We earlier
+ * saved a copy of the XIRR in the PACA, it will be picked up by
+ * the host ICP driver.
+ */
+static int kvmppc_check_passthru(u32 xisr, __be32 xirr)
+{
+ struct kvmppc_passthru_irqmap *pimap;
+ struct kvmppc_irq_map *irq_map;
+ struct kvm_vcpu *vcpu;
+
+ vcpu = local_paca->kvm_hstate.kvm_vcpu;
+ if (!vcpu)
+ return 1;
+ pimap = kvmppc_get_passthru_irqmap(vcpu->kvm);
+ if (!pimap)
+ return 1;
+ irq_map = get_irqmap(pimap, xisr);
+ if (!irq_map)
+ return 1;
+
+ /* We're handling this interrupt, generic code doesn't need to */
+ local_paca->kvm_hstate.saved_xirr = 0;
+
+ return kvmppc_deliver_irq_passthru(vcpu, xirr, irq_map, pimap);
+}
+
+#else
+static inline int kvmppc_check_passthru(u32 xisr, __be32 xirr)
+{
+ return 1;
+}
+#endif
+
/*
* Determine what sort of external interrupt is pending (if any).
* Returns:
* 0 if no interrupt is pending
* 1 if an interrupt is pending that needs to be handled by the host
* -1 if there was a guest wakeup IPI (which has now been cleared)
+ * -2 if there is PCI passthrough external interrupt that was handled
*/
long kvmppc_read_intr(void)
@@ -368,5 +439,5 @@ long kvmppc_read_intr(void)
return -1;
}
- return 1;
+ return kvmppc_check_passthru(xisr, xirr);
}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 980d8a6..17f5b85 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -19,6 +19,7 @@
#include <asm/synch.h>
#include <asm/cputhreads.h>
#include <asm/ppc-opcode.h>
+#include <asm/pnv-pci.h>
#include "book3s_xics.h"
@@ -712,6 +713,49 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
return check_too_hard(xics, icp);
}
+unsigned long eoi_rc;
+
+static void icp_eoi(struct irq_chip *c, u32 hwirq, u32 xirr)
+{
+ unsigned long xics_phys;
+ int64_t rc;
+
+ rc = pnv_opal_pci_msi_eoi(c, hwirq);
+
+ if (rc)
+ eoi_rc = rc;
+
+ iosync();
+
+ /* EOI it */
+ xics_phys = local_paca->kvm_hstate.xics_phys;
+ _stwcix(xics_phys + XICS_XIRR, xirr);
+}
+
+long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
+ u32 xirr,
+ struct kvmppc_irq_map *irq_map,
+ struct kvmppc_passthru_irqmap *pimap)
+{
+ struct kvmppc_xics *xics;
+ struct kvmppc_icp *icp;
+ u32 irq;
+
+ irq = irq_map->v_hwirq;
+ xics = vcpu->kvm->arch.xics;
+ icp = vcpu->arch.icp;
+
+ icp_rm_deliver_irq(xics, icp, irq);
+
+ /* EOI the interrupt */
+ icp_eoi(irq_desc_get_chip(irq_map->desc), irq_map->r_hwirq, xirr);
+
+ if (check_too_hard(xics, icp) == H_TOO_HARD)
+ return 1;
+ else
+ return -2;
+}
+
/* --- Non-real mode XICS-related built-in routines --- */
/**
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index dccfa85..12fb2af 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1198,6 +1198,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
* -1 A guest wakeup IPI (which has now been cleared)
* In either case, we return to guest to deliver any pending
* guest interrupts.
+ *
+ * -2 A PCI passthrough external interrupt was handled
+ * (interrupt was delivered directly to guest)
+ * Return to guest to deliver any pending guest interrupts.
*/
cmpdi r3, 0
@@ -2352,6 +2356,8 @@ machine_check_realmode:
* 0 if nothing needs to be done
* 1 if something happened that needs to be handled by the host
* -1 if there was a guest wakeup (IPI or msgsnd)
+ * -2 if we handled a PCI passthrough interrupt (returned by
+ * kvmppc_read_intr only)
*
* Also sets r12 to the interrupt vector for any interrupt that needs
* to be handled now by the host (0x500 for external interrupt), or zero.
--
2.7.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 08/13] KVM: PPC: Book3S HV: Complete passthrough interrupt in host
2016-08-19 5:35 [PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM Paul Mackerras
` (6 preceding siblings ...)
2016-08-19 5:35 ` [PATCH 07/13] KVM: PPC: Book3S HV: Handle passthrough interrupts in guest Paul Mackerras
@ 2016-08-19 5:35 ` Paul Mackerras
2016-08-19 5:35 ` [PATCH 09/13] KVM: PPC: Book3S HV: Dump irqmap in debugfs Paul Mackerras
` (5 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Paul Mackerras @ 2016-08-19 5:35 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
From: Suresh Warrier <warrier@linux.vnet.ibm.com>
In existing real mode ICP code, when updating the virtual ICP
state, if there is a required action that cannot be completely
handled in real mode, as for instance, a VCPU needs to be woken
up, flags are set in the ICP to indicate the required action.
This is checked when returning from hypercalls to decide whether
the call needs switch back to the host where the action can be
performed in virtual mode. Note that if h_ipi_redirect is enabled,
real mode code will first try to message a free host CPU to
complete this job instead of returning the host to do it ourselves.
Currently, the real mode PCI passthrough interrupt handling code
checks if any of these flags are set and simply returns to the host.
This is not good enough as the trap value (0x500) is treated as an
external interrupt by the host code. It is only when the trap value
is a hypercall that the host code searches for and acts on unfinished
work by calling kvmppc_xics_rm_complete.
This patch introduces a special trap BOOK3S_INTERRUPT_HV_RM_HARD
which is returned by KVM if there is unfinished business to be
completed in host virtual mode after handling a PCI passthrough
interrupt. The host checks for this special interrupt condition
and calls into the kvmppc_xics_rm_complete, which is made an
exported function for this reason.
[paulus@ozlabs.org - moved logic to set r12 to BOOK3S_INTERRUPT_HV_RM_HARD
in book3s_hv_rmhandlers.S into the end of kvmppc_check_wake_reason.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_asm.h | 10 ++++++++++
arch/powerpc/include/asm/kvm_ppc.h | 3 +++
arch/powerpc/kvm/book3s_hv.c | 8 +++++++-
arch/powerpc/kvm/book3s_hv_builtin.c | 1 +
arch/powerpc/kvm/book3s_hv_rm_xics.c | 2 +-
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 25 +++++++++++++++++++++++++
arch/powerpc/kvm/book3s_xics.c | 3 ++-
7 files changed, 49 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
index 5bca220..05cabed 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -105,6 +105,15 @@
#define BOOK3S_INTERRUPT_FAC_UNAVAIL 0xf60
#define BOOK3S_INTERRUPT_H_FAC_UNAVAIL 0xf80
+/* book3s_hv */
+
+/*
+ * Special trap used to indicate to host that this is a
+ * passthrough interrupt that could not be handled
+ * completely in the guest.
+ */
+#define BOOK3S_INTERRUPT_HV_RM_HARD 0x5555
+
#define BOOK3S_IRQPRIO_SYSTEM_RESET 0
#define BOOK3S_IRQPRIO_DATA_SEGMENT 1
#define BOOK3S_IRQPRIO_INST_SEGMENT 2
@@ -136,6 +145,7 @@
#define RESUME_FLAG_NV (1<<0) /* Reload guest nonvolatile state? */
#define RESUME_FLAG_HOST (1<<1) /* Resume host? */
#define RESUME_FLAG_ARCH1 (1<<2)
+#define RESUME_FLAG_ARCH2 (1<<3)
#define RESUME_GUEST 0
#define RESUME_GUEST_NV RESUME_FLAG_NV
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 4299a1f..e0ada31 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -469,6 +469,7 @@ static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap(
extern void kvmppc_alloc_host_rm_ops(void);
extern void kvmppc_free_host_rm_ops(void);
extern void kvmppc_free_pimap(struct kvm *kvm);
+extern int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall);
extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu);
extern int kvmppc_xics_create_icp(struct kvm_vcpu *vcpu, unsigned long server);
extern int kvm_vm_ioctl_xics_irq(struct kvm *kvm, struct kvm_irq_level *args);
@@ -489,6 +490,8 @@ static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap(
static inline void kvmppc_alloc_host_rm_ops(void) {};
static inline void kvmppc_free_host_rm_ops(void) {};
static inline void kvmppc_free_pimap(struct kvm *kvm) {};
+static inline int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall)
+ { return 0; }
static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu)
{ return 0; }
static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 175bdab..cfddafa 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -73,6 +73,8 @@
/* Used to indicate that a guest page fault needs to be handled */
#define RESUME_PAGE_FAULT (RESUME_GUEST | RESUME_FLAG_ARCH1)
+/* Used to indicate that a guest passthrough interrupt needs to be handled */
+#define RESUME_PASSTHROUGH (RESUME_GUEST | RESUME_FLAG_ARCH2)
/* Used as a "null" value for timebase values */
#define TB_NIL (~(u64)0)
@@ -994,6 +996,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
r = RESUME_GUEST;
break;
+ case BOOK3S_INTERRUPT_HV_RM_HARD:
+ r = RESUME_PASSTHROUGH;
+ break;
default:
kvmppc_dump_regs(vcpu);
printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n",
@@ -2821,7 +2826,8 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu)
r = kvmppc_book3s_hv_page_fault(run, vcpu,
vcpu->arch.fault_dar, vcpu->arch.fault_dsisr);
srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
- }
+ } else if (r == RESUME_PASSTHROUGH)
+ r = kvmppc_xics_rm_complete(vcpu, 0);
} while (is_kvmppc_resume_guest(r));
out:
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index fdb8aef..a262957 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -321,6 +321,7 @@ static struct kvmppc_irq_map *get_irqmap(struct kvmppc_passthru_irqmap *pimap,
* Returns:
* 0 if no interrupt is pending
* 1 if an interrupt is pending that needs to be handled by the host
+ * 2 Passthrough that needs completion in the host
* -1 if there was a guest wakeup IPI (which has now been cleared)
* -2 if there is PCI passthrough external interrupt that was handled
*/
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 17f5b85..3b8d7ac 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -751,7 +751,7 @@ long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
icp_eoi(irq_desc_get_chip(irq_map->desc), irq_map->r_hwirq, xirr);
if (check_too_hard(xics, icp) == H_TOO_HARD)
- return 1;
+ return 2;
else
return -2;
}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 12fb2af..7cc924b 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1189,6 +1189,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
* 1 An interrupt is pending that needs to be handled by the host
* Exit guest and return to host by branching to guest_exit_cont
*
+ * 2 Passthrough that needs completion in the host
+ * Exit guest and return to host by branching to guest_exit_cont
+ * However, we also set r12 to BOOK3S_INTERRUPT_HV_RM_HARD
+ * to indicate to the host to complete handling the interrupt
+ *
* Before returning to guest, we check if any CPU is heading out
* to the host and if so, we head out also. If no CPUs are heading
* check return values <= 0.
@@ -1204,6 +1209,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
* Return to guest to deliver any pending guest interrupts.
*/
+ cmpdi r3, 1
+ ble 1f
+
+ /* Return code = 2 */
+ li r12, BOOK3S_INTERRUPT_HV_RM_HARD
+ stw r12, VCPU_TRAP(r9)
+ b guest_exit_cont
+
+1: /* Return code <= 1 */
cmpdi r3, 0
bgt guest_exit_cont
@@ -2419,6 +2433,17 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
bl kvmppc_read_intr
nop
li r12, BOOK3S_INTERRUPT_EXTERNAL
+ cmpdi r3, 1
+ ble 1f
+
+ /*
+ * Return code of 2 means PCI passthrough interrupt, but
+ * we need to return back to host to complete handling the
+ * interrupt. Trap reason is expected in r12 by guest
+ * exit code.
+ */
+ li r12, BOOK3S_INTERRUPT_HV_RM_HARD
+1:
ld r0, PPC_MIN_STKFRM+PPC_LR_STKOFF(r1)
addi r1, r1, PPC_MIN_STKFRM
mtlr r0
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 05aa113..d528d22 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -812,7 +812,7 @@ static noinline int kvmppc_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
return H_SUCCESS;
}
-static noinline int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall)
+int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall)
{
struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
struct kvmppc_icp *icp = vcpu->arch.icp;
@@ -841,6 +841,7 @@ static noinline int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall)
return H_SUCCESS;
}
+EXPORT_SYMBOL_GPL(kvmppc_xics_rm_complete);
int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
{
--
2.8.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 09/13] KVM: PPC: Book3S HV: Dump irqmap in debugfs
2016-08-19 5:35 [PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM Paul Mackerras
` (7 preceding siblings ...)
2016-08-19 5:35 ` [PATCH 08/13] KVM: PPC: Book3S HV: Complete passthrough interrupt in host Paul Mackerras
@ 2016-08-19 5:35 ` Paul Mackerras
2016-08-19 5:35 ` [PATCH 10/13] KVM: PPC: Book3S HV: Tunable to disable KVM IRQ bypass Paul Mackerras
` (4 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Paul Mackerras @ 2016-08-19 5:35 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
From: Suresh Warrier <warrier@linux.vnet.ibm.com>
Dump the passthrough irqmap structure associated with a
guest as part of /sys/kernel/debug/powerpc/kvm-xics-*.
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_xics.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index d528d22..b41f1d3 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -893,6 +893,21 @@ EXPORT_SYMBOL_GPL(kvmppc_xics_hcall);
/* -- Initialisation code etc. -- */
+static void xics_debugfs_irqmap(struct seq_file *m,
+ struct kvmppc_passthru_irqmap *pimap)
+{
+ int i;
+
+ if (!pimap)
+ return;
+ seq_printf(m, "========\nPIRQ mappings: %d maps\n===========\n",
+ pimap->n_mapped);
+ for (i = 0; i < pimap->n_mapped; i++) {
+ seq_printf(m, "r_hwirq=%x, v_hwirq=%x\n",
+ pimap->mapped[i].r_hwirq, pimap->mapped[i].v_hwirq);
+ }
+}
+
static int xics_debug_show(struct seq_file *m, void *private)
{
struct kvmppc_xics *xics = m->private;
@@ -914,6 +929,8 @@ static int xics_debug_show(struct seq_file *m, void *private)
t_check_resend = 0;
t_reject = 0;
+ xics_debugfs_irqmap(m, kvm->arch.pimap);
+
seq_printf(m, "=========\nICP state\n=========\n");
kvm_for_each_vcpu(i, vcpu, kvm) {
--
2.8.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 10/13] KVM: PPC: Book3S HV: Tunable to disable KVM IRQ bypass
2016-08-19 5:35 [PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM Paul Mackerras
` (8 preceding siblings ...)
2016-08-19 5:35 ` [PATCH 09/13] KVM: PPC: Book3S HV: Dump irqmap in debugfs Paul Mackerras
@ 2016-08-19 5:35 ` Paul Mackerras
2016-08-19 5:35 ` [PATCH 11/13] KVM: PPC: Book3S HV: Update irq stats for IRQs handled in real mode Paul Mackerras
` (3 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Paul Mackerras @ 2016-08-19 5:35 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
From: Suresh Warrier <warrier@linux.vnet.ibm.com>
Add a module parameter kvm_irq_bypass for kvm_hv.ko to
disable IRQ bypass for passthrough interrupts. The default
value of this tunable is 1 - that is enable the feature.
Since the tunable is used by built-in kernel code, we use
the module_param_cb macro to achieve this.
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_book3s.h | 1 +
arch/powerpc/include/asm/kvm_ppc.h | 2 +-
arch/powerpc/kvm/book3s_hv.c | 10 ++++++++++
arch/powerpc/kvm/book3s_hv_rm_xics.c | 2 ++
4 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 8f39796..8e5fac6 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -191,6 +191,7 @@ extern void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu,
struct kvm_vcpu *vcpu);
extern void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu,
struct kvmppc_book3s_shadow_vcpu *svcpu);
+extern int kvm_irq_bypass;
static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
{
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index e0ada31..97b9bad 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -461,7 +461,7 @@ static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu)
static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap(
struct kvm *kvm)
{
- if (kvm)
+ if (kvm && kvm_irq_bypass)
return kvm->arch.pimap;
return NULL;
}
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cfddafa..2e71518 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -94,6 +94,10 @@ static struct kernel_param_ops module_param_ops = {
.get = param_get_int,
};
+module_param_cb(kvm_irq_bypass, &module_param_ops, &kvm_irq_bypass,
+ S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(kvm_irq_bypass, "Bypass passthrough interrupt optimization");
+
module_param_cb(h_ipi_redirect, &module_param_ops, &h_ipi_redirect,
S_IRUGO | S_IWUSR);
MODULE_PARM_DESC(h_ipi_redirect, "Redirect H_IPI wakeup to a free host core");
@@ -3313,6 +3317,9 @@ static int kvmppc_set_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi)
struct irq_chip *chip;
int i;
+ if (!kvm_irq_bypass)
+ return 1;
+
desc = irq_to_desc(host_irq);
if (!desc)
return -EIO;
@@ -3389,6 +3396,9 @@ static int kvmppc_clr_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi)
struct kvmppc_passthru_irqmap *pimap;
int i;
+ if (!kvm_irq_bypass)
+ return 0;
+
desc = irq_to_desc(host_irq);
if (!desc)
return -EIO;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 3b8d7ac..00b9dfde 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -27,6 +27,8 @@
int h_ipi_redirect = 1;
EXPORT_SYMBOL(h_ipi_redirect);
+int kvm_irq_bypass = 1;
+EXPORT_SYMBOL(kvm_irq_bypass);
static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
u32 new_irq);
--
2.8.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 11/13] KVM: PPC: Book3S HV: Update irq stats for IRQs handled in real mode
2016-08-19 5:35 [PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM Paul Mackerras
` (9 preceding siblings ...)
2016-08-19 5:35 ` [PATCH 10/13] KVM: PPC: Book3S HV: Tunable to disable KVM IRQ bypass Paul Mackerras
@ 2016-08-19 5:35 ` Paul Mackerras
2016-08-19 5:35 ` [PATCH 12/13] KVM: PPC: Book3S HV: Set server for passed-through interrupts Paul Mackerras
` (2 subsequent siblings)
13 siblings, 0 replies; 16+ messages in thread
From: Paul Mackerras @ 2016-08-19 5:35 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
From: Suresh Warrier <warrier@linux.vnet.ibm.com>
When a passthrough IRQ is handled completely within KVM real
mode code, it has to also update the IRQ stats since this
does not go through the generic IRQ handling code.
However, the per CPU kstat_irqs field is an allocated (not static)
field and so cannot be directly accessed in real mode safely.
The function this_cpu_inc_rm() is introduced to safely increment
per CPU fields (currently coded for unsigned integers only) that
are allocated and could thus be vmalloced also.
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/kvm/book3s_hv_rm_xics.c | 50 ++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 00b9dfde..554cdfa 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -10,6 +10,7 @@
#include <linux/kernel.h>
#include <linux/kvm_host.h>
#include <linux/err.h>
+#include <linux/kernel_stat.h>
#include <asm/kvm_book3s.h>
#include <asm/kvm_ppc.h>
@@ -18,6 +19,7 @@
#include <asm/debug.h>
#include <asm/synch.h>
#include <asm/cputhreads.h>
+#include <asm/pgtable.h>
#include <asm/ppc-opcode.h>
#include <asm/pnv-pci.h>
@@ -734,6 +736,53 @@ static void icp_eoi(struct irq_chip *c, u32 hwirq, u32 xirr)
_stwcix(xics_phys + XICS_XIRR, xirr);
}
+/*
+ * Increment a per-CPU 32-bit unsigned integer variable.
+ * Safe to call in real-mode. Handles vmalloc'ed addresses
+ *
+ * ToDo: Make this work for any integral type
+ */
+
+static inline void this_cpu_inc_rm(unsigned int __percpu *addr)
+{
+ unsigned long l;
+ unsigned int *raddr;
+ int cpu = smp_processor_id();
+
+ raddr = per_cpu_ptr(addr, cpu);
+ l = (unsigned long)raddr;
+
+ if (REGION_ID(l) == VMALLOC_REGION_ID) {
+ l = vmalloc_to_phys(raddr);
+ raddr = (unsigned int *)l;
+ }
+ ++*raddr;
+}
+
+/*
+ * We don't try to update the flags in the irq_desc 'istate' field in
+ * here as would happen in the normal IRQ handling path for several reasons:
+ * - state flags represent internal IRQ state and are not expected to be
+ * updated outside the IRQ subsystem
+ * - more importantly, these are useful for edge triggered interrupts,
+ * IRQ probing, etc., but we are only handling MSI/MSIx interrupts here
+ * and these states shouldn't apply to us.
+ *
+ * However, we do update irq_stats - we somewhat duplicate the code in
+ * kstat_incr_irqs_this_cpu() for this since this function is defined
+ * in irq/internal.h which we don't want to include here.
+ * The only difference is that desc->kstat_irqs is an allocated per CPU
+ * variable and could have been vmalloc'ed, so we can't directly
+ * call __this_cpu_inc() on it. The kstat structure is a static
+ * per CPU variable and it should be accessible by real-mode KVM.
+ *
+ */
+static void kvmppc_rm_handle_irq_desc(struct irq_desc *desc)
+{
+ this_cpu_inc_rm(desc->kstat_irqs);
+ __this_cpu_inc(kstat.irqs_sum);
+}
+
long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
u32 xirr,
struct kvmppc_irq_map *irq_map,
@@ -747,6 +796,7 @@ long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
xics = vcpu->kvm->arch.xics;
icp = vcpu->arch.icp;
+ kvmppc_rm_handle_irq_desc(irq_map->desc);
icp_rm_deliver_irq(xics, icp, irq);
/* EOI the interrupt */
--
2.8.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 12/13] KVM: PPC: Book3S HV: Set server for passed-through interrupts
2016-08-19 5:35 [PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM Paul Mackerras
` (10 preceding siblings ...)
2016-08-19 5:35 ` [PATCH 11/13] KVM: PPC: Book3S HV: Update irq stats for IRQs handled in real mode Paul Mackerras
@ 2016-08-19 5:35 ` Paul Mackerras
2016-08-19 5:35 ` [PATCH 13/13] KVM: PPC: Book3S HV: Counters for passthrough IRQ stats Paul Mackerras
2016-09-12 0:55 ` [PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM Paul Mackerras
13 siblings, 0 replies; 16+ messages in thread
From: Paul Mackerras @ 2016-08-19 5:35 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
When a guest has a PCI pass-through device with an interrupt, it
will direct the interrupt to a particular guest VCPU. In fact the
physical interrupt might arrive on any CPU, and then get
delivered to the target VCPU in the emulated XICS (guest interrupt
controller), and eventually delivered to the target VCPU.
Now that we have code to handle device interrupts in real mode
without exiting to the host kernel, there is an advantage to having
the device interrupt arrive on the same sub(core) as the target
VCPU is running on. In this situation, the interrupt can be
delivered to the target VCPU without any exit to the host kernel
(using a hypervisor doorbell interrupt between threads if
necessary).
This patch aims to get passed-through device interrupts arriving
on the correct core by setting the interrupt server in the real
hardware XICS for the interrupt to the first thread in the (sub)core
where its target VCPU is running. We do this in the real-mode H_EOI
code because the H_EOI handler already needs to look at the
emulated ICS state for the interrupt (whereas the H_XIRR handler
doesn't), and we know we are running in the target VCPU context
at that point.
We set the server CPU in hardware using an OPAL call, regardless of
what the IRQ affinity mask for the interrupt says, and without
updating the affinity mask. This amounts to saying that when an
interrupt is passed through to a guest, as a matter of policy we
allow the guest's affinity for the interrupt to override the host's.
This is inspired by an earlier patch from Suresh Warrier, although
none of this code came from that earlier patch.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_ppc.h | 4 +++
arch/powerpc/include/asm/opal.h | 1 +
arch/powerpc/kvm/book3s_hv.c | 4 +++
arch/powerpc/kvm/book3s_hv_rm_xics.c | 16 ++++++++++++
arch/powerpc/kvm/book3s_xics.c | 35 ++++++++++++++++++++++++++
arch/powerpc/kvm/book3s_xics.h | 2 ++
arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
7 files changed, 63 insertions(+)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 97b9bad..f6e4964 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -479,6 +479,10 @@ extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
struct kvm_vcpu *vcpu, u32 cpu);
extern void kvmppc_xics_ipi_action(void);
+extern void kvmppc_xics_set_mapped(struct kvm *kvm, unsigned long guest_irq,
+ unsigned long host_irq);
+extern void kvmppc_xics_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
+ unsigned long host_irq);
extern long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, u32 xirr,
struct kvmppc_irq_map *irq_map,
struct kvmppc_passthru_irqmap *pimap);
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index ee05bd2..e958b70 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -67,6 +67,7 @@ int64_t opal_pci_config_write_half_word(uint64_t phb_id, uint64_t bus_dev_func,
int64_t opal_pci_config_write_word(uint64_t phb_id, uint64_t bus_dev_func,
uint64_t offset, uint32_t data);
int64_t opal_set_xive(uint32_t isn, uint16_t server, uint8_t priority);
+int64_t opal_rm_set_xive(uint32_t isn, uint16_t server, uint8_t priority);
int64_t opal_get_xive(uint32_t isn, __be16 *server, uint8_t *priority);
int64_t opal_register_exception_handler(uint64_t opal_exception,
uint64_t handler_address,
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2e71518..b969abc 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3385,6 +3385,8 @@ static int kvmppc_set_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi)
if (i == pimap->n_mapped)
pimap->n_mapped++;
+ kvmppc_xics_set_mapped(kvm, guest_gsi, desc->irq_data.hwirq);
+
mutex_unlock(&kvm->lock);
return 0;
@@ -3421,6 +3423,8 @@ static int kvmppc_clr_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi)
return -ENODEV;
}
+ kvmppc_xics_clr_mapped(kvm, guest_gsi, pimap->mapped[i].r_hwirq);
+
/* invalidate the entry */
pimap->mapped[i].r_hwirq = 0;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 554cdfa..5f7527e 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -22,6 +22,7 @@
#include <asm/pgtable.h>
#include <asm/ppc-opcode.h>
#include <asm/pnv-pci.h>
+#include <asm/opal.h>
#include "book3s_xics.h"
@@ -34,6 +35,7 @@ EXPORT_SYMBOL(kvm_irq_bypass);
static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
u32 new_irq);
+static int xics_opal_rm_set_server(unsigned int hw_irq, int server_cpu);
/* -- ICS routines -- */
static void ics_rm_check_resend(struct kvmppc_xics *xics,
@@ -713,6 +715,13 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
icp->rm_action |= XICS_RM_NOTIFY_EOI;
icp->rm_eoied_irq = irq;
}
+
+ if (state->host_irq && state->intr_cpu != -1) {
+ int pcpu = cpu_first_thread_sibling(raw_smp_processor_id());
+ if (state->intr_cpu != pcpu)
+ xics_opal_rm_set_server(state->host_irq, pcpu);
+ state->intr_cpu = -1;
+ }
bail:
return check_too_hard(xics, icp);
}
@@ -736,6 +745,13 @@ static void icp_eoi(struct irq_chip *c, u32 hwirq, u32 xirr)
_stwcix(xics_phys + XICS_XIRR, xirr);
}
+static int xics_opal_rm_set_server(unsigned int hw_irq, int server_cpu)
+{
+ unsigned int mangle_cpu = get_hard_smp_processor_id(server_cpu) << 2;
+
+ return opal_rm_set_xive(hw_irq, mangle_cpu, DEFAULT_PRIORITY);
+}
+
/*
* Increment a per-CPU 32-bit unsigned integer variable.
* Safe to call in real-mode. Handles vmalloc'ed addresses
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index b41f1d3..222eaf9 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -99,6 +99,10 @@ static int ics_deliver_irq(struct kvmppc_xics *xics, u32 irq, u32 level)
return 0;
}
+ /* Record which CPU this arrived on for passed-through interrupts */
+ if (state->host_irq)
+ state->intr_cpu = raw_smp_processor_id();
+
/* Attempt delivery */
icp_deliver_irq(xics, NULL, irq);
@@ -1436,3 +1440,34 @@ int kvm_irq_map_chip_pin(struct kvm *kvm, unsigned irqchip, unsigned pin)
{
return pin;
}
+
+void kvmppc_xics_set_mapped(struct kvm *kvm, unsigned long irq,
+ unsigned long host_irq)
+{
+ struct kvmppc_xics *xics = kvm->arch.xics;
+ struct kvmppc_ics *ics;
+ u16 idx;
+
+ ics = kvmppc_xics_find_ics(xics, irq, &idx);
+ if (!ics)
+ return;
+
+ ics->irq_state[idx].host_irq = host_irq;
+ ics->irq_state[idx].intr_cpu = -1;
+}
+EXPORT_SYMBOL_GPL(kvmppc_xics_set_mapped);
+
+void kvmppc_xics_clr_mapped(struct kvm *kvm, unsigned long irq,
+ unsigned long host_irq)
+{
+ struct kvmppc_xics *xics = kvm->arch.xics;
+ struct kvmppc_ics *ics;
+ u16 idx;
+
+ ics = kvmppc_xics_find_ics(xics, irq, &idx);
+ if (!ics)
+ return;
+
+ ics->irq_state[idx].host_irq = 0;
+}
+EXPORT_SYMBOL_GPL(kvmppc_xics_clr_mapped);
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
index a46b954..2a50320 100644
--- a/arch/powerpc/kvm/book3s_xics.h
+++ b/arch/powerpc/kvm/book3s_xics.h
@@ -42,6 +42,8 @@ struct ics_irq_state {
u8 lsi; /* level-sensitive interrupt */
u8 asserted; /* Only for LSI */
u8 exists;
+ int intr_cpu;
+ u32 host_irq;
};
/* Atomic ICP state, updated with a single compare & swap */
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 3d29d40..44d2d84 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -208,6 +208,7 @@ OPAL_CALL(opal_pci_config_write_byte, OPAL_PCI_CONFIG_WRITE_BYTE);
OPAL_CALL(opal_pci_config_write_half_word, OPAL_PCI_CONFIG_WRITE_HALF_WORD);
OPAL_CALL(opal_pci_config_write_word, OPAL_PCI_CONFIG_WRITE_WORD);
OPAL_CALL(opal_set_xive, OPAL_SET_XIVE);
+OPAL_CALL_REAL(opal_rm_set_xive, OPAL_SET_XIVE);
OPAL_CALL(opal_get_xive, OPAL_GET_XIVE);
OPAL_CALL(opal_register_exception_handler, OPAL_REGISTER_OPAL_EXCEPTION_HANDLER);
OPAL_CALL(opal_pci_eeh_freeze_status, OPAL_PCI_EEH_FREEZE_STATUS);
--
2.8.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 13/13] KVM: PPC: Book3S HV: Counters for passthrough IRQ stats
2016-08-19 5:35 [PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM Paul Mackerras
` (11 preceding siblings ...)
2016-08-19 5:35 ` [PATCH 12/13] KVM: PPC: Book3S HV: Set server for passed-through interrupts Paul Mackerras
@ 2016-08-19 5:35 ` Paul Mackerras
2016-09-12 0:55 ` [PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM Paul Mackerras
13 siblings, 0 replies; 16+ messages in thread
From: Paul Mackerras @ 2016-08-19 5:35 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
From: Suresh Warrier <warrier@linux.vnet.ibm.com>
Add VCPU stat counters to track affinity for passthrough
interrupts.
pthru_all: Counts all passthrough interrupts whose IRQ mappings are
in the kvmppc_passthru_irq_map structure.
pthru_host: Counts all cached passthrough interrupts that were injected
from the host through kvm_set_irq (i.e. not handled in
real mode).
pthru_bad_aff: Counts how many cached passthrough interrupts have
bad affinity (receiving CPU is not running VCPU that is
the target of the virtual interrupt in the guest).
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
arch/powerpc/include/asm/kvm_host.h | 3 +++
arch/powerpc/kvm/book3s.c | 3 +++
arch/powerpc/kvm/book3s_hv_rm_xics.c | 18 +++++++++++++-----
3 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 3eb5092..f371a23 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -131,6 +131,9 @@ struct kvm_vcpu_stat {
u32 ld_slow;
u32 st_slow;
#endif
+ u32 pthru_all;
+ u32 pthru_host;
+ u32 pthru_bad_aff;
};
enum kvm_exit_types {
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 47018fc..6d0c45b 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -64,6 +64,9 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "ld_slow", VCPU_STAT(ld_slow) },
{ "st", VCPU_STAT(st) },
{ "st_slow", VCPU_STAT(st_slow) },
+ { "pthru_all", VCPU_STAT(pthru_all) },
+ { "pthru_host", VCPU_STAT(pthru_host) },
+ { "pthru_bad_aff", VCPU_STAT(pthru_bad_aff) },
{ NULL }
};
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 5f7527e..82ff5de 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -716,11 +716,19 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
icp->rm_eoied_irq = irq;
}
- if (state->host_irq && state->intr_cpu != -1) {
- int pcpu = cpu_first_thread_sibling(raw_smp_processor_id());
- if (state->intr_cpu != pcpu)
- xics_opal_rm_set_server(state->host_irq, pcpu);
- state->intr_cpu = -1;
+ if (state->host_irq) {
+ ++vcpu->stat.pthru_all;
+ if (state->intr_cpu != -1) {
+ int pcpu = raw_smp_processor_id();
+
+ pcpu = cpu_first_thread_sibling(pcpu);
+ ++vcpu->stat.pthru_host;
+ if (state->intr_cpu != pcpu) {
+ ++vcpu->stat.pthru_bad_aff;
+ xics_opal_rm_set_server(state->host_irq, pcpu);
+ }
+ state->intr_cpu = -1;
+ }
}
bail:
return check_too_hard(xics, icp);
--
2.8.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM
2016-08-19 5:35 [PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM Paul Mackerras
` (12 preceding siblings ...)
2016-08-19 5:35 ` [PATCH 13/13] KVM: PPC: Book3S HV: Counters for passthrough IRQ stats Paul Mackerras
@ 2016-09-12 0:55 ` Paul Mackerras
13 siblings, 0 replies; 16+ messages in thread
From: Paul Mackerras @ 2016-09-12 0:55 UTC (permalink / raw)
To: kvm, linuxppc-dev; +Cc: kvm-ppc
On Fri, Aug 19, 2016 at 03:35:44PM +1000, Paul Mackerras wrote:
> This patch set reduces the latency for presenting interrupts from PCI
> pass-through devices to a Book3S HV guest. Currently, if an interrupt
> arrives from a PCI pass-through device while a guest is running, it
> causes an exit of all threads on the core to the host, where the
> interrupt is handled by making an interrupt pending in the virtual
> XICS interrupt controller for the guest that owns the device.
> Furthermore, there is currently no attempt to direct PCI pass-through
> device interrupts to the physical core where the VCPU that they are
> directed to is running, so they often land on a different core and
> require an IPI to interrupt the VCPU.
>
> With this patch set, if the interrupt arrives on a core where the
> correct guest is running, it can be handled in hypervisor real mode
> without needing an exit to host context. If the destination VCPU is
> on the same core, then we can interrupt it using at most a msgsnd
> (message send) instruction, which is considerably faster than an IPI.
>
> Further, if an interrupt arrives on a different core, we then change
> the destination for the interrupt in the physical interrupt controller
> to point to the core where the VCPU is running. For now, we always
> direct the interrupt to thread 0 of the core because the other threads
> are offline from the point of view of the host, and the offline loop
> (which is where those other threads run when thread 0 is in host
> context) doesn't handle device interrupts.
>
> This patch set is based on a patch set from Suresh Warrier, with
> considerable revision by me. The data structure for mapping host
> interrupt numbers to guest interrupt numbers is just a flat array that
> is searched linearly, which works and is simple but could perform
> poorly with large numbers of interrupt sources. It would be simple to
> replace this mapping array with a more sophisticated data structure in
> future.
>
> To test the performance of this patch set, I used a network one-byte
> ping-pong test between a guest with a Mellanox CX-3 passed through to
> it, connected over 10Gb ethernet to another POWER8 system running
> bare-metal with a Chelsio 10Gb ethernet adapter. (The guest was
> running Ubuntu 16.04.1 under QEMU v2.7-rc2 on a POWER8.) Without this
> patchset, the round-trip latency was 43us, and with it the latency was
> 41us, a saving of 2us per round-trip.
Series applied to my kvm-ppc-next branch.
Paul.
^ permalink raw reply [flat|nested] 16+ messages in thread