* [PATCH 1/8] KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls
2013-04-11 5:52 [PATCH v4 0/8] In-kernel XICS interrupt controller emulation Paul Mackerras
@ 2013-04-11 5:53 ` Paul Mackerras
2013-04-11 5:54 ` [PATCH 2/8] KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller Paul Mackerras
` (7 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Paul Mackerras @ 2013-04-11 5:53 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm, kvm-ppc, Scott Wood
From: Michael Ellerman <michael@ellerman.id.au>
For pseries machine emulation, in order to move the interrupt
controller code to the kernel, we need to intercept some RTAS
calls in the kernel itself. This adds an infrastructure to allow
in-kernel handlers to be registered for RTAS services by name.
A new ioctl, KVM_PPC_RTAS_DEFINE_TOKEN, then allows userspace to
associate token values with those service names. Then, when the
guest requests an RTAS service with one of those token values, it
will be handled by the relevant in-kernel handler rather than being
passed up to userspace as at present.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
Documentation/virtual/kvm/api.txt | 19 ++++
arch/powerpc/include/asm/hvcall.h | 3 +
arch/powerpc/include/asm/kvm_host.h | 1 +
arch/powerpc/include/asm/kvm_ppc.h | 4 +
arch/powerpc/include/uapi/asm/kvm.h | 6 ++
arch/powerpc/kvm/Makefile | 1 +
arch/powerpc/kvm/book3s_hv.c | 18 +++-
arch/powerpc/kvm/book3s_pr.c | 1 +
arch/powerpc/kvm/book3s_pr_papr.c | 7 ++
arch/powerpc/kvm/book3s_rtas.c | 182 +++++++++++++++++++++++++++++++++++
arch/powerpc/kvm/powerpc.c | 8 ++
include/uapi/linux/kvm.h | 3 +
12 files changed, 252 insertions(+), 1 deletion(-)
create mode 100644 arch/powerpc/kvm/book3s_rtas.c
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 4c326ae..4247d65 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2325,6 +2325,25 @@ and distributor interface, the ioctl must be called after calling
KVM_CREATE_IRQCHIP, but before calling KVM_RUN on any of the VCPUs. Calling
this ioctl twice for any of the base addresses will return -EEXIST.
+4.82 KVM_PPC_RTAS_DEFINE_TOKEN
+
+Capability: KVM_CAP_PPC_RTAS
+Architectures: ppc
+Type: vm ioctl
+Parameters: struct kvm_rtas_token_args
+Returns: 0 on success, -1 on error
+
+Defines a token value for a RTAS (Run Time Abstraction Services)
+service in order to allow it to be handled in the kernel. The
+argument struct gives the name of the service, which must be the name
+of a service that has a kernel-side implementation. If the token
+value is non-zero, it will be associated with that service, and
+subsequent RTAS calls by the guest specifying that token will be
+handled by the kernel. If the token value is 0, then any token
+associated with the service will be forgotten, and subsequent RTAS
+calls by the guest for that service will be passed to userspace to be
+handled.
+
5. The kvm_run structure
------------------------
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 4bc2c3d..cf4df8e 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -270,6 +270,9 @@
#define H_SET_MODE 0x31C
#define MAX_HCALL_OPCODE H_SET_MODE
+/* Platform specific hcalls, used by KVM */
+#define H_RTAS 0xf000
+
#ifndef __ASSEMBLY__
/**
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 2a2e235..8fe8ef5 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -255,6 +255,7 @@ struct kvm_arch {
#endif /* CONFIG_KVM_BOOK3S_64_HV */
#ifdef CONFIG_PPC_BOOK3S_64
struct list_head spapr_tce_tables;
+ struct list_head rtas_tokens;
#endif
};
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index f54707f..f4e66c4 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -166,6 +166,10 @@ extern int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct kvm_get_htab_fd *);
int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
+extern int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void __user *argp);
+extern int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu);
+extern void kvmppc_rtas_tokens_free(struct kvm *kvm);
+
/*
* Cuts out inst bits with ordering according to spec.
* That means the leftmost bit is zero. All given bits are included.
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index ef072b1..a599ea5 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -299,6 +299,12 @@ struct kvm_allocate_rma {
__u64 rma_size;
};
+/* for KVM_CAP_PPC_RTAS */
+struct kvm_rtas_token_args {
+ char name[120];
+ __u64 token; /* Use a token of 0 to undefine a mapping */
+};
+
struct kvm_book3e_206_tlb_entry {
__u32 mas8;
__u32 mas1;
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 4a2277a..d2c8a88 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -86,6 +86,7 @@ kvm-book3s_64-module-objs := \
emulate.o \
book3s.o \
book3s_64_vio.o \
+ book3s_rtas.o \
$(kvm-book3s_64-objs-y)
kvm-objs-$(CONFIG_KVM_BOOK3S_64) := $(kvm-book3s_64-module-objs)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 1e521ba..c445a3d 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -479,7 +479,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
unsigned long req = kvmppc_get_gpr(vcpu, 3);
unsigned long target, ret = H_SUCCESS;
struct kvm_vcpu *tvcpu;
- int idx;
+ int idx, rc;
switch (req) {
case H_ENTER:
@@ -515,6 +515,19 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
kvmppc_get_gpr(vcpu, 5),
kvmppc_get_gpr(vcpu, 6));
break;
+ case H_RTAS:
+ if (list_empty(&vcpu->kvm->arch.rtas_tokens))
+ return RESUME_HOST;
+
+ rc = kvmppc_rtas_hcall(vcpu);
+
+ if (rc == -ENOENT)
+ return RESUME_HOST;
+ else if (rc == 0)
+ break;
+
+ /* Send the error out to userspace via KVM_RUN */
+ return rc;
default:
return RESUME_HOST;
}
@@ -1821,6 +1834,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
cpumask_setall(&kvm->arch.need_tlb_flush);
INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
+ INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
kvm->arch.rma = NULL;
@@ -1866,6 +1880,8 @@ void kvmppc_core_destroy_vm(struct kvm *kvm)
kvm->arch.rma = NULL;
}
+ kvmppc_rtas_tokens_free(kvm);
+
kvmppc_free_hpt(kvm);
WARN_ON(!list_empty(&kvm->arch.spapr_tce_tables));
}
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 286e23e..88790b8 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1298,6 +1298,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
{
#ifdef CONFIG_PPC64
INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
+ INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
#endif
if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index ee02b30..4efa4a4 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -246,6 +246,13 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
vcpu->stat.halt_wakeup++;
return EMULATE_DONE;
+ case H_RTAS:
+ if (list_empty(&vcpu->kvm->arch.rtas_tokens))
+ return RESUME_HOST;
+ if (kvmppc_rtas_hcall(vcpu))
+ break;
+ kvmppc_set_gpr(vcpu, 3, 0);
+ return EMULATE_DONE;
}
return EMULATE_FAIL;
diff --git a/arch/powerpc/kvm/book3s_rtas.c b/arch/powerpc/kvm/book3s_rtas.c
new file mode 100644
index 0000000..8a324e8
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_rtas.c
@@ -0,0 +1,182 @@
+/*
+ * Copyright 2012 Michael Ellerman, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/kvm_host.h>
+#include <linux/kvm.h>
+#include <linux/err.h>
+
+#include <asm/uaccess.h>
+#include <asm/kvm_book3s.h>
+#include <asm/kvm_ppc.h>
+#include <asm/hvcall.h>
+#include <asm/rtas.h>
+
+
+struct rtas_handler {
+ void (*handler)(struct kvm_vcpu *vcpu, struct rtas_args *args);
+ char *name;
+};
+
+static struct rtas_handler rtas_handlers[] = { };
+
+struct rtas_token_definition {
+ struct list_head list;
+ struct rtas_handler *handler;
+ u64 token;
+};
+
+static int rtas_name_matches(char *s1, char *s2)
+{
+ struct kvm_rtas_token_args args;
+ return !strncmp(s1, s2, sizeof(args.name));
+}
+
+static int rtas_token_undefine(struct kvm *kvm, char *name)
+{
+ struct rtas_token_definition *d, *tmp;
+
+ lockdep_assert_held(&kvm->lock);
+
+ list_for_each_entry_safe(d, tmp, &kvm->arch.rtas_tokens, list) {
+ if (rtas_name_matches(d->handler->name, name)) {
+ list_del(&d->list);
+ kfree(d);
+ return 0;
+ }
+ }
+
+ /* It's not an error to undefine an undefined token */
+ return 0;
+}
+
+static int rtas_token_define(struct kvm *kvm, char *name, u64 token)
+{
+ struct rtas_token_definition *d;
+ struct rtas_handler *h;
+ bool found;
+ int i;
+
+ lockdep_assert_held(&kvm->lock);
+
+ list_for_each_entry(d, &kvm->arch.rtas_tokens, list) {
+ if (d->token == token)
+ return -EEXIST;
+ }
+
+ found = false;
+ for (i = 0; i < ARRAY_SIZE(rtas_handlers); i++) {
+ h = &rtas_handlers[i];
+ if (rtas_name_matches(h->name, name)) {
+ found = true;
+ break;
+ }
+ }
+
+ if (!found)
+ return -ENOENT;
+
+ d = kzalloc(sizeof(*d), GFP_KERNEL);
+ if (!d)
+ return -ENOMEM;
+
+ d->handler = h;
+ d->token = token;
+
+ list_add_tail(&d->list, &kvm->arch.rtas_tokens);
+
+ return 0;
+}
+
+int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void __user *argp)
+{
+ struct kvm_rtas_token_args args;
+ int rc;
+
+ if (copy_from_user(&args, argp, sizeof(args)))
+ return -EFAULT;
+
+ mutex_lock(&kvm->lock);
+
+ if (args.token)
+ rc = rtas_token_define(kvm, args.name, args.token);
+ else
+ rc = rtas_token_undefine(kvm, args.name);
+
+ mutex_unlock(&kvm->lock);
+
+ return rc;
+}
+
+int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
+{
+ struct rtas_token_definition *d;
+ struct rtas_args args;
+ rtas_arg_t *orig_rets;
+ gpa_t args_phys;
+ int rc;
+
+ /* r4 contains the guest physical address of the RTAS args */
+ args_phys = kvmppc_get_gpr(vcpu, 4);
+
+ rc = kvm_read_guest(vcpu->kvm, args_phys, &args, sizeof(args));
+ if (rc)
+ goto fail;
+
+ /*
+ * args->rets is a pointer into args->args. Now that we've
+ * copied args we need to fix it up to point into our copy,
+ * not the guest args. We also need to save the original
+ * value so we can restore it on the way out.
+ */
+ orig_rets = args.rets;
+ args.rets = &args.args[args.nargs];
+
+ mutex_lock(&vcpu->kvm->lock);
+
+ rc = -ENOENT;
+ list_for_each_entry(d, &vcpu->kvm->arch.rtas_tokens, list) {
+ if (d->token == args.token) {
+ d->handler->handler(vcpu, &args);
+ rc = 0;
+ break;
+ }
+ }
+
+ mutex_unlock(&vcpu->kvm->lock);
+
+ if (rc == 0) {
+ args.rets = orig_rets;
+ rc = kvm_write_guest(vcpu->kvm, args_phys, &args, sizeof(args));
+ if (rc)
+ goto fail;
+ }
+
+ return rc;
+
+fail:
+ /*
+ * We only get here if the guest has called RTAS with a bogus
+ * args pointer. That means we can't get to the args, and so we
+ * can't fail the RTAS call. So fail right out to userspace,
+ * which should kill the guest.
+ */
+ return rc;
+}
+
+void kvmppc_rtas_tokens_free(struct kvm *kvm)
+{
+ struct rtas_token_definition *d, *tmp;
+
+ lockdep_assert_held(&kvm->lock);
+
+ list_for_each_entry_safe(d, tmp, &kvm->arch.rtas_tokens, list) {
+ list_del(&d->list);
+ kfree(d);
+ }
+}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 5306ca5..5cde31d 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -340,6 +340,7 @@ int kvm_dev_ioctl_check_extension(long ext)
#ifdef CONFIG_PPC_BOOK3S_64
case KVM_CAP_SPAPR_TCE:
case KVM_CAP_PPC_ALLOC_HTAB:
+ case KVM_CAP_PPC_RTAS:
r = 1;
break;
#endif /* CONFIG_PPC_BOOK3S_64 */
@@ -980,6 +981,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
#ifdef CONFIG_KVM_BOOK3S_64_HV
case KVM_ALLOCATE_RMA: {
struct kvm_allocate_rma rma;
+ struct kvm *kvm = filp->private_data;
r = kvm_vm_ioctl_allocate_rma(kvm, &rma);
if (r >= 0 && copy_to_user(argp, &rma, sizeof(rma)))
@@ -1024,6 +1026,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = -EFAULT;
break;
}
+ case KVM_PPC_RTAS_DEFINE_TOKEN: {
+ struct kvm *kvm = filp->private_data;
+
+ r = kvm_vm_ioctl_rtas_define_token(kvm, argp);
+ break;
+ }
#endif /* CONFIG_PPC_BOOK3S_64 */
default:
r = -ENOTTY;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 22fce7b..373f4aa 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -670,6 +670,7 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_ARM_SET_DEVICE_ADDR 88
#define KVM_CAP_DEVICE_CTRL 89
#define KVM_CAP_IRQ_MPIC 90
+#define KVM_CAP_PPC_RTAS 91
#ifdef KVM_CAP_IRQ_ROUTING
@@ -909,6 +910,8 @@ struct kvm_s390_ucas_mapping {
#define KVM_PPC_GET_HTAB_FD _IOW(KVMIO, 0xaa, struct kvm_get_htab_fd)
/* Available with KVM_CAP_ARM_SET_DEVICE_ADDR */
#define KVM_ARM_SET_DEVICE_ADDR _IOW(KVMIO, 0xab, struct kvm_arm_device_addr)
+/* Available with KVM_CAP_PPC_RTAS */
+#define KVM_PPC_RTAS_DEFINE_TOKEN _IOW(KVMIO, 0xac, struct kvm_rtas_token_args)
/*
* Device control API, available with KVM_CAP_DEVICE_CTRL
--
1.7.10.4
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH 2/8] KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller
2013-04-11 5:52 [PATCH v4 0/8] In-kernel XICS interrupt controller emulation Paul Mackerras
2013-04-11 5:53 ` [PATCH 1/8] KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls Paul Mackerras
@ 2013-04-11 5:54 ` Paul Mackerras
2013-04-11 5:54 ` [PATCH 3/8] KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVM Paul Mackerras
` (6 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Paul Mackerras @ 2013-04-11 5:54 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm, kvm-ppc, Scott Wood
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds in-kernel emulation of the XICS (eXternal Interrupt
Controller Specification) interrupt controller specified by PAPR, for
both HV and PR KVM guests.
The XICS emulation supports up to 1048560 interrupt sources.
Interrupt source numbers below 16 are reserved; 0 is used to mean no
interrupt and 2 is used for IPIs. Internally these are represented in
blocks of 1024, called ICS (interrupt controller source) entities, but
that is not visible to userspace.
Each vcpu gets one ICP (interrupt controller presentation) entity,
used to store the per-vcpu state such as vcpu priority, pending
interrupt state, IPI request, etc.
This does not include any API or any way to connect vcpus to their
ICP state; that will be added in later patches.
This is based on an initial implementation by Michael Ellerman
<michael@ellerman.id.au> reworked by Benjamin Herrenschmidt and
Paul Mackerras.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
arch/powerpc/include/asm/kvm_book3s.h | 1 +
arch/powerpc/include/asm/kvm_host.h | 9 +
arch/powerpc/include/asm/kvm_ppc.h | 27 +
arch/powerpc/include/uapi/asm/kvm.h | 1 +
arch/powerpc/kvm/Kconfig | 8 +
arch/powerpc/kvm/Makefile | 3 +
arch/powerpc/kvm/book3s.c | 2 +-
arch/powerpc/kvm/book3s_hv.c | 9 +
arch/powerpc/kvm/book3s_pr_papr.c | 16 +
arch/powerpc/kvm/book3s_rtas.c | 51 +-
arch/powerpc/kvm/book3s_xics.c | 942 +++++++++++++++++++++++++++++++++
arch/powerpc/kvm/book3s_xics.h | 113 ++++
arch/powerpc/kvm/powerpc.c | 5 +
13 files changed, 1185 insertions(+), 2 deletions(-)
create mode 100644 arch/powerpc/kvm/book3s_xics.c
create mode 100644 arch/powerpc/kvm/book3s_xics.h
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 5a56e1c..17c9a15 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -142,6 +142,7 @@ extern int kvmppc_mmu_hv_init(void);
extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data);
extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data);
extern void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec);
+extern void kvmppc_book3s_dequeue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec);
extern void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags);
extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat,
bool upper, u32 val);
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 8fe8ef5..c7497dd 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -188,6 +188,10 @@ struct kvmppc_linear_info {
int type;
};
+/* XICS components, defined in boo3s_xics.c */
+struct kvmppc_xics;
+struct kvmppc_icp;
+
/*
* The reverse mapping array has one entry for each HPTE,
* which stores the guest's view of the second word of the HPTE
@@ -256,6 +260,7 @@ struct kvm_arch {
#ifdef CONFIG_PPC_BOOK3S_64
struct list_head spapr_tce_tables;
struct list_head rtas_tokens;
+ struct kvmppc_xics *xics;
#endif
};
@@ -378,6 +383,7 @@ struct kvmppc_booke_debug_reg {
#define KVMPPC_IRQ_DEFAULT 0
#define KVMPPC_IRQ_MPIC 1
+#define KVMPPC_IRQ_XICS 2
struct openpic;
@@ -562,6 +568,9 @@ struct kvm_vcpu_arch {
int irq_type; /* one of KVM_IRQ_* */
struct openpic *mpic; /* KVM_IRQ_MPIC */
+#ifdef CONFIG_KVM_XICS
+ struct kvmppc_icp *icp; /* XICS presentation controller */
+#endif
#ifdef CONFIG_KVM_BOOK3S_64_HV
struct kvm_vcpu_arch_shared shregs;
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index f4e66c4..5f5821b 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -130,6 +130,8 @@ extern long kvmppc_prepare_vrma(struct kvm *kvm,
extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu,
struct kvm_memory_slot *memslot, unsigned long porder);
extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
+extern int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd);
+
extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
@@ -169,6 +171,8 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
extern int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void __user *argp);
extern int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu);
extern void kvmppc_rtas_tokens_free(struct kvm *kvm);
+extern int kvmppc_xics_set_xive(struct kvm *kvm, u32 irq, u32 server, u32 priority);
+extern int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority);
/*
* Cuts out inst bits with ordering according to spec.
@@ -268,6 +272,27 @@ static inline void kvmppc_set_xics_phys(int cpu, unsigned long addr)
static inline void kvm_linear_init(void)
{}
+
+#endif
+
+#ifdef CONFIG_PPC_BOOK3S_64
+static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.irq_type == KVMPPC_IRQ_XICS;
+}
+extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu);
+extern int kvmppc_xics_create_icp(struct kvm_vcpu *vcpu, unsigned long server);
+extern int kvm_vm_ioctl_xics_irq(struct kvm *kvm, struct kvm_irq_level *args);
+#else
+static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu)
+ { return 0; }
+static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { }
+static inline int kvmppc_xics_create_icp(struct kvm_vcpu *vcpu,
+ unsigned long server)
+ { return -EINVAL; }
+static inline int kvm_vm_ioctl_xics_irq(struct kvm *kvm,
+ struct kvm_irq_level *args)
+ { return -ENOTTY; }
#endif
static inline void kvmppc_set_epr(struct kvm_vcpu *vcpu, u32 epr)
@@ -282,6 +307,8 @@ static inline void kvmppc_set_epr(struct kvm_vcpu *vcpu, u32 epr)
void kvmppc_mpic_set_epr(struct kvm_vcpu *vcpu);
int kvmppc_mpic_connect_vcpu(struct file *mpic_filp, struct kvm_vcpu *vcpu,
u32 cpu);
+int kvmppc_xics_connect_vcpu(struct file *xics_filp, struct kvm_vcpu *vcpu,
+ u32 cpu);
int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu *vcpu,
struct kvm_config_tlb *cfg);
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index a599ea5..6beb876 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -428,4 +428,5 @@ struct kvm_get_htab_header {
#define KVM_REG_PPC_CLEAR_TSR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x88)
#define KVM_REG_PPC_TCR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x89)
#define KVM_REG_PPC_TSR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x8a)
+
#endif /* __LINUX_KVM_POWERPC_H */
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index a87139b..c036c26 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -155,6 +155,14 @@ config KVM_MPIC
bool "KVM in-kernel MPIC emulation"
depends on KVM
+config KVM_XICS
+ bool "KVM in-kernel XICS emulation"
+ depends on KVM_BOOK3S_64
+ ---help---
+ Include support for the XICS (eXternal Interrupt Controller
+ Specification) interrupt controller architecture used on
+ IBM POWER (pSeries) servers.
+
source drivers/vhost/Kconfig
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index d2c8a88..cccd85f 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -79,6 +79,9 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
book3s_hv_ras.o \
book3s_hv_builtin.o
+kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \
+ book3s_xics.o
+
kvm-book3s_64-module-objs := \
../../../virt/kvm/kvm_main.o \
../../../virt/kvm/eventfd.o \
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 6548445..c5a4478 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -104,7 +104,7 @@ static int kvmppc_book3s_vec2irqprio(unsigned int vec)
return prio;
}
-static void kvmppc_book3s_dequeue_irqprio(struct kvm_vcpu *vcpu,
+void kvmppc_book3s_dequeue_irqprio(struct kvm_vcpu *vcpu,
unsigned int vec)
{
unsigned long old_pending = vcpu->arch.pending_exceptions;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index c445a3d..506b5ea 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -528,6 +528,15 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
/* Send the error out to userspace via KVM_RUN */
return rc;
+
+ case H_XIRR:
+ case H_CPPR:
+ case H_EOI:
+ case H_IPI:
+ if (kvmppc_xics_enabled(vcpu)) {
+ ret = kvmppc_xics_hcall(vcpu, req);
+ break;
+ } /* fallthrough */
default:
return RESUME_HOST;
}
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index 4efa4a4..9802245 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -227,6 +227,15 @@ static int kvmppc_h_pr_put_tce(struct kvm_vcpu *vcpu)
return EMULATE_DONE;
}
+static int kvmppc_h_pr_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
+{
+ long rc = kvmppc_xics_hcall(vcpu, cmd);
+ if (rc == H_TOO_HARD)
+ return EMULATE_FAIL;
+ kvmppc_set_gpr(vcpu, 3, rc);
+ return EMULATE_DONE;
+}
+
int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
{
switch (cmd) {
@@ -246,6 +255,13 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
vcpu->stat.halt_wakeup++;
return EMULATE_DONE;
+ case H_XIRR:
+ case H_CPPR:
+ case H_EOI:
+ case H_IPI:
+ if (kvmppc_xics_enabled(vcpu))
+ return kvmppc_h_pr_xics_hcall(vcpu, cmd);
+ break;
case H_RTAS:
if (list_empty(&vcpu->kvm->arch.rtas_tokens))
return RESUME_HOST;
diff --git a/arch/powerpc/kvm/book3s_rtas.c b/arch/powerpc/kvm/book3s_rtas.c
index 8a324e8..6a6c1fe 100644
--- a/arch/powerpc/kvm/book3s_rtas.c
+++ b/arch/powerpc/kvm/book3s_rtas.c
@@ -18,12 +18,61 @@
#include <asm/rtas.h>
+static void kvm_rtas_set_xive(struct kvm_vcpu *vcpu, struct rtas_args *args)
+{
+ u32 irq, server, priority;
+ int rc;
+
+ if (args->nargs != 3 || args->nret != 1) {
+ rc = -3;
+ goto out;
+ }
+
+ irq = args->args[0];
+ server = args->args[1];
+ priority = args->args[2];
+
+ rc = kvmppc_xics_set_xive(vcpu->kvm, irq, server, priority);
+ if (rc)
+ rc = -3;
+out:
+ args->rets[0] = rc;
+}
+
+static void kvm_rtas_get_xive(struct kvm_vcpu *vcpu, struct rtas_args *args)
+{
+ u32 irq, server, priority;
+ int rc;
+
+ if (args->nargs != 1 || args->nret != 3) {
+ rc = -3;
+ goto out;
+ }
+
+ irq = args->args[0];
+
+ server = priority = 0;
+ rc = kvmppc_xics_get_xive(vcpu->kvm, irq, &server, &priority);
+ if (rc) {
+ rc = -3;
+ goto out;
+ }
+
+ args->rets[1] = server;
+ args->rets[2] = priority;
+out:
+ args->rets[0] = rc;
+}
+
struct rtas_handler {
void (*handler)(struct kvm_vcpu *vcpu, struct rtas_args *args);
char *name;
};
-static struct rtas_handler rtas_handlers[] = { };
+static struct rtas_handler rtas_handlers[] = {
+ { .name = "ibm,set-xive", .handler = kvm_rtas_set_xive },
+ { .name = "ibm,get-xive", .handler = kvm_rtas_get_xive },
+};
struct rtas_token_definition {
struct list_head list;
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
new file mode 100644
index 0000000..175a3c2
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -0,0 +1,942 @@
+/*
+ * Copyright 2012 Michael Ellerman, IBM Corporation.
+ * Copyright 2012 Benjamin Herrenschmidt, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/kvm_host.h>
+#include <linux/err.h>
+#include <linux/gfp.h>
+
+#include <asm/uaccess.h>
+#include <asm/kvm_book3s.h>
+#include <asm/kvm_ppc.h>
+#include <asm/hvcall.h>
+#include <asm/xics.h>
+#include <asm/debug.h>
+
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+
+#include "book3s_xics.h"
+
+#define XICS_DBG(fmt...) do { } while (0)
+//#define XICS_DBG(fmt...) do { trace_printk(fmt); } while (0)
+
+/*
+ * LOCKING
+ * =======
+ *
+ * Each ICS has a mutex protecting the information about the IRQ
+ * sources and avoiding simultaneous deliveries if the same interrupt.
+ *
+ * ICP operations are done via a single compare & swap transaction
+ * (most ICP state fits in the union kvmppc_icp_state)
+ */
+
+/*
+ * TODO
+ * ====
+ *
+ * - To speed up resends, keep a bitmap of "resend" set bits in the
+ * ICS
+ *
+ * - Speed up server# -> ICP lookup (array ? hash table ?)
+ *
+ * - Make ICS lockless as well, or at least a per-interrupt lock or hashed
+ * locks array to improve scalability
+ *
+ * - ioctl's to save/restore the entire state for snapshot & migration
+ */
+
+/* -- ICS routines -- */
+
+static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
+ u32 new_irq);
+
+static int ics_deliver_irq(struct kvmppc_xics *xics, u32 irq, u32 level)
+{
+ struct ics_irq_state *state;
+ struct kvmppc_ics *ics;
+ u16 src;
+
+ XICS_DBG("ics deliver %#x (level: %d)\n", irq, level);
+
+ ics = kvmppc_xics_find_ics(xics, irq, &src);
+ if (!ics) {
+ XICS_DBG("ics_deliver_irq: IRQ 0x%06x not found !\n", irq);
+ return -EINVAL;
+ }
+ state = &ics->irq_state[src];
+ if (!state->exists)
+ return -EINVAL;
+
+ /*
+ * We set state->asserted locklessly. This should be fine as
+ * we are the only setter, thus concurrent access is undefined
+ * to begin with.
+ */
+ if (level == KVM_INTERRUPT_SET_LEVEL)
+ state->asserted = 1;
+ else if (level == KVM_INTERRUPT_UNSET) {
+ state->asserted = 0;
+ return 0;
+ }
+
+ /* Attempt delivery */
+ icp_deliver_irq(xics, NULL, irq);
+
+ return 0;
+}
+
+static void ics_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
+ struct kvmppc_icp *icp)
+{
+ int i;
+
+ mutex_lock(&ics->lock);
+
+ for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
+ struct ics_irq_state *state = &ics->irq_state[i];
+
+ if (!state->resend)
+ continue;
+
+ XICS_DBG("resend %#x prio %#x\n", state->number,
+ state->priority);
+
+ mutex_unlock(&ics->lock);
+ icp_deliver_irq(xics, icp, state->number);
+ mutex_lock(&ics->lock);
+ }
+
+ mutex_unlock(&ics->lock);
+}
+
+int kvmppc_xics_set_xive(struct kvm *kvm, u32 irq, u32 server, u32 priority)
+{
+ struct kvmppc_xics *xics = kvm->arch.xics;
+ struct kvmppc_icp *icp;
+ struct kvmppc_ics *ics;
+ struct ics_irq_state *state;
+ u16 src;
+ bool deliver;
+
+ if (!xics)
+ return -ENODEV;
+
+ ics = kvmppc_xics_find_ics(xics, irq, &src);
+ if (!ics)
+ return -EINVAL;
+ state = &ics->irq_state[src];
+
+ icp = kvmppc_xics_find_server(kvm, server);
+ if (!icp)
+ return -EINVAL;
+
+ mutex_lock(&ics->lock);
+
+ XICS_DBG("set_xive %#x server %#x prio %#x MP:%d RS:%d\n",
+ irq, server, priority,
+ state->masked_pending, state->resend);
+
+ state->server = server;
+ state->priority = priority;
+ deliver = false;
+ if ((state->masked_pending || state->resend) && priority != MASKED) {
+ state->masked_pending = 0;
+ deliver = true;
+ }
+
+ mutex_unlock(&ics->lock);
+
+ if (deliver)
+ icp_deliver_irq(xics, icp, irq);
+
+ return 0;
+}
+
+int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority)
+{
+ struct kvmppc_xics *xics = kvm->arch.xics;
+ struct kvmppc_ics *ics;
+ struct ics_irq_state *state;
+ u16 src;
+
+ if (!xics)
+ return -ENODEV;
+
+ ics = kvmppc_xics_find_ics(xics, irq, &src);
+ if (!ics)
+ return -EINVAL;
+ state = &ics->irq_state[src];
+
+ mutex_lock(&ics->lock);
+ *server = state->server;
+ *priority = state->priority;
+ mutex_unlock(&ics->lock);
+
+ return 0;
+}
+
+/* -- ICP routines, including hcalls -- */
+
+static inline bool icp_try_update(struct kvmppc_icp *icp,
+ union kvmppc_icp_state old,
+ union kvmppc_icp_state new,
+ bool change_self)
+{
+ bool success;
+
+ /* Calculate new output value */
+ new.out_ee = (new.xisr && (new.pending_pri < new.cppr));
+
+ /* Attempt atomic update */
+ success = cmpxchg64(&icp->state.raw, old.raw, new.raw) == old.raw;
+ if (!success)
+ goto bail;
+
+ XICS_DBG("UPD [%04x] - C:%02x M:%02x PP: %02x PI:%06x R:%d O:%d\n",
+ icp->vcpu->vcpu_id,
+ old.cppr, old.mfrr, old.pending_pri, old.xisr,
+ old.need_resend, old.out_ee);
+ XICS_DBG("UPD - C:%02x M:%02x PP: %02x PI:%06x R:%d O:%d\n",
+ new.cppr, new.mfrr, new.pending_pri, new.xisr,
+ new.need_resend, new.out_ee);
+ /*
+ * Check for output state update
+ *
+ * Note that this is racy since another processor could be updating
+ * the state already. This is why we never clear the interrupt output
+ * here, we only ever set it. The clear only happens prior to doing
+ * an update and only by the processor itself. Currently we do it
+ * in Accept (H_XIRR) and Up_Cppr (H_XPPR).
+ *
+ * We also do not try to figure out whether the EE state has changed,
+ * we unconditionally set it if the new state calls for it for the
+ * same reason.
+ */
+ if (new.out_ee) {
+ kvmppc_book3s_queue_irqprio(icp->vcpu,
+ BOOK3S_INTERRUPT_EXTERNAL_LEVEL);
+ if (!change_self)
+ kvm_vcpu_kick(icp->vcpu);
+ }
+ bail:
+ return success;
+}
+
+static void icp_check_resend(struct kvmppc_xics *xics,
+ struct kvmppc_icp *icp)
+{
+ u32 icsid;
+
+ /* Order this load with the test for need_resend in the caller */
+ smp_rmb();
+ for_each_set_bit(icsid, icp->resend_map, xics->max_icsid + 1) {
+ struct kvmppc_ics *ics = xics->ics[icsid];
+
+ if (!test_and_clear_bit(icsid, icp->resend_map))
+ continue;
+ if (!ics)
+ continue;
+ ics_check_resend(xics, ics, icp);
+ }
+}
+
+static bool icp_try_to_deliver(struct kvmppc_icp *icp, u32 irq, u8 priority,
+ u32 *reject)
+{
+ union kvmppc_icp_state old_state, new_state;
+ bool success;
+
+ XICS_DBG("try deliver %#x(P:%#x) to server %#x\n", irq, priority,
+ icp->vcpu->vcpu_id);
+
+ do {
+ old_state = new_state = ACCESS_ONCE(icp->state);
+
+ *reject = 0;
+
+ /* See if we can deliver */
+ success = new_state.cppr > priority &&
+ new_state.mfrr > priority &&
+ new_state.pending_pri > priority;
+
+ /*
+ * If we can, check for a rejection and perform the
+ * delivery
+ */
+ if (success) {
+ *reject = new_state.xisr;
+ new_state.xisr = irq;
+ new_state.pending_pri = priority;
+ } else {
+ /*
+ * If we failed to deliver we set need_resend
+ * so a subsequent CPPR state change causes us
+ * to try a new delivery.
+ */
+ new_state.need_resend = true;
+ }
+
+ } while (!icp_try_update(icp, old_state, new_state, false));
+
+ return success;
+}
+
+static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
+ u32 new_irq)
+{
+ struct ics_irq_state *state;
+ struct kvmppc_ics *ics;
+ u32 reject;
+ u16 src;
+
+ /*
+ * This is used both for initial delivery of an interrupt and
+ * for subsequent rejection.
+ *
+ * Rejection can be racy vs. resends. We have evaluated the
+ * rejection in an atomic ICP transaction which is now complete,
+ * so potentially the ICP can already accept the interrupt again.
+ *
+ * So we need to retry the delivery. Essentially the reject path
+ * boils down to a failed delivery. Always.
+ *
+ * Now the interrupt could also have moved to a different target,
+ * thus we may need to re-do the ICP lookup as well
+ */
+
+ again:
+ /* Get the ICS state and lock it */
+ ics = kvmppc_xics_find_ics(xics, new_irq, &src);
+ if (!ics) {
+ XICS_DBG("icp_deliver_irq: IRQ 0x%06x not found !\n", new_irq);
+ return;
+ }
+ state = &ics->irq_state[src];
+
+ /* Get a lock on the ICS */
+ mutex_lock(&ics->lock);
+
+ /* Get our server */
+ if (!icp || state->server != icp->vcpu->vcpu_id) {
+ icp = kvmppc_xics_find_server(xics->kvm, state->server);
+ if (!icp) {
+ pr_warning("icp_deliver_irq: IRQ 0x%06x server 0x%x"
+ " not found !\n", new_irq, state->server);
+ goto out;
+ }
+ }
+
+ /* Clear the resend bit of that interrupt */
+ state->resend = 0;
+
+ /*
+ * If masked, bail out
+ *
+ * Note: PAPR doesn't mention anything about masked pending
+ * when doing a resend, only when doing a delivery.
+ *
+ * However that would have the effect of losing a masked
+ * interrupt that was rejected and isn't consistent with
+ * the whole masked_pending business which is about not
+ * losing interrupts that occur while masked.
+ *
+ * I don't differenciate normal deliveries and resends, this
+ * implementation will differ from PAPR and not lose such
+ * interrupts.
+ */
+ if (state->priority == MASKED) {
+ XICS_DBG("irq %#x masked pending\n", new_irq);
+ state->masked_pending = 1;
+ goto out;
+ }
+
+ /*
+ * Try the delivery, this will set the need_resend flag
+ * in the ICP as part of the atomic transaction if the
+ * delivery is not possible.
+ *
+ * Note that if successful, the new delivery might have itself
+ * rejected an interrupt that was "delivered" before we took the
+ * icp mutex.
+ *
+ * In this case we do the whole sequence all over again for the
+ * new guy. We cannot assume that the rejected interrupt is less
+ * favored than the new one, and thus doesn't need to be delivered,
+ * because by the time we exit icp_try_to_deliver() the target
+ * processor may well have alrady consumed & completed it, and thus
+ * the rejected interrupt might actually be already acceptable.
+ */
+ if (icp_try_to_deliver(icp, new_irq, state->priority, &reject)) {
+ /*
+ * Delivery was successful, did we reject somebody else ?
+ */
+ if (reject && reject != XICS_IPI) {
+ mutex_unlock(&ics->lock);
+ new_irq = reject;
+ goto again;
+ }
+ } else {
+ /*
+ * We failed to deliver the interrupt we need to set the
+ * resend map bit and mark the ICS state as needing a resend
+ */
+ set_bit(ics->icsid, icp->resend_map);
+ state->resend = 1;
+
+ /*
+ * If the need_resend flag got cleared in the ICP some time
+ * between icp_try_to_deliver() atomic update and now, then
+ * we know it might have missed the resend_map bit. So we
+ * retry
+ */
+ smp_mb();
+ if (!icp->state.need_resend) {
+ mutex_unlock(&ics->lock);
+ goto again;
+ }
+ }
+ out:
+ mutex_unlock(&ics->lock);
+}
+
+static void icp_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
+ u8 new_cppr)
+{
+ union kvmppc_icp_state old_state, new_state;
+ bool resend;
+
+ /*
+ * This handles several related states in one operation:
+ *
+ * ICP State: Down_CPPR
+ *
+ * Load CPPR with new value and if the XISR is 0
+ * then check for resends:
+ *
+ * ICP State: Resend
+ *
+ * If MFRR is more favored than CPPR, check for IPIs
+ * and notify ICS of a potential resend. This is done
+ * asynchronously (when used in real mode, we will have
+ * to exit here).
+ *
+ * We do not handle the complete Check_IPI as documented
+ * here. In the PAPR, this state will be used for both
+ * Set_MFRR and Down_CPPR. However, we know that we aren't
+ * changing the MFRR state here so we don't need to handle
+ * the case of an MFRR causing a reject of a pending irq,
+ * this will have been handled when the MFRR was set in the
+ * first place.
+ *
+ * Thus we don't have to handle rejects, only resends.
+ *
+ * When implementing real mode for HV KVM, resend will lead to
+ * a H_TOO_HARD return and the whole transaction will be handled
+ * in virtual mode.
+ */
+ do {
+ old_state = new_state = ACCESS_ONCE(icp->state);
+
+ /* Down_CPPR */
+ new_state.cppr = new_cppr;
+
+ /*
+ * Cut down Resend / Check_IPI / IPI
+ *
+ * The logic is that we cannot have a pending interrupt
+ * trumped by an IPI at this point (see above), so we
+ * know that either the pending interrupt is already an
+ * IPI (in which case we don't care to override it) or
+ * it's either more favored than us or non existent
+ */
+ if (new_state.mfrr < new_cppr &&
+ new_state.mfrr <= new_state.pending_pri) {
+ WARN_ON(new_state.xisr != XICS_IPI &&
+ new_state.xisr != 0);
+ new_state.pending_pri = new_state.mfrr;
+ new_state.xisr = XICS_IPI;
+ }
+
+ /* Latch/clear resend bit */
+ resend = new_state.need_resend;
+ new_state.need_resend = 0;
+
+ } while (!icp_try_update(icp, old_state, new_state, true));
+
+ /*
+ * Now handle resend checks. Those are asynchronous to the ICP
+ * state update in HW (ie bus transactions) so we can handle them
+ * separately here too
+ */
+ if (resend)
+ icp_check_resend(xics, icp);
+}
+
+static noinline unsigned long h_xirr(struct kvm_vcpu *vcpu)
+{
+ union kvmppc_icp_state old_state, new_state;
+ struct kvmppc_icp *icp = vcpu->arch.icp;
+ u32 xirr;
+
+ /* First, remove EE from the processor */
+ kvmppc_book3s_dequeue_irqprio(icp->vcpu,
+ BOOK3S_INTERRUPT_EXTERNAL_LEVEL);
+
+ /*
+ * ICP State: Accept_Interrupt
+ *
+ * Return the pending interrupt (if any) along with the
+ * current CPPR, then clear the XISR & set CPPR to the
+ * pending priority
+ */
+ do {
+ old_state = new_state = ACCESS_ONCE(icp->state);
+
+ xirr = old_state.xisr | (((u32)old_state.cppr) << 24);
+ if (!old_state.xisr)
+ break;
+ new_state.cppr = new_state.pending_pri;
+ new_state.pending_pri = 0xff;
+ new_state.xisr = 0;
+
+ } while (!icp_try_update(icp, old_state, new_state, true));
+
+ XICS_DBG("h_xirr vcpu %d xirr %#x\n", vcpu->vcpu_id, xirr);
+
+ return xirr;
+}
+
+static noinline int h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
+ unsigned long mfrr)
+{
+ union kvmppc_icp_state old_state, new_state;
+ struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+ struct kvmppc_icp *icp;
+ u32 reject;
+ bool resend;
+ bool local;
+
+ XICS_DBG("h_ipi vcpu %d to server %lu mfrr %#lx\n",
+ vcpu->vcpu_id, server, mfrr);
+
+ local = vcpu->vcpu_id == server;
+ if (local)
+ icp = vcpu->arch.icp;
+ else
+ icp = kvmppc_xics_find_server(vcpu->kvm, server);
+ if (!icp)
+ return H_PARAMETER;
+
+ /*
+ * ICP state: Set_MFRR
+ *
+ * If the CPPR is more favored than the new MFRR, then
+ * nothing needs to be rejected as there can be no XISR to
+ * reject. If the MFRR is being made less favored then
+ * there might be a previously-rejected interrupt needing
+ * to be resent.
+ *
+ * If the CPPR is less favored, then we might be replacing
+ * an interrupt, and thus need to possibly reject it as in
+ *
+ * ICP state: Check_IPI
+ */
+ do {
+ old_state = new_state = ACCESS_ONCE(icp->state);
+
+ /* Set_MFRR */
+ new_state.mfrr = mfrr;
+
+ /* Check_IPI */
+ reject = 0;
+ resend = false;
+ if (mfrr < new_state.cppr) {
+ /* Reject a pending interrupt if not an IPI */
+ if (mfrr <= new_state.pending_pri)
+ reject = new_state.xisr;
+ new_state.pending_pri = mfrr;
+ new_state.xisr = XICS_IPI;
+ }
+
+ if (mfrr > old_state.mfrr && mfrr > new_state.cppr) {
+ resend = new_state.need_resend;
+ new_state.need_resend = 0;
+ }
+ } while (!icp_try_update(icp, old_state, new_state, local));
+
+ /* Handle reject */
+ if (reject && reject != XICS_IPI)
+ icp_deliver_irq(xics, icp, reject);
+
+ /* Handle resend */
+ if (resend)
+ icp_check_resend(xics, icp);
+
+ return H_SUCCESS;
+}
+
+static noinline void h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
+{
+ union kvmppc_icp_state old_state, new_state;
+ struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+ struct kvmppc_icp *icp = vcpu->arch.icp;
+ u32 reject;
+
+ XICS_DBG("h_cppr vcpu %d cppr %#lx\n", vcpu->vcpu_id, cppr);
+
+ /*
+ * ICP State: Set_CPPR
+ *
+ * We can safely compare the new value with the current
+ * value outside of the transaction as the CPPR is only
+ * ever changed by the processor on itself
+ */
+ if (cppr > icp->state.cppr)
+ icp_down_cppr(xics, icp, cppr);
+ else if (cppr == icp->state.cppr)
+ return;
+
+ /*
+ * ICP State: Up_CPPR
+ *
+ * The processor is raising its priority, this can result
+ * in a rejection of a pending interrupt:
+ *
+ * ICP State: Reject_Current
+ *
+ * We can remove EE from the current processor, the update
+ * transaction will set it again if needed
+ */
+ kvmppc_book3s_dequeue_irqprio(icp->vcpu,
+ BOOK3S_INTERRUPT_EXTERNAL_LEVEL);
+
+ do {
+ old_state = new_state = ACCESS_ONCE(icp->state);
+
+ reject = 0;
+ new_state.cppr = cppr;
+
+ if (cppr <= new_state.pending_pri) {
+ reject = new_state.xisr;
+ new_state.xisr = 0;
+ new_state.pending_pri = 0xff;
+ }
+
+ } while (!icp_try_update(icp, old_state, new_state, true));
+
+ /*
+ * Check for rejects. They are handled by doing a new delivery
+ * attempt (see comments in icp_deliver_irq).
+ */
+ if (reject && reject != XICS_IPI)
+ icp_deliver_irq(xics, icp, reject);
+}
+
+static noinline int h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
+{
+ struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+ struct kvmppc_icp *icp = vcpu->arch.icp;
+ struct kvmppc_ics *ics;
+ struct ics_irq_state *state;
+ u32 irq = xirr & 0x00ffffff;
+ u16 src;
+
+ XICS_DBG("h_eoi vcpu %d eoi %#lx\n", vcpu->vcpu_id, xirr);
+
+ /*
+ * ICP State: EOI
+ *
+ * Note: If EOI is incorrectly used by SW to lower the CPPR
+ * value (ie more favored), we do not check for rejection of
+ * a pending interrupt, this is a SW error and PAPR sepcifies
+ * that we don't have to deal with it.
+ *
+ * The sending of an EOI to the ICS is handled after the
+ * CPPR update
+ *
+ * ICP State: Down_CPPR which we handle
+ * in a separate function as it's shared with H_CPPR.
+ */
+ icp_down_cppr(xics, icp, xirr >> 24);
+
+ /* IPIs have no EOI */
+ if (irq == XICS_IPI)
+ return H_SUCCESS;
+ /*
+ * EOI handling: If the interrupt is still asserted, we need to
+ * resend it. We can take a lockless "peek" at the ICS state here.
+ *
+ * "Message" interrupts will never have "asserted" set
+ */
+ ics = kvmppc_xics_find_ics(xics, irq, &src);
+ if (!ics) {
+ XICS_DBG("h_eoi: IRQ 0x%06x not found !\n", irq);
+ return H_PARAMETER;
+ }
+ state = &ics->irq_state[src];
+
+ /* Still asserted, resend it */
+ if (state->asserted)
+ icp_deliver_irq(xics, icp, irq);
+
+ return H_SUCCESS;
+}
+
+int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
+{
+ unsigned long res;
+ int rc = H_SUCCESS;
+
+ /* Check if we have an ICP */
+ if (!vcpu->arch.icp || !vcpu->kvm->arch.xics)
+ return H_HARDWARE;
+
+ switch (req) {
+ case H_XIRR:
+ res = h_xirr(vcpu);
+ kvmppc_set_gpr(vcpu, 4, res);
+ break;
+ case H_CPPR:
+ h_cppr(vcpu, kvmppc_get_gpr(vcpu, 4));
+ break;
+ case H_EOI:
+ rc = h_eoi(vcpu, kvmppc_get_gpr(vcpu, 4));
+ break;
+ case H_IPI:
+ rc = h_ipi(vcpu, kvmppc_get_gpr(vcpu, 4),
+ kvmppc_get_gpr(vcpu, 5));
+ break;
+ }
+
+ return rc;
+}
+
+
+/* -- Initialisation code etc. -- */
+
+static int xics_debug_show(struct seq_file *m, void *private)
+{
+ struct kvmppc_xics *xics = m->private;
+ struct kvm *kvm = xics->kvm;
+ struct kvm_vcpu *vcpu;
+ int icsid, i;
+
+ if (!kvm)
+ return 0;
+
+ seq_printf(m, "=========\nICP state\n=========\n");
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ struct kvmppc_icp *icp = vcpu->arch.icp;
+ union kvmppc_icp_state state;
+
+ if (!icp)
+ continue;
+
+ state.raw = ACCESS_ONCE(icp->state.raw);
+ seq_printf(m, "cpu server %#x XIRR:%#x PPRI:%#x CPPR:%#x "
+ "MFRR:%#x OUT:%d NR:%d\n", vcpu->vcpu_id, state.xisr,
+ state.pending_pri, state.cppr, state.mfrr,
+ state.out_ee, state.need_resend);
+ }
+
+ for (icsid = 0; icsid <= KVMPPC_XICS_MAX_ICS_ID; icsid++) {
+ struct kvmppc_ics *ics = xics->ics[icsid];
+
+ if (!ics)
+ continue;
+
+ seq_printf(m, "=========\nICS state for ICS 0x%x\n=========\n",
+ icsid);
+
+ mutex_lock(&ics->lock);
+
+ for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
+ struct ics_irq_state *irq = &ics->irq_state[i];
+
+ seq_printf(m, "irq 0x%06x: server %#x prio %#x save"
+ " prio %#x asserted %d resend %d masked"
+ " pending %d\n",
+ irq->number, irq->server, irq->priority,
+ irq->saved_priority, irq->asserted,
+ irq->resend, irq->masked_pending);
+
+ }
+ mutex_unlock(&ics->lock);
+ }
+ return 0;
+}
+
+static int xics_debug_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, xics_debug_show, inode->i_private);
+}
+
+static const struct file_operations xics_debug_fops = {
+ .open = xics_debug_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static void xics_debugfs_init(struct kvmppc_xics *xics)
+{
+ char *name;
+
+ name = kasprintf(GFP_KERNEL, "kvm-xics-%p", xics);
+ if (!name) {
+ pr_err("%s: no memory for name\n", __func__);
+ return;
+ }
+
+ xics->dentry = debugfs_create_file(name, S_IRUGO, powerpc_debugfs_root,
+ xics, &xics_debug_fops);
+
+ pr_debug("%s: created %s\n", __func__, name);
+ kfree(name);
+}
+
+struct kvmppc_ics *kvmppc_xics_create_ics(struct kvmppc_xics *xics,
+ int irq)
+{
+ struct kvmppc_ics *ics;
+ int i, icsid;
+
+ icsid = irq >> KVMPPC_XICS_ICS_SHIFT;
+
+ mutex_lock(&xics->kvm->lock);
+
+ /* ICS already exists - somebody else got here first */
+ if (xics->ics[icsid])
+ goto out;
+
+ /* Create the ICS */
+ ics = kzalloc(sizeof(struct kvmppc_ics), GFP_KERNEL);
+ if (!ics)
+ goto out;
+
+ mutex_init(&ics->lock);
+ ics->icsid = icsid;
+
+ for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
+ ics->irq_state[i].number = (icsid << KVMPPC_XICS_ICS_SHIFT) | i;
+ ics->irq_state[i].priority = MASKED;
+ ics->irq_state[i].saved_priority = MASKED;
+ }
+ smp_wmb();
+ xics->ics[icsid] = ics;
+
+ if (icsid > xics->max_icsid)
+ xics->max_icsid = icsid;
+
+ out:
+ mutex_unlock(&xics->kvm->lock);
+ return xics->ics[icsid];
+}
+
+int kvmppc_xics_create_icp(struct kvm_vcpu *vcpu, unsigned long server_num)
+{
+ struct kvmppc_icp *icp;
+
+ if (!vcpu->kvm->arch.xics)
+ return -ENODEV;
+
+ if (kvmppc_xics_find_server(vcpu->kvm, server_num))
+ return -EEXIST;
+
+ icp = kzalloc(sizeof(struct kvmppc_icp), GFP_KERNEL);
+ if (!icp)
+ return -ENOMEM;
+
+ icp->vcpu = vcpu;
+ icp->server_num = server_num;
+ icp->state.mfrr = MASKED;
+ icp->state.pending_pri = MASKED;
+ vcpu->arch.icp = icp;
+
+ XICS_DBG("created server for vcpu %d\n", vcpu->vcpu_id);
+
+ return 0;
+}
+
+/* -- ioctls -- */
+
+int kvm_vm_ioctl_xics_irq(struct kvm *kvm, struct kvm_irq_level *args)
+{
+ struct kvmppc_xics *xics;
+ int r;
+
+ /* locking against multiple callers? */
+
+ xics = kvm->arch.xics;
+ if (!xics)
+ return -ENODEV;
+
+ switch (args->level) {
+ case KVM_INTERRUPT_SET:
+ case KVM_INTERRUPT_SET_LEVEL:
+ case KVM_INTERRUPT_UNSET:
+ r = ics_deliver_irq(xics, args->irq, args->level);
+ break;
+ default:
+ r = -EINVAL;
+ }
+
+ return r;
+}
+
+void kvmppc_xics_free(struct kvmppc_xics *xics)
+{
+ int i;
+
+ debugfs_remove(xics->dentry);
+
+ if (xics->kvm) {
+ kvm_put_kvm(xics->kvm);
+ xics->kvm->arch.xics = NULL;
+ xics->kvm = NULL;
+ }
+
+ for (i = 0; i <= xics->max_icsid; i++) {
+ if (xics->ics[i])
+ kfree(xics->ics[i]);
+ }
+ kfree(xics);
+}
+
+int kvm_create_xics(struct kvm *kvm, u32 type)
+{
+ struct kvmppc_xics *xics;
+
+ /* Already there ? */
+ if (kvm->arch.xics)
+ return -EEXIST;
+
+ xics = kzalloc(sizeof(*xics), GFP_KERNEL);
+ if (!xics)
+ return -ENOMEM;
+
+ xics->kvm = kvm;
+ kvm->arch.xics = xics;
+ xics_debugfs_init(xics);
+
+ kvm_get_kvm(kvm);
+ return 0;
+}
+
+void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu)
+{
+ if (!vcpu->arch.icp)
+ return;
+ kfree(vcpu->arch.icp);
+ vcpu->arch.icp = NULL;
+ vcpu->arch.irq_type = KVMPPC_IRQ_DEFAULT;
+}
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
new file mode 100644
index 0000000..c44034f
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_xics.h
@@ -0,0 +1,113 @@
+/*
+ * Copyright 2012 Michael Ellerman, IBM Corporation.
+ * Copyright 2012 Benjamin Herrenschmidt, IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _KVM_PPC_BOOK3S_XICS_H
+#define _KVM_PPC_BOOK3S_XICS_H
+
+/*
+ * We use a two-level tree to store interrupt source information.
+ * There are up to 1024 ICS nodes, each of which can represent
+ * 1024 sources.
+ */
+#define KVMPPC_XICS_MAX_ICS_ID 1023
+#define KVMPPC_XICS_ICS_SHIFT 10
+#define KVMPPC_XICS_IRQ_PER_ICS (1 << KVMPPC_XICS_ICS_SHIFT)
+#define KVMPPC_XICS_SRC_MASK (KVMPPC_XICS_IRQ_PER_ICS - 1)
+
+/*
+ * Interrupt source numbers below this are reserved, for example
+ * 0 is "no interrupt", and 2 is used for IPIs.
+ */
+#define KVMPPC_XICS_FIRST_IRQ 16
+#define KVMPPC_XICS_NR_IRQS ((KVMPPC_XICS_MAX_ICS_ID + 1) * KVMPPC_XICS_IRQ_PER_ICS)
+
+/* Priority value to use for disabling an interrupt */
+#define MASKED 0xff
+
+/* State for one irq source */
+struct ics_irq_state {
+ u32 number;
+ u32 server;
+ u8 priority;
+ u8 saved_priority; /* currently unused */
+ u8 resend;
+ u8 masked_pending;
+ u8 asserted; /* Only for LSI */
+ u8 exists;
+};
+
+/* Atomic ICP state, updated with a single compare & swap */
+union kvmppc_icp_state {
+ unsigned long raw;
+ struct {
+ u8 out_ee : 1;
+ u8 need_resend : 1;
+ u8 cppr;
+ u8 mfrr;
+ u8 pending_pri;
+ u32 xisr;
+ };
+};
+
+/* One bit per ICS */
+#define ICP_RESEND_MAP_SIZE (KVMPPC_XICS_MAX_ICS_ID / BITS_PER_LONG + 1)
+
+struct kvmppc_icp {
+ struct kvm_vcpu *vcpu;
+ unsigned long server_num;
+ union kvmppc_icp_state state;
+ unsigned long resend_map[ICP_RESEND_MAP_SIZE];
+};
+
+struct kvmppc_ics {
+ struct mutex lock;
+ u16 icsid;
+ struct ics_irq_state irq_state[KVMPPC_XICS_IRQ_PER_ICS];
+};
+
+struct kvmppc_xics {
+ struct kvm *kvm;
+ struct dentry *dentry;
+ u32 max_icsid;
+ atomic_t usecount;
+ struct kvmppc_ics *ics[KVMPPC_XICS_MAX_ICS_ID + 1];
+};
+
+static inline struct kvmppc_icp *kvmppc_xics_find_server(struct kvm *kvm,
+ u32 nr)
+{
+ struct kvm_vcpu *vcpu = NULL;
+ int i;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ if (vcpu->arch.icp && nr == vcpu->arch.icp->server_num)
+ return vcpu->arch.icp;
+ }
+ return NULL;
+}
+
+static inline struct kvmppc_ics *kvmppc_xics_find_ics(struct kvmppc_xics *xics,
+ u32 irq, u16 *source)
+{
+ u32 icsid = irq >> KVMPPC_XICS_ICS_SHIFT;
+ u16 src = irq & KVMPPC_XICS_SRC_MASK;
+ struct kvmppc_ics *ics;
+
+ if (source)
+ *source = src;
+ if (icsid > KVMPPC_XICS_MAX_ICS_ID)
+ return NULL;
+ ics = xics->ics[icsid];
+ if (!ics)
+ return NULL;
+ return ics;
+}
+
+
+#endif /* _KVM_PPC_BOOK3S_XICS_H */
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 5cde31d..bc6d7cf 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -472,6 +472,11 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
kvmppc_mpic_put(vcpu->arch.mpic);
break;
#endif
+#ifdef CONFIG_KVM_XICS
+ case KVMPPC_IRQ_XICS:
+ kvmppc_xics_free_icp(vcpu);
+ break;
+#endif
default:
break;
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH 3/8] KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVM
2013-04-11 5:52 [PATCH v4 0/8] In-kernel XICS interrupt controller emulation Paul Mackerras
2013-04-11 5:53 ` [PATCH 1/8] KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls Paul Mackerras
2013-04-11 5:54 ` [PATCH 2/8] KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller Paul Mackerras
@ 2013-04-11 5:54 ` Paul Mackerras
2013-04-11 5:55 ` [PATCH 4/8] KVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulation Paul Mackerras
` (5 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Paul Mackerras @ 2013-04-11 5:54 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm, kvm-ppc, Scott Wood
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, we wake up a CPU by sending a host IPI with
smp_send_reschedule() to thread 0 of that core, which will take all
threads out of the guest, and cause them to re-evaluate their
interrupt status on the way back in.
This adds a mechanism to differentiate real host IPIs from IPIs sent
by KVM for guest threads to poke each other, in order to target the
guest threads precisely when possible and avoid that global switch of
the core to host state.
We then use this new facility in the in-kernel XICS code.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
arch/powerpc/include/asm/kvm_book3s_asm.h | 8 ++-
arch/powerpc/include/asm/kvm_ppc.h | 29 ++++++++
arch/powerpc/kernel/asm-offsets.c | 2 +
arch/powerpc/kvm/book3s_hv.c | 26 +++++++-
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 102 ++++++++++++++++++++++++-----
arch/powerpc/kvm/book3s_xics.c | 2 +-
arch/powerpc/sysdev/xics/icp-native.c | 8 +++
7 files changed, 158 insertions(+), 19 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
index cdc3d27..9039d3c 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -20,6 +20,11 @@
#ifndef __ASM_KVM_BOOK3S_ASM_H__
#define __ASM_KVM_BOOK3S_ASM_H__
+/* XICS ICP register offsets */
+#define XICS_XIRR 4
+#define XICS_MFRR 0xc
+#define XICS_IPI 2 /* interrupt source # for IPIs */
+
#ifdef __ASSEMBLY__
#ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -81,10 +86,11 @@ struct kvmppc_host_state {
#ifdef CONFIG_KVM_BOOK3S_64_HV
u8 hwthread_req;
u8 hwthread_state;
-
+ u8 host_ipi;
struct kvm_vcpu *kvm_vcpu;
struct kvmppc_vcore *kvm_vcore;
unsigned long xics_phys;
+ u32 saved_xirr;
u64 dabr;
u64 host_mmcr[3];
u32 host_pmc[8];
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 5f5821b..1f7f5f6 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -264,6 +264,21 @@ static inline void kvmppc_set_xics_phys(int cpu, unsigned long addr)
paca[cpu].kvm_hstate.xics_phys = addr;
}
+static inline u32 kvmppc_get_xics_latch(void)
+{
+ u32 xirr = get_paca()->kvm_hstate.saved_xirr;
+
+ get_paca()->kvm_hstate.saved_xirr = 0;
+
+ return xirr;
+}
+
+static inline void kvmppc_set_host_ipi(int cpu, u8 host_ipi)
+{
+ paca[cpu].kvm_hstate.host_ipi = host_ipi;
+}
+
+extern void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu);
extern void kvm_linear_init(void);
#else
@@ -273,6 +288,18 @@ static inline void kvmppc_set_xics_phys(int cpu, unsigned long addr)
static inline void kvm_linear_init(void)
{}
+static inline u32 kvmppc_get_xics_latch(void)
+{
+ return 0;
+}
+
+static inline void kvmppc_set_host_ipi(int cpu, u8 host_ipi)
+{}
+
+static inline void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
+{
+ kvm_vcpu_kick(vcpu);
+}
#endif
#ifdef CONFIG_PPC_BOOK3S_64
@@ -363,4 +390,6 @@ static inline ulong kvmppc_get_ea_indexed(struct kvm_vcpu *vcpu, int ra, int rb)
return ea;
}
+extern void xics_wake_cpu(int cpu);
+
#endif /* __POWERPC_KVM_PPC_H__ */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index d87c908..75e31ac 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -573,6 +573,8 @@ int main(void)
HSTATE_FIELD(HSTATE_KVM_VCPU, kvm_vcpu);
HSTATE_FIELD(HSTATE_KVM_VCORE, kvm_vcore);
HSTATE_FIELD(HSTATE_XICS_PHYS, xics_phys);
+ HSTATE_FIELD(HSTATE_SAVED_XIRR, saved_xirr);
+ HSTATE_FIELD(HSTATE_HOST_IPI, host_ipi);
HSTATE_FIELD(HSTATE_MMCR, host_mmcr);
HSTATE_FIELD(HSTATE_PMC, host_pmc);
HSTATE_FIELD(HSTATE_PURR, host_purr);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 506b5ea..ceb3d81 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -66,6 +66,31 @@
static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
+void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
+{
+ int me;
+ int cpu = vcpu->cpu;
+ wait_queue_head_t *wqp;
+
+ wqp = kvm_arch_vcpu_wq(vcpu);
+ if (waitqueue_active(wqp)) {
+ wake_up_interruptible(wqp);
+ ++vcpu->stat.halt_wakeup;
+ }
+
+ me = get_cpu();
+
+ /* CPU points to the first thread of the core */
+ if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) {
+ int real_cpu = cpu + vcpu->arch.ptid;
+ if (paca[real_cpu].kvm_hstate.xics_phys)
+ xics_wake_cpu(real_cpu);
+ else if (cpu_online(cpu))
+ smp_send_reschedule(cpu);
+ }
+ put_cpu();
+}
+
/*
* We use the vcpu_load/put functions to measure stolen time.
* Stolen time is counted as time when either the vcpu is able to
@@ -977,7 +1002,6 @@ static void kvmppc_end_cede(struct kvm_vcpu *vcpu)
}
extern int __kvmppc_vcore_entry(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
-extern void xics_wake_cpu(int cpu);
static void kvmppc_remove_runnable(struct kvmppc_vcore *vc,
struct kvm_vcpu *vcpu)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index e33d11f..291e772 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -79,10 +79,6 @@ _GLOBAL(kvmppc_hv_entry_trampoline)
* *
*****************************************************************************/
-#define XICS_XIRR 4
-#define XICS_QIRR 0xc
-#define XICS_IPI 2 /* interrupt source # for IPIs */
-
/*
* We come in here when wakened from nap mode on a secondary hw thread.
* Relocation is off and most register values are lost.
@@ -122,7 +118,7 @@ kvm_start_guest:
beq 27f
25: ld r5,HSTATE_XICS_PHYS(r13)
li r0,0xff
- li r6,XICS_QIRR
+ li r6,XICS_MFRR
li r7,XICS_XIRR
lwzcix r8,r5,r7 /* get and ack the interrupt */
sync
@@ -676,17 +672,91 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
cmpwi r12,BOOK3S_INTERRUPT_SYSCALL
beq hcall_try_real_mode
- /* Check for mediated interrupts (could be done earlier really ...) */
+ /* Only handle external interrupts here on arch 206 and later */
BEGIN_FTR_SECTION
- cmpwi r12,BOOK3S_INTERRUPT_EXTERNAL
- bne+ 1f
- andi. r0,r11,MSR_EE
- beq 1f
- mfspr r5,SPRN_LPCR
- andi. r0,r5,LPCR_MER
+ b ext_interrupt_to_host
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206)
+
+ /* External interrupt ? */
+ cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL
+ bne+ ext_interrupt_to_host
+
+ /* External interrupt, first check for host_ipi. If this is
+ * set, we know the host wants us out so let's do it now
+ */
+ lbz r0, HSTATE_HOST_IPI(r13)
+ cmpwi r0, 0
+ bne ext_interrupt_to_host
+
+ /* Now read the interrupt from the ICP */
+ ld r5, HSTATE_XICS_PHYS(r13)
+ li r7, XICS_XIRR
+ cmpdi r5, 0
+ beq- ext_interrupt_to_host
+ lwzcix r3, r5, r7
+ rlwinm. r0, r3, 0, 0xffffff
+ sync
+ bne 1f
+
+ /* Nothing pending in the ICP, check for mediated interrupts
+ * and bounce it to the guest
+ */
+ andi. r0, r11, MSR_EE
+ beq ext_interrupt_to_host /* shouldn't happen ?? */
+ mfspr r5, SPRN_LPCR
+ andi. r0, r5, LPCR_MER
bne bounce_ext_interrupt
-1:
-END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
+ b ext_interrupt_to_host /* shouldn't happen ?? */
+
+1: /* We found something in the ICP...
+ *
+ * If it's not an IPI, stash it in the PACA and return to
+ * the host, we don't (yet) handle directing real external
+ * interrupts directly to the guest
+ */
+ cmpwi r0, XICS_IPI
+ bne ext_stash_for_host
+
+ /* It's an IPI, clear the MFRR and EOI it */
+ li r0, 0xff
+ li r6, XICS_MFRR
+ stbcix r0, r5, r6 /* clear the IPI */
+ stwcix r3, r5, r7 /* EOI it */
+ sync
+
+ /* We need to re-check host IPI now in case it got set in the
+ * meantime. If it's clear, we bounce the interrupt to the
+ * guest
+ */
+ lbz r0, HSTATE_HOST_IPI(r13)
+ cmpwi r0, 0
+ bne- 1f
+
+ /* Allright, looks like an IPI for the guest, we need to set MER */
+ mfspr r8,SPRN_LPCR
+ ori r8,r8,LPCR_MER
+ mtspr SPRN_LPCR,r8
+
+ /* And if the guest EE is set, we can deliver immediately, else
+ * we return to the guest with MER set
+ */
+ andi. r0, r11, MSR_EE
+ bne bounce_ext_interrupt
+ mr r4, r9
+ b fast_guest_return
+
+ /* We raced with the host, we need to resend that IPI, bummer */
+1: li r0, IPI_PRIORITY
+ stbcix r0, r5, r6 /* set the IPI */
+ sync
+ b ext_interrupt_to_host
+
+ext_stash_for_host:
+ /* It's not an IPI and it's for the host, stash it in the PACA
+ * before exit, it will be picked up by the host ICP driver
+ */
+ stw r3, HSTATE_SAVED_XIRR(r13)
+ext_interrupt_to_host:
guest_exit_cont: /* r9 = vcpu, r12 = trap, r13 = paca */
/* Save DEC */
@@ -829,7 +899,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
beq 44f
ld r8,HSTATE_XICS_PHYS(r6) /* get thread's XICS reg addr */
li r0,IPI_PRIORITY
- li r7,XICS_QIRR
+ li r7,XICS_MFRR
stbcix r0,r7,r8 /* trigger the IPI */
44: srdi. r3,r3,1
addi r6,r6,PACA_SIZE
@@ -1626,7 +1696,7 @@ secondary_nap:
beq 37f
sync
li r0, 0xff
- li r6, XICS_QIRR
+ li r6, XICS_MFRR
stbcix r0, r5, r6 /* clear the IPI */
stwcix r3, r5, r7 /* EOI it */
37: sync
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 175a3c2..029ea74 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -224,7 +224,7 @@ static inline bool icp_try_update(struct kvmppc_icp *icp,
kvmppc_book3s_queue_irqprio(icp->vcpu,
BOOK3S_INTERRUPT_EXTERNAL_LEVEL);
if (!change_self)
- kvm_vcpu_kick(icp->vcpu);
+ kvmppc_fast_vcpu_kick(icp->vcpu);
}
bail:
return success;
diff --git a/arch/powerpc/sysdev/xics/icp-native.c b/arch/powerpc/sysdev/xics/icp-native.c
index 48861d3..20b328b 100644
--- a/arch/powerpc/sysdev/xics/icp-native.c
+++ b/arch/powerpc/sysdev/xics/icp-native.c
@@ -51,6 +51,12 @@ static struct icp_ipl __iomem *icp_native_regs[NR_CPUS];
static inline unsigned int icp_native_get_xirr(void)
{
int cpu = smp_processor_id();
+ unsigned int xirr;
+
+ /* Handled an interrupt latched by KVM */
+ xirr = kvmppc_get_xics_latch();
+ if (xirr)
+ return xirr;
return in_be32(&icp_native_regs[cpu]->xirr.word);
}
@@ -138,6 +144,7 @@ static unsigned int icp_native_get_irq(void)
static void icp_native_cause_ipi(int cpu, unsigned long data)
{
+ kvmppc_set_host_ipi(cpu, 1);
icp_native_set_qirr(cpu, IPI_PRIORITY);
}
@@ -151,6 +158,7 @@ static irqreturn_t icp_native_ipi_action(int irq, void *dev_id)
{
int cpu = smp_processor_id();
+ kvmppc_set_host_ipi(cpu, 0);
icp_native_set_qirr(cpu, 0xff);
return smp_ipi_demux();
--
1.7.10.4
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH 4/8] KVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulation
2013-04-11 5:52 [PATCH v4 0/8] In-kernel XICS interrupt controller emulation Paul Mackerras
` (2 preceding siblings ...)
2013-04-11 5:54 ` [PATCH 3/8] KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVM Paul Mackerras
@ 2013-04-11 5:55 ` Paul Mackerras
2013-04-11 5:55 ` [PATCH 5/8] KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts Paul Mackerras
` (4 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Paul Mackerras @ 2013-04-11 5:55 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm, kvm-ppc, Scott Wood
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds an implementation of the XICS hypercalls in real mode for HV
KVM, which allows us to avoid exiting the guest MMU context on all
threads for a variety of operations such as fetching a pending
interrupt, EOI of messages, IPIs, etc.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
arch/powerpc/kvm/Makefile | 1 +
arch/powerpc/kvm/book3s_hv_rm_xics.c | 402 +++++++++++++++++++++++++++++++
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 +-
arch/powerpc/kvm/book3s_xics.c | 64 ++++-
arch/powerpc/kvm/book3s_xics.h | 16 ++
5 files changed, 475 insertions(+), 18 deletions(-)
create mode 100644 arch/powerpc/kvm/book3s_hv_rm_xics.c
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index cccd85f..24a2896 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -77,6 +77,7 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
book3s_hv_rm_mmu.o \
book3s_64_vio_hv.o \
book3s_hv_ras.o \
+ book3s_hv_rm_xics.o \
book3s_hv_builtin.o
kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
new file mode 100644
index 0000000..4cb7df8
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -0,0 +1,402 @@
+/*
+ * Copyright 2012 Michael Ellerman, IBM Corporation.
+ * Copyright 2012 Benjamin Herrenschmidt, IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/kvm_host.h>
+#include <linux/err.h>
+
+#include <asm/kvm_book3s.h>
+#include <asm/kvm_ppc.h>
+#include <asm/hvcall.h>
+#include <asm/xics.h>
+#include <asm/debug.h>
+#include <asm/synch.h>
+#include <asm/ppc-opcode.h>
+
+#include "book3s_xics.h"
+
+#define DEBUG_PASSUP
+
+static inline void rm_writeb(unsigned long paddr, u8 val)
+{
+ __asm__ __volatile__("sync; stbcix %0,0,%1"
+ : : "r" (val), "r" (paddr) : "memory");
+}
+
+static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu, struct kvm_vcpu *this_vcpu)
+{
+ struct kvmppc_icp *this_icp = this_vcpu->arch.icp;
+ unsigned long xics_phys;
+ int cpu;
+
+ /* Mark the target VCPU as having an interrupt pending */
+ vcpu->stat.queue_intr++;
+ set_bit(BOOK3S_IRQPRIO_EXTERNAL_LEVEL, &vcpu->arch.pending_exceptions);
+
+ /* Kick self ? Just set MER and return */
+ if (vcpu == this_vcpu) {
+ mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_MER);
+ return;
+ }
+
+ /* Check if the core is loaded, if not, too hard */
+ cpu = vcpu->cpu;
+ if (cpu < 0 || cpu >= nr_cpu_ids) {
+ this_icp->rm_action |= XICS_RM_KICK_VCPU;
+ this_icp->rm_kick_target = vcpu;
+ return;
+ }
+ /* In SMT cpu will always point to thread 0, we adjust it */
+ cpu += vcpu->arch.ptid;
+
+ /* Not too hard, then poke the target */
+ xics_phys = paca[cpu].kvm_hstate.xics_phys;
+ rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+}
+
+static void icp_rm_clr_vcpu_irq(struct kvm_vcpu *vcpu)
+{
+ /* Note: Only called on self ! */
+ clear_bit(BOOK3S_IRQPRIO_EXTERNAL_LEVEL, &vcpu->arch.pending_exceptions);
+ mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~LPCR_MER);
+}
+
+static inline bool icp_rm_try_update(struct kvmppc_icp *icp,
+ union kvmppc_icp_state old,
+ union kvmppc_icp_state new)
+{
+ struct kvm_vcpu *this_vcpu = local_paca->kvm_hstate.kvm_vcpu;
+ bool success;
+
+ /* Calculate new output value */
+ new.out_ee = (new.xisr && (new.pending_pri < new.cppr));
+
+ /* Attempt atomic update */
+ success = cmpxchg64(&icp->state.raw, old.raw, new.raw) == old.raw;
+ if (!success)
+ goto bail;
+
+ /*
+ * Check for output state update
+ *
+ * Note that this is racy since another processor could be updating
+ * the state already. This is why we never clear the interrupt output
+ * here, we only ever set it. The clear only happens prior to doing
+ * an update and only by the processor itself. Currently we do it
+ * in Accept (H_XIRR) and Up_Cppr (H_XPPR).
+ *
+ * We also do not try to figure out whether the EE state has changed,
+ * we unconditionally set it if the new state calls for it. The reason
+ * for that is that we opportunistically remove the pending interrupt
+ * flag when raising CPPR, so we need to set it back here if an
+ * interrupt is still pending.
+ */
+ if (new.out_ee)
+ icp_rm_set_vcpu_irq(icp->vcpu, this_vcpu);
+
+ /* Expose the state change for debug purposes */
+ this_vcpu->arch.icp->rm_dbgstate = new;
+ this_vcpu->arch.icp->rm_dbgtgt = icp->vcpu;
+
+ bail:
+ return success;
+}
+
+static inline int check_too_hard(struct kvmppc_xics *xics, struct kvmppc_icp *icp)
+{
+ return (xics->real_mode_dbg || icp->rm_action) ? H_TOO_HARD : H_SUCCESS;
+}
+
+static void icp_rm_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
+ u8 new_cppr)
+{
+ union kvmppc_icp_state old_state, new_state;
+ bool resend;
+
+ /*
+ * This handles several related states in one operation:
+ *
+ * ICP State: Down_CPPR
+ *
+ * Load CPPR with new value and if the XISR is 0
+ * then check for resends:
+ *
+ * ICP State: Resend
+ *
+ * If MFRR is more favored than CPPR, check for IPIs
+ * and notify ICS of a potential resend. This is done
+ * asynchronously (when used in real mode, we will have
+ * to exit here).
+ *
+ * We do not handle the complete Check_IPI as documented
+ * here. In the PAPR, this state will be used for both
+ * Set_MFRR and Down_CPPR. However, we know that we aren't
+ * changing the MFRR state here so we don't need to handle
+ * the case of an MFRR causing a reject of a pending irq,
+ * this will have been handled when the MFRR was set in the
+ * first place.
+ *
+ * Thus we don't have to handle rejects, only resends.
+ *
+ * When implementing real mode for HV KVM, resend will lead to
+ * a H_TOO_HARD return and the whole transaction will be handled
+ * in virtual mode.
+ */
+ do {
+ old_state = new_state = ACCESS_ONCE(icp->state);
+
+ /* Down_CPPR */
+ new_state.cppr = new_cppr;
+
+ /*
+ * Cut down Resend / Check_IPI / IPI
+ *
+ * The logic is that we cannot have a pending interrupt
+ * trumped by an IPI at this point (see above), so we
+ * know that either the pending interrupt is already an
+ * IPI (in which case we don't care to override it) or
+ * it's either more favored than us or non existent
+ */
+ if (new_state.mfrr < new_cppr &&
+ new_state.mfrr <= new_state.pending_pri) {
+ new_state.pending_pri = new_state.mfrr;
+ new_state.xisr = XICS_IPI;
+ }
+
+ /* Latch/clear resend bit */
+ resend = new_state.need_resend;
+ new_state.need_resend = 0;
+
+ } while (!icp_rm_try_update(icp, old_state, new_state));
+
+ /*
+ * Now handle resend checks. Those are asynchronous to the ICP
+ * state update in HW (ie bus transactions) so we can handle them
+ * separately here as well.
+ */
+ if (resend)
+ icp->rm_action |= XICS_RM_CHECK_RESEND;
+}
+
+
+unsigned long kvmppc_rm_h_xirr(struct kvm_vcpu *vcpu)
+{
+ union kvmppc_icp_state old_state, new_state;
+ struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+ struct kvmppc_icp *icp = vcpu->arch.icp;
+ u32 xirr;
+
+ if (!xics || !xics->real_mode)
+ return H_TOO_HARD;
+
+ /* First clear the interrupt */
+ icp_rm_clr_vcpu_irq(icp->vcpu);
+
+ /*
+ * ICP State: Accept_Interrupt
+ *
+ * Return the pending interrupt (if any) along with the
+ * current CPPR, then clear the XISR & set CPPR to the
+ * pending priority
+ */
+ do {
+ old_state = new_state = ACCESS_ONCE(icp->state);
+
+ xirr = old_state.xisr | (((u32)old_state.cppr) << 24);
+ if (!old_state.xisr)
+ break;
+ new_state.cppr = new_state.pending_pri;
+ new_state.pending_pri = 0xff;
+ new_state.xisr = 0;
+
+ } while (!icp_rm_try_update(icp, old_state, new_state));
+
+ /* Return the result in GPR4 */
+ vcpu->arch.gpr[4] = xirr;
+
+ return check_too_hard(xics, icp);
+}
+
+int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server, unsigned long mfrr)
+{
+ union kvmppc_icp_state old_state, new_state;
+ struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+ struct kvmppc_icp *icp, *this_icp = vcpu->arch.icp;
+ u32 reject;
+ bool resend;
+ bool local;
+
+ if (!xics || !xics->real_mode)
+ return H_TOO_HARD;
+
+ local = vcpu->vcpu_id == server;
+ if (local)
+ icp = this_icp;
+ else
+ icp = kvmppc_xics_find_server(vcpu->kvm, server);
+ if (!icp)
+ return H_PARAMETER;
+
+ /*
+ * ICP state: Set_MFRR
+ *
+ * If the CPPR is more favored than the new MFRR, then
+ * nothing needs to be done as there can be no XISR to
+ * reject.
+ *
+ * If the CPPR is less favored, then we might be replacing
+ * an interrupt, and thus need to possibly reject it as in
+ *
+ * ICP state: Check_IPI
+ */
+ do {
+ old_state = new_state = ACCESS_ONCE(icp->state);
+
+ /* Set_MFRR */
+ new_state.mfrr = mfrr;
+
+ /* Check_IPI */
+ reject = 0;
+ resend = false;
+ if (mfrr < new_state.cppr) {
+ /* Reject a pending interrupt if not an IPI */
+ if (mfrr <= new_state.pending_pri)
+ reject = new_state.xisr;
+ new_state.pending_pri = mfrr;
+ new_state.xisr = XICS_IPI;
+ }
+
+ if (mfrr > old_state.mfrr && mfrr > new_state.cppr) {
+ resend = new_state.need_resend;
+ new_state.need_resend = 0;
+ }
+ } while (!icp_rm_try_update(icp, old_state, new_state));
+
+ /* Pass rejects to virtual mode */
+ if (reject && reject != XICS_IPI) {
+ this_icp->rm_action |= XICS_RM_REJECT;
+ this_icp->rm_reject = reject;
+ }
+
+ /* Pass resends to virtual mode */
+ if (resend)
+ this_icp->rm_action |= XICS_RM_CHECK_RESEND;
+
+ return check_too_hard(xics, this_icp);
+}
+
+int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
+{
+ union kvmppc_icp_state old_state, new_state;
+ struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+ struct kvmppc_icp *icp = vcpu->arch.icp;
+ u32 reject;
+
+ if (!xics || !xics->real_mode)
+ return H_TOO_HARD;
+
+ /*
+ * ICP State: Set_CPPR
+ *
+ * We can safely compare the new value with the current
+ * value outside of the transaction as the CPPR is only
+ * ever changed by the processor on itself
+ */
+ if (cppr > icp->state.cppr) {
+ icp_rm_down_cppr(xics, icp, cppr);
+ goto bail;
+ } else if (cppr == icp->state.cppr)
+ return H_SUCCESS;
+
+ /*
+ * ICP State: Up_CPPR
+ *
+ * The processor is raising its priority, this can result
+ * in a rejection of a pending interrupt:
+ *
+ * ICP State: Reject_Current
+ *
+ * We can remove EE from the current processor, the update
+ * transaction will set it again if needed
+ */
+ icp_rm_clr_vcpu_irq(icp->vcpu);
+
+ do {
+ old_state = new_state = ACCESS_ONCE(icp->state);
+
+ reject = 0;
+ new_state.cppr = cppr;
+
+ if (cppr <= new_state.pending_pri) {
+ reject = new_state.xisr;
+ new_state.xisr = 0;
+ new_state.pending_pri = 0xff;
+ }
+
+ } while (!icp_rm_try_update(icp, old_state, new_state));
+
+ /* Pass rejects to virtual mode */
+ if (reject && reject != XICS_IPI) {
+ icp->rm_action |= XICS_RM_REJECT;
+ icp->rm_reject = reject;
+ }
+ bail:
+ return check_too_hard(xics, icp);
+}
+
+int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
+{
+ struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+ struct kvmppc_icp *icp = vcpu->arch.icp;
+ struct kvmppc_ics *ics;
+ struct ics_irq_state *state;
+ u32 irq = xirr & 0x00ffffff;
+ u16 src;
+
+ if (!xics || !xics->real_mode)
+ return H_TOO_HARD;
+
+ /*
+ * ICP State: EOI
+ *
+ * Note: If EOI is incorrectly used by SW to lower the CPPR
+ * value (ie more favored), we do not check for rejection of
+ * a pending interrupt, this is a SW error and PAPR sepcifies
+ * that we don't have to deal with it.
+ *
+ * The sending of an EOI to the ICS is handled after the
+ * CPPR update
+ *
+ * ICP State: Down_CPPR which we handle
+ * in a separate function as it's shared with H_CPPR.
+ */
+ icp_rm_down_cppr(xics, icp, xirr >> 24);
+
+ /* IPIs have no EOI */
+ if (irq == XICS_IPI)
+ goto bail;
+ /*
+ * EOI handling: If the interrupt is still asserted, we need to
+ * resend it. We can take a lockless "peek" at the ICS state here.
+ *
+ * "Message" interrupts will never have "asserted" set
+ */
+ ics = kvmppc_xics_find_ics(xics, irq, &src);
+ if (!ics)
+ goto bail;
+ state = &ics->irq_state[src];
+
+ /* Still asserted, resend it, we make it look like a reject */
+ if (state->asserted) {
+ icp->rm_action |= XICS_RM_REJECT;
+ icp->rm_reject = irq;
+ }
+ bail:
+ return check_too_hard(xics, icp);
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 291e772..4fa187f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1420,11 +1420,11 @@ hcall_real_table:
.long 0 /* 0x58 */
.long 0 /* 0x5c */
.long 0 /* 0x60 */
- .long 0 /* 0x64 */
- .long 0 /* 0x68 */
- .long 0 /* 0x6c */
- .long 0 /* 0x70 */
- .long 0 /* 0x74 */
+ .long .kvmppc_rm_h_eoi - hcall_real_table
+ .long .kvmppc_rm_h_cppr - hcall_real_table
+ .long .kvmppc_rm_h_ipi - hcall_real_table
+ .long 0 /* 0x70 - H_IPOLL */
+ .long .kvmppc_rm_h_xirr - hcall_real_table
.long 0 /* 0x78 */
.long 0 /* 0x7c */
.long 0 /* 0x80 */
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 029ea74..278eecc 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -27,6 +27,9 @@
#define XICS_DBG(fmt...) do { } while (0)
//#define XICS_DBG(fmt...) do { trace_printk(fmt); } while (0)
+#define ENABLE_REALMODE true
+#define DEBUG_REALMODE false
+
/*
* LOCKING
* =======
@@ -217,8 +220,10 @@ static inline bool icp_try_update(struct kvmppc_icp *icp,
* in Accept (H_XIRR) and Up_Cppr (H_XPPR).
*
* We also do not try to figure out whether the EE state has changed,
- * we unconditionally set it if the new state calls for it for the
- * same reason.
+ * we unconditionally set it if the new state calls for it. The reason
+ * for that is that we opportunistically remove the pending interrupt
+ * flag when raising CPPR, so we need to set it back here if an
+ * interrupt is still pending.
*/
if (new.out_ee) {
kvmppc_book3s_queue_irqprio(icp->vcpu,
@@ -480,7 +485,7 @@ static void icp_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
icp_check_resend(xics, icp);
}
-static noinline unsigned long h_xirr(struct kvm_vcpu *vcpu)
+static noinline unsigned long kvmppc_h_xirr(struct kvm_vcpu *vcpu)
{
union kvmppc_icp_state old_state, new_state;
struct kvmppc_icp *icp = vcpu->arch.icp;
@@ -514,8 +519,8 @@ static noinline unsigned long h_xirr(struct kvm_vcpu *vcpu)
return xirr;
}
-static noinline int h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
- unsigned long mfrr)
+static noinline int kvmppc_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
+ unsigned long mfrr)
{
union kvmppc_icp_state old_state, new_state;
struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
@@ -583,7 +588,7 @@ static noinline int h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
return H_SUCCESS;
}
-static noinline void h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
+static noinline void kvmppc_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
{
union kvmppc_icp_state old_state, new_state;
struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
@@ -640,7 +645,7 @@ static noinline void h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
icp_deliver_irq(xics, icp, reject);
}
-static noinline int h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
+static noinline int kvmppc_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
{
struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
struct kvmppc_icp *icp = vcpu->arch.icp;
@@ -690,29 +695,54 @@ static noinline int h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
return H_SUCCESS;
}
+static int noinline kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall)
+{
+ struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+ struct kvmppc_icp *icp = vcpu->arch.icp;
+
+ XICS_DBG("XICS_RM: H_%x completing, act: %x state: %lx tgt: %p\n",
+ hcall, icp->rm_action, icp->rm_dbgstate.raw, icp->rm_dbgtgt);
+
+ if (icp->rm_action & XICS_RM_KICK_VCPU)
+ kvmppc_fast_vcpu_kick(icp->rm_kick_target);
+ if (icp->rm_action & XICS_RM_CHECK_RESEND)
+ icp_check_resend(xics, icp);
+ if (icp->rm_action & XICS_RM_REJECT)
+ icp_deliver_irq(xics, icp, icp->rm_reject);
+
+ icp->rm_action = 0;
+
+ return H_SUCCESS;
+}
+
int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
{
+ struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
unsigned long res;
int rc = H_SUCCESS;
/* Check if we have an ICP */
- if (!vcpu->arch.icp || !vcpu->kvm->arch.xics)
+ if (!xics || !vcpu->arch.icp)
return H_HARDWARE;
+ /* Check for real mode returning too hard */
+ if (xics->real_mode)
+ return kvmppc_xics_rm_complete(vcpu, req);
+
switch (req) {
case H_XIRR:
- res = h_xirr(vcpu);
+ res = kvmppc_h_xirr(vcpu);
kvmppc_set_gpr(vcpu, 4, res);
break;
case H_CPPR:
- h_cppr(vcpu, kvmppc_get_gpr(vcpu, 4));
+ kvmppc_h_cppr(vcpu, kvmppc_get_gpr(vcpu, 4));
break;
case H_EOI:
- rc = h_eoi(vcpu, kvmppc_get_gpr(vcpu, 4));
+ rc = kvmppc_h_eoi(vcpu, kvmppc_get_gpr(vcpu, 4));
break;
case H_IPI:
- rc = h_ipi(vcpu, kvmppc_get_gpr(vcpu, 4),
- kvmppc_get_gpr(vcpu, 5));
+ rc = kvmppc_h_ipi(vcpu, kvmppc_get_gpr(vcpu, 4),
+ kvmppc_get_gpr(vcpu, 5));
break;
}
@@ -928,6 +958,14 @@ int kvm_create_xics(struct kvm *kvm, u32 type)
kvm->arch.xics = xics;
xics_debugfs_init(xics);
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+ if (cpu_has_feature(CPU_FTR_ARCH_206)) {
+ /* Enable real mode support */
+ xics->real_mode = ENABLE_REALMODE;
+ xics->real_mode_dbg = DEBUG_REALMODE;
+ }
+#endif /* CONFIG_KVM_BOOK3S_64_HV */
+
kvm_get_kvm(kvm);
return 0;
}
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
index c44034f..6121f43 100644
--- a/arch/powerpc/kvm/book3s_xics.h
+++ b/arch/powerpc/kvm/book3s_xics.h
@@ -63,6 +63,20 @@ struct kvmppc_icp {
unsigned long server_num;
union kvmppc_icp_state state;
unsigned long resend_map[ICP_RESEND_MAP_SIZE];
+
+ /* Real mode might find something too hard, here's the action
+ * it might request from virtual mode
+ */
+#define XICS_RM_KICK_VCPU 0x1
+#define XICS_RM_CHECK_RESEND 0x2
+#define XICS_RM_REJECT 0x4
+ u32 rm_action;
+ struct kvm_vcpu *rm_kick_target;
+ u32 rm_reject;
+
+ /* Debug stuff for real mode */
+ union kvmppc_icp_state rm_dbgstate;
+ struct kvm_vcpu *rm_dbgtgt;
};
struct kvmppc_ics {
@@ -76,6 +90,8 @@ struct kvmppc_xics {
struct dentry *dentry;
u32 max_icsid;
atomic_t usecount;
+ bool real_mode;
+ bool real_mode_dbg;
struct kvmppc_ics *ics[KVMPPC_XICS_MAX_ICS_ID + 1];
};
--
1.7.10.4
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH 5/8] KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts
2013-04-11 5:52 [PATCH v4 0/8] In-kernel XICS interrupt controller emulation Paul Mackerras
` (3 preceding siblings ...)
2013-04-11 5:55 ` [PATCH 4/8] KVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulation Paul Mackerras
@ 2013-04-11 5:55 ` Paul Mackerras
2013-04-11 5:56 ` [PATCH 6/8] KVM: PPC: Book3S: Add support for ibm,int-on/off RTAS calls Paul Mackerras
` (3 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Paul Mackerras @ 2013-04-11 5:55 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm, kvm-ppc, Scott Wood
This streamlines our handling of external interrupts that come in
while we're in the guest. First, when waking up a hardware thread
that was napping, we split off the "napping due to H_CEDE" case
earlier, and use the code that handles an external interrupt (0x500)
in the guest to handle that too. Secondly, the code that handles
those external interrupts now checks if any other thread is exiting
to the host before bouncing an external interrupt to the guest, and
also checks that there is actually an external interrupt pending for
the guest before setting the LPCR MER bit (mediated external request).
This also makes sure that we clear the "ceded" flag when we handle a
wakeup from cede in real mode, and fixes a potential infinite loop
in kvmppc_run_vcpu() which can occur if we ever end up with the ceded
flag set but MSR[EE] off.
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
arch/powerpc/include/asm/reg.h | 1 +
arch/powerpc/kvm/book3s_hv.c | 5 +-
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 140 +++++++++++++++++--------------
3 files changed, 81 insertions(+), 65 deletions(-)
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index c9c67fc..7993224 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -290,6 +290,7 @@
#define LPCR_PECE1 0x00002000 /* decrementer can cause exit */
#define LPCR_PECE2 0x00001000 /* machine check etc can cause exit */
#define LPCR_MER 0x00000800 /* Mediated External Exception */
+#define LPCR_MER_SH 11
#define LPCR_LPES 0x0000000c
#define LPCR_LPES0 0x00000008 /* LPAR Env selector 0 */
#define LPCR_LPES1 0x00000004 /* LPAR Env selector 1 */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ceb3d81..c066b77 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1376,9 +1376,12 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
break;
vc->runner = vcpu;
n_ceded = 0;
- list_for_each_entry(v, &vc->runnable_threads, arch.run_list)
+ list_for_each_entry(v, &vc->runnable_threads, arch.run_list) {
if (!v->arch.pending_exceptions)
n_ceded += v->arch.ceded;
+ else
+ v->arch.ceded = 0;
+ }
if (n_ceded == vc->n_runnable)
kvmppc_vcore_blocked(vc);
else
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 4fa187f..3835963 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -97,50 +97,51 @@ kvm_start_guest:
li r0,1
stb r0,PACA_NAPSTATELOST(r13)
- /* get vcpu pointer, NULL if we have no vcpu to run */
- ld r4,HSTATE_KVM_VCPU(r13)
- cmpdi cr1,r4,0
+ /* were we napping due to cede? */
+ lbz r0,HSTATE_NAPPING(r13)
+ cmpwi r0,0
+ bne kvm_end_cede
+
+ /*
+ * We weren't napping due to cede, so this must be a secondary
+ * thread being woken up to run a guest, or being woken up due
+ * to a stray IPI. (Or due to some machine check or hypervisor
+ * maintenance interrupt while the core is in KVM.)
+ */
/* Check the wake reason in SRR1 to see why we got here */
mfspr r3,SPRN_SRR1
rlwinm r3,r3,44-31,0x7 /* extract wake reason field */
cmpwi r3,4 /* was it an external interrupt? */
- bne 27f
-
- /*
- * External interrupt - for now assume it is an IPI, since we
- * should never get any other interrupts sent to offline threads.
- * Only do this for secondary threads.
- */
- beq cr1,25f
- lwz r3,VCPU_PTID(r4)
- cmpwi r3,0
- beq 27f
-25: ld r5,HSTATE_XICS_PHYS(r13)
- li r0,0xff
- li r6,XICS_MFRR
- li r7,XICS_XIRR
+ bne 27f /* if not */
+ ld r5,HSTATE_XICS_PHYS(r13)
+ li r7,XICS_XIRR /* if it was an external interrupt, */
lwzcix r8,r5,r7 /* get and ack the interrupt */
sync
clrldi. r9,r8,40 /* get interrupt source ID. */
- beq 27f /* none there? */
- cmpwi r9,XICS_IPI
- bne 26f
+ beq 28f /* none there? */
+ cmpwi r9,XICS_IPI /* was it an IPI? */
+ bne 29f
+ li r0,0xff
+ li r6,XICS_MFRR
stbcix r0,r5,r6 /* clear IPI */
-26: stwcix r8,r5,r7 /* EOI the interrupt */
-
-27: /* XXX should handle hypervisor maintenance interrupts etc. here */
+ stwcix r8,r5,r7 /* EOI the interrupt */
+ sync /* order loading of vcpu after that */
- /* reload vcpu pointer after clearing the IPI */
+ /* get vcpu pointer, NULL if we have no vcpu to run */
ld r4,HSTATE_KVM_VCPU(r13)
cmpdi r4,0
/* if we have no vcpu to run, go back to sleep */
beq kvm_no_guest
+ b kvmppc_hv_entry
- /* were we napping due to cede? */
- lbz r0,HSTATE_NAPPING(r13)
- cmpwi r0,0
- bne kvm_end_cede
+27: /* XXX should handle hypervisor maintenance interrupts etc. here */
+ b kvm_no_guest
+28: /* SRR1 said external but ICP said nope?? */
+ b kvm_no_guest
+29: /* External non-IPI interrupt to offline secondary thread? help?? */
+ stw r8,HSTATE_SAVED_XIRR(r13)
+ b kvm_no_guest
.global kvmppc_hv_entry
kvmppc_hv_entry:
@@ -481,20 +482,20 @@ toc_tlbie_lock:
mtctr r6
mtxer r7
+ ld r10, VCPU_PC(r4)
+ ld r11, VCPU_MSR(r4)
kvmppc_cede_reentry: /* r4 = vcpu, r13 = paca */
ld r6, VCPU_SRR0(r4)
ld r7, VCPU_SRR1(r4)
- ld r10, VCPU_PC(r4)
- ld r11, VCPU_MSR(r4) /* r11 = vcpu->arch.msr & ~MSR_HV */
+ /* r11 = vcpu->arch.msr & ~MSR_HV */
rldicl r11, r11, 63 - MSR_HV_LG, 1
rotldi r11, r11, 1 + MSR_HV_LG
ori r11, r11, MSR_ME
/* Check if we can deliver an external or decrementer interrupt now */
ld r0,VCPU_PENDING_EXC(r4)
- li r8,(1 << BOOK3S_IRQPRIO_EXTERNAL)
- oris r8,r8,(1 << BOOK3S_IRQPRIO_EXTERNAL_LEVEL)@h
+ lis r8,(1 << BOOK3S_IRQPRIO_EXTERNAL_LEVEL)@h
and r0,r0,r8
cmpdi cr1,r0,0
andi. r0,r11,MSR_EE
@@ -522,10 +523,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
/* Move SRR0 and SRR1 into the respective regs */
5: mtspr SPRN_SRR0, r6
mtspr SPRN_SRR1, r7
- li r0,0
- stb r0,VCPU_CEDED(r4) /* cancel cede */
fast_guest_return:
+ li r0,0
+ stb r0,VCPU_CEDED(r4) /* cancel cede */
mtspr SPRN_HSRR0,r10
mtspr SPRN_HSRR1,r11
@@ -684,6 +685,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206)
/* External interrupt, first check for host_ipi. If this is
* set, we know the host wants us out so let's do it now
*/
+do_ext_interrupt:
lbz r0, HSTATE_HOST_IPI(r13)
cmpwi r0, 0
bne ext_interrupt_to_host
@@ -696,19 +698,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206)
lwzcix r3, r5, r7
rlwinm. r0, r3, 0, 0xffffff
sync
- bne 1f
-
- /* Nothing pending in the ICP, check for mediated interrupts
- * and bounce it to the guest
- */
- andi. r0, r11, MSR_EE
- beq ext_interrupt_to_host /* shouldn't happen ?? */
- mfspr r5, SPRN_LPCR
- andi. r0, r5, LPCR_MER
- bne bounce_ext_interrupt
- b ext_interrupt_to_host /* shouldn't happen ?? */
+ beq 3f /* if nothing pending in the ICP */
-1: /* We found something in the ICP...
+ /* We found something in the ICP...
*
* If it's not an IPI, stash it in the PACA and return to
* the host, we don't (yet) handle directing real external
@@ -733,18 +725,35 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206)
bne- 1f
/* Allright, looks like an IPI for the guest, we need to set MER */
- mfspr r8,SPRN_LPCR
- ori r8,r8,LPCR_MER
- mtspr SPRN_LPCR,r8
+3:
+ /* Check if any CPU is heading out to the host, if so head out too */
+ ld r5, HSTATE_KVM_VCORE(r13)
+ lwz r0, VCORE_ENTRY_EXIT(r5)
+ cmpwi r0, 0x100
+ bge ext_interrupt_to_host
+
+ /* See if there is a pending interrupt for the guest */
+ mfspr r8, SPRN_LPCR
+ ld r0, VCPU_PENDING_EXC(r9)
+ /* Insert EXTERNAL_LEVEL bit into LPCR at the MER bit position */
+ rldicl. r0, r0, 64 - BOOK3S_IRQPRIO_EXTERNAL_LEVEL, 63
+ rldimi r8, r0, LPCR_MER_SH, 63 - LPCR_MER_SH
+ beq 2f
/* And if the guest EE is set, we can deliver immediately, else
* we return to the guest with MER set
*/
andi. r0, r11, MSR_EE
- bne bounce_ext_interrupt
- mr r4, r9
+ beq 2f
+ mtspr SPRN_SRR0, r10
+ mtspr SPRN_SRR1, r11
+ li r10, BOOK3S_INTERRUPT_EXTERNAL
+ li r11, (MSR_ME << 1) | 1 /* synthesize MSR_SF | MSR_ME */
+ rotldi r11, r11, 63
+2: mr r4, r9
+ mtspr SPRN_LPCR, r8
b fast_guest_return
-
+
/* We raced with the host, we need to resend that IPI, bummer */
1: li r0, IPI_PRIORITY
stbcix r0, r5, r6 /* set the IPI */
@@ -1475,15 +1484,6 @@ ignore_hdec:
mr r4,r9
b fast_guest_return
-bounce_ext_interrupt:
- mr r4,r9
- mtspr SPRN_SRR0,r10
- mtspr SPRN_SRR1,r11
- li r10,BOOK3S_INTERRUPT_EXTERNAL
- li r11,(MSR_ME << 1) | 1 /* synthesize MSR_SF | MSR_ME */
- rotldi r11,r11,63
- b fast_guest_return
-
_GLOBAL(kvmppc_h_set_dabr)
std r4,VCPU_DABR(r3)
/* Work around P7 bug where DABR can get corrupted on mtspr */
@@ -1589,6 +1589,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206)
b .
kvm_end_cede:
+ /* get vcpu pointer */
+ ld r4, HSTATE_KVM_VCPU(r13)
+
/* Woken by external or decrementer interrupt */
ld r1, HSTATE_HOST_R1(r13)
@@ -1628,6 +1631,16 @@ kvm_end_cede:
li r0,0
stb r0,HSTATE_NAPPING(r13)
+ /* Check the wake reason in SRR1 to see why we got here */
+ mfspr r3, SPRN_SRR1
+ rlwinm r3, r3, 44-31, 0x7 /* extract wake reason field */
+ cmpwi r3, 4 /* was it an external interrupt? */
+ li r12, BOOK3S_INTERRUPT_EXTERNAL
+ mr r9, r4
+ ld r10, VCPU_PC(r9)
+ ld r11, VCPU_MSR(r9)
+ beq do_ext_interrupt /* if so */
+
/* see if any other thread is already exiting */
lwz r0,VCORE_ENTRY_EXIT(r5)
cmpwi r0,0x100
@@ -1647,8 +1660,7 @@ kvm_cede_prodded:
/* we've ceded but we want to give control to the host */
kvm_cede_exit:
- li r3,H_TOO_HARD
- blr
+ b hcall_real_fallback
/* Try to handle a machine check in real mode */
machine_check_realmode:
--
1.7.10.4
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH 6/8] KVM: PPC: Book3S: Add support for ibm,int-on/off RTAS calls
2013-04-11 5:52 [PATCH v4 0/8] In-kernel XICS interrupt controller emulation Paul Mackerras
` (4 preceding siblings ...)
2013-04-11 5:55 ` [PATCH 5/8] KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts Paul Mackerras
@ 2013-04-11 5:56 ` Paul Mackerras
2013-04-11 5:56 ` [PATCH 7/8] KVM: PPC: Book3S: Facilities to save/restore XICS presentation ctrler state Paul Mackerras
` (2 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Paul Mackerras @ 2013-04-11 5:56 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm, kvm-ppc, Scott Wood
This adds support for the ibm,int-on and ibm,int-off RTAS calls to the
in-kernel XICS emulation and corrects the handling of the saved
priority by the ibm,set-xive RTAS call. With this, ibm,int-off sets
the specified interrupt's priority in its saved_priority field and
sets the priority to 0xff (the least favoured value). ibm,int-on
restores the saved_priority to the priority field, and ibm,set-xive
sets both the priority and the saved_priority to the specified
priority value.
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
arch/powerpc/include/asm/kvm_ppc.h | 2 +
arch/powerpc/kvm/book3s_rtas.c | 40 +++++++++++++++++
arch/powerpc/kvm/book3s_xics.c | 86 +++++++++++++++++++++++++++++-------
arch/powerpc/kvm/book3s_xics.h | 2 +-
4 files changed, 114 insertions(+), 16 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 1f7f5f6..e5a0614 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -173,6 +173,8 @@ extern int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu);
extern void kvmppc_rtas_tokens_free(struct kvm *kvm);
extern int kvmppc_xics_set_xive(struct kvm *kvm, u32 irq, u32 server, u32 priority);
extern int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority);
+extern int kvmppc_xics_int_on(struct kvm *kvm, u32 irq);
+extern int kvmppc_xics_int_off(struct kvm *kvm, u32 irq);
/*
* Cuts out inst bits with ordering according to spec.
diff --git a/arch/powerpc/kvm/book3s_rtas.c b/arch/powerpc/kvm/book3s_rtas.c
index 6a6c1fe..fc1a749 100644
--- a/arch/powerpc/kvm/book3s_rtas.c
+++ b/arch/powerpc/kvm/book3s_rtas.c
@@ -64,6 +64,44 @@ out:
args->rets[0] = rc;
}
+static void kvm_rtas_int_off(struct kvm_vcpu *vcpu, struct rtas_args *args)
+{
+ u32 irq;
+ int rc;
+
+ if (args->nargs != 1 || args->nret != 1) {
+ rc = -3;
+ goto out;
+ }
+
+ irq = args->args[0];
+
+ rc = kvmppc_xics_int_off(vcpu->kvm, irq);
+ if (rc)
+ rc = -3;
+out:
+ args->rets[0] = rc;
+}
+
+static void kvm_rtas_int_on(struct kvm_vcpu *vcpu, struct rtas_args *args)
+{
+ u32 irq;
+ int rc;
+
+ if (args->nargs != 1 || args->nret != 1) {
+ rc = -3;
+ goto out;
+ }
+
+ irq = args->args[0];
+
+ rc = kvmppc_xics_int_on(vcpu->kvm, irq);
+ if (rc)
+ rc = -3;
+out:
+ args->rets[0] = rc;
+}
+
struct rtas_handler {
void (*handler)(struct kvm_vcpu *vcpu, struct rtas_args *args);
char *name;
@@ -72,6 +110,8 @@ struct rtas_handler {
static struct rtas_handler rtas_handlers[] = {
{ .name = "ibm,set-xive", .handler = kvm_rtas_set_xive },
{ .name = "ibm,get-xive", .handler = kvm_rtas_get_xive },
+ { .name = "ibm,int-off", .handler = kvm_rtas_int_off },
+ { .name = "ibm,int-on", .handler = kvm_rtas_int_on },
};
struct rtas_token_definition {
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 278eecc..d1ec4b0 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -120,6 +120,28 @@ static void ics_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
mutex_unlock(&ics->lock);
}
+static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
+ struct ics_irq_state *state,
+ u32 server, u32 priority, u32 saved_priority)
+{
+ bool deliver;
+
+ mutex_lock(&ics->lock);
+
+ state->server = server;
+ state->priority = priority;
+ state->saved_priority = saved_priority;
+ deliver = false;
+ if ((state->masked_pending || state->resend) && priority != MASKED) {
+ state->masked_pending = 0;
+ deliver = true;
+ }
+
+ mutex_unlock(&ics->lock);
+
+ return deliver;
+}
+
int kvmppc_xics_set_xive(struct kvm *kvm, u32 irq, u32 server, u32 priority)
{
struct kvmppc_xics *xics = kvm->arch.xics;
@@ -127,7 +149,6 @@ int kvmppc_xics_set_xive(struct kvm *kvm, u32 irq, u32 server, u32 priority)
struct kvmppc_ics *ics;
struct ics_irq_state *state;
u16 src;
- bool deliver;
if (!xics)
return -ENODEV;
@@ -141,23 +162,11 @@ int kvmppc_xics_set_xive(struct kvm *kvm, u32 irq, u32 server, u32 priority)
if (!icp)
return -EINVAL;
- mutex_lock(&ics->lock);
-
XICS_DBG("set_xive %#x server %#x prio %#x MP:%d RS:%d\n",
irq, server, priority,
state->masked_pending, state->resend);
- state->server = server;
- state->priority = priority;
- deliver = false;
- if ((state->masked_pending || state->resend) && priority != MASKED) {
- state->masked_pending = 0;
- deliver = true;
- }
-
- mutex_unlock(&ics->lock);
-
- if (deliver)
+ if (write_xive(xics, ics, state, server, priority, priority))
icp_deliver_irq(xics, icp, irq);
return 0;
@@ -186,6 +195,53 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority)
return 0;
}
+int kvmppc_xics_int_on(struct kvm *kvm, u32 irq)
+{
+ struct kvmppc_xics *xics = kvm->arch.xics;
+ struct kvmppc_icp *icp;
+ struct kvmppc_ics *ics;
+ struct ics_irq_state *state;
+ u16 src;
+
+ if (!xics)
+ return -ENODEV;
+
+ ics = kvmppc_xics_find_ics(xics, irq, &src);
+ if (!ics)
+ return -EINVAL;
+ state = &ics->irq_state[src];
+
+ icp = kvmppc_xics_find_server(kvm, state->server);
+ if (!icp)
+ return -EINVAL;
+
+ if (write_xive(xics, ics, state, state->server, state->saved_priority,
+ state->saved_priority))
+ icp_deliver_irq(xics, icp, irq);
+
+ return 0;
+}
+
+int kvmppc_xics_int_off(struct kvm *kvm, u32 irq)
+{
+ struct kvmppc_xics *xics = kvm->arch.xics;
+ struct kvmppc_ics *ics;
+ struct ics_irq_state *state;
+ u16 src;
+
+ if (!xics)
+ return -ENODEV;
+
+ ics = kvmppc_xics_find_ics(xics, irq, &src);
+ if (!ics)
+ return -EINVAL;
+ state = &ics->irq_state[src];
+
+ write_xive(xics, ics, state, state->server, MASKED, state->priority);
+
+ return 0;
+}
+
/* -- ICP routines, including hcalls -- */
static inline bool icp_try_update(struct kvmppc_icp *icp,
@@ -580,7 +636,7 @@ static noinline int kvmppc_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
/* Handle reject */
if (reject && reject != XICS_IPI)
icp_deliver_irq(xics, icp, reject);
-
+
/* Handle resend */
if (resend)
icp_check_resend(xics, icp);
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
index 6121f43..c1f2773 100644
--- a/arch/powerpc/kvm/book3s_xics.h
+++ b/arch/powerpc/kvm/book3s_xics.h
@@ -35,7 +35,7 @@ struct ics_irq_state {
u32 number;
u32 server;
u8 priority;
- u8 saved_priority; /* currently unused */
+ u8 saved_priority;
u8 resend;
u8 masked_pending;
u8 asserted; /* Only for LSI */
--
1.7.10.4
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH 7/8] KVM: PPC: Book3S: Facilities to save/restore XICS presentation ctrler state
2013-04-11 5:52 [PATCH v4 0/8] In-kernel XICS interrupt controller emulation Paul Mackerras
` (5 preceding siblings ...)
2013-04-11 5:56 ` [PATCH 6/8] KVM: PPC: Book3S: Add support for ibm,int-on/off RTAS calls Paul Mackerras
@ 2013-04-11 5:56 ` Paul Mackerras
2013-04-11 5:57 ` [PATCH 8/8] KVM: PPC: Book 3S: Add API for in-kernel XICS emulation Paul Mackerras
2013-04-12 3:42 ` [PATCH v4 0/8] In-kernel XICS interrupt controller emulation Paul Mackerras
8 siblings, 0 replies; 11+ messages in thread
From: Paul Mackerras @ 2013-04-11 5:56 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm, kvm-ppc, Scott Wood
This adds the ability for userspace to save and restore the state
of the XICS interrupt presentation controllers (ICPs) via the
KVM_GET/SET_ONE_REG interface. Since there is one ICP per vcpu, we
simply define a new 64-bit register in the ONE_REG space for the ICP
state. The state includes the CPU priority setting, the pending IPI
priority, and the priority and source number of any pending external
interrupt.
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
Documentation/virtual/kvm/api.txt | 1 +
arch/powerpc/include/asm/kvm_ppc.h | 2 +
arch/powerpc/include/uapi/asm/kvm.h | 13 +++++
arch/powerpc/kvm/book3s.c | 19 ++++++++
arch/powerpc/kvm/book3s_xics.c | 90 +++++++++++++++++++++++++++++++++++
5 files changed, 125 insertions(+)
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 4247d65..54bb6ad 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1792,6 +1792,7 @@ registers, find a list below:
PPC | KVM_REG_PPC_TSR | 32
PPC | KVM_REG_PPC_OR_TSR | 32
PPC | KVM_REG_PPC_CLEAR_TSR | 32
+ PPC | KVM_REG_PPC_ICP_STATE | 64
ARM registers are mapped using the lower 32 bits. The upper 16 of that
is the register group type, or coprocessor number:
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index e5a0614..c1b6150 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -312,6 +312,8 @@ static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu)
extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu);
extern int kvmppc_xics_create_icp(struct kvm_vcpu *vcpu, unsigned long server);
extern int kvm_vm_ioctl_xics_irq(struct kvm *kvm, struct kvm_irq_level *args);
+extern u64 kvmppc_xics_get_icp(struct kvm_vcpu *vcpu);
+extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
#else
static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu)
{ return 0; }
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 6beb876..9781927 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -429,4 +429,17 @@ struct kvm_get_htab_header {
#define KVM_REG_PPC_TCR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x89)
#define KVM_REG_PPC_TSR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x8a)
+/* Per-vcpu interrupt controller state */
+#define KVM_REG_PPC_ICP_STATE (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x8b)
+
+/* Layout of above for XICS */
+#define KVM_REG_PPC_ICP_CPPR_SHIFT 56 /* current proc priority */
+#define KVM_REG_PPC_ICP_CPPR_MASK 0xff
+#define KVM_REG_PPC_ICP_XISR_SHIFT 32 /* interrupt status field */
+#define KVM_REG_PPC_ICP_XISR_MASK 0xffffff
+#define KVM_REG_PPC_ICP_MFRR_SHIFT 24 /* pending IPI priority */
+#define KVM_REG_PPC_ICP_MFRR_MASK 0xff
+#define KVM_REG_PPC_ICP_PPRI_SHIFT 16 /* pending irq priority */
+#define KVM_REG_PPC_ICP_PPRI_MASK 0xff
+
#endif /* __LINUX_KVM_POWERPC_H */
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index c5a4478..07d3709 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -529,6 +529,15 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
val = get_reg_val(reg->id, vcpu->arch.vscr.u[3]);
break;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_KVM_BOOK3S_64
+ case KVM_REG_PPC_ICP_STATE:
+ if (!vcpu->arch.icp) {
+ r = -ENXIO;
+ break;
+ }
+ val = get_reg_val(reg->id, kvmppc_xics_get_icp(vcpu));
+ break;
+#endif /* CONFIG_KVM_BOOK3S_64 */
default:
r = -EINVAL;
break;
@@ -591,6 +600,16 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
vcpu->arch.vscr.u[3] = set_reg_val(reg->id, val);
break;
#endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_KVM_BOOK3S_64
+ case KVM_REG_PPC_ICP_STATE:
+ if (!vcpu->arch.icp) {
+ r = -ENXIO;
+ break;
+ }
+ r = kvmppc_xics_set_icp(vcpu,
+ set_reg_val(reg->id, val));
+ break;
+#endif /* CONFIG_KVM_BOOK3S_64 */
default:
r = -EINVAL;
break;
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index d1ec4b0..4eb4f4b 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -953,6 +953,96 @@ int kvmppc_xics_create_icp(struct kvm_vcpu *vcpu, unsigned long server_num)
return 0;
}
+u64 kvmppc_xics_get_icp(struct kvm_vcpu *vcpu)
+{
+ struct kvmppc_icp *icp = vcpu->arch.icp;
+ union kvmppc_icp_state state;
+
+ if (!icp)
+ return 0;
+ state = icp->state;
+ return ((u64)state.cppr << KVM_REG_PPC_ICP_CPPR_SHIFT) |
+ ((u64)state.xisr << KVM_REG_PPC_ICP_XISR_SHIFT) |
+ ((u64)state.mfrr << KVM_REG_PPC_ICP_MFRR_SHIFT) |
+ ((u64)state.pending_pri << KVM_REG_PPC_ICP_PPRI_SHIFT);
+}
+
+int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval)
+{
+ struct kvmppc_icp *icp = vcpu->arch.icp;
+ struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+ union kvmppc_icp_state old_state, new_state;
+ struct kvmppc_ics *ics;
+ u8 cppr, mfrr, pending_pri;
+ u32 xisr;
+ u16 src;
+ bool resend;
+
+ if (!icp || !xics)
+ return -ENOENT;
+
+ cppr = icpval >> KVM_REG_PPC_ICP_CPPR_SHIFT;
+ xisr = (icpval >> KVM_REG_PPC_ICP_XISR_SHIFT) &
+ KVM_REG_PPC_ICP_XISR_MASK;
+ mfrr = icpval >> KVM_REG_PPC_ICP_MFRR_SHIFT;
+ pending_pri = icpval >> KVM_REG_PPC_ICP_PPRI_SHIFT;
+
+ /* Require the new state to be internally consistent */
+ if (xisr == 0) {
+ if (pending_pri != 0xff)
+ return -EINVAL;
+ } else if (xisr == XICS_IPI) {
+ if (pending_pri != mfrr || pending_pri >= cppr)
+ return -EINVAL;
+ } else {
+ if (pending_pri >= mfrr || pending_pri >= cppr)
+ return -EINVAL;
+ ics = kvmppc_xics_find_ics(xics, xisr, &src);
+ if (!ics)
+ return -EINVAL;
+ }
+
+ new_state.raw = 0;
+ new_state.cppr = cppr;
+ new_state.xisr = xisr;
+ new_state.mfrr = mfrr;
+ new_state.pending_pri = pending_pri;
+
+ /*
+ * Deassert the CPU interrupt request.
+ * icp_try_update will reassert it if necessary.
+ */
+ kvmppc_book3s_dequeue_irqprio(icp->vcpu,
+ BOOK3S_INTERRUPT_EXTERNAL_LEVEL);
+
+ /*
+ * Note that if we displace an interrupt from old_state.xisr,
+ * we don't mark it as rejected. We expect userspace to set
+ * the state of the interrupt sources to be consistent with
+ * the ICP states (either before or afterwards, which doesn't
+ * matter). We do handle resends due to CPPR becoming less
+ * favoured because that is necessary to end up with a
+ * consistent state in the situation where userspace restores
+ * the ICS states before the ICP states.
+ */
+ do {
+ old_state = ACCESS_ONCE(icp->state);
+
+ if (new_state.mfrr <= old_state.mfrr) {
+ resend = false;
+ new_state.need_resend = old_state.need_resend;
+ } else {
+ resend = old_state.need_resend;
+ new_state.need_resend = 0;
+ }
+ } while (!icp_try_update(icp, old_state, new_state, false));
+
+ if (resend)
+ icp_check_resend(xics, icp);
+
+ return 0;
+}
+
/* -- ioctls -- */
int kvm_vm_ioctl_xics_irq(struct kvm *kvm, struct kvm_irq_level *args)
--
1.7.10.4
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH 8/8] KVM: PPC: Book 3S: Add API for in-kernel XICS emulation
2013-04-11 5:52 [PATCH v4 0/8] In-kernel XICS interrupt controller emulation Paul Mackerras
` (6 preceding siblings ...)
2013-04-11 5:56 ` [PATCH 7/8] KVM: PPC: Book3S: Facilities to save/restore XICS presentation ctrler state Paul Mackerras
@ 2013-04-11 5:57 ` Paul Mackerras
2013-04-12 3:42 ` [PATCH v4 0/8] In-kernel XICS interrupt controller emulation Paul Mackerras
8 siblings, 0 replies; 11+ messages in thread
From: Paul Mackerras @ 2013-04-11 5:57 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm, kvm-ppc, Scott Wood
This adds the API for userspace to instantiate an XICS device in a VM
and connect VCPUs to it. The API consists of a new device type for
the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
which is used to assert and deassert interrupt inputs of the XICS.
The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
Each attribute within this group corresponds to the state of one
interrupt source. The attribute number is the same as the interrupt
source number.
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
Documentation/virtual/kvm/api.txt | 8 ++
Documentation/virtual/kvm/devices/xics.txt | 66 +++++++++
arch/powerpc/kvm/book3s_xics.c | 206 +++++++++++++++++++++++++++-
arch/powerpc/kvm/powerpc.c | 31 +++++
include/linux/kvm_host.h | 1 +
include/uapi/linux/kvm.h | 14 ++
virt/kvm/kvm_main.c | 14 ++
7 files changed, 335 insertions(+), 5 deletions(-)
create mode 100644 Documentation/virtual/kvm/devices/xics.txt
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 54bb6ad..db230f8 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2756,3 +2756,11 @@ Parameters: args[0] is the MPIC device fd
args[1] is the MPIC CPU number for this vcpu
This capability connects the vcpu to an in-kernel MPIC device.
+
+6.7 KVM_CAP_IRQ_XICS
+
+Architectures: ppc
+Parameters: args[0] is the XICS device fd
+ args[1] is the XICS CPU number (server ID) for this vcpu
+
+This capability connects the vcpu to an in-kernel XICS device.
diff --git a/Documentation/virtual/kvm/devices/xics.txt b/Documentation/virtual/kvm/devices/xics.txt
new file mode 100644
index 0000000..4286493
--- /dev/null
+++ b/Documentation/virtual/kvm/devices/xics.txt
@@ -0,0 +1,66 @@
+XICS interrupt controller
+
+Device type supported: KVM_DEV_TYPE_XICS
+
+Groups:
+ KVM_DEV_XICS_SOURCES
+ Attributes: One per interrupt source, indexed by the source number.
+
+This device emulates the XICS (eXternal Interrupt Controller
+Specification) defined in PAPR. The XICS has a set of interrupt
+sources, each identified by a 20-bit source number, and a set of
+Interrupt Control Presentation (ICP) entities, also called "servers",
+each associated with a virtual CPU.
+
+The ICP entities are created by enabling the KVM_CAP_IRQ_ARCH
+capability for each vcpu, specifying KVM_CAP_IRQ_XICS in args[0] and
+the interrupt server number (i.e. the vcpu number from the XICS's
+point of view) in args[1] of the kvm_enable_cap struct. Each ICP has
+64 bits of state which can be read and written using the
+KVM_GET_ONE_REG and KVM_SET_ONE_REG ioctls on the vcpu. The 64 bit
+state word has the following bitfields, starting at the
+least-significant end of the word:
+
+* Unused, 16 bits
+
+* Pending interrupt priority, 8 bits
+ Zero is the highest priority, 255 means no interrupt is pending.
+
+* Pending IPI (inter-processor interrupt) priority, 8 bits
+ Zero is the highest priority, 255 means no IPI is pending.
+
+* Pending interrupt source number, 24 bits
+ Zero means no interrupt pending, 2 means an IPI is pending
+
+* Current processor priority, 8 bits
+ Zero is the highest priority, meaning no interrupts can be
+ delivered, and 255 is the lowest priority.
+
+Each source has 64 bits of state that can be read and written using
+the KVM_GET_DEVICE_ATTR and KVM_SET_DEVICE_ATTR ioctls, specifying the
+KVM_DEV_XICS_SOURCES attribute group, with the attribute number being
+the interrupt source number. The 64 bit state word has the following
+bitfields, starting from the least-significant end of the word:
+
+* Destination (server number), 32 bits
+ This specifies where the interrupt should be sent, and is the
+ interrupt server number specified for the destination vcpu.
+
+* Priority, 8 bits
+ This is the priority specified for this interrupt source, where 0 is
+ the highest priority and 255 is the lowest. An interrupt with a
+ priority of 255 will never be delivered.
+
+* Level sensitive flag, 1 bit
+ This bit is 1 for a level-sensitive interrupt source, or 0 for
+ edge-sensitive (or MSI).
+
+* Masked flag, 1 bit
+ This bit is set to 1 if the interrupt is masked (cannot be delivered
+ regardless of its priority), for example by the ibm,int-off RTAS
+ call, or 0 if it is not masked.
+
+* Pending flag, 1 bit
+ This bit is 1 if the source has a pending interrupt, otherwise 0.
+
+Only one XICS instance may be created per VM.
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 4eb4f4b..eb58abf 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -11,6 +11,7 @@
#include <linux/kvm_host.h>
#include <linux/err.h>
#include <linux/gfp.h>
+#include <linux/anon_inodes.h>
#include <asm/uaccess.h>
#include <asm/kvm_book3s.h>
@@ -890,8 +891,8 @@ static void xics_debugfs_init(struct kvmppc_xics *xics)
kfree(name);
}
-struct kvmppc_ics *kvmppc_xics_create_ics(struct kvmppc_xics *xics,
- int irq)
+static struct kvmppc_ics *kvmppc_xics_create_ics(struct kvmppc_xics *xics,
+ int irq)
{
struct kvmppc_ics *ics;
int i, icsid;
@@ -1043,6 +1044,94 @@ int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval)
return 0;
}
+static int xics_get_source(struct kvm *kvm, long irq, u64 addr)
+{
+ int ret;
+ struct kvmppc_xics *xics = kvm->arch.xics;
+ struct kvmppc_ics *ics;
+ struct ics_irq_state *irqp;
+ u64 __user *ubufp = (u64 __user *) addr;
+ u16 idx;
+ u64 val, prio;
+
+ ics = kvmppc_xics_find_ics(xics, irq, &idx);
+ if (!ics)
+ return -ENOENT;
+
+ irqp = &ics->irq_state[idx];
+ mutex_lock(&ics->lock);
+ ret = -ENOENT;
+ if (irqp->exists) {
+ val = irqp->server;
+ prio = irqp->priority;
+ if (prio == MASKED) {
+ val |= KVM_XICS_MASKED;
+ prio = irqp->saved_priority;
+ }
+ val |= prio << KVM_XICS_PRIORITY_SHIFT;
+ if (irqp->asserted)
+ val |= KVM_XICS_LEVEL_SENSITIVE | KVM_XICS_PENDING;
+ else if (irqp->masked_pending || irqp->resend)
+ val |= KVM_XICS_PENDING;
+ ret = 0;
+ }
+ mutex_unlock(&ics->lock);
+
+ if (!ret && put_user(val, ubufp))
+ ret = -EFAULT;
+
+ return ret;
+}
+
+static int xics_set_source(struct kvm *kvm, long irq, u64 addr)
+{
+ struct kvmppc_xics *xics = kvm->arch.xics;
+ struct kvmppc_ics *ics;
+ struct ics_irq_state *irqp;
+ u64 __user *ubufp = (u64 __user *) addr;
+ u16 idx;
+ u64 val;
+ u8 prio;
+ u32 server;
+
+ if (irq < KVMPPC_XICS_FIRST_IRQ || irq >= KVMPPC_XICS_NR_IRQS)
+ return -ENOENT;
+
+ ics = kvmppc_xics_find_ics(xics, irq, &idx);
+ if (!ics) {
+ ics = kvmppc_xics_create_ics(xics, irq);
+ if (!ics)
+ return -ENOMEM;
+ }
+ irqp = &ics->irq_state[idx];
+ if (get_user(val, ubufp))
+ return -EFAULT;
+
+ server = val & KVM_XICS_DESTINATION_MASK;
+ prio = val >> KVM_XICS_PRIORITY_SHIFT;
+ if (prio != MASKED && kvmppc_xics_find_server(kvm, server) == NULL)
+ return -EINVAL;
+
+ mutex_lock(&ics->lock);
+ irqp->server = server;
+ irqp->saved_priority = prio;
+ if (val & KVM_XICS_MASKED)
+ prio = MASKED;
+ irqp->priority = prio;
+ irqp->resend = 0;
+ irqp->masked_pending = 0;
+ irqp->asserted = 0;
+ if ((val & KVM_XICS_PENDING) && (val & KVM_XICS_LEVEL_SENSITIVE))
+ irqp->asserted = 1;
+ irqp->exists = 1;
+ mutex_unlock(&ics->lock);
+
+ if (val & KVM_XICS_PENDING)
+ icp_deliver_irq(xics, NULL, irqp->number);
+
+ return 0;
+}
+
/* -- ioctls -- */
int kvm_vm_ioctl_xics_irq(struct kvm *kvm, struct kvm_irq_level *args)
@@ -1069,14 +1158,70 @@ int kvm_vm_ioctl_xics_irq(struct kvm *kvm, struct kvm_irq_level *args)
return r;
}
-void kvmppc_xics_free(struct kvmppc_xics *xics)
+static int xics_set_attr(struct kvmppc_xics *xics, struct kvm_device_attr *attr)
+{
+ switch (attr->group) {
+ case KVM_DEV_XICS_GRP_SOURCES:
+ return xics_set_source(xics->kvm, attr->attr, attr->addr);
+ }
+ return -ENXIO;
+}
+
+static int xics_get_attr(struct kvmppc_xics *xics, struct kvm_device_attr *attr)
+{
+ switch (attr->group) {
+ case KVM_DEV_XICS_GRP_SOURCES:
+ return xics_get_source(xics->kvm, attr->attr, attr->addr);
+ }
+ return -ENXIO;
+}
+
+static int xics_has_attr(struct kvmppc_xics *xics, struct kvm_device_attr *attr)
+{
+ switch (attr->group) {
+ case KVM_DEV_XICS_GRP_SOURCES:
+ if (attr->attr >= KVMPPC_XICS_FIRST_IRQ &&
+ attr->attr < KVMPPC_XICS_NR_IRQS)
+ return 0;
+ break;
+ }
+ return -ENXIO;
+}
+
+static long kvm_xics_ioctl(struct file *filp, unsigned int ioctl,
+ unsigned long arg)
+{
+ struct kvmppc_xics *xics = filp->private_data;
+ struct kvm_device_attr attr;
+ int (*accessor)(struct kvmppc_xics *xics, struct kvm_device_attr *attr);
+
+ switch (ioctl) {
+ case KVM_SET_DEVICE_ATTR:
+ accessor = xics_set_attr;
+ break;
+ case KVM_GET_DEVICE_ATTR:
+ accessor = xics_get_attr;
+ break;
+ case KVM_HAS_DEVICE_ATTR:
+ accessor = xics_has_attr;
+ break;
+ default:
+ return -ENOTTY;
+ }
+
+ if (copy_from_user(&attr, (void __user *)arg, sizeof(attr)))
+ return -EFAULT;
+
+ return accessor(xics, &attr);
+}
+
+static void kvmppc_xics_free(struct kvmppc_xics *xics)
{
int i;
debugfs_remove(xics->dentry);
if (xics->kvm) {
- kvm_put_kvm(xics->kvm);
xics->kvm->arch.xics = NULL;
xics->kvm = NULL;
}
@@ -1088,9 +1233,30 @@ void kvmppc_xics_free(struct kvmppc_xics *xics)
kfree(xics);
}
+static void kvmppc_xics_put(struct kvmppc_xics *xics)
+{
+ if (xics && atomic_dec_and_test(&xics->usecount))
+ kvmppc_xics_free(xics);
+}
+
+static int kvm_xics_release(struct inode *inode, struct file *filp)
+{
+ struct kvmppc_xics *xics = filp->private_data;
+
+ kvmppc_xics_put(xics);
+ kvm_put_kvm(xics->kvm);
+ return 0;
+}
+
+static const struct file_operations kvm_xics_fops = {
+ .unlocked_ioctl = kvm_xics_ioctl,
+ .release = kvm_xics_release,
+};
+
int kvm_create_xics(struct kvm *kvm, u32 type)
{
struct kvmppc_xics *xics;
+ int fd;
/* Already there ? */
if (kvm->arch.xics)
@@ -1100,6 +1266,12 @@ int kvm_create_xics(struct kvm *kvm, u32 type)
if (!xics)
return -ENOMEM;
+ fd = anon_inode_getfd("kvm-xics", &kvm_xics_fops, xics, O_RDWR);
+ if (fd < 0) {
+ kfree(xics);
+ return fd;
+ }
+
xics->kvm = kvm;
kvm->arch.xics = xics;
xics_debugfs_init(xics);
@@ -1112,8 +1284,31 @@ int kvm_create_xics(struct kvm *kvm, u32 type)
}
#endif /* CONFIG_KVM_BOOK3S_64_HV */
+ atomic_set(&xics->usecount, 1);
kvm_get_kvm(kvm);
- return 0;
+ return fd;
+}
+
+int kvmppc_xics_connect_vcpu(struct file *xics_filp, struct kvm_vcpu *vcpu,
+ u32 xcpu)
+{
+ struct kvmppc_xics *xics = xics_filp->private_data;
+ int r = -EBUSY;
+
+ if (xics_filp->f_op != &kvm_xics_fops)
+ return -EPERM;
+ if (xics->kvm != vcpu->kvm)
+ return -EPERM;
+ if (vcpu->arch.irq_type)
+ return -EBUSY;
+
+ r = kvmppc_xics_create_icp(vcpu, xcpu);
+ if (!r) {
+ atomic_inc(&xics->usecount);
+ vcpu->arch.irq_type = KVMPPC_IRQ_XICS;
+ }
+
+ return r;
}
void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu)
@@ -1123,4 +1318,5 @@ void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu)
kfree(vcpu->arch.icp);
vcpu->arch.icp = NULL;
vcpu->arch.irq_type = KVMPPC_IRQ_DEFAULT;
+ kvmppc_xics_put(vcpu->kvm->arch.xics);
}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index bc6d7cf..b6dd4aa 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -341,6 +341,9 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_SPAPR_TCE:
case KVM_CAP_PPC_ALLOC_HTAB:
case KVM_CAP_PPC_RTAS:
+#ifdef CONFIG_KVM_XICS
+ case KVM_CAP_IRQ_XICS:
+#endif
r = 1;
break;
#endif /* CONFIG_PPC_BOOK3S_64 */
@@ -830,6 +833,21 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
break;
}
#endif
+#ifdef CONFIG_KVM_XICS
+ case KVM_CAP_IRQ_XICS: {
+ struct file *filp;
+
+ r = -EBADF;
+ filp = fget(cap->args[0]);
+ if (!filp)
+ break;
+
+ r = kvmppc_xics_connect_vcpu(filp, vcpu, cap->args[1]);
+
+ fput(filp);
+ break;
+ }
+#endif /* CONFIG_KVM_XICS */
default:
r = -EINVAL;
break;
@@ -1038,6 +1056,19 @@ long kvm_arch_vm_ioctl(struct file *filp,
break;
}
#endif /* CONFIG_PPC_BOOK3S_64 */
+
+ case KVM_IRQ_LINE: {
+ struct kvm *kvm = filp->private_data;
+ struct kvm_irq_level args;
+
+ r = -EFAULT;
+ if (copy_from_user(&args, argp, sizeof(args)))
+ break;
+
+ /* Call all the interrupt controllers */
+ r = kvm_vm_ioctl_xics_irq(kvm, &args);
+ break;
+ }
default:
r = -ENOTTY;
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 852a3a1..41963e0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1085,6 +1085,7 @@ static inline bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
}
int kvm_create_mpic(struct kvm *kvm, u32 type);
+extern int kvm_create_xics(struct kvm *kvm, u32 type);
#endif /* CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT */
#else
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 373f4aa..28b269e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -671,6 +671,7 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_DEVICE_CTRL 89
#define KVM_CAP_IRQ_MPIC 90
#define KVM_CAP_PPC_RTAS 91
+#define KVM_CAP_IRQ_XICS 92
#ifdef KVM_CAP_IRQ_ROUTING
@@ -940,6 +941,19 @@ struct kvm_device_attr {
#define KVM_DEV_MPIC_GRP_REGISTER 2 /* 32-bit */
#define KVM_DEV_MPIC_GRP_IRQ_ACTIVE 3 /* 32-bit */
+/* PPC64 eXternal Interrupt Controller Specification */
+#define KVM_DEV_TYPE_XICS 3
+#define KVM_DEV_XICS_GRP_SOURCES 1 /* 64-bit source attributes */
+
+/* Layout of 64-bit source attribute values */
+#define KVM_XICS_DESTINATION_SHIFT 0
+#define KVM_XICS_DESTINATION_MASK 0xffffffffULL
+#define KVM_XICS_PRIORITY_SHIFT 32
+#define KVM_XICS_PRIORITY_MASK 0xff
+#define KVM_XICS_LEVEL_SENSITIVE (1ULL << 40)
+#define KVM_XICS_MASKED (1ULL << 41)
+#define KVM_XICS_PENDING (1ULL << 42)
+
/* ioctl for vm fd */
#define KVM_CREATE_DEVICE _IOWR(KVMIO, 0xe0, struct kvm_create_device)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ca3adf9..b97d3e9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2171,6 +2171,20 @@ static int kvm_ioctl_create_device(struct kvm *kvm,
return kvm_create_mpic(kvm, cd->type);
}
#endif
+#ifdef CONFIG_KVM_XICS
+ case KVM_DEV_TYPE_XICS: {
+ int fd;
+
+ if (cd->flags & KVM_CREATE_DEVICE_TEST)
+ return 0;
+
+ fd = kvm_create_xics(kvm, cd->type);
+ if (fd < 0)
+ return fd;
+ cd->fd = fd;
+ return 0;
+ }
+#endif
default:
return -ENODEV;
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v4 0/8] In-kernel XICS interrupt controller emulation
2013-04-11 5:52 [PATCH v4 0/8] In-kernel XICS interrupt controller emulation Paul Mackerras
` (7 preceding siblings ...)
2013-04-11 5:57 ` [PATCH 8/8] KVM: PPC: Book 3S: Add API for in-kernel XICS emulation Paul Mackerras
@ 2013-04-12 3:42 ` Paul Mackerras
8 siblings, 0 replies; 11+ messages in thread
From: Paul Mackerras @ 2013-04-12 3:42 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm, kvm-ppc, Scott Wood
I wrote:
> The series is based on Alex Graf's kvm-ppc-next branch with Scott
> Wood's recent patch series applied on top, together with the patch
> below to allow it to compile with CONFIG_KVM_MPIC=n.
And of course I forgot to include the patch. Here it is.
Paul.
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 290a905..5306ca5 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -466,9 +466,13 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
kvmppc_remove_vcpu_debugfs(vcpu);
switch (vcpu->arch.irq_type) {
+#ifdef CONFIG_KVM_MPIC
case KVMPPC_IRQ_MPIC:
kvmppc_mpic_put(vcpu->arch.mpic);
break;
+#endif
+ default:
+ break;
}
kvmppc_core_vcpu_free(vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e325f5d..ca3adf9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2161,13 +2161,11 @@ out:
static int kvm_ioctl_create_device(struct kvm *kvm,
struct kvm_create_device *cd)
{
- bool test = cd->flags & KVM_CREATE_DEVICE_TEST;
-
switch (cd->type) {
#ifdef CONFIG_KVM_MPIC
case KVM_DEV_TYPE_FSL_MPIC_20:
case KVM_DEV_TYPE_FSL_MPIC_42: {
- if (test)
+ if (cd->flags & KVM_CREATE_DEVICE_TEST)
return 0;
return kvm_create_mpic(kvm, cd->type);
^ permalink raw reply related [flat|nested] 11+ messages in thread