kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation
@ 2012-11-05  3:18 Paul Mackerras
  2012-11-05  3:19 ` [RFC PATCH 1/9] KVM: PPC: Remove unused argument to kvmppc_core_dequeue_external Paul Mackerras
                   ` (9 more replies)
  0 siblings, 10 replies; 17+ messages in thread
From: Paul Mackerras @ 2012-11-05  3:18 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc, Benjamin Herrenschmidt

This series implements in-kernel emulation of the XICS interrupt
controller specified in the PAPR specification and used by pseries
guests.  Since the XICS is organized a bit differently to the
interrupt controllers used on x86 machines, I have defined some new
ioctls for setting up the XICS and for saving and restoring its
state.  It may be possible to use some of the currently x86-specific
ioctls instead.

Paul.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/9] KVM: PPC: Remove unused argument to kvmppc_core_dequeue_external
  2012-11-05  3:18 [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation Paul Mackerras
@ 2012-11-05  3:19 ` Paul Mackerras
  2012-11-05  3:20 ` [RFC PATCH 2/9] KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls Paul Mackerras
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Paul Mackerras @ 2012-11-05  3:19 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc, Benjamin Herrenschmidt

Currently kvmppc_core_dequeue_external() takes a struct kvm_interrupt *
argument and does nothing with it, in any of its implementations.
This removes it in order to make things easier for forthcoming
in-kernel interrupt controller emulation code.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_ppc.h |    3 +--
 arch/powerpc/kvm/book3s.c          |    3 +--
 arch/powerpc/kvm/booke.c           |    3 +--
 arch/powerpc/kvm/powerpc.c         |    2 +-
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 609cca3..9fd2654 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -105,8 +105,7 @@ extern void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
                                        struct kvm_interrupt *irq);
-extern void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu,
-                                         struct kvm_interrupt *irq);
+extern void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_flush_tlb(struct kvm_vcpu *vcpu);
 
 extern int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index a4b6452..6548445 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -160,8 +160,7 @@ void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
 	kvmppc_book3s_queue_irqprio(vcpu, vec);
 }
 
-void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu,
-                                  struct kvm_interrupt *irq)
+void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu)
 {
 	kvmppc_book3s_dequeue_irqprio(vcpu, BOOK3S_INTERRUPT_EXTERNAL);
 	kvmppc_book3s_dequeue_irqprio(vcpu, BOOK3S_INTERRUPT_EXTERNAL_LEVEL);
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 3d1f35d..faaf217 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -213,8 +213,7 @@ void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
 	kvmppc_booke_queue_irqprio(vcpu, prio);
 }
 
-void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu,
-                                  struct kvm_interrupt *irq)
+void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu)
 {
 	clear_bit(BOOKE_IRQPRIO_EXTERNAL, &vcpu->arch.pending_exceptions);
 	clear_bit(BOOKE_IRQPRIO_EXTERNAL_LEVEL, &vcpu->arch.pending_exceptions);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index deb0d59..a42e1e3 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -707,7 +707,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq)
 {
 	if (irq->irq == KVM_INTERRUPT_UNSET) {
-		kvmppc_core_dequeue_external(vcpu, irq);
+		kvmppc_core_dequeue_external(vcpu);
 		return 0;
 	}
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/9] KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls
  2012-11-05  3:18 [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation Paul Mackerras
  2012-11-05  3:19 ` [RFC PATCH 1/9] KVM: PPC: Remove unused argument to kvmppc_core_dequeue_external Paul Mackerras
@ 2012-11-05  3:20 ` Paul Mackerras
  2012-11-05  3:21 ` [RFC PATCH 3/9] KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller Paul Mackerras
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Paul Mackerras @ 2012-11-05  3:20 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc, Benjamin Herrenschmidt

From: Michael Ellerman <michael@ellerman.id.au>

For pseries machine emulation, in order to move the interrupt
controller code to the kernel, we need to intercept some RTAS
calls in the kernel itself.  This adds an infrastructure to allow
in-kernel handlers to be registered for RTAS services by name.
A new ioctl, KVM_PPC_RTAS_DEFINE_TOKEN, then allows userspace to
associate token values with those service names.  Then, when the
guest requests an RTAS service with one of those token values, it
will be handled by the relevant in-kernel handler rather than being
passed up to userspace as at present.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 Documentation/virtual/kvm/api.txt   |   19 ++++
 arch/powerpc/include/asm/hvcall.h   |    3 +
 arch/powerpc/include/asm/kvm_host.h |    1 +
 arch/powerpc/include/asm/kvm_ppc.h  |    4 +
 arch/powerpc/include/uapi/asm/kvm.h |    6 ++
 arch/powerpc/kvm/Makefile           |    1 +
 arch/powerpc/kvm/book3s_hv.c        |   18 +++-
 arch/powerpc/kvm/book3s_pr_papr.c   |    7 ++
 arch/powerpc/kvm/book3s_rtas.c      |  182 +++++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/powerpc.c          |    9 +-
 include/uapi/linux/kvm.h            |    4 +
 11 files changed, 252 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_rtas.c

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 6671fdc..8ca0a7c 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2071,6 +2071,25 @@ KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; source cpu in parm
 
 Note that the vcpu ioctl is asynchronous to vcpu execution.
 
+4.78 KVM_PPC_RTAS_DEFINE_TOKEN
+
+Capability: KVM_CAP_PPC_RTAS
+Architectures: ppc
+Type: vm ioctl
+Parameters: struct kvm_rtas_token_args
+Returns: 0 on success, -1 on error
+
+Defines a token value for a RTAS (Run Time Abstraction Services)
+service in order to allow it to be handled in the kernel.  The
+argument struct gives the name of the service, which must be the name
+of a service that has a kernel-side implementation.  If the token
+value is non-zero, it will be associated with that service, and
+subsequent RTAS calls by the guest specifying that token will be
+handled by the kernel.  If the token value is 0, then any token
+associated with the service will be forgotten, and subsequent RTAS
+calls by the guest for that service will be passed to userspace to be
+handled.
+
 
 5. The kvm_run structure
 ------------------------
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 7a86706..9ea22b2 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -269,6 +269,9 @@
 #define H_GET_MPP_X		0x314
 #define MAX_HCALL_OPCODE	H_GET_MPP_X
 
+/* Platform specific hcalls, used by KVM */
+#define H_RTAS			0xf000
+
 #ifndef __ASSEMBLY__
 
 /**
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 3093896..8e745a8 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -255,6 +255,7 @@ struct kvm_arch {
 #endif /* CONFIG_KVM_BOOK3S_64_HV */
 #ifdef CONFIG_PPC_BOOK3S_64
 	struct list_head spapr_tce_tables;
+	struct list_head rtas_tokens;
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 9fd2654..996aeca 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -163,6 +163,10 @@ extern void kvmppc_bookehv_exit(void);
 
 extern int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu);
 
+extern int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void __user *argp);
+extern int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu);
+extern void kvmppc_rtas_tokens_free(struct kvm *kvm);
+
 /*
  * Cuts out inst bits with ordering according to spec.
  * That means the leftmost bit is zero. All given bits are included.
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index b89ae4d..fd0a6f7 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -296,6 +296,12 @@ struct kvm_allocate_rma {
 	__u64 rma_size;
 };
 
+/* for KVM_CAP_PPC_RTAS */
+struct kvm_rtas_token_args {
+	char name[120];
+	__u64 token;	/* Use a token of 0 to undefine a mapping */
+};
+
 struct kvm_book3e_206_tlb_entry {
 	__u32 mas8;
 	__u32 mas1;
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index c2a0863..536f65f 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -80,6 +80,7 @@ kvm-book3s_64-module-objs := \
 	emulate.o \
 	book3s.o \
 	book3s_64_vio.o \
+	book3s_rtas.o \
 	$(kvm-book3s_64-objs-y)
 
 kvm-objs-$(CONFIG_KVM_BOOK3S_64) := $(kvm-book3s_64-module-objs)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 843eb75..8284d55 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -479,7 +479,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 	unsigned long req = kvmppc_get_gpr(vcpu, 3);
 	unsigned long target, ret = H_SUCCESS;
 	struct kvm_vcpu *tvcpu;
-	int idx;
+	int idx, rc;
 
 	switch (req) {
 	case H_ENTER:
@@ -515,6 +515,19 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 					kvmppc_get_gpr(vcpu, 5),
 					kvmppc_get_gpr(vcpu, 6));
 		break;
+	case H_RTAS:
+		if (list_empty(&vcpu->kvm->arch.rtas_tokens))
+			return RESUME_HOST;
+
+		rc = kvmppc_rtas_hcall(vcpu);
+
+		if (rc == -ENOENT)
+			return RESUME_HOST;
+		else if (rc == 0)
+			break;
+
+		/* Send the error out to userspace via KVM_RUN */
+		return rc;
 	default:
 		return RESUME_HOST;
 	}
@@ -1815,6 +1828,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 	kvm->arch.lpid = lpid;
 
 	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
+	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
 
 	kvm->arch.rma = NULL;
 
@@ -1860,6 +1874,8 @@ void kvmppc_core_destroy_vm(struct kvm *kvm)
 		kvm->arch.rma = NULL;
 	}
 
+	kvmppc_rtas_tokens_free(kvm);
+
 	kvmppc_free_hpt(kvm);
 	WARN_ON(!list_empty(&kvm->arch.spapr_tce_tables));
 }
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index ee02b30..4efa4a4 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -246,6 +246,13 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
 		clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
 		vcpu->stat.halt_wakeup++;
 		return EMULATE_DONE;
+	case H_RTAS:
+		if (list_empty(&vcpu->kvm->arch.rtas_tokens))
+			return RESUME_HOST;
+		if (kvmppc_rtas_hcall(vcpu))
+			break;
+		kvmppc_set_gpr(vcpu, 3, 0);
+		return EMULATE_DONE;
 	}
 
 	return EMULATE_FAIL;
diff --git a/arch/powerpc/kvm/book3s_rtas.c b/arch/powerpc/kvm/book3s_rtas.c
new file mode 100644
index 0000000..8a324e8
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_rtas.c
@@ -0,0 +1,182 @@
+/*
+ * Copyright 2012 Michael Ellerman, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/kvm_host.h>
+#include <linux/kvm.h>
+#include <linux/err.h>
+
+#include <asm/uaccess.h>
+#include <asm/kvm_book3s.h>
+#include <asm/kvm_ppc.h>
+#include <asm/hvcall.h>
+#include <asm/rtas.h>
+
+
+struct rtas_handler {
+	void (*handler)(struct kvm_vcpu *vcpu, struct rtas_args *args);
+	char *name;
+};
+
+static struct rtas_handler rtas_handlers[] = { };
+
+struct rtas_token_definition {
+	struct list_head list;
+	struct rtas_handler *handler;
+	u64 token;
+};
+
+static int rtas_name_matches(char *s1, char *s2)
+{
+	struct kvm_rtas_token_args args;
+	return !strncmp(s1, s2, sizeof(args.name));
+}
+
+static int rtas_token_undefine(struct kvm *kvm, char *name)
+{
+	struct rtas_token_definition *d, *tmp;
+
+	lockdep_assert_held(&kvm->lock);
+
+	list_for_each_entry_safe(d, tmp, &kvm->arch.rtas_tokens, list) {
+		if (rtas_name_matches(d->handler->name, name)) {
+			list_del(&d->list);
+			kfree(d);
+			return 0;
+		}
+	}
+
+	/* It's not an error to undefine an undefined token */
+	return 0;
+}
+
+static int rtas_token_define(struct kvm *kvm, char *name, u64 token)
+{
+	struct rtas_token_definition *d;
+	struct rtas_handler *h;
+	bool found;
+	int i;
+
+	lockdep_assert_held(&kvm->lock);
+
+	list_for_each_entry(d, &kvm->arch.rtas_tokens, list) {
+		if (d->token == token)
+			return -EEXIST;
+	}
+
+	found = false;
+	for (i = 0; i < ARRAY_SIZE(rtas_handlers); i++) {
+		h = &rtas_handlers[i];
+		if (rtas_name_matches(h->name, name)) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found)
+		return -ENOENT;
+
+	d = kzalloc(sizeof(*d), GFP_KERNEL);
+	if (!d)
+		return -ENOMEM;
+
+	d->handler = h;
+	d->token = token;
+
+	list_add_tail(&d->list, &kvm->arch.rtas_tokens);
+
+	return 0;
+}
+
+int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void __user *argp)
+{
+	struct kvm_rtas_token_args args;
+	int rc;
+
+	if (copy_from_user(&args, argp, sizeof(args)))
+		return -EFAULT;
+
+	mutex_lock(&kvm->lock);
+
+	if (args.token)
+		rc = rtas_token_define(kvm, args.name, args.token);
+	else
+		rc = rtas_token_undefine(kvm, args.name);
+
+	mutex_unlock(&kvm->lock);
+
+	return rc;
+}
+
+int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
+{
+	struct rtas_token_definition *d;
+	struct rtas_args args;
+	rtas_arg_t *orig_rets;
+	gpa_t args_phys;
+	int rc;
+
+	/* r4 contains the guest physical address of the RTAS args */
+	args_phys = kvmppc_get_gpr(vcpu, 4);
+
+	rc = kvm_read_guest(vcpu->kvm, args_phys, &args, sizeof(args));
+	if (rc)
+		goto fail;
+
+	/*
+	 * args->rets is a pointer into args->args. Now that we've
+	 * copied args we need to fix it up to point into our copy,
+	 * not the guest args. We also need to save the original
+	 * value so we can restore it on the way out.
+	 */
+	orig_rets = args.rets;
+	args.rets = &args.args[args.nargs];
+
+	mutex_lock(&vcpu->kvm->lock);
+
+	rc = -ENOENT;
+	list_for_each_entry(d, &vcpu->kvm->arch.rtas_tokens, list) {
+		if (d->token == args.token) {
+			d->handler->handler(vcpu, &args);
+			rc = 0;
+			break;
+		}
+	}
+
+	mutex_unlock(&vcpu->kvm->lock);
+
+	if (rc == 0) {
+		args.rets = orig_rets;
+		rc = kvm_write_guest(vcpu->kvm, args_phys, &args, sizeof(args));
+		if (rc)
+			goto fail;
+	}
+
+	return rc;
+
+fail:
+	/*
+	 * We only get here if the guest has called RTAS with a bogus
+	 * args pointer. That means we can't get to the args, and so we
+	 * can't fail the RTAS call. So fail right out to userspace,
+	 * which should kill the guest.
+	 */
+	return rc;
+}
+
+void kvmppc_rtas_tokens_free(struct kvm *kvm)
+{
+	struct rtas_token_definition *d, *tmp;
+
+	lockdep_assert_held(&kvm->lock);
+
+	list_for_each_entry_safe(d, tmp, &kvm->arch.rtas_tokens, list) {
+		list_del(&d->list);
+		kfree(d);
+	}
+}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index a42e1e3..ba552f8 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -332,6 +332,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 #ifdef CONFIG_PPC_BOOK3S_64
 	case KVM_CAP_SPAPR_TCE:
 	case KVM_CAP_PPC_ALLOC_HTAB:
+	case KVM_CAP_PPC_RTAS:
 		r = 1;
 		break;
 #endif /* CONFIG_PPC_BOOK3S_64 */
@@ -909,8 +910,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
 
 #ifdef CONFIG_KVM_BOOK3S_64_HV
 	case KVM_ALLOCATE_RMA: {
-		struct kvm *kvm = filp->private_data;
 		struct kvm_allocate_rma rma;
+		struct kvm *kvm = filp->private_data;
 
 		r = kvm_vm_ioctl_allocate_rma(kvm, &rma);
 		if (r >= 0 && copy_to_user(argp, &rma, sizeof(rma)))
@@ -947,6 +948,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
 			r = -EFAULT;
 		break;
 	}
+	case KVM_PPC_RTAS_DEFINE_TOKEN: {
+		struct kvm *kvm = filp->private_data;
+
+		r = kvm_vm_ioctl_rtas_define_token(kvm, argp);
+		break;
+	}
 #endif /* CONFIG_PPC_BOOK3S_64 */
 	default:
 		r = -ENOTTY;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 494a84c..3a92001 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -634,6 +634,7 @@ struct kvm_ppc_smmu_info {
 #endif
 #define KVM_CAP_IRQFD_RESAMPLE 82
 #define KVM_CAP_PPC_BOOKE_WATCHDOG 83
+#define KVM_CAP_PPC_RTAS 84
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -859,6 +860,9 @@ struct kvm_s390_ucas_mapping {
 #define KVM_CREATE_SPAPR_TCE	  _IOW(KVMIO,  0xa8, struct kvm_create_spapr_tce)
 /* Available with KVM_CAP_RMA */
 #define KVM_ALLOCATE_RMA	  _IOR(KVMIO,  0xa9, struct kvm_allocate_rma)
+/* Available with KVM_CAP_PPC_RTAS */
+#define KVM_PPC_RTAS_DEFINE_TOKEN _IOW(KVMIO,  0xaa, struct kvm_rtas_token_args)
+
 
 /*
  * ioctls for vcpu fds
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/9] KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller
  2012-11-05  3:18 [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation Paul Mackerras
  2012-11-05  3:19 ` [RFC PATCH 1/9] KVM: PPC: Remove unused argument to kvmppc_core_dequeue_external Paul Mackerras
  2012-11-05  3:20 ` [RFC PATCH 2/9] KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls Paul Mackerras
@ 2012-11-05  3:21 ` Paul Mackerras
  2012-11-05  3:21 ` [RFC PATCH 4/9] KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVM Paul Mackerras
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Paul Mackerras @ 2012-11-05  3:21 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc, Benjamin Herrenschmidt

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>

This adds in-kernel emulation of the XICS (eXternal Interrupt
Controller Specification) interrupt controller specified by PAPR, for
both HV and PR KVM guests.

It supports up to 4095 "BUIDs" (blocks of interrupts) of up to 4096
interrupts each.  Each vcpu gets one ICP (interrupt controller
presentation) entity, and each BUID gets one ICS (interrupt controller
source) entity.  These entities are created with the new
KVM_CREATE_IRQCHIP_ARGS ioctl, which takes a structure indicating
which type of controller to create and the parameters pertaining to it
(flags for both, BUID and number of interrupt sources for ICS).
We don't use the existing KVM_CREATE_IRQCHIP ioctl because it doesn't
allow us to pass any parameters defining the new interrupt controller.

This is based on an initial implementation by Michael Ellerman
<michael@ellerman.id.au> reworked by myself.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 Documentation/virtual/kvm/api.txt     |   17 +
 arch/powerpc/include/asm/kvm_book3s.h |    1 +
 arch/powerpc/include/asm/kvm_host.h   |    8 +
 arch/powerpc/include/asm/kvm_ppc.h    |   18 +
 arch/powerpc/include/uapi/asm/kvm.h   |   32 +
 arch/powerpc/kvm/Makefile             |    1 +
 arch/powerpc/kvm/book3s.c             |    2 +-
 arch/powerpc/kvm/book3s_hv.c          |   20 +
 arch/powerpc/kvm/book3s_pr.c          |   13 +
 arch/powerpc/kvm/book3s_pr_papr.c     |   16 +
 arch/powerpc/kvm/book3s_rtas.c        |   51 +-
 arch/powerpc/kvm/book3s_xics.c        | 1112 +++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/powerpc.c            |   21 +
 include/uapi/linux/kvm.h              |    6 +-
 14 files changed, 1315 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_xics.c

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 8ca0a7c..fbe018e 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2090,6 +2090,23 @@ associated with the service will be forgotten, and subsequent RTAS
 calls by the guest for that service will be passed to userspace to be
 handled.
 
+4.79 KVM_CREATE_IRQCHIP_ARGS
+
+Capability: KVM_CAP_SPAPR_XICS
+Architectures: ppc
+Type: vm ioctl
+Parameters: struct kvm_irqchip_args
+Returns: 0 on success, -1 on error
+
+Creates an interrupt controller model in the kernel.  This is
+currently only implemented for the XICS (eXternal Interrupt Controller
+Specification) model defined in PAPR.  This ioctl creates either an
+interrupt controller presentation (ICP) or an interrupt controller
+source (ICS) entity, depending on the type field of the argument
+struct.  When creating an ICS, the argument struct further indicates
+the BUID (Bus Unit ID) and number of interrupt sources for the new
+ICS.
+
 
 5. The kvm_run structure
 ------------------------
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index 36fcf41..fbda9b1 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -140,6 +140,7 @@ extern int kvmppc_mmu_hv_init(void);
 extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data);
 extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data);
 extern void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec);
+extern void kvmppc_book3s_dequeue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec);
 extern void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags);
 extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat,
 			   bool upper, u32 val);
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 8e745a8..0136e1e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -190,6 +190,10 @@ struct kvmppc_linear_info {
 	int		 type;
 };
 
+/* XICS components, defined in boo3s_xics.c */
+struct kvmppc_xics;
+struct kvmppc_icp;
+
 /*
  * The reverse mapping array has one entry for each HPTE,
  * which stores the guest's view of the second word of the HPTE
@@ -256,6 +260,7 @@ struct kvm_arch {
 #ifdef CONFIG_PPC_BOOK3S_64
 	struct list_head spapr_tce_tables;
 	struct list_head rtas_tokens;
+	struct kvmppc_xics *xics;
 #endif
 };
 
@@ -565,6 +570,9 @@ struct kvm_vcpu_arch {
 	u64 busy_stolen;
 	u64 busy_preempt;
 #endif
+#ifdef CONFIG_PPC_BOOK3S_64
+	struct kvmppc_icp *icp; /* XICS presentation controller */
+#endif
 };
 
 /* Values for vcpu->arch.state */
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 996aeca..c74fd20 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -131,6 +131,12 @@ extern long kvmppc_prepare_vrma(struct kvm *kvm,
 extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu,
 			struct kvm_memory_slot *memslot, unsigned long porder);
 extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
+extern int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd);
+extern int kvmppc_xics_ioctl(struct kvm *kvm, unsigned ioctl, unsigned long arg);
+extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu);
+extern int kvmppc_xics_create_icp(struct kvm_vcpu *vcpu);
+extern void kvmppc_xics_free(struct kvm *kvm);
+
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 				struct kvm_create_spapr_tce *args);
 extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
@@ -166,6 +172,8 @@ extern int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu);
 extern int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void __user *argp);
 extern int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu);
 extern void kvmppc_rtas_tokens_free(struct kvm *kvm);
+extern int kvmppc_xics_set_xive(struct kvm *kvm, u32 irq, u32 server, u32 priority);
+extern int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority);
 
 /*
  * Cuts out inst bits with ordering according to spec.
@@ -262,6 +270,16 @@ static inline void kvmppc_set_xics_phys(int cpu, unsigned long addr)
 
 static inline void kvm_linear_init(void)
 {}
+
+#endif
+
+#ifdef CONFIG_PPC_BOOK3S_64
+static inline int kvmppc_xics_enabled(struct kvm *kvm)
+{
+	return kvm->arch.xics != NULL;
+}
+#else
+static inline int kvmppc_xics_enabled(struct kvm *kvm) { return 0; }
 #endif
 
 int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index fd0a6f7..145c645 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -302,6 +302,38 @@ struct kvm_rtas_token_args {
 	__u64 token;	/* Use a token of 0 to undefine a mapping */
 };
 
+/* for KVM_CAP_SPAPR_XICS */
+#define __KVM_HAVE_IRQCHIP_ARGS
+struct kvm_irqchip_args {
+#define KVM_IRQCHIP_TYPE_ICP	0	/* XICS: ICP (presentation controller) */
+#define KVM_IRQCHIP_TYPE_ICS	1	/* XICS: ICS (source controller) */
+	__u32 type;
+	union {
+		/* XICS ICP arguments. This needs to be called once before
+		 * creating any VCPU to initialize the main kernel XICS data
+		 * structures.
+		 */
+		struct {
+			__u32 flags;
+		} icp;
+
+		/* XICS ICS arguments. You can call this for every BUID you
+		 * want to make available.
+		 *
+		 * The BUID is 12 bits, the interrupt number within a BUID
+		 * is up to 12 bits as well. The resulting interrupt numbers
+		 * exposed to the guest are BUID || IRQ which is 24 bit
+		 *
+		 * BUID cannot be 0.
+		 */
+		struct {
+			__u32 flags;
+			__u16 buid;
+			__u16 nr_irqs;
+		} ics;
+	};
+};
+
 struct kvm_book3e_206_tlb_entry {
 	__u32 mas8;
 	__u32 mas1;
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 536f65f..ec2f8da 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -81,6 +81,7 @@ kvm-book3s_64-module-objs := \
 	book3s.o \
 	book3s_64_vio.o \
 	book3s_rtas.o \
+	book3s_xics.o \
 	$(kvm-book3s_64-objs-y)
 
 kvm-objs-$(CONFIG_KVM_BOOK3S_64) := $(kvm-book3s_64-module-objs)
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 6548445..c5a4478 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -104,7 +104,7 @@ static int kvmppc_book3s_vec2irqprio(unsigned int vec)
 	return prio;
 }
 
-static void kvmppc_book3s_dequeue_irqprio(struct kvm_vcpu *vcpu,
+void kvmppc_book3s_dequeue_irqprio(struct kvm_vcpu *vcpu,
 					  unsigned int vec)
 {
 	unsigned long old_pending = vcpu->arch.pending_exceptions;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8284d55..0d6616e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -528,6 +528,14 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 
 		/* Send the error out to userspace via KVM_RUN */
 		return rc;
+	case H_XIRR:
+	case H_CPPR:
+	case H_EOI:
+	case H_IPI:
+		if (kvmppc_xics_enabled(vcpu->kvm)) {
+			ret = kvmppc_xics_hcall(vcpu, req);
+			break;
+		} /* fallthrough */
 	default:
 		return RESUME_HOST;
 	}
@@ -876,6 +884,13 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 	spin_lock_init(&vcpu->arch.tbacct_lock);
 	vcpu->arch.busy_preempt = TB_NIL;
 
+	/* Create the XICS */
+	if (kvmppc_xics_enabled(kvm)) {
+		err = kvmppc_xics_create_icp(vcpu);
+		if (err < 0)
+			goto free_vcpu;
+	}
+
 	kvmppc_mmu_book3s_hv_init(vcpu);
 
 	vcpu->arch.state = KVMPPC_VCPU_NOTREADY;
@@ -926,6 +941,8 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
 		kvmppc_unpin_guest_page(vcpu->kvm, vcpu->arch.vpa.pinned_addr);
 	spin_unlock(&vcpu->arch.vpa_update_lock);
 	kvm_vcpu_uninit(vcpu);
+	if (kvmppc_xics_enabled(vcpu->kvm))
+		kvmppc_xics_free_icp(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
@@ -1876,6 +1893,9 @@ void kvmppc_core_destroy_vm(struct kvm *kvm)
 
 	kvmppc_rtas_tokens_free(kvm);
 
+	if (kvmppc_xics_enabled(kvm))
+		kvmppc_xics_free(kvm);
+
 	kvmppc_free_hpt(kvm);
 	WARN_ON(!list_empty(&kvm->arch.spapr_tce_tables));
 }
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index b853696..26acf5c 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1052,6 +1052,13 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 	if (err < 0)
 		goto uninit_vcpu;
 
+	/* Create the XICS */
+	if (kvmppc_xics_enabled(kvm)) {
+		err = kvmppc_xics_create_icp(vcpu);
+		if (err < 0)
+			goto free_vcpu;
+	}
+
 	return vcpu;
 
 uninit_vcpu:
@@ -1068,6 +1075,8 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
 
+	if (kvmppc_xics_enabled(vcpu->kvm))
+		kvmppc_xics_free_icp(vcpu);
 	free_page((unsigned long)vcpu->arch.shared & PAGE_MASK);
 	kvm_vcpu_uninit(vcpu);
 	kfree(vcpu_book3s->shadow_vcpu);
@@ -1278,6 +1287,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 {
 #ifdef CONFIG_PPC64
 	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
+	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
 #endif
 
 	return 0;
@@ -1288,6 +1298,9 @@ void kvmppc_core_destroy_vm(struct kvm *kvm)
 #ifdef CONFIG_PPC64
 	WARN_ON(!list_empty(&kvm->arch.spapr_tce_tables));
 #endif
+	if (kvmppc_xics_enabled(kvm))
+		kvmppc_xics_free(kvm);
+
 }
 
 static int kvmppc_book3s_init(void)
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index 4efa4a4..94cec5b 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -227,6 +227,15 @@ static int kvmppc_h_pr_put_tce(struct kvm_vcpu *vcpu)
 	return EMULATE_DONE;
 }
 
+static int kvmppc_h_pr_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
+{
+	long rc = kvmppc_xics_hcall(vcpu, cmd);
+	if (rc == H_TOO_HARD)
+		return EMULATE_FAIL;
+	kvmppc_set_gpr(vcpu, 3, rc);
+	return EMULATE_DONE;
+}
+
 int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
 {
 	switch (cmd) {
@@ -246,6 +255,13 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
 		clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
 		vcpu->stat.halt_wakeup++;
 		return EMULATE_DONE;
+	case H_XIRR:
+	case H_CPPR:
+	case H_EOI:
+	case H_IPI:
+		if (kvmppc_xics_enabled(vcpu->kvm))
+			return kvmppc_h_pr_xics_hcall(vcpu, cmd);
+		break;
 	case H_RTAS:
 		if (list_empty(&vcpu->kvm->arch.rtas_tokens))
 			return RESUME_HOST;
diff --git a/arch/powerpc/kvm/book3s_rtas.c b/arch/powerpc/kvm/book3s_rtas.c
index 8a324e8..6a6c1fe 100644
--- a/arch/powerpc/kvm/book3s_rtas.c
+++ b/arch/powerpc/kvm/book3s_rtas.c
@@ -18,12 +18,61 @@
 #include <asm/rtas.h>
 
 
+static void kvm_rtas_set_xive(struct kvm_vcpu *vcpu, struct rtas_args *args)
+{
+	u32 irq, server, priority;
+	int rc;
+
+	if (args->nargs != 3 || args->nret != 1) {
+		rc = -3;
+		goto out;
+	}
+
+	irq = args->args[0];
+	server = args->args[1];
+	priority = args->args[2];
+
+	rc = kvmppc_xics_set_xive(vcpu->kvm, irq, server, priority);
+	if (rc)
+		rc = -3;
+out:
+	args->rets[0] = rc;
+}
+
+static void kvm_rtas_get_xive(struct kvm_vcpu *vcpu, struct rtas_args *args)
+{
+	u32 irq, server, priority;
+	int rc;
+
+	if (args->nargs != 1 || args->nret != 3) {
+		rc = -3;
+		goto out;
+	}
+
+	irq = args->args[0];
+
+	server = priority = 0;
+	rc = kvmppc_xics_get_xive(vcpu->kvm, irq, &server, &priority);
+	if (rc) {
+		rc = -3;
+		goto out;
+	}
+
+	args->rets[1] = server;
+	args->rets[2] = priority;
+out:
+	args->rets[0] = rc;
+}
+
 struct rtas_handler {
 	void (*handler)(struct kvm_vcpu *vcpu, struct rtas_args *args);
 	char *name;
 };
 
-static struct rtas_handler rtas_handlers[] = { };
+static struct rtas_handler rtas_handlers[] = {
+	{ .name = "ibm,set-xive", .handler = kvm_rtas_set_xive },
+	{ .name = "ibm,get-xive", .handler = kvm_rtas_get_xive },
+};
 
 struct rtas_token_definition {
 	struct list_head list;
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
new file mode 100644
index 0000000..1538be2
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -0,0 +1,1112 @@
+/*
+ * Copyright 2012 Michael Ellerman, IBM Corporation.
+ * Copyright 2012 Benjamin Herrenschmidt, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/kvm_host.h>
+#include <linux/err.h>
+#include <linux/gfp.h>
+
+#include <asm/uaccess.h>
+#include <asm/kvm_book3s.h>
+#include <asm/kvm_ppc.h>
+#include <asm/hvcall.h>
+#include <asm/xics.h>
+#include <asm/debug.h>
+
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+
+#define MASKED	0xff
+
+#define XICS_DBG(fmt...) do { } while (0)
+//#define XICS_DBG(fmt...) do { trace_printk(fmt); } while (0)
+
+#undef DEBUG_REALMODE
+
+/*
+ * LOCKING
+ * =======
+ *
+ * Each ICS has a mutex protecting the information about the IRQ
+ * sources and avoiding simultaneous deliveries if the same interrupt.
+ *
+ * ICP operations are done via a single compare & swap transaction
+ * (most ICP state fits in the union kvmppc_icp_state)
+ */
+
+/*
+ * Interrupt numbering
+ * ===================
+ *
+ * The 24-bit global interrupt numbers are divided in two components,
+ * the BUID and the interrupt source. We have arbitrarily chosen a
+ * 10 bit
+ */
+
+/*
+ * TODO
+ * ====
+ *
+ * - To speed up resends, keep a bitmap of "resend" set bits in the
+ *   ICS
+ *
+ * - Speed up server# -> ICP lookup (array ? hash table ?)
+ *
+ * - Make ICS lockless as well, or at least a per-interrupt lock or hashed
+ *   locks array to improve scalability
+ *
+ * - ioctl's to save/restore the entire state for snapshot & migration
+ */
+
+#define KVMPPC_XICS_MAX_BUID	0xfff
+#define KVMPPC_XICS_IRQ_COUNT	0x1000
+#define KVMPPC_XICS_BUID_SHIFT	12
+#define KVMPPC_XICS_SRC_MASK	0xfff
+
+/* State for one irq in an ics */
+struct ics_irq_state {
+	u32 number;
+	u32 server;
+	u8  priority;
+	u8  saved_priority; /* currently unused */
+	u8  resend;
+	u8  masked_pending;
+	u8  asserted; /* Only for LSI */
+};
+
+#define ICP_RESEND_MAP_SIZE	\
+	((KVMPPC_XICS_MAX_BUID + BITS_PER_LONG - 1) / BITS_PER_LONG)
+
+/* Atomic ICP state, updated with a single compare & swap */
+union kvmppc_icp_state {
+	unsigned long raw;
+	struct {
+		u8 out_ee : 1;
+		u8 need_resend : 1;
+		u8 cppr;
+		u8 mfrr;
+		u8 pending_pri;
+		u32 xisr;
+	};
+};
+
+struct kvmppc_icp {
+	struct kvm_vcpu *vcpu;
+	union kvmppc_icp_state state;
+	unsigned long resend_map[ICP_RESEND_MAP_SIZE];
+};
+
+struct kvmppc_ics {
+	struct mutex lock;
+	u16 buid;
+	u16 nr_irqs;
+	struct ics_irq_state irq_state[];
+};
+
+struct kvmppc_xics {
+	struct kvm *kvm;
+	struct dentry *dentry;
+	u32 max_buid;
+	struct kvmppc_ics *ics[KVMPPC_XICS_MAX_BUID]; /* [1...MAX_BUID] */
+};
+
+static struct kvmppc_icp *kvmppc_xics_find_server(struct kvm *kvm, u32 nr)
+{
+	struct kvm_vcpu *vcpu = NULL;
+	int i;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (nr == vcpu->vcpu_id)
+			return vcpu->arch.icp;
+	}
+	return NULL;
+}
+
+static struct kvmppc_ics *kvmppc_xics_find_ics(struct kvmppc_xics *xics,
+					       u32 irq, u16 *source)
+{
+	u16 buid = irq >> KVMPPC_XICS_BUID_SHIFT;
+	u16 src = irq & KVMPPC_XICS_SRC_MASK;
+	struct kvmppc_ics *ics;
+
+	if (WARN_ON_ONCE(!buid || buid > KVMPPC_XICS_MAX_BUID)) {
+		XICS_DBG("kvmppc_xics_find_ics: irq %#x BUID out of range !\n",
+			 irq);
+		return NULL;
+	}
+	ics = xics->ics[buid - 1];
+	if (!ics)
+		return NULL;
+	if (src >= ics->nr_irqs)
+		return NULL;
+	if (source)
+		*source = src;
+	return ics;
+}
+
+
+/* -- ICS routines -- */
+
+static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
+			    u32 new_irq);
+
+static void ics_deliver_irq(struct kvmppc_xics *xics, u32 irq, u32 level)
+{
+	struct ics_irq_state *state;
+	struct kvmppc_ics *ics;	
+	u16 src;
+
+	XICS_DBG("ics deliver %#x (level: %d)\n", irq, level);
+
+	ics = kvmppc_xics_find_ics(xics, irq, &src);
+	if (!ics) {
+		XICS_DBG("ics_deliver_irq: IRQ 0x%06x not found !\n", irq);
+		return;
+	}
+	state = &ics->irq_state[src];
+
+	/*
+	 * We set state->asserted locklessly. This should be fine as
+	 * we are the only setter, thus concurrent access is undefined
+	 * to begin with.
+	 */
+	if (level == KVM_INTERRUPT_SET_LEVEL)
+		state->asserted = 1;
+	else if (level == KVM_INTERRUPT_UNSET) {
+		state->asserted = 0;
+		return;
+	}
+
+	/* Attempt delivery */
+	icp_deliver_irq(xics, NULL, irq);
+}
+
+static void ics_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
+			     struct kvmppc_icp *icp)
+{
+	int i;
+
+	mutex_lock(&ics->lock);
+
+	for (i = 0; i < ics->nr_irqs; i++) {
+		struct ics_irq_state *state = &ics->irq_state[i];
+
+		if (!state->resend)
+			continue;
+
+		XICS_DBG("resend %#x prio %#x\n", state->number,
+			      state->priority);
+
+		mutex_unlock(&ics->lock);
+		icp_deliver_irq(xics, icp, state->number);
+		mutex_lock(&ics->lock);
+	}
+
+	mutex_unlock(&ics->lock);
+}
+
+int kvmppc_xics_set_xive(struct kvm *kvm, u32 irq, u32 server, u32 priority)
+{
+	struct kvmppc_xics *xics = kvm->arch.xics;
+	struct kvmppc_icp *icp;
+	struct kvmppc_ics *ics;
+	struct ics_irq_state *state;
+	u16 src;
+	bool deliver;
+
+	if (!xics)
+		return -ENODEV;
+
+	ics = kvmppc_xics_find_ics(xics, irq, &src);
+	if (!ics)
+		return -EINVAL;
+	state = &ics->irq_state[src];
+
+	icp = kvmppc_xics_find_server(kvm, server);
+	if (!icp)
+		return -EINVAL;
+
+	mutex_lock(&ics->lock);
+
+	XICS_DBG("set_xive %#x server %#x prio %#x MP:%d RS:%d\n",
+		 irq, server, priority,
+		 state->masked_pending, state->resend);
+
+	state->server = server;
+	state->priority = priority;
+	deliver = false;
+	if ((state->masked_pending || state->resend) && priority != MASKED) {
+		state->masked_pending = 0;
+		deliver = true;
+	}
+
+	mutex_unlock(&ics->lock);
+
+	if (deliver)
+		icp_deliver_irq(xics, icp, irq);
+
+	return 0;
+}
+
+int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority)
+{
+	struct kvmppc_xics *xics = kvm->arch.xics;
+	struct kvmppc_ics *ics;
+	struct ics_irq_state *state;
+	u16 src;
+
+	if (!xics)
+		return -ENODEV;
+
+	ics = kvmppc_xics_find_ics(xics, irq, &src);
+	if (!ics)
+		return -EINVAL;
+	state = &ics->irq_state[src];
+
+	mutex_lock(&ics->lock);
+	*server = state->server;
+	*priority = state->priority;
+	mutex_unlock(&ics->lock);
+
+	return 0;
+}
+
+/* -- ICP routines, including hcalls -- */
+
+static inline bool icp_try_update(struct kvmppc_icp *icp,
+				  union kvmppc_icp_state old,
+				  union kvmppc_icp_state new,
+				  bool change_self)
+{
+	bool success;
+
+	/* Calculate new output value */
+	new.out_ee = (new.xisr && (new.pending_pri < new.cppr));
+
+	/* Attempt atomic update */
+	success = cmpxchg64(&icp->state.raw, old.raw, new.raw) == old.raw;
+	if (!success)
+		goto bail;
+
+	XICS_DBG("UPD [%04x] - C:%02x M:%02x PP: %02x PI:%06x R:%d O:%d\n",
+		 icp->vcpu->vcpu_id,
+		 old.cppr, old.mfrr, old.pending_pri, old.xisr,
+		 old.need_resend, old.out_ee);
+	XICS_DBG("UPD        - C:%02x M:%02x PP: %02x PI:%06x R:%d O:%d\n",
+		 new.cppr, new.mfrr, new.pending_pri, new.xisr,
+		 new.need_resend, new.out_ee);
+	/*
+	 * Check for output state update
+	 *
+	 * Note that this is racy since another processor could be updating
+	 * the state already. This is why we never clear the interrupt output
+	 * here, we only ever set it. The clear only happens prior to doing
+	 * an update and only by the processor itself. Currently we do it
+	 * in Accept (H_XIRR) and Up_Cppr (H_XPPR).
+	 *
+	 * We also do not try to figure out whether the EE state has changed,
+	 * we unconditionally set it if the new state calls for it for the
+	 * same reason.
+	 */
+	if (new.out_ee) {
+		kvmppc_book3s_queue_irqprio(icp->vcpu,
+					    BOOK3S_INTERRUPT_EXTERNAL_LEVEL);
+		if (!change_self)
+			kvm_vcpu_kick(icp->vcpu);
+	}
+ bail:
+	return success;
+}
+
+static void icp_check_resend(struct kvmppc_xics *xics,
+			     struct kvmppc_icp *icp)
+{
+	u32 buid;
+	
+	/* Order this load with the test for need_resend in the caller */
+	smp_rmb();
+	for_each_set_bit(buid, icp->resend_map, xics->max_buid + 1) {
+		struct kvmppc_ics *ics = xics->ics[buid - 1];
+
+		if (!test_and_clear_bit(buid, icp->resend_map))
+			continue;
+		if (!ics)
+			continue;
+		ics_check_resend(xics, ics, icp);
+	}
+}
+
+static bool icp_try_to_deliver(struct kvmppc_icp *icp, u32 irq, u8 priority,
+			       u32 *reject)
+{
+	union kvmppc_icp_state old_state, new_state;
+	bool success;
+
+	XICS_DBG("try deliver %#x(P:%#x) to server %#x\n", irq, priority,
+		 icp->vcpu->vcpu_id);
+
+	do {
+		old_state = new_state = ACCESS_ONCE(icp->state);
+
+		*reject = 0;
+
+		/* See if we can deliver */
+		success = new_state.cppr > priority &&
+			new_state.mfrr > priority &&
+			new_state.pending_pri > priority;
+
+		/*
+		 * If we can, check for a rejection and perform the
+		 * delivery
+		 */
+		if (success) {
+			*reject = new_state.xisr;
+			new_state.xisr = irq;
+			new_state.pending_pri = priority;
+		} else {
+			/*
+			 * If we failed to deliver we set need_resend
+			 * so a subsequent CPPR state change causes us
+			 * to try a new delivery.
+			 */
+			new_state.need_resend = true;
+		}
+
+	} while (!icp_try_update(icp, old_state, new_state, false));
+
+	return success;
+}
+
+static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
+			    u32 new_irq)
+{
+	struct ics_irq_state *state;
+	struct kvmppc_ics *ics;
+	u32 reject;
+	u16 src;	
+
+	/*
+	 * This is used both for initial delivery of an interrupt and
+	 * for subsequent rejection.
+	 *
+	 * Rejection can be racy vs. resends. We have evaluated the
+	 * rejection in an atomic ICP transaction which is now complete,
+	 * so potentially the ICP can already accept the interrupt again.
+	 *
+	 * So we need to retry the delivery. Essentially the reject path
+	 * boils down to a failed delivery. Always.
+	 *
+	 * Now the interrupt could also have moved to a different target,
+	 * thus we may need to re-do the ICP lookup as well
+	 */
+	 
+ again:
+	/* Get the ICS state and lock it */
+	ics = kvmppc_xics_find_ics(xics, new_irq, &src);
+	if (!ics) {
+		XICS_DBG("icp_deliver_irq: IRQ 0x%06x not found !\n", new_irq);
+		return;
+	}
+	state = &ics->irq_state[src];
+
+	/* Get a lock on the ICS */
+	mutex_lock(&ics->lock);
+
+	/* Get our server */
+	if (!icp || state->server != icp->vcpu->vcpu_id) {
+		icp = kvmppc_xics_find_server(xics->kvm, state->server);
+		if (!icp) {
+			pr_warning("icp_deliver_irq: IRQ 0x%06x server 0x%x"
+				   " not found !\n", new_irq, state->server);
+			goto out;
+		}
+	}
+
+	/* Clear the resend bit of that interrupt */
+	state->resend = 0;
+
+	/*
+	 * If masked, bail out
+	 *
+	 * Note: PAPR doesn't mention anything about masked pending
+	 * when doing a resend, only when doing a delivery.
+	 *
+	 * However that would have the effect of losing a masked
+	 * interrupt that was rejected and isn't consistent with
+	 * the whole masked_pending business which is about not
+	 * losing interrupts that occur while masked.
+	 *
+	 * I don't differenciate normal deliveries and resends, this
+	 * implementation will differ from PAPR and not lose such
+	 * interrupts.
+	 */
+	if (state->priority == MASKED) {
+		XICS_DBG("irq %#x masked pending\n", new_irq);
+		state->masked_pending = 1;
+		goto out;
+	}
+
+	/*
+	 * Try the delivery, this will set the need_resend flag
+	 * in the ICP as part of the atomic transaction if the
+	 * delivery is not possible.
+	 *
+	 * Note that if successful, the new delivery might have itself
+	 * rejected an interrupt that was "delivered" before we took the
+	 * icp mutex.
+	 *
+	 * In this case we do the whole sequence all over again for the
+	 * new guy. We cannot assume that the rejected interrupt is less
+	 * favored than the new one, and thus doesn't need to be delivered,
+	 * because by the time we exit icp_try_to_deliver() the target
+	 * processor may well have alrady consumed & completed it, and thus
+	 * the rejected interrupt might actually be already acceptable.
+	 */
+	if (icp_try_to_deliver(icp, new_irq, state->priority, &reject)) {
+		/*
+		 * Delivery was successful, did we reject somebody else ?
+		 */
+		if (reject && reject != XICS_IPI) {
+			mutex_unlock(&ics->lock);
+			new_irq = reject;
+			goto again;
+		}
+	} else {
+		/*
+		 * We failed to deliver the interrupt we need to set the
+		 * resend map bit and mark the ICS state as needing a resend
+		 */
+		set_bit(ics->buid, icp->resend_map);
+		state->resend = 1;
+
+		/*
+		 * If the need_resend flag got cleared in the ICP some time
+		 * between icp_try_to_deliver() atomic update and now, then
+		 * we know it might have missed the resend_map bit. So we
+		 * retry
+		 */
+		smp_mb();
+		if (!icp->state.need_resend) {
+			mutex_unlock(&ics->lock);
+			goto again;
+		}
+	}
+ out:
+	mutex_unlock(&ics->lock);
+}
+
+static void icp_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
+			  u8 new_cppr)
+{
+	union kvmppc_icp_state old_state, new_state;
+	bool resend;
+
+	/*
+	 * This handles several related states in one operation:
+	 *
+	 * ICP State: Down_CPPR
+	 *
+	 * Load CPPR with new value and if the XISR is 0
+	 * then check for resends:
+	 *
+	 * ICP State: Resend
+	 *
+	 * If MFRR is more favored than CPPR, check for IPIs
+	 * and notify ICS of a potential resend. This is done
+	 * asynchronously (when used in real mode, we will have
+	 * to exit here).
+	 *
+	 * We do not handle the complete Check_IPI as documented
+	 * here. In the PAPR, this state will be used for both
+	 * Set_MFRR and Down_CPPR. However, we know that we aren't
+	 * changing the MFRR state here so we don't need to handle
+	 * the case of an MFRR causing a reject of a pending irq,
+	 * this will have been handled when the MFRR was set in the
+	 * first place.
+	 *
+	 * Thus we don't have to handle rejects, only resends.
+	 *
+	 * When implementing real mode for HV KVM, resend will lead to
+	 * a H_TOO_HARD return and the whole transaction will be handled
+	 * in virtual mode.
+	 */
+	do {
+		old_state = new_state = ACCESS_ONCE(icp->state);
+
+		/* Down_CPPR */
+		new_state.cppr = new_cppr;
+
+		/*
+		 * Cut down Resend / Check_IPI / IPI
+		 *
+		 * The logic is that we cannot have a pending interrupt
+		 * trumped by an IPI at this point (see above), so we
+		 * know that either the pending interrupt is already an
+		 * IPI (in which case we don't care to override it) or
+		 * it's either more favored than us or non existent
+		 */
+		if (new_state.mfrr < new_cppr &&
+		    new_state.mfrr <= new_state.pending_pri) {
+			WARN_ON(new_state.xisr != XICS_IPI &&
+				new_state.xisr != 0);
+			new_state.pending_pri = new_state.mfrr;
+			new_state.xisr = XICS_IPI;
+		}
+
+		/* Latch/clear resend bit */
+		resend = new_state.need_resend;
+		new_state.need_resend = 0;
+
+	} while (!icp_try_update(icp, old_state, new_state, true));
+
+	/*
+	 * Now handle resend checks. Those are asynchronous to the ICP
+	 * state update in HW (ie bus transactions) so we can handle them
+	 * separately here too
+	 */
+	if (resend)
+		icp_check_resend(xics, icp);
+}
+
+static noinline unsigned long h_xirr(struct kvm_vcpu *vcpu)
+{
+	union kvmppc_icp_state old_state, new_state;
+	struct kvmppc_icp *icp = vcpu->arch.icp;
+	u32 xirr;
+
+	/* First, remove EE from the processor */
+	kvmppc_book3s_dequeue_irqprio(icp->vcpu,
+				      BOOK3S_INTERRUPT_EXTERNAL_LEVEL);
+
+	/*
+	 * ICP State: Accept_Interrupt
+	 *
+	 * Return the pending interrupt (if any) along with the
+	 * current CPPR, then clear the XISR & set CPPR to the
+	 * pending priority
+	 */
+	do {
+		old_state = new_state = ACCESS_ONCE(icp->state);
+
+		xirr = old_state.xisr | (((u32)old_state.cppr) << 24);
+		if (!old_state.xisr)
+			break;
+		new_state.cppr = new_state.pending_pri;
+		new_state.pending_pri = 0xff;
+		new_state.xisr = 0;
+
+	} while (!icp_try_update(icp, old_state, new_state, true));
+
+	XICS_DBG("h_xirr vcpu %d xirr %#x\n", vcpu->vcpu_id, xirr);
+
+	return xirr;
+}
+
+static noinline int h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
+			  unsigned long mfrr)
+{
+        union kvmppc_icp_state old_state, new_state;
+	struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+	struct kvmppc_icp *icp;
+	u32 reject;
+	bool resend;
+	bool local;
+
+	XICS_DBG("h_ipi vcpu %d to server %lu mfrr %#lx\n",
+			vcpu->vcpu_id, server, mfrr);
+
+	local = vcpu->vcpu_id == server;
+	if (local)
+		icp = vcpu->arch.icp;
+	else
+		icp = kvmppc_xics_find_server(vcpu->kvm, server);
+	if (!icp)
+		return H_PARAMETER;
+
+	/*
+	 * ICP state: Set_MFRR
+	 *
+	 * If the CPPR is more favored than the new MFRR, then
+	 * nothing needs to be rejected as there can be no XISR to
+	 * reject.  If the MFRR is being made less favored then
+	 * there might be a previously-rejected interrupt needing
+	 * to be resent.
+	 *
+	 * If the CPPR is less favored, then we might be replacing
+	 * an interrupt, and thus need to possibly reject it as in
+	 *
+	 * ICP state: Check_IPI
+	 */
+	do {
+		old_state = new_state = ACCESS_ONCE(icp->state);
+
+		/* Set_MFRR */
+		new_state.mfrr = mfrr;
+
+		/* Check_IPI */
+		reject = 0;
+		resend = false;
+		if (mfrr < new_state.cppr) {
+			/* Reject a pending interrupt if not an IPI */
+			if (mfrr <= new_state.pending_pri)
+				reject = new_state.xisr;
+			new_state.pending_pri = mfrr;
+			new_state.xisr = XICS_IPI;
+		}
+
+		if (mfrr > old_state.mfrr && mfrr > new_state.cppr) {
+			resend = new_state.need_resend;
+			new_state.need_resend = 0;
+		}
+	} while (!icp_try_update(icp, old_state, new_state, local));
+
+	/* Handle reject */
+	if (reject && reject != XICS_IPI)
+		icp_deliver_irq(xics, icp, reject);
+		
+	/* Handle resend */
+	if (resend)
+		icp_check_resend(xics, icp);
+
+	return H_SUCCESS;
+}
+
+static noinline void h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
+{
+	union kvmppc_icp_state old_state, new_state;
+	struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+	struct kvmppc_icp *icp = vcpu->arch.icp;
+	u32 reject;
+
+	XICS_DBG("h_cppr vcpu %d cppr %#lx\n", vcpu->vcpu_id, cppr);
+
+	/*
+	 * ICP State: Set_CPPR
+	 *
+	 * We can safely compare the new value with the current
+	 * value outside of the transaction as the CPPR is only
+	 * ever changed by the processor on itself
+	 */
+	if (cppr > icp->state.cppr)
+		icp_down_cppr(xics, icp, cppr);
+	else if (cppr == icp->state.cppr)
+		return;
+
+	/*
+	 * ICP State: Up_CPPR
+	 *
+	 * The processor is raising its priority, this can result
+	 * in a rejection of a pending interrupt:
+	 *
+	 * ICP State: Reject_Current
+	 *
+	 * We can remove EE from the current processor, the update
+	 * transaction will set it again if needed
+	 */
+	kvmppc_book3s_dequeue_irqprio(icp->vcpu,
+				      BOOK3S_INTERRUPT_EXTERNAL_LEVEL);
+
+	do {
+		old_state = new_state = ACCESS_ONCE(icp->state);
+
+		reject = 0;
+		new_state.cppr = cppr;
+
+		if (cppr <= new_state.pending_pri) {
+			reject = new_state.xisr;
+			new_state.xisr = 0;
+			new_state.pending_pri = 0xff;
+		}
+
+	} while (!icp_try_update(icp, old_state, new_state, true));
+
+	/*
+	 * Check for rejects. They are handled by doing a new delivery
+	 * attempt (see comments in icp_deliver_irq).
+	 */
+	if (reject && reject != XICS_IPI)
+		icp_deliver_irq(xics, icp, reject);
+}
+
+static noinline int h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
+{
+	struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+	struct kvmppc_icp *icp = vcpu->arch.icp;
+	struct kvmppc_ics *ics;
+	struct ics_irq_state *state;
+	u32 irq = xirr & 0x00ffffff;
+	u16 src;
+
+	XICS_DBG("h_eoi vcpu %d eoi %#lx\n", vcpu->vcpu_id, xirr);
+
+	/*
+	 * ICP State: EOI
+	 *
+	 * Note: If EOI is incorrectly used by SW to lower the CPPR
+	 * value (ie more favored), we do not check for rejection of
+	 * a pending interrupt, this is a SW error and PAPR sepcifies
+	 * that we don't have to deal with it.
+	 *
+	 * The sending of an EOI to the ICS is handled after the
+	 * CPPR update
+	 *
+	 * ICP State: Down_CPPR which we handle
+	 * in a separate function as it's shared with H_CPPR.
+	 */
+	icp_down_cppr(xics, icp, xirr >> 24);
+
+	/* IPIs have no EOI */
+	if (irq == XICS_IPI)
+		return H_SUCCESS;
+	/*
+	 * EOI handling: If the interrupt is still asserted, we need to
+	 * resend it. We can take a lockless "peek" at the ICS state here.
+	 *
+	 * "Message" interrupts will never have "asserted" set
+	 */
+	ics = kvmppc_xics_find_ics(xics, irq, &src);
+	if (!ics) {
+		XICS_DBG("h_eoi: IRQ 0x%06x not found !\n", irq);
+		return H_PARAMETER;
+	}
+	state = &ics->irq_state[src];
+
+	/* Still asserted, resend it */
+	if (state->asserted)
+		icp_deliver_irq(xics, icp, irq);
+
+	return H_SUCCESS;
+}
+
+int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
+{
+	unsigned long res;
+	int rc = H_SUCCESS;
+
+	/* Check if we have an ICP */
+	if (!vcpu->arch.icp || !vcpu->kvm->arch.xics)
+		return H_HARDWARE;
+
+	switch (req) {
+	case H_XIRR:
+		res = h_xirr(vcpu);
+		kvmppc_set_gpr(vcpu, 4, res);
+		break;
+	case H_CPPR:
+		h_cppr(vcpu, kvmppc_get_gpr(vcpu, 4));
+		break;
+	case H_EOI:
+		rc = h_eoi(vcpu, kvmppc_get_gpr(vcpu, 4));
+		break;
+	case H_IPI:
+		rc = h_ipi(vcpu, kvmppc_get_gpr(vcpu, 4),
+			   kvmppc_get_gpr(vcpu, 5));
+		break;
+	}
+
+	return rc;
+}
+
+
+/* -- Initialisation code etc. -- */
+
+static int xics_debug_show(struct seq_file *m, void *private)
+{
+	struct kvmppc_xics *xics = m->private;
+	struct kvm *kvm = xics->kvm;
+	struct kvm_vcpu *vcpu;
+	int buid, i;
+
+	if (!kvm)
+		return 0;
+
+	seq_printf(m, "=========\nICP state\n=========\n");
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		struct kvmppc_icp *icp = vcpu->arch.icp;
+		union kvmppc_icp_state state;
+
+		if (!icp)
+			continue;
+
+		state.raw = ACCESS_ONCE(icp->state.raw);
+		seq_printf(m, "cpu server %#x XIRR:%#x PPRI:%#x CPPR:%#x "
+			   "MFRR:%#x OUT:%d NR:%d\n", vcpu->vcpu_id, state.xisr,
+			   state.pending_pri, state.cppr, state.mfrr,
+			   state.out_ee, state.need_resend);
+	}
+
+	for (buid = 1; buid <= KVMPPC_XICS_MAX_BUID; buid++) {
+		struct kvmppc_ics *ics = xics->ics[buid - 1];
+
+		if (!ics)
+			continue;
+
+		seq_printf(m, "=========\nICS state for BUID 0x%x\n=========\n",
+			   buid);
+
+		mutex_lock(&ics->lock);
+
+		for (i = 0; i < ics->nr_irqs; i++) {
+			struct ics_irq_state *irq = &ics->irq_state[i];
+
+			seq_printf(m, "irq 0x%06x: server %#x prio %#x save"
+				   " prio %#x asserted %d resend %d masked"
+				   " pending %d\n",
+				   irq->number, irq->server, irq->priority,
+				   irq->saved_priority, irq->asserted,
+				   irq->resend, irq->masked_pending);
+
+		}
+		mutex_unlock(&ics->lock);
+	}
+	return 0;
+}
+
+static int xics_debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, xics_debug_show, inode->i_private);
+}
+
+static const struct file_operations xics_debug_fops = {
+	.open = xics_debug_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
+
+static void xics_debugfs_init(struct kvmppc_xics *xics)
+{
+	char *name;
+
+	name = kasprintf(GFP_KERNEL, "kvm-xics-%p", xics);
+	if (!name) {
+		pr_err("%s: no memory for name\n", __func__);
+		return;
+	}
+
+	xics->dentry = debugfs_create_file(name, S_IRUGO, powerpc_debugfs_root,
+					   xics, &xics_debug_fops);
+
+	pr_debug("%s: created %s\n", __func__, name);
+	kfree(name);
+}
+
+static int kvmppc_xics_create_ics(struct kvmppc_xics *xics, u16 buid,
+				  u16 nr_irqs)
+{
+	struct kvmppc_ics *ics;
+	int i, size;
+
+
+	/* Create the ICS */
+	size = sizeof(struct kvmppc_ics) +
+		sizeof(struct ics_irq_state) * nr_irqs;
+	ics = kzalloc(size, GFP_KERNEL);
+	if (!ics)
+		return -ENOMEM;
+
+	mutex_init(&ics->lock);
+	ics->buid = buid;
+	ics->nr_irqs = nr_irqs;
+
+	for (i = 0; i < nr_irqs; i++) {
+		ics->irq_state[i].number = (buid << KVMPPC_XICS_BUID_SHIFT) | i;
+		ics->irq_state[i].priority = MASKED;
+		ics->irq_state[i].saved_priority = MASKED;
+	}
+	smp_wmb();
+	xics->ics[buid - 1] = ics;
+
+	if (buid > xics->max_buid)
+		xics->max_buid = buid;
+
+	return 0;
+}
+
+int kvmppc_xics_create_icp(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_icp *icp;
+
+	icp = kzalloc(sizeof(struct kvmppc_icp), GFP_KERNEL);
+	if (!icp)
+		return -ENOMEM;
+
+	icp->vcpu = vcpu;
+	icp->state.mfrr = MASKED;
+	icp->state.pending_pri = MASKED;
+	vcpu->arch.icp = icp;
+
+	XICS_DBG("created server for vcpu %d\n", vcpu->vcpu_id);
+
+	return 0;
+}
+
+void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu)
+{
+	if (!vcpu->arch.icp)
+		return;
+	kfree(vcpu->arch.icp);
+	vcpu->arch.icp = NULL;
+}
+
+void kvmppc_xics_free(struct kvm *kvm)
+{
+	struct kvmppc_xics *xics = kvm->arch.xics;
+	int i;
+
+	if (!xics)
+		return;
+
+	lockdep_assert_held(&kvm->lock);
+
+	debugfs_remove(xics->dentry);
+
+	if (xics->kvm) {
+		xics->kvm->arch.xics = NULL;
+		xics->kvm = NULL;
+	}
+
+	for (i = 0; i < xics->max_buid; i++) {
+		if (xics->ics[i])
+			kfree(xics->ics[i]);
+	}
+	kfree(xics);
+}
+
+/* -- ioctls -- */
+
+static int kvm_vm_ioctl_create_icp(struct kvm *kvm,
+				   struct kvm_irqchip_args *args)
+{
+	struct kvmppc_xics *xics;
+	int rc = 0;
+
+	mutex_lock(&kvm->lock);
+
+	/* Already there ? */
+	if (kvm->arch.xics)
+		return -EEXIST;
+
+	xics = kzalloc(sizeof(*xics), GFP_KERNEL);
+	if (!xics) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	xics->kvm = kvm;
+	kvm->arch.xics = xics;
+	xics_debugfs_init(xics);
+
+out:
+	mutex_unlock(&kvm->lock);
+	return rc;
+}
+
+static int kvm_vm_ioctl_create_ics(struct kvm *kvm,
+				   struct kvm_irqchip_args *args)
+{
+	struct kvmppc_xics *xics = kvm->arch.xics;
+	u16 nr_irqs, buid;
+	int rc;
+
+	if (!xics)
+		return -ENODEV;
+
+	nr_irqs = args->ics.nr_irqs;
+	buid = args->ics.buid;
+
+	/* BUID 0 is bogus */
+	if (buid == 0) {
+		rc = 0;
+		goto out;
+	}
+
+	/* Sanity checks */
+	if (nr_irqs == 0 || nr_irqs > KVMPPC_XICS_IRQ_COUNT ||
+	    buid > KVMPPC_XICS_MAX_BUID)
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+
+	/* BUID already exists */
+	if (xics->ics[buid - 1]) {
+		rc = -EEXIST;
+		goto out;
+	}
+
+	/* Create the ICS */
+	rc = kvmppc_xics_create_ics(xics, buid, nr_irqs);
+out:
+	mutex_unlock(&kvm->lock);
+	return rc;
+}
+
+static int kvm_vm_ioctl_xics_irq(struct kvm *kvm, struct kvm_irq_level *args)
+{
+	struct kvmppc_xics *xics;
+
+	/* locking against multiple callers? */
+
+	xics = kvm->arch.xics;
+	if (!xics)
+		return -ENODEV;
+
+	switch (args->level) {
+	case KVM_INTERRUPT_SET:
+	case KVM_INTERRUPT_SET_LEVEL:
+	case KVM_INTERRUPT_UNSET:
+		ics_deliver_irq(xics, args->irq, args->level);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int kvmppc_xics_ioctl(struct kvm *kvm, unsigned ioctl, unsigned long arg)
+{
+	void __user *argp = (void __user *)arg;
+	int rc;
+
+	BUILD_BUG_ON(sizeof(union kvmppc_icp_state) != sizeof(unsigned long));
+
+	switch (ioctl) {
+	case KVM_CREATE_IRQCHIP_ARGS: {
+		struct kvm_irqchip_args args;
+
+		rc = -EFAULT;
+		if (copy_from_user(&args, argp, sizeof(args)))
+			break;
+		rc = -EINVAL;
+		if (args.type == KVM_IRQCHIP_TYPE_ICP)
+			rc = kvm_vm_ioctl_create_icp(kvm, &args);
+		else if (args.type == KVM_IRQCHIP_TYPE_ICS)
+			rc = kvm_vm_ioctl_create_ics(kvm, &args);
+		break;
+	}
+
+	case KVM_IRQ_LINE: {
+		struct kvm_irq_level args;
+
+		rc = -EFAULT;
+		if (copy_from_user(&args, argp, sizeof(args)))
+			break;
+		rc = kvm_vm_ioctl_xics_irq(kvm, &args);
+		break;
+	}
+
+	default:
+		rc = -ENOTTY;
+		break;
+	}
+
+	return rc;
+}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index ba552f8..90b5b5c 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -374,6 +374,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 		break;
 #ifdef CONFIG_PPC_BOOK3S_64
 	case KVM_CAP_PPC_GET_SMMU_INFO:
+	case KVM_CAP_SPAPR_XICS:
 		r = 1;
 		break;
 #endif
@@ -954,6 +955,26 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = kvm_vm_ioctl_rtas_define_token(kvm, argp);
 		break;
 	}
+	case KVM_IRQ_LINE: {
+		struct kvm *kvm = filp->private_data;
+
+		r = -ENOTTY;
+		if (kvmppc_xics_enabled(kvm))
+			r = kvmppc_xics_ioctl(kvm, ioctl, arg);
+		break;
+	}
+	case KVM_CREATE_IRQCHIP_ARGS: {
+		struct kvm *kvm = filp->private_data;
+		u32 type;
+
+		r = -EFAULT;
+		if (get_user(type, (u32 __user *)argp))
+			break;
+		r = -EINVAL;
+		if (type == KVM_IRQCHIP_TYPE_ICP || type == KVM_IRQCHIP_TYPE_ICS)
+			r = kvmppc_xics_ioctl(kvm, ioctl, arg);
+		break;
+	}
 #endif /* CONFIG_PPC_BOOK3S_64 */
 	default:
 		r = -ENOTTY;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3a92001..8674d32 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -115,6 +115,7 @@ struct kvm_irq_level {
 	 * ACPI gsi notion of irq.
 	 * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47..
 	 * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23..
+	 * On powerpc SPAPR, the ICS source number, level is ignored.
 	 */
 	union {
 		__u32 irq;
@@ -635,6 +636,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_IRQFD_RESAMPLE 82
 #define KVM_CAP_PPC_BOOKE_WATCHDOG 83
 #define KVM_CAP_PPC_RTAS 84
+#define KVM_CAP_SPAPR_XICS 85
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -862,7 +864,9 @@ struct kvm_s390_ucas_mapping {
 #define KVM_ALLOCATE_RMA	  _IOR(KVMIO,  0xa9, struct kvm_allocate_rma)
 /* Available with KVM_CAP_PPC_RTAS */
 #define KVM_PPC_RTAS_DEFINE_TOKEN _IOW(KVMIO,  0xaa, struct kvm_rtas_token_args)
-
+#ifdef __KVM_HAVE_IRQCHIP_ARGS
+#define KVM_CREATE_IRQCHIP_ARGS   _IOW(KVMIO,  0xab, struct kvm_irqchip_args)
+#endif
 
 /*
  * ioctls for vcpu fds
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 4/9] KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVM
  2012-11-05  3:18 [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation Paul Mackerras
                   ` (2 preceding siblings ...)
  2012-11-05  3:21 ` [RFC PATCH 3/9] KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller Paul Mackerras
@ 2012-11-05  3:21 ` Paul Mackerras
  2012-11-05  3:22 ` [RFC PATCH 5/9] KVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulation Paul Mackerras
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Paul Mackerras @ 2012-11-05  3:21 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc, Benjamin Herrenschmidt

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Currently, we wake up a CPU by sending a host IPI with
smp_send_reschedule() to thread 0 of that core, which will take all
threads out of the guest, and cause them to re-evaluate their
interrupt status on the way back in.

This adds a mechanism to differentiate real host IPIs from IPIs sent
by KVM for guest threads to poke each other, in order to target the
guest threads precisely when possible and avoid that global switch of
the core to host state.

We then use this new facility in the in-kernel XICS code.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s_asm.h |    8 ++-
 arch/powerpc/include/asm/kvm_ppc.h        |   29 ++++++++
 arch/powerpc/kernel/asm-offsets.c         |    2 +
 arch/powerpc/kvm/book3s_hv.c              |   26 ++++++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  112 +++++++++++++++++++++++------
 arch/powerpc/kvm/book3s_xics.c            |    2 +-
 arch/powerpc/sysdev/xics/icp-native.c     |    8 +++
 7 files changed, 163 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 88609b2..923522d 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -20,6 +20,11 @@
 #ifndef __ASM_KVM_BOOK3S_ASM_H__
 #define __ASM_KVM_BOOK3S_ASM_H__
 
+/* XICS ICP register offsets */
+#define XICS_XIRR		4
+#define XICS_MFRR		0xc
+#define XICS_IPI		2	/* interrupt source # for IPIs */
+
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -81,10 +86,11 @@ struct kvmppc_host_state {
 #ifdef CONFIG_KVM_BOOK3S_64_HV
 	u8 hwthread_req;
 	u8 hwthread_state;
-
+	u8 host_ipi;
 	struct kvm_vcpu *kvm_vcpu;
 	struct kvmppc_vcore *kvm_vcore;
 	unsigned long xics_phys;
+	u32 saved_xirr;
 	u64 dabr;
 	u64 host_mmcr[3];
 	u32 host_pmc[8];
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index c74fd20..1c833ed 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -262,6 +262,21 @@ static inline void kvmppc_set_xics_phys(int cpu, unsigned long addr)
 	paca[cpu].kvm_hstate.xics_phys = addr;
 }
 
+static inline u32 kvmppc_get_xics_latch(void)
+{
+	u32 xirr = get_paca()->kvm_hstate.saved_xirr;
+
+	get_paca()->kvm_hstate.saved_xirr = 0;
+
+	return xirr;
+}
+
+static inline void kvmppc_set_host_ipi(int cpu, u8 host_ipi)
+{
+	paca[cpu].kvm_hstate.host_ipi = host_ipi;
+}
+
+extern void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu);
 extern void kvm_linear_init(void);
 
 #else
@@ -271,6 +286,18 @@ static inline void kvmppc_set_xics_phys(int cpu, unsigned long addr)
 static inline void kvm_linear_init(void)
 {}
 
+static inline u32 kvmppc_get_xics_latch(void)
+{
+	return 0;
+}
+
+static inline void kvmppc_set_host_ipi(int cpu, u8 host_ipi)
+{}
+
+static inline void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
+{
+	kvm_vcpu_kick(vcpu);
+}
 #endif
 
 #ifdef CONFIG_PPC_BOOK3S_64
@@ -314,4 +341,6 @@ static inline void kvmppc_lazy_ee_enable(void)
 #endif
 }
 
+extern void xics_wake_cpu(int cpu);
+
 #endif /* __POWERPC_KVM_PPC_H__ */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 7523539..d5376c7 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -545,6 +545,8 @@ int main(void)
 	HSTATE_FIELD(HSTATE_KVM_VCPU, kvm_vcpu);
 	HSTATE_FIELD(HSTATE_KVM_VCORE, kvm_vcore);
 	HSTATE_FIELD(HSTATE_XICS_PHYS, xics_phys);
+	HSTATE_FIELD(HSTATE_SAVED_XIRR, saved_xirr);
+	HSTATE_FIELD(HSTATE_HOST_IPI, host_ipi);
 	HSTATE_FIELD(HSTATE_MMCR, host_mmcr);
 	HSTATE_FIELD(HSTATE_PMC, host_pmc);
 	HSTATE_FIELD(HSTATE_PURR, host_purr);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 0d6616e..ccd7663 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -66,6 +66,31 @@
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
+void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
+{
+	int me;
+	int cpu = vcpu->cpu;
+	wait_queue_head_t *wqp;
+
+	wqp = kvm_arch_vcpu_wq(vcpu);
+	if (waitqueue_active(wqp)) {
+		wake_up_interruptible(wqp);
+		++vcpu->stat.halt_wakeup;
+	}
+
+	me = get_cpu();
+
+	/* CPU points to the first thread of the core */
+	if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) {
+		int real_cpu = cpu + vcpu->arch.ptid;
+		if (paca[real_cpu].kvm_hstate.xics_phys)
+			xics_wake_cpu(real_cpu);
+		else if (cpu_online(cpu))
+			smp_send_reschedule(cpu);			
+	}
+	put_cpu();
+}
+
 /*
  * We use the vcpu_load/put functions to measure stolen time.
  * Stolen time is counted as time when either the vcpu is able to
@@ -974,7 +999,6 @@ static void kvmppc_end_cede(struct kvm_vcpu *vcpu)
 }
 
 extern int __kvmppc_vcore_entry(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
-extern void xics_wake_cpu(int cpu);
 
 static void kvmppc_remove_runnable(struct kvmppc_vcore *vc,
 				   struct kvm_vcpu *vcpu)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 690d112..4fa7dd4 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -78,10 +78,6 @@ _GLOBAL(kvmppc_hv_entry_trampoline)
  *                                                                            *
  *****************************************************************************/
 
-#define XICS_XIRR		4
-#define XICS_QIRR		0xc
-#define XICS_IPI		2	/* interrupt source # for IPIs */
-
 /*
  * We come in here when wakened from nap mode on a secondary hw thread.
  * Relocation is off and most register values are lost.
@@ -121,7 +117,7 @@ kvm_start_guest:
 	beq	27f
 25:	ld	r5,HSTATE_XICS_PHYS(r13)
 	li	r0,0xff
-	li	r6,XICS_QIRR
+	li	r6,XICS_MFRR
 	li	r7,XICS_XIRR
 	lwzcix	r8,r5,r7		/* get and ack the interrupt */
 	sync
@@ -670,17 +666,91 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
 	cmpwi	r12,BOOK3S_INTERRUPT_SYSCALL
 	beq	hcall_try_real_mode
 
-	/* Check for mediated interrupts (could be done earlier really ...) */
+	/* Only handle external interrupts here on arch 206 and later */
 BEGIN_FTR_SECTION
-	cmpwi	r12,BOOK3S_INTERRUPT_EXTERNAL
-	bne+	1f
-	andi.	r0,r11,MSR_EE
-	beq	1f
-	mfspr	r5,SPRN_LPCR
-	andi.	r0,r5,LPCR_MER
+	b	ext_interrupt_to_host
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206)
+
+	/* External interrupt ? */
+	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
+	bne+	ext_interrupt_to_host
+
+	/* External interrupt, first check for host_ipi. If this is
+	 * set, we know the host wants us out so let's do it now
+	 */
+	lbz	r0, HSTATE_HOST_IPI(r13)
+	cmpwi	r0, 0
+	bne	ext_interrupt_to_host
+
+	/* Now read the interrupt from the ICP */
+	ld	r5, HSTATE_XICS_PHYS(r13)
+	li	r7, XICS_XIRR
+	cmpdi	r5, 0
+	beq-	ext_interrupt_to_host
+	lwzcix	r3, r5, r7
+	rlwinm.	r0, r3, 0, 0xffffff
+	sync
+	bne	1f
+
+	/* Nothing pending in the ICP, check for mediated interrupts
+	 * and bounce it to the guest
+	 */
+	andi.	r0, r11, MSR_EE
+	beq	ext_interrupt_to_host /* shouldn't happen ?? */
+	mfspr	r5, SPRN_LPCR
+	andi.	r0, r5, LPCR_MER
 	bne	bounce_ext_interrupt
-1:
-END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
+	b	ext_interrupt_to_host /* shouldn't happen ?? */
+
+1:	/* We found something in the ICP...
+	 *
+	 * If it's not an IPI, stash it in the PACA and return to
+	 * the host, we don't (yet) handle directing real external
+	 * interrupts directly to the guest
+	 */
+	cmpwi	r0, XICS_IPI
+	bne	ext_stash_for_host
+
+	/* It's an IPI, clear the MFRR and EOI it */
+	li	r0, 0xff
+	li	r6, XICS_MFRR
+	stbcix	r0, r5, r6		/* clear the IPI */
+	stwcix	r3, r5, r7		/* EOI it */
+	sync
+
+	/* We need to re-check host IPI now in case it got set in the
+	 * meantime. If it's clear, we bounce the interrupt to the
+	 * guest
+	 */
+	lbz	r0, HSTATE_HOST_IPI(r13)
+	cmpwi	r0, 0
+	bne-	1f
+
+	/* Allright, looks like an IPI for the guest, we need to set MER */
+	mfspr	r8,SPRN_LPCR
+	ori	r8,r8,LPCR_MER
+	mtspr	SPRN_LPCR,r8
+
+	/* And if the guest EE is set, we can deliver immediately, else
+	 * we return to the guest with MER set
+	 */
+	andi.	r0, r11, MSR_EE
+	bne	bounce_ext_interrupt
+	mr	r4, r9
+	b	fast_guest_return
+	
+	/* We raced with the host, we need to resend that IPI, bummer */
+1:	li	r0, IPI_PRIORITY
+	stbcix	r0, r5, r6		/* set the IPI */
+	sync
+	b	ext_interrupt_to_host
+
+ext_stash_for_host:
+	/* It's not an IPI and it's for the host, stash it in the PACA
+	 * before exit, it will be picked up by the host ICP driver
+	 */
+	stw	r3, HSTATE_SAVED_XIRR(r13)
+ext_interrupt_to_host:
 
 nohpte_cont:
 hcall_real_cont:		/* r9 = vcpu, r12 = trap, r13 = paca */
@@ -819,7 +889,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
 	beq	44f
 	ld	r8,HSTATE_XICS_PHYS(r6)	/* get thread's XICS reg addr */
 	li	r0,IPI_PRIORITY
-	li	r7,XICS_QIRR
+	li	r7,XICS_MFRR
 	stbcix	r0,r7,r8		/* trigger the IPI */
 44:	srdi.	r3,r3,1
 	addi	r6,r6,PACA_SIZE
@@ -1341,11 +1411,11 @@ hcall_real_table:
 	.long	0		/* 0x58 */
 	.long	0		/* 0x5c */
 	.long	0		/* 0x60 */
-	.long	0		/* 0x64 */
-	.long	0		/* 0x68 */
-	.long	0		/* 0x6c */
-	.long	0		/* 0x70 */
-	.long	0		/* 0x74 */
+	.long	.kvmppc_rm_h_eoi - hcall_real_table
+	.long	.kvmppc_rm_h_cppr - hcall_real_table
+	.long	.kvmppc_rm_h_ipi - hcall_real_table
+	.long	0		/* 0x70 - H_IPOLL */
+	.long	.kvmppc_rm_h_xirr - hcall_real_table
 	.long	0		/* 0x78 */
 	.long	0		/* 0x7c */
 	.long	0		/* 0x80 */
@@ -1602,7 +1672,7 @@ secondary_nap:
 	beq	37f
 	sync
 	li	r0, 0xff
-	li	r6, XICS_QIRR
+	li	r6, XICS_MFRR
 	stbcix	r0, r5, r6		/* clear the IPI */
 	stwcix	r3, r5, r7		/* EOI it */
 37:	sync
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 1538be2..ffcdb7e 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -318,7 +318,7 @@ static inline bool icp_try_update(struct kvmppc_icp *icp,
 		kvmppc_book3s_queue_irqprio(icp->vcpu,
 					    BOOK3S_INTERRUPT_EXTERNAL_LEVEL);
 		if (!change_self)
-			kvm_vcpu_kick(icp->vcpu);
+			kvmppc_fast_vcpu_kick(icp->vcpu);
 	}
  bail:
 	return success;
diff --git a/arch/powerpc/sysdev/xics/icp-native.c b/arch/powerpc/sysdev/xics/icp-native.c
index 48861d3..20b328b 100644
--- a/arch/powerpc/sysdev/xics/icp-native.c
+++ b/arch/powerpc/sysdev/xics/icp-native.c
@@ -51,6 +51,12 @@ static struct icp_ipl __iomem *icp_native_regs[NR_CPUS];
 static inline unsigned int icp_native_get_xirr(void)
 {
 	int cpu = smp_processor_id();
+	unsigned int xirr;
+
+	/* Handled an interrupt latched by KVM */
+	xirr = kvmppc_get_xics_latch();
+	if (xirr)
+		return xirr;
 
 	return in_be32(&icp_native_regs[cpu]->xirr.word);
 }
@@ -138,6 +144,7 @@ static unsigned int icp_native_get_irq(void)
 
 static void icp_native_cause_ipi(int cpu, unsigned long data)
 {
+	kvmppc_set_host_ipi(cpu, 1);
 	icp_native_set_qirr(cpu, IPI_PRIORITY);
 }
 
@@ -151,6 +158,7 @@ static irqreturn_t icp_native_ipi_action(int irq, void *dev_id)
 {
 	int cpu = smp_processor_id();
 
+	kvmppc_set_host_ipi(cpu, 0);
 	icp_native_set_qirr(cpu, 0xff);
 
 	return smp_ipi_demux();
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 5/9] KVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulation
  2012-11-05  3:18 [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation Paul Mackerras
                   ` (3 preceding siblings ...)
  2012-11-05  3:21 ` [RFC PATCH 4/9] KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVM Paul Mackerras
@ 2012-11-05  3:22 ` Paul Mackerras
  2012-11-05  3:23 ` [RFC PATCH 6/9] KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts Paul Mackerras
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Paul Mackerras @ 2012-11-05  3:22 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc, Benjamin Herrenschmidt

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>

This adds an implementation of the XICS hypercalls in real mode for HV
KVM, which allows us to avoid exiting the guest MMU context on all
threads for a variety of operations such as fetching a pending
interrupt, EOI of messages, IPIs, etc.

For debugging purposes, the use of the real mode implementation can be
disabled by setting the KVM_ICP_FLAG_NOREALMODE bit in the icp.flags
field of struct kvm_irqchip_args.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/uapi/asm/kvm.h  |    1 +
 arch/powerpc/kvm/Makefile            |    1 +
 arch/powerpc/kvm/book3s_hv_rm_xics.c |  402 ++++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_xics.c       |  153 +++++--------
 arch/powerpc/kvm/book3s_xics.h       |  116 ++++++++++
 5 files changed, 572 insertions(+), 101 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_hv_rm_xics.c
 create mode 100644 arch/powerpc/kvm/book3s_xics.h

diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 145c645..55c1907 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -314,6 +314,7 @@ struct kvm_irqchip_args {
 		 * structures.
 		 */
 		struct {
+#define KVM_ICP_FLAG_NOREALMODE		0x00000001 /* Disable real mode ICP */
 			__u32 flags;
 		} icp;
 
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index ec2f8da..c3d958d 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -72,6 +72,7 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
 	book3s_hv_rmhandlers.o \
 	book3s_hv_rm_mmu.o \
 	book3s_64_vio_hv.o \
+	book3s_hv_rm_xics.o \
 	book3s_hv_builtin.o
 
 kvm-book3s_64-module-objs := \
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
new file mode 100644
index 0000000..49bb25b
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -0,0 +1,402 @@
+/*
+ * Copyright 2012 Michael Ellerman, IBM Corporation.
+ * Copyright 2012 Benjamin Herrenschmidt, IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/kvm_host.h>
+#include <linux/err.h>
+
+#include <asm/kvm_book3s.h>
+#include <asm/kvm_ppc.h>
+#include <asm/hvcall.h>
+#include <asm/xics.h>
+#include <asm/debug.h>
+#include <asm/synch.h>
+#include <asm/ppc-opcode.h>
+
+#include "book3s_xics.h"
+
+#define DEBUG_PASSUP
+
+static inline void rm_writeb(unsigned long paddr, u8 val)
+{
+	__asm__ __volatile__("sync; stbcix %0,0,%1"
+		: : "r" (val), "r" (paddr) : "memory");
+}
+
+static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu, struct kvm_vcpu *this_vcpu)
+{
+	struct kvmppc_icp *this_icp = this_vcpu->arch.icp;
+	unsigned long xics_phys;
+	int cpu;
+
+	/* Mark the target VCPU as having an interrupt pending */
+	vcpu->stat.queue_intr++;
+	set_bit(BOOK3S_IRQPRIO_EXTERNAL_LEVEL, &vcpu->arch.pending_exceptions);
+
+	/* Kick self ? Just set MER and return */
+	if (vcpu == this_vcpu) {
+		mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_MER);
+		return;
+	}
+
+	/* Check if the core is loaded, if not, too hard */
+	cpu = vcpu->cpu;
+	if (cpu < 0 || cpu >= nr_cpu_ids) {
+		this_icp->rm_action |= XICS_RM_KICK_VCPU;
+		this_icp->rm_kick_target = vcpu;
+		return;
+	}
+	/* In SMT cpu will always point to thread 0, we adjust it */
+	cpu += vcpu->arch.ptid;
+
+	/* Not too hard, then poke the target */
+	xics_phys = paca[cpu].kvm_hstate.xics_phys;
+	rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+}
+
+static void icp_rm_clr_vcpu_irq(struct kvm_vcpu *vcpu)
+{
+	/* Note: Only called on self ! */
+	clear_bit(BOOK3S_IRQPRIO_EXTERNAL_LEVEL, &vcpu->arch.pending_exceptions);
+	mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~LPCR_MER);
+}
+
+static inline bool icp_rm_try_update(struct kvmppc_icp *icp,
+				     union kvmppc_icp_state old,
+				     union kvmppc_icp_state new)
+{
+	struct kvm_vcpu *this_vcpu = local_paca->kvm_hstate.kvm_vcpu;
+	bool success;
+
+	/* Calculate new output value */
+	new.out_ee = (new.xisr && (new.pending_pri < new.cppr));
+
+	/* Attempt atomic update */
+	success = cmpxchg64(&icp->state.raw, old.raw, new.raw) == old.raw;
+	if (!success)
+		goto bail;
+
+	/*
+	 * Check for output state update
+	 *
+	 * Note that this is racy since another processor could be updating
+	 * the state already. This is why we never clear the interrupt output
+	 * here, we only ever set it. The clear only happens prior to doing
+	 * an update and only by the processor itself. Currently we do it
+	 * in Accept (H_XIRR) and Up_Cppr (H_XPPR).
+	 *
+	 * We also do not try to figure out whether the EE state has changed,
+	 * we unconditionally set it if the new state calls for it. The reason
+	 * for that is that we opportunistically remove the pending interrupt
+	 * flag when raising CPPR, so we need to set it back here if an
+	 * interrupt is still pending.
+	 */
+	if (new.out_ee)
+		icp_rm_set_vcpu_irq(icp->vcpu, this_vcpu);
+
+	/* Expose the state change for debug purposes */
+	this_vcpu->arch.icp->rm_dbgstate = new;
+	this_vcpu->arch.icp->rm_dbgtgt = icp->vcpu;
+
+ bail:
+	return success;
+}
+
+static inline int check_too_hard(struct kvmppc_xics *xics, struct kvmppc_icp *icp)
+{
+	return (xics->real_mode_dbg || icp->rm_action) ? H_TOO_HARD : H_SUCCESS;
+}
+
+static void icp_rm_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
+			     u8 new_cppr)
+{
+	union kvmppc_icp_state old_state, new_state;
+	bool resend;
+
+	/*
+	 * This handles several related states in one operation:
+	 *
+	 * ICP State: Down_CPPR
+	 *
+	 * Load CPPR with new value and if the XISR is 0
+	 * then check for resends:
+	 *
+	 * ICP State: Resend
+	 *
+	 * If MFRR is more favored than CPPR, check for IPIs
+	 * and notify ICS of a potential resend. This is done
+	 * asynchronously (when used in real mode, we will have
+	 * to exit here).
+	 *
+	 * We do not handle the complete Check_IPI as documented
+	 * here. In the PAPR, this state will be used for both
+	 * Set_MFRR and Down_CPPR. However, we know that we aren't
+	 * changing the MFRR state here so we don't need to handle
+	 * the case of an MFRR causing a reject of a pending irq,
+	 * this will have been handled when the MFRR was set in the
+	 * first place.
+	 *
+	 * Thus we don't have to handle rejects, only resends.
+	 *
+	 * When implementing real mode for HV KVM, resend will lead to
+	 * a H_TOO_HARD return and the whole transaction will be handled
+	 * in virtual mode.
+	 */
+	do {
+		old_state = new_state = ACCESS_ONCE(icp->state);
+
+		/* Down_CPPR */
+		new_state.cppr = new_cppr;
+
+		/*
+		 * Cut down Resend / Check_IPI / IPI
+		 *
+		 * The logic is that we cannot have a pending interrupt
+		 * trumped by an IPI at this point (see above), so we
+		 * know that either the pending interrupt is already an
+		 * IPI (in which case we don't care to override it) or
+		 * it's either more favored than us or non existent
+		 */
+		if (new_state.mfrr < new_cppr &&
+		    new_state.mfrr <= new_state.pending_pri) {
+			new_state.pending_pri = new_state.mfrr;
+			new_state.xisr = XICS_IPI;
+		}
+
+		/* Latch/clear resend bit */
+		resend = new_state.need_resend;
+		new_state.need_resend = 0;
+
+	} while (!icp_rm_try_update(icp, old_state, new_state));
+
+	/*
+	 * Now handle resend checks. Those are asynchronous to the ICP
+	 * state update in HW (ie bus transactions) so we can handle them
+	 * separately here as well.
+	 */
+	if (resend)
+		icp->rm_action |= XICS_RM_CHECK_RESEND;
+}
+
+
+unsigned long kvmppc_rm_h_xirr(struct kvm_vcpu *vcpu)
+{
+	union kvmppc_icp_state old_state, new_state;
+	struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+	struct kvmppc_icp *icp = vcpu->arch.icp;
+	u32 xirr;
+
+	if (!xics->real_mode)
+		return H_TOO_HARD;
+
+	/* First clear the interrupt */
+	icp_rm_clr_vcpu_irq(icp->vcpu);
+
+	/*
+	 * ICP State: Accept_Interrupt
+	 *
+	 * Return the pending interrupt (if any) along with the
+	 * current CPPR, then clear the XISR & set CPPR to the
+	 * pending priority
+	 */
+	do {
+		old_state = new_state = ACCESS_ONCE(icp->state);
+
+		xirr = old_state.xisr | (((u32)old_state.cppr) << 24);
+		if (!old_state.xisr)
+			break;
+		new_state.cppr = new_state.pending_pri;
+		new_state.pending_pri = 0xff;
+		new_state.xisr = 0;
+
+	} while (!icp_rm_try_update(icp, old_state, new_state));
+
+	/* Return the result in GPR4 */
+	vcpu->arch.gpr[4] = xirr;
+
+	return check_too_hard(xics, icp);
+}
+
+int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server, unsigned long mfrr)
+{
+        union kvmppc_icp_state old_state, new_state;
+	struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+	struct kvmppc_icp *icp, *this_icp = vcpu->arch.icp;
+	u32 reject;
+	bool resend;
+	bool local;
+
+	if (!xics->real_mode)
+		return H_TOO_HARD;
+
+	local = vcpu->vcpu_id == server;
+	if (local)
+		icp = this_icp;
+	else
+		icp = kvmppc_xics_find_server(vcpu->kvm, server);
+	if (!icp)
+		return H_PARAMETER;
+
+	/*
+	 * ICP state: Set_MFRR
+	 *
+	 * If the CPPR is more favored than the new MFRR, then
+	 * nothing needs to be done as there can be no XISR to
+	 * reject.
+	 *
+	 * If the CPPR is less favored, then we might be replacing
+	 * an interrupt, and thus need to possibly reject it as in
+	 *
+	 * ICP state: Check_IPI
+	 */
+	do {
+		old_state = new_state = ACCESS_ONCE(icp->state);
+
+		/* Set_MFRR */
+		new_state.mfrr = mfrr;
+
+		/* Check_IPI */
+		reject = 0;
+		resend = false;
+		if (mfrr < new_state.cppr) {
+			/* Reject a pending interrupt if not an IPI */
+			if (mfrr <= new_state.pending_pri)
+				reject = new_state.xisr;
+			new_state.pending_pri = mfrr;
+			new_state.xisr = XICS_IPI;
+		}
+
+		if (mfrr > old_state.mfrr && mfrr > new_state.cppr) {
+			resend = new_state.need_resend;
+			new_state.need_resend = 0;
+		}
+	} while (!icp_rm_try_update(icp, old_state, new_state));
+
+	/* Pass rejects to virtual mode */
+	if (reject && reject != XICS_IPI) {
+		this_icp->rm_action |= XICS_RM_REJECT;
+		this_icp->rm_reject = reject;
+	}
+
+	/* Pass resends to virtual mode */
+	if (resend)
+		this_icp->rm_action |= XICS_RM_CHECK_RESEND;
+
+	return check_too_hard(xics, this_icp);
+}
+
+int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
+{
+	union kvmppc_icp_state old_state, new_state;
+	struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+	struct kvmppc_icp *icp = vcpu->arch.icp;
+	u32 reject;
+
+	if (!xics->real_mode)
+		return H_TOO_HARD;
+
+	/*
+	 * ICP State: Set_CPPR
+	 *
+	 * We can safely compare the new value with the current
+	 * value outside of the transaction as the CPPR is only
+	 * ever changed by the processor on itself
+	 */
+	if (cppr > icp->state.cppr) {
+		icp_rm_down_cppr(xics, icp, cppr);
+		goto bail;
+	} else if (cppr == icp->state.cppr)
+		return H_SUCCESS;
+
+	/*
+	 * ICP State: Up_CPPR
+	 *
+	 * The processor is raising its priority, this can result
+	 * in a rejection of a pending interrupt:
+	 *
+	 * ICP State: Reject_Current
+	 *
+	 * We can remove EE from the current processor, the update
+	 * transaction will set it again if needed
+	 */
+	icp_rm_clr_vcpu_irq(icp->vcpu);
+
+	do {
+		old_state = new_state = ACCESS_ONCE(icp->state);
+
+		reject = 0;
+		new_state.cppr = cppr;
+
+		if (cppr <= new_state.pending_pri) {
+			reject = new_state.xisr;
+			new_state.xisr = 0;
+			new_state.pending_pri = 0xff;
+		}
+
+	} while (!icp_rm_try_update(icp, old_state, new_state));
+
+	/* Pass rejects to virtual mode */
+	if (reject && reject != XICS_IPI) {
+		icp->rm_action |= XICS_RM_REJECT;
+		icp->rm_reject = reject;
+	}
+ bail:
+	return check_too_hard(xics, icp);
+}
+
+int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
+{
+	struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+	struct kvmppc_icp *icp = vcpu->arch.icp;
+	struct kvmppc_ics *ics;
+	struct ics_irq_state *state;
+	u32 irq = xirr & 0x00ffffff;
+	u16 src;
+
+	if (!xics->real_mode)
+		return H_TOO_HARD;
+
+	/*
+	 * ICP State: EOI
+	 *
+	 * Note: If EOI is incorrectly used by SW to lower the CPPR
+	 * value (ie more favored), we do not check for rejection of
+	 * a pending interrupt, this is a SW error and PAPR sepcifies
+	 * that we don't have to deal with it.
+	 *
+	 * The sending of an EOI to the ICS is handled after the
+	 * CPPR update
+	 *
+	 * ICP State: Down_CPPR which we handle
+	 * in a separate function as it's shared with H_CPPR.
+	 */
+	icp_rm_down_cppr(xics, icp, xirr >> 24);
+
+	/* IPIs have no EOI */
+	if (irq == XICS_IPI)
+		goto bail;
+	/*
+	 * EOI handling: If the interrupt is still asserted, we need to
+	 * resend it. We can take a lockless "peek" at the ICS state here.
+	 *
+	 * "Message" interrupts will never have "asserted" set
+	 */
+	ics = kvmppc_xics_find_ics(xics, irq, &src);
+	if (!ics)
+		goto bail;
+	state = &ics->irq_state[src];
+
+	/* Still asserted, resend it, we make it look like a reject */
+	if (state->asserted) {
+		icp->rm_action |= XICS_RM_REJECT;
+		icp->rm_reject = irq;
+	}
+ bail:
+	return check_too_hard(xics, icp);
+}
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index ffcdb7e..3858c14 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -22,7 +22,7 @@
 #include <linux/debugfs.h>
 #include <linux/seq_file.h>
 
-#define MASKED	0xff
+#include "book3s_xics.h"
 
 #define XICS_DBG(fmt...) do { } while (0)
 //#define XICS_DBG(fmt...) do { trace_printk(fmt); } while (0)
@@ -64,93 +64,6 @@
  * - ioctl's to save/restore the entire state for snapshot & migration
  */
 
-#define KVMPPC_XICS_MAX_BUID	0xfff
-#define KVMPPC_XICS_IRQ_COUNT	0x1000
-#define KVMPPC_XICS_BUID_SHIFT	12
-#define KVMPPC_XICS_SRC_MASK	0xfff
-
-/* State for one irq in an ics */
-struct ics_irq_state {
-	u32 number;
-	u32 server;
-	u8  priority;
-	u8  saved_priority; /* currently unused */
-	u8  resend;
-	u8  masked_pending;
-	u8  asserted; /* Only for LSI */
-};
-
-#define ICP_RESEND_MAP_SIZE	\
-	((KVMPPC_XICS_MAX_BUID + BITS_PER_LONG - 1) / BITS_PER_LONG)
-
-/* Atomic ICP state, updated with a single compare & swap */
-union kvmppc_icp_state {
-	unsigned long raw;
-	struct {
-		u8 out_ee : 1;
-		u8 need_resend : 1;
-		u8 cppr;
-		u8 mfrr;
-		u8 pending_pri;
-		u32 xisr;
-	};
-};
-
-struct kvmppc_icp {
-	struct kvm_vcpu *vcpu;
-	union kvmppc_icp_state state;
-	unsigned long resend_map[ICP_RESEND_MAP_SIZE];
-};
-
-struct kvmppc_ics {
-	struct mutex lock;
-	u16 buid;
-	u16 nr_irqs;
-	struct ics_irq_state irq_state[];
-};
-
-struct kvmppc_xics {
-	struct kvm *kvm;
-	struct dentry *dentry;
-	u32 max_buid;
-	struct kvmppc_ics *ics[KVMPPC_XICS_MAX_BUID]; /* [1...MAX_BUID] */
-};
-
-static struct kvmppc_icp *kvmppc_xics_find_server(struct kvm *kvm, u32 nr)
-{
-	struct kvm_vcpu *vcpu = NULL;
-	int i;
-
-	kvm_for_each_vcpu(i, vcpu, kvm) {
-		if (nr == vcpu->vcpu_id)
-			return vcpu->arch.icp;
-	}
-	return NULL;
-}
-
-static struct kvmppc_ics *kvmppc_xics_find_ics(struct kvmppc_xics *xics,
-					       u32 irq, u16 *source)
-{
-	u16 buid = irq >> KVMPPC_XICS_BUID_SHIFT;
-	u16 src = irq & KVMPPC_XICS_SRC_MASK;
-	struct kvmppc_ics *ics;
-
-	if (WARN_ON_ONCE(!buid || buid > KVMPPC_XICS_MAX_BUID)) {
-		XICS_DBG("kvmppc_xics_find_ics: irq %#x BUID out of range !\n",
-			 irq);
-		return NULL;
-	}
-	ics = xics->ics[buid - 1];
-	if (!ics)
-		return NULL;
-	if (src >= ics->nr_irqs)
-		return NULL;
-	if (source)
-		*source = src;
-	return ics;
-}
-
-
 /* -- ICS routines -- */
 
 static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
@@ -311,8 +224,10 @@ static inline bool icp_try_update(struct kvmppc_icp *icp,
 	 * in Accept (H_XIRR) and Up_Cppr (H_XPPR).
 	 *
 	 * We also do not try to figure out whether the EE state has changed,
-	 * we unconditionally set it if the new state calls for it for the
-	 * same reason.
+	 * we unconditionally set it if the new state calls for it. The reason
+	 * for that is that we opportunistically remove the pending interrupt
+	 * flag when raising CPPR, so we need to set it back here if an
+	 * interrupt is still pending.
 	 */
 	if (new.out_ee) {
 		kvmppc_book3s_queue_irqprio(icp->vcpu,
@@ -574,7 +489,7 @@ static void icp_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 		icp_check_resend(xics, icp);
 }
 
-static noinline unsigned long h_xirr(struct kvm_vcpu *vcpu)
+static noinline unsigned long kvmppc_h_xirr(struct kvm_vcpu *vcpu)
 {
 	union kvmppc_icp_state old_state, new_state;
 	struct kvmppc_icp *icp = vcpu->arch.icp;
@@ -608,8 +523,8 @@ static noinline unsigned long h_xirr(struct kvm_vcpu *vcpu)
 	return xirr;
 }
 
-static noinline int h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
-			  unsigned long mfrr)
+static noinline int kvmppc_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
+				 unsigned long mfrr)
 {
         union kvmppc_icp_state old_state, new_state;
 	struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
@@ -677,7 +592,7 @@ static noinline int h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
 	return H_SUCCESS;
 }
 
-static noinline void h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
+static noinline void kvmppc_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
 {
 	union kvmppc_icp_state old_state, new_state;
 	struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
@@ -734,7 +649,7 @@ static noinline void h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
 		icp_deliver_irq(xics, icp, reject);
 }
 
-static noinline int h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
+static noinline int kvmppc_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
 {
 	struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
 	struct kvmppc_icp *icp = vcpu->arch.icp;
@@ -784,29 +699,54 @@ static noinline int h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
 	return H_SUCCESS;
 }
 
+static int noinline kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall)
+{
+	struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+	struct kvmppc_icp *icp = vcpu->arch.icp;
+
+	XICS_DBG("XICS_RM: H_%x completing, act: %x state: %lx tgt: %p\n",
+		 hcall, icp->rm_action, icp->rm_dbgstate.raw, icp->rm_dbgtgt);
+
+	if (icp->rm_action & XICS_RM_KICK_VCPU)
+		kvmppc_fast_vcpu_kick(icp->rm_kick_target);
+	if (icp->rm_action & XICS_RM_CHECK_RESEND)
+		icp_check_resend(xics, icp);
+	if (icp->rm_action & XICS_RM_REJECT)
+		icp_deliver_irq(xics, icp, icp->rm_reject);
+
+	icp->rm_action = 0;
+
+	return H_SUCCESS;
+}
+
 int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
 {
+	struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
 	unsigned long res;
 	int rc = H_SUCCESS;
 
 	/* Check if we have an ICP */
-	if (!vcpu->arch.icp || !vcpu->kvm->arch.xics)
+	if (!xics || !vcpu->arch.icp)
 		return H_HARDWARE;
 
+	/* Check for real mode returning too hard */
+	if (xics->real_mode)
+		return kvmppc_xics_rm_complete(vcpu, req);
+
 	switch (req) {
 	case H_XIRR:
-		res = h_xirr(vcpu);
+		res = kvmppc_h_xirr(vcpu);
 		kvmppc_set_gpr(vcpu, 4, res);
 		break;
 	case H_CPPR:
-		h_cppr(vcpu, kvmppc_get_gpr(vcpu, 4));
+		kvmppc_h_cppr(vcpu, kvmppc_get_gpr(vcpu, 4));
 		break;
 	case H_EOI:
-		rc = h_eoi(vcpu, kvmppc_get_gpr(vcpu, 4));
+		rc = kvmppc_h_eoi(vcpu, kvmppc_get_gpr(vcpu, 4));
 		break;
 	case H_IPI:
-		rc = h_ipi(vcpu, kvmppc_get_gpr(vcpu, 4),
-			   kvmppc_get_gpr(vcpu, 5));
+		rc = kvmppc_h_ipi(vcpu, kvmppc_get_gpr(vcpu, 4),
+				  kvmppc_get_gpr(vcpu, 5));
 		break;
 	}
 
@@ -1004,6 +944,17 @@ static int kvm_vm_ioctl_create_icp(struct kvm *kvm,
 	kvm->arch.xics = xics;
 	xics_debugfs_init(xics);
 
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+	if (cpu_has_feature(CPU_FTR_ARCH_206)) {
+	/* Enable real mode support */
+		if (!args->icp.flags & KVM_ICP_FLAG_NOREALMODE)
+			xics->real_mode = true;
+#ifdef DEBUG_REALMODE
+		xics->real_mode_dbg = true;
+#endif
+	}
+#endif /* CONFIG_KVM_BOOK3S_64_HV */
+
 out:
 	mutex_unlock(&kvm->lock);
 	return rc;
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
new file mode 100644
index 0000000..951eacb
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_xics.h
@@ -0,0 +1,116 @@
+/*
+ * Copyright 2012 Michael Ellerman, IBM Corporation.
+ * Copyright 2012 Benjamin Herrenschmidt, IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _KVM_PPC_BOOK3S_XICS_H
+#define _KVM_PPC_BOOK3S_XICS_H
+
+#define KVMPPC_XICS_MAX_BUID	0xfff
+#define KVMPPC_XICS_IRQ_COUNT	0x1000
+#define KVMPPC_XICS_BUID_SHIFT	12
+#define KVMPPC_XICS_SRC_MASK	0xfff
+
+#define MASKED	0xff
+
+/* State for one irq in an ics */
+struct ics_irq_state {
+	u32 number;
+	u32 server;
+	u8  priority;
+	u8  saved_priority; /* currently unused */
+	u8  resend;
+	u8  masked_pending;
+	u8  asserted; /* Only for LSI */
+};
+
+#define ICP_RESEND_MAP_SIZE	\
+	((KVMPPC_XICS_MAX_BUID + BITS_PER_LONG - 1) / BITS_PER_LONG)
+
+/* Atomic ICP state, updated with a single compare & swap */
+union kvmppc_icp_state {
+	unsigned long raw;
+	struct {
+		u8 out_ee : 1;
+		u8 need_resend : 1;
+		u8 cppr;
+		u8 mfrr;
+		u8 pending_pri;
+		u32 xisr;
+	};
+};
+
+struct kvmppc_icp {
+	struct kvm_vcpu *vcpu;
+	union kvmppc_icp_state state;
+	unsigned long resend_map[ICP_RESEND_MAP_SIZE];
+
+	/* Real mode might find something too hard, here's the action
+	 * it might request from virtual mode
+	 */
+#define XICS_RM_KICK_VCPU	0x1
+#define XICS_RM_CHECK_RESEND	0x2
+#define XICS_RM_REJECT		0x4
+	u32 rm_action;
+	struct kvm_vcpu *rm_kick_target;
+	u32  rm_reject;
+
+	/* Debug stuff for real mode */
+	union kvmppc_icp_state rm_dbgstate;
+	struct kvm_vcpu *rm_dbgtgt;
+};
+
+struct kvmppc_ics {
+	struct mutex lock;
+	u16 buid;
+	u16 nr_irqs;
+	struct ics_irq_state irq_state[];
+};
+
+struct kvmppc_xics {
+	struct kvm *kvm;
+	struct dentry *dentry;
+	u32 max_buid;
+	bool real_mode;
+	bool real_mode_dbg;
+	struct kvmppc_ics *ics[KVMPPC_XICS_MAX_BUID]; /* [1...MAX_BUID] */
+};
+
+static inline struct kvmppc_icp *kvmppc_xics_find_server(struct kvm *kvm,
+							 u32 nr)
+{
+	struct kvm_vcpu *vcpu = NULL;
+	int i;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (nr == vcpu->vcpu_id)
+			return vcpu->arch.icp;
+	}
+	return NULL;
+}
+
+static inline struct kvmppc_ics *kvmppc_xics_find_ics(struct kvmppc_xics *xics,
+						      u32 irq, u16 *source)
+{
+	u16 buid = irq >> KVMPPC_XICS_BUID_SHIFT;
+	u16 src = irq & KVMPPC_XICS_SRC_MASK;
+	struct kvmppc_ics *ics;
+
+	if (WARN_ON_ONCE(!buid || buid > KVMPPC_XICS_MAX_BUID))
+		return NULL;
+	ics = xics->ics[buid - 1];
+	if (!ics)
+		return NULL;
+	if (src >= ics->nr_irqs)
+		return NULL;
+	if (source)
+		*source = src;
+	return ics;
+}
+
+
+#endif /* _KVM_PPC_BOOK3S_XICS_H */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 6/9] KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts
  2012-11-05  3:18 [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation Paul Mackerras
                   ` (4 preceding siblings ...)
  2012-11-05  3:22 ` [RFC PATCH 5/9] KVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulation Paul Mackerras
@ 2012-11-05  3:23 ` Paul Mackerras
  2012-11-05  3:23 ` [RFC PATCH 7/9] KVM: PPC: Book3S: Add support for ibm,int-on/off RTAS calls Paul Mackerras
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Paul Mackerras @ 2012-11-05  3:23 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc, Benjamin Herrenschmidt

This streamlines our handling of external interrupts that come in
while we're in the guest.  First, when waking up a hardware thread
that was napping, we split off the "napping due to H_CEDE" case
earlier, and use the code that handles an external interrupt (0x500)
in the guest to handle that too.  Secondly, the code that handles
those external interrupts now checks if any other thread is exiting
to the host before bouncing an external interrupt to the guest, and
also checks that there is actually an external interrupt pending for
the guest before setting the LPCR MER bit (mediated external request).

This also makes sure that we clear the "ceded" flag when we handle a
wakeup from cede in real mode, and fixes a potential infinite loop
in kvmppc_run_vcpu() which can occur if we ever end up with the ceded
flag set but MSR[EE] off.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/reg.h          |    1 +
 arch/powerpc/kvm/book3s_hv.c            |    5 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  140 +++++++++++++++++--------------
 3 files changed, 81 insertions(+), 65 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index d24c141..faf77d5 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -254,6 +254,7 @@
 #define     LPCR_PECE1	0x00002000	/* decrementer can cause exit */
 #define     LPCR_PECE2	0x00001000	/* machine check etc can cause exit */
 #define   LPCR_MER	0x00000800	/* Mediated External Exception */
+#define   LPCR_MER_SH	11
 #define   LPCR_LPES    0x0000000c
 #define   LPCR_LPES0   0x00000008      /* LPAR Env selector 0 */
 #define   LPCR_LPES1   0x00000004      /* LPAR Env selector 1 */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ccd7663..e6a4982 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1373,9 +1373,12 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 			break;
 		vc->runner = vcpu;
 		n_ceded = 0;
-		list_for_each_entry(v, &vc->runnable_threads, arch.run_list)
+		list_for_each_entry(v, &vc->runnable_threads, arch.run_list) {
 			if (!v->arch.pending_exceptions)
 				n_ceded += v->arch.ceded;
+			else
+				v->arch.ceded = 0;
+		}
 		if (n_ceded == vc->n_runnable)
 			kvmppc_vcore_blocked(vc);
 		else
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 4fa7dd4..2d8fe73 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -96,50 +96,51 @@ kvm_start_guest:
 	li	r0,1
 	stb	r0,PACA_NAPSTATELOST(r13)
 
-	/* get vcpu pointer, NULL if we have no vcpu to run */
-	ld	r4,HSTATE_KVM_VCPU(r13)
-	cmpdi	cr1,r4,0
+	/* were we napping due to cede? */
+	lbz	r0,HSTATE_NAPPING(r13)
+	cmpwi	r0,0
+	bne	kvm_end_cede
+
+	/*
+	 * We weren't napping due to cede, so this must be a secondary
+	 * thread being woken up to run a guest, or being woken up due
+	 * to a stray IPI.  (Or due to some machine check or hypervisor
+	 * maintenance interrupt while the core is in KVM.)
+	 */
 
 	/* Check the wake reason in SRR1 to see why we got here */
 	mfspr	r3,SPRN_SRR1
 	rlwinm	r3,r3,44-31,0x7		/* extract wake reason field */
 	cmpwi	r3,4			/* was it an external interrupt? */
-	bne	27f
-
-	/*
-	 * External interrupt - for now assume it is an IPI, since we
-	 * should never get any other interrupts sent to offline threads.
-	 * Only do this for secondary threads.
-	 */
-	beq	cr1,25f
-	lwz	r3,VCPU_PTID(r4)
-	cmpwi	r3,0
-	beq	27f
-25:	ld	r5,HSTATE_XICS_PHYS(r13)
-	li	r0,0xff
-	li	r6,XICS_MFRR
-	li	r7,XICS_XIRR
+	bne	27f			/* if not */
+	ld	r5,HSTATE_XICS_PHYS(r13)
+	li	r7,XICS_XIRR		/* if it was an external interrupt, */
 	lwzcix	r8,r5,r7		/* get and ack the interrupt */
 	sync
 	clrldi.	r9,r8,40		/* get interrupt source ID. */
-	beq	27f			/* none there? */
-	cmpwi	r9,XICS_IPI
-	bne	26f
+	beq	28f			/* none there? */
+	cmpwi	r9,XICS_IPI		/* was it an IPI? */
+	bne	29f
+	li	r0,0xff
+	li	r6,XICS_MFRR
 	stbcix	r0,r5,r6		/* clear IPI */
-26:	stwcix	r8,r5,r7		/* EOI the interrupt */
-
-27:	/* XXX should handle hypervisor maintenance interrupts etc. here */
+	stwcix	r8,r5,r7		/* EOI the interrupt */
+	sync				/* order loading of vcpu after that */
 
-	/* reload vcpu pointer after clearing the IPI */
+	/* get vcpu pointer, NULL if we have no vcpu to run */
 	ld	r4,HSTATE_KVM_VCPU(r13)
 	cmpdi	r4,0
 	/* if we have no vcpu to run, go back to sleep */
 	beq	kvm_no_guest
+	b	kvmppc_hv_entry
 
-	/* were we napping due to cede? */
-	lbz	r0,HSTATE_NAPPING(r13)
-	cmpwi	r0,0
-	bne	kvm_end_cede
+27:	/* XXX should handle hypervisor maintenance interrupts etc. here */
+	b	kvm_no_guest
+28:	/* SRR1 said external but ICP said nope?? */
+	b	kvm_no_guest
+29:	/* External non-IPI interrupt to offline secondary thread? help?? */
+	stw	r8,HSTATE_SAVED_XIRR(r13)
+	b	kvm_no_guest
 
 .global kvmppc_hv_entry
 kvmppc_hv_entry:
@@ -484,20 +485,20 @@ toc_tlbie_lock:
 	mtctr	r6
 	mtxer	r7
 
+	ld	r10, VCPU_PC(r4)
+	ld	r11, VCPU_MSR(r4)
 kvmppc_cede_reentry:		/* r4 = vcpu, r13 = paca */
 	ld	r6, VCPU_SRR0(r4)
 	ld	r7, VCPU_SRR1(r4)
-	ld	r10, VCPU_PC(r4)
-	ld	r11, VCPU_MSR(r4)	/* r11 = vcpu->arch.msr & ~MSR_HV */
 
+	/* r11 = vcpu->arch.msr & ~MSR_HV */
 	rldicl	r11, r11, 63 - MSR_HV_LG, 1
 	rotldi	r11, r11, 1 + MSR_HV_LG
 	ori	r11, r11, MSR_ME
 
 	/* Check if we can deliver an external or decrementer interrupt now */
 	ld	r0,VCPU_PENDING_EXC(r4)
-	li	r8,(1 << BOOK3S_IRQPRIO_EXTERNAL)
-	oris	r8,r8,(1 << BOOK3S_IRQPRIO_EXTERNAL_LEVEL)@h
+	lis	r8,(1 << BOOK3S_IRQPRIO_EXTERNAL_LEVEL)@h
 	and	r0,r0,r8
 	cmpdi	cr1,r0,0
 	andi.	r0,r11,MSR_EE
@@ -525,10 +526,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
 	/* Move SRR0 and SRR1 into the respective regs */
 5:	mtspr	SPRN_SRR0, r6
 	mtspr	SPRN_SRR1, r7
-	li	r0,0
-	stb	r0,VCPU_CEDED(r4)	/* cancel cede */
 
 fast_guest_return:
+	li	r0,0
+	stb	r0,VCPU_CEDED(r4)	/* cancel cede */
 	mtspr	SPRN_HSRR0,r10
 	mtspr	SPRN_HSRR1,r11
 
@@ -678,6 +679,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206)
 	/* External interrupt, first check for host_ipi. If this is
 	 * set, we know the host wants us out so let's do it now
 	 */
+do_ext_interrupt:
 	lbz	r0, HSTATE_HOST_IPI(r13)
 	cmpwi	r0, 0
 	bne	ext_interrupt_to_host
@@ -690,19 +692,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206)
 	lwzcix	r3, r5, r7
 	rlwinm.	r0, r3, 0, 0xffffff
 	sync
-	bne	1f
-
-	/* Nothing pending in the ICP, check for mediated interrupts
-	 * and bounce it to the guest
-	 */
-	andi.	r0, r11, MSR_EE
-	beq	ext_interrupt_to_host /* shouldn't happen ?? */
-	mfspr	r5, SPRN_LPCR
-	andi.	r0, r5, LPCR_MER
-	bne	bounce_ext_interrupt
-	b	ext_interrupt_to_host /* shouldn't happen ?? */
+	beq	3f		/* if nothing pending in the ICP */
 
-1:	/* We found something in the ICP...
+	/* We found something in the ICP...
 	 *
 	 * If it's not an IPI, stash it in the PACA and return to
 	 * the host, we don't (yet) handle directing real external
@@ -727,18 +719,35 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206)
 	bne-	1f
 
 	/* Allright, looks like an IPI for the guest, we need to set MER */
-	mfspr	r8,SPRN_LPCR
-	ori	r8,r8,LPCR_MER
-	mtspr	SPRN_LPCR,r8
+3:
+	/* Check if any CPU is heading out to the host, if so head out too */
+	ld	r5, HSTATE_KVM_VCORE(r13)
+	lwz	r0, VCORE_ENTRY_EXIT(r5)
+	cmpwi	r0, 0x100
+	bge	ext_interrupt_to_host
+
+	/* See if there is a pending interrupt for the guest */
+	mfspr	r8, SPRN_LPCR
+	ld	r0, VCPU_PENDING_EXC(r9)
+	/* Insert EXTERNAL_LEVEL bit into LPCR at the MER bit position */
+	rldicl.	r0, r0, 64 - BOOK3S_IRQPRIO_EXTERNAL_LEVEL, 63
+	rldimi	r8, r0, LPCR_MER_SH, 63 - LPCR_MER_SH
+	beq	2f
 
 	/* And if the guest EE is set, we can deliver immediately, else
 	 * we return to the guest with MER set
 	 */
 	andi.	r0, r11, MSR_EE
-	bne	bounce_ext_interrupt
-	mr	r4, r9
+	beq	2f
+	mtspr	SPRN_SRR0, r10
+	mtspr	SPRN_SRR1, r11
+	li	r10, BOOK3S_INTERRUPT_EXTERNAL
+	li	r11, (MSR_ME << 1) | 1	/* synthesize MSR_SF | MSR_ME */
+	rotldi	r11, r11, 63
+2:	mr	r4, r9
+	mtspr	SPRN_LPCR, r8
 	b	fast_guest_return
-	
+
 	/* We raced with the host, we need to resend that IPI, bummer */
 1:	li	r0, IPI_PRIORITY
 	stbcix	r0, r5, r6		/* set the IPI */
@@ -1466,15 +1475,6 @@ ignore_hdec:
 	mr	r4,r9
 	b	fast_guest_return
 
-bounce_ext_interrupt:
-	mr	r4,r9
-	mtspr	SPRN_SRR0,r10
-	mtspr	SPRN_SRR1,r11
-	li	r10,BOOK3S_INTERRUPT_EXTERNAL
-	li	r11,(MSR_ME << 1) | 1	/* synthesize MSR_SF | MSR_ME */
-	rotldi	r11,r11,63
-	b	fast_guest_return
-
 _GLOBAL(kvmppc_h_set_dabr)
 	std	r4,VCPU_DABR(r3)
 	/* Work around P7 bug where DABR can get corrupted on mtspr */
@@ -1580,6 +1580,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206)
 	b	.
 
 kvm_end_cede:
+	/* get vcpu pointer */
+	ld	r4, HSTATE_KVM_VCPU(r13)
+
 	/* Woken by external or decrementer interrupt */
 	ld	r1, HSTATE_HOST_R1(r13)
 
@@ -1619,6 +1622,16 @@ kvm_end_cede:
 	li	r0,0
 	stb	r0,HSTATE_NAPPING(r13)
 
+	/* Check the wake reason in SRR1 to see why we got here */
+	mfspr	r3, SPRN_SRR1
+	rlwinm	r3, r3, 44-31, 0x7	/* extract wake reason field */
+	cmpwi	r3, 4			/* was it an external interrupt? */
+	li	r12, BOOK3S_INTERRUPT_EXTERNAL
+	mr	r9, r4
+	ld	r10, VCPU_PC(r9)
+	ld	r11, VCPU_MSR(r9)
+	beq	do_ext_interrupt	/* if so */
+	
 	/* see if any other thread is already exiting */
 	lwz	r0,VCORE_ENTRY_EXIT(r5)
 	cmpwi	r0,0x100
@@ -1638,8 +1651,7 @@ kvm_cede_prodded:
 
 	/* we've ceded but we want to give control to the host */
 kvm_cede_exit:
-	li	r3,H_TOO_HARD
-	blr
+	b	hcall_real_fallback
 
 secondary_too_late:
 	ld	r5,HSTATE_KVM_VCORE(r13)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 7/9] KVM: PPC: Book3S: Add support for ibm,int-on/off RTAS calls
  2012-11-05  3:18 [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation Paul Mackerras
                   ` (5 preceding siblings ...)
  2012-11-05  3:23 ` [RFC PATCH 6/9] KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts Paul Mackerras
@ 2012-11-05  3:23 ` Paul Mackerras
  2012-11-05  3:24 ` [RFC PATCH 8/9] KVM: PPC: Book3S: Facilities to save/restore XICS presentation ctrler state Paul Mackerras
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Paul Mackerras @ 2012-11-05  3:23 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc, Benjamin Herrenschmidt

This adds support for the ibm,int-on and ibm,int-off RTAS calls to the
in-kernel XICS emulation and corrects the handling of the saved
priority by the ibm,set-xive RTAS call.  With this, ibm,int-off sets
the specified interrupt's priority in its saved_priority field and
sets the priority to 0xff (the least favoured value).  ibm,int-on
restores the saved_priority to the priority field, and ibm,set-xive
sets both the priority and the saved_priority to the specified
priority value.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_ppc.h |    2 +
 arch/powerpc/kvm/book3s_rtas.c     |   40 +++++++++++++++++
 arch/powerpc/kvm/book3s_xics.c     |   86 +++++++++++++++++++++++++++++-------
 arch/powerpc/kvm/book3s_xics.h     |    2 +-
 4 files changed, 114 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 1c833ed..a9ab748 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -174,6 +174,8 @@ extern int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu);
 extern void kvmppc_rtas_tokens_free(struct kvm *kvm);
 extern int kvmppc_xics_set_xive(struct kvm *kvm, u32 irq, u32 server, u32 priority);
 extern int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority);
+extern int kvmppc_xics_int_on(struct kvm *kvm, u32 irq);
+extern int kvmppc_xics_int_off(struct kvm *kvm, u32 irq);
 
 /*
  * Cuts out inst bits with ordering according to spec.
diff --git a/arch/powerpc/kvm/book3s_rtas.c b/arch/powerpc/kvm/book3s_rtas.c
index 6a6c1fe..fc1a749 100644
--- a/arch/powerpc/kvm/book3s_rtas.c
+++ b/arch/powerpc/kvm/book3s_rtas.c
@@ -64,6 +64,44 @@ out:
 	args->rets[0] = rc;
 }
 
+static void kvm_rtas_int_off(struct kvm_vcpu *vcpu, struct rtas_args *args)
+{
+	u32 irq;
+	int rc;
+
+	if (args->nargs != 1 || args->nret != 1) {
+		rc = -3;
+		goto out;
+	}
+
+	irq = args->args[0];
+
+	rc = kvmppc_xics_int_off(vcpu->kvm, irq);
+	if (rc)
+		rc = -3;
+out:
+	args->rets[0] = rc;
+}
+
+static void kvm_rtas_int_on(struct kvm_vcpu *vcpu, struct rtas_args *args)
+{
+	u32 irq;
+	int rc;
+
+	if (args->nargs != 1 || args->nret != 1) {
+		rc = -3;
+		goto out;
+	}
+
+	irq = args->args[0];
+
+	rc = kvmppc_xics_int_on(vcpu->kvm, irq);
+	if (rc)
+		rc = -3;
+out:
+	args->rets[0] = rc;
+}
+
 struct rtas_handler {
 	void (*handler)(struct kvm_vcpu *vcpu, struct rtas_args *args);
 	char *name;
@@ -72,6 +110,8 @@ struct rtas_handler {
 static struct rtas_handler rtas_handlers[] = {
 	{ .name = "ibm,set-xive", .handler = kvm_rtas_set_xive },
 	{ .name = "ibm,get-xive", .handler = kvm_rtas_get_xive },
+	{ .name = "ibm,int-off",  .handler = kvm_rtas_int_off },
+	{ .name = "ibm,int-on",   .handler = kvm_rtas_int_on },
 };
 
 struct rtas_token_definition {
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 3858c14..7b29cb8 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -124,6 +124,28 @@ static void ics_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
 	mutex_unlock(&ics->lock);
 }
 
+static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
+		       struct ics_irq_state *state,
+		       u32 server, u32 priority, u32 saved_priority)
+{
+	bool deliver;
+
+	mutex_lock(&ics->lock);
+
+	state->server = server;
+	state->priority = priority;
+	state->saved_priority = saved_priority;
+	deliver = false;
+	if ((state->masked_pending || state->resend) && priority != MASKED) {
+		state->masked_pending = 0;
+		deliver = true;
+	}
+
+	mutex_unlock(&ics->lock);
+
+	return deliver;
+}
+
 int kvmppc_xics_set_xive(struct kvm *kvm, u32 irq, u32 server, u32 priority)
 {
 	struct kvmppc_xics *xics = kvm->arch.xics;
@@ -131,7 +153,6 @@ int kvmppc_xics_set_xive(struct kvm *kvm, u32 irq, u32 server, u32 priority)
 	struct kvmppc_ics *ics;
 	struct ics_irq_state *state;
 	u16 src;
-	bool deliver;
 
 	if (!xics)
 		return -ENODEV;
@@ -145,23 +166,11 @@ int kvmppc_xics_set_xive(struct kvm *kvm, u32 irq, u32 server, u32 priority)
 	if (!icp)
 		return -EINVAL;
 
-	mutex_lock(&ics->lock);
-
 	XICS_DBG("set_xive %#x server %#x prio %#x MP:%d RS:%d\n",
 		 irq, server, priority,
 		 state->masked_pending, state->resend);
 
-	state->server = server;
-	state->priority = priority;
-	deliver = false;
-	if ((state->masked_pending || state->resend) && priority != MASKED) {
-		state->masked_pending = 0;
-		deliver = true;
-	}
-
-	mutex_unlock(&ics->lock);
-
-	if (deliver)
+	if (write_xive(xics, ics, state, server, priority, priority))
 		icp_deliver_irq(xics, icp, irq);
 
 	return 0;
@@ -190,6 +199,53 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority)
 	return 0;
 }
 
+int kvmppc_xics_int_on(struct kvm *kvm, u32 irq)
+{
+	struct kvmppc_xics *xics = kvm->arch.xics;
+	struct kvmppc_icp *icp;
+	struct kvmppc_ics *ics;
+	struct ics_irq_state *state;
+	u16 src;
+
+	if (!xics)
+		return -ENODEV;
+
+	ics = kvmppc_xics_find_ics(xics, irq, &src);
+	if (!ics)
+		return -EINVAL;
+	state = &ics->irq_state[src];
+
+	icp = kvmppc_xics_find_server(kvm, state->server);
+	if (!icp)
+		return -EINVAL;
+
+	if (write_xive(xics, ics, state, state->server, state->saved_priority,
+		       state->saved_priority))
+		icp_deliver_irq(xics, icp, irq);
+
+	return 0;
+}
+
+int kvmppc_xics_int_off(struct kvm *kvm, u32 irq)
+{
+	struct kvmppc_xics *xics = kvm->arch.xics;
+	struct kvmppc_ics *ics;
+	struct ics_irq_state *state;
+	u16 src;
+
+	if (!xics)
+		return -ENODEV;
+
+	ics = kvmppc_xics_find_ics(xics, irq, &src);
+	if (!ics)
+		return -EINVAL;
+	state = &ics->irq_state[src];
+
+	write_xive(xics, ics, state, state->server, MASKED, state->priority);
+
+	return 0;
+}
+
 /* -- ICP routines, including hcalls -- */
 
 static inline bool icp_try_update(struct kvmppc_icp *icp,
@@ -584,7 +640,7 @@ static noinline int kvmppc_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
 	/* Handle reject */
 	if (reject && reject != XICS_IPI)
 		icp_deliver_irq(xics, icp, reject);
-		
+
 	/* Handle resend */
 	if (resend)
 		icp_check_resend(xics, icp);
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
index 951eacb..78f6731 100644
--- a/arch/powerpc/kvm/book3s_xics.h
+++ b/arch/powerpc/kvm/book3s_xics.h
@@ -22,7 +22,7 @@ struct ics_irq_state {
 	u32 number;
 	u32 server;
 	u8  priority;
-	u8  saved_priority; /* currently unused */
+	u8  saved_priority;
 	u8  resend;
 	u8  masked_pending;
 	u8  asserted; /* Only for LSI */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 8/9] KVM: PPC: Book3S: Facilities to save/restore XICS presentation ctrler state
  2012-11-05  3:18 [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation Paul Mackerras
                   ` (6 preceding siblings ...)
  2012-11-05  3:23 ` [RFC PATCH 7/9] KVM: PPC: Book3S: Add support for ibm,int-on/off RTAS calls Paul Mackerras
@ 2012-11-05  3:24 ` Paul Mackerras
  2012-11-05  3:25 ` [RFC PATCH 9/9] KVM: PPC: Book 3S: Facilities to save/restore XICS source controller state Paul Mackerras
  2012-12-15  0:46 ` [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation Alexander Graf
  9 siblings, 0 replies; 17+ messages in thread
From: Paul Mackerras @ 2012-11-05  3:24 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc, Benjamin Herrenschmidt

This adds the ability for userspace to save and restore the state
of the XICS interrupt presentation controllers (ICPs) via the
KVM_GET/SET_ONE_REG interface.  Since there is one ICP per vcpu, we
simply define a new 64-bit register in the ONE_REG space for the ICP
state.  The state includes the CPU priority setting, the pending IPI
priority, and the priority and source number of any pending external
interrupt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_ppc.h  |    2 +
 arch/powerpc/include/uapi/asm/kvm.h |   11 +++++
 arch/powerpc/kvm/book3s.c           |   18 +++++++
 arch/powerpc/kvm/book3s_xics.c      |   90 +++++++++++++++++++++++++++++++++++
 4 files changed, 121 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index a9ab748..d00d7c3 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -176,6 +176,8 @@ extern int kvmppc_xics_set_xive(struct kvm *kvm, u32 irq, u32 server, u32 priori
 extern int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority);
 extern int kvmppc_xics_int_on(struct kvm *kvm, u32 irq);
 extern int kvmppc_xics_int_off(struct kvm *kvm, u32 irq);
+extern u64 kvmppc_xics_get_icp(struct kvm_vcpu *vcpu);
+extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
 
 /*
  * Cuts out inst bits with ordering according to spec.
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 55c1907..9fc1d11 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -425,4 +425,15 @@ struct kvm_book3e_206_tlb_params {
 #define KVM_REG_PPC_VPA_SLB	(KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x83)
 #define KVM_REG_PPC_VPA_DTL	(KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x84)
 
+/* Per-vcpu interrupt controller state */
+#define KVM_REG_PPC_ICP_STATE	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x85)
+#define  KVM_REG_PPC_ICP_CPPR_SHIFT	56	/* current proc priority */
+#define  KVM_REG_PPC_ICP_CPPR_MASK	0xff
+#define  KVM_REG_PPC_ICP_XISR_SHIFT	32	/* interrupt status field */
+#define  KVM_REG_PPC_ICP_XISR_MASK	0xffffff
+#define  KVM_REG_PPC_ICP_MFRR_SHIFT	24	/* pending IPI priority */
+#define  KVM_REG_PPC_ICP_MFRR_MASK	0xff
+#define  KVM_REG_PPC_ICP_PPRI_SHIFT	16	/* pending irq priority */
+#define  KVM_REG_PPC_ICP_PPRI_MASK	0xff
+
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index c5a4478..e19232f 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -529,6 +529,15 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
 			val = get_reg_val(reg->id, vcpu->arch.vscr.u[3]);
 			break;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_PPC_BOOK3S_64
+		case KVM_REG_PPC_ICP_STATE:
+			if (!vcpu->arch.icp) {
+				r = -ENXIO;
+				break;
+			}
+			val = get_reg_val(reg->id, kvmppc_xics_get_icp(vcpu));
+			break;
+#endif
 		default:
 			r = -EINVAL;
 			break;
@@ -591,6 +600,15 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
 			vcpu->arch.vscr.u[3] = set_reg_val(reg->id, val);
 			break;
 #endif /* CONFIG_ALTIVEC */
+#ifdef CONFIG_PPC_BOOK3S_64
+		case KVM_REG_PPC_ICP_STATE:
+			if (!vcpu->arch.icp) {
+				r = -ENXIO;
+				break;
+			}
+			r = kvmppc_xics_set_icp(vcpu, set_reg_val(reg->id, val));
+			break;
+#endif
 		default:
 			r = -EINVAL;
 			break;
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 7b29cb8..8314cd9 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -976,6 +976,96 @@ void kvmppc_xics_free(struct kvm *kvm)
 	kfree(xics);
 }
 
+u64 kvmppc_xics_get_icp(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_icp *icp = vcpu->arch.icp;
+	union kvmppc_icp_state state;
+
+	if (!icp)
+		return 0;
+	state = icp->state;
+	return ((u64)state.cppr << KVM_REG_PPC_ICP_CPPR_SHIFT) |
+		((u64)state.xisr << KVM_REG_PPC_ICP_XISR_SHIFT) |
+		((u64)state.mfrr << KVM_REG_PPC_ICP_MFRR_SHIFT) |
+		((u64)state.pending_pri << KVM_REG_PPC_ICP_PPRI_SHIFT);
+}
+
+int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval)
+{
+	struct kvmppc_icp *icp = vcpu->arch.icp;
+	struct kvmppc_xics *xics = vcpu->kvm->arch.xics;
+	union kvmppc_icp_state old_state, new_state;
+	struct kvmppc_ics *ics;
+	u8 cppr, mfrr, pending_pri;
+	u32 xisr;
+	u16 src;
+	bool resend;
+
+	if (!icp || !xics)
+		return -ENOENT;
+
+	cppr = icpval >> KVM_REG_PPC_ICP_CPPR_SHIFT;
+	xisr = (icpval >> KVM_REG_PPC_ICP_XISR_SHIFT) &
+		KVM_REG_PPC_ICP_XISR_MASK;
+	mfrr = icpval >> KVM_REG_PPC_ICP_MFRR_SHIFT;
+	pending_pri = icpval >> KVM_REG_PPC_ICP_PPRI_SHIFT;
+
+	/* Require the new state to be internally consistent */
+	if (xisr == 0) {
+		if (pending_pri != 0xff)
+			return -EINVAL;
+	} else if (xisr == XICS_IPI) {
+		if (pending_pri != mfrr || pending_pri >= cppr)
+			return -EINVAL;
+	} else {
+		if (pending_pri >= mfrr || pending_pri >= cppr)
+			return -EINVAL;
+		ics = kvmppc_xics_find_ics(xics, xisr, &src);
+		if (!ics)
+			return -EINVAL;
+	}
+
+	new_state.raw = 0;
+	new_state.cppr = cppr;
+	new_state.xisr = xisr;
+	new_state.mfrr = mfrr;
+	new_state.pending_pri = pending_pri;
+
+	/*
+	 * Deassert the CPU interrupt request.
+	 * icp_try_update will reassert it if necessary.
+	 */
+	kvmppc_book3s_dequeue_irqprio(icp->vcpu,
+				      BOOK3S_INTERRUPT_EXTERNAL_LEVEL);
+
+	/*
+	 * Note that if we displace an interrupt from old_state.xisr,
+	 * we don't mark it as rejected.  We expect userspace to set
+	 * the state of the interrupt sources to be consistent with
+	 * the ICP states (either before or afterwards, which doesn't
+	 * matter).  We do handle resends due to CPPR becoming less
+	 * favoured because that is necessary to end up with a
+	 * consistent state in the situation where userspace restores
+	 * the ICS states before the ICP states.
+	 */
+	do {
+		old_state = ACCESS_ONCE(icp->state);
+
+		if (new_state.mfrr <= old_state.mfrr) {
+			resend = false;
+			new_state.need_resend = old_state.need_resend;
+		} else {
+			resend = old_state.need_resend;
+			new_state.need_resend = 0;
+		}
+	} while (!icp_try_update(icp, old_state, new_state, false));
+
+	if (resend)
+		icp_check_resend(xics, icp);
+
+	return 0;
+}
+
 /* -- ioctls -- */
 
 static int kvm_vm_ioctl_create_icp(struct kvm *kvm,
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 9/9] KVM: PPC: Book 3S: Facilities to save/restore XICS source controller state
  2012-11-05  3:18 [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation Paul Mackerras
                   ` (7 preceding siblings ...)
  2012-11-05  3:24 ` [RFC PATCH 8/9] KVM: PPC: Book3S: Facilities to save/restore XICS presentation ctrler state Paul Mackerras
@ 2012-11-05  3:25 ` Paul Mackerras
  2012-12-15  0:46 ` [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation Alexander Graf
  9 siblings, 0 replies; 17+ messages in thread
From: Paul Mackerras @ 2012-11-05  3:25 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc, Benjamin Herrenschmidt

This adds the ability for userspace to save and restore the state of
the XICS interrupt source controllers (ICS) via two new VM ioctls,
KVM_IRQCHIP_GET_SOURCES and KVM_IRQCHIP_SET_SOURCES.  Both take an
argument struct that gives the starting interrupt source number, the
number of interrupt sources to be processed, and a pointer to an array
with space for 64 bits per source where the state of the interrupt
sources are to be stored or loaded.  The interrupt sources are required
to all be within one ICS.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 Documentation/virtual/kvm/api.txt |   33 ++++++++++++
 arch/powerpc/kvm/book3s_xics.c    |  108 +++++++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/powerpc.c        |    4 +-
 include/uapi/linux/kvm.h          |   17 ++++++
 4 files changed, 161 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index fbe018e..2ba53a1 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2107,6 +2107,39 @@ struct.  When creating an ICS, the argument struct further indicates
 the BUID (Bus Unit ID) and number of interrupt sources for the new
 ICS.
 
+4.80 KVM_IRQCHIP_GET_SOURCES
+
+Capability: KVM_CAP_SPAPR_XICS
+Architectures: ppc
+Type: vm ioctl
+Parameters: struct kvm_irq_sources
+Returns: 0 on success, -1 on error
+
+Copies configuration and status information about a range of interrupt
+sources into a user-supplied buffer.  The argument struct gives the
+starting interrupt source number and the number of interrupt sources.
+All of the interrupt sources must be within the range of a single
+interrupt source controller (ICS).  The user buffer is an array of
+64-bit quantities, one per interrupt source, with (from the least-
+significant bit) 32 bits of interrupt server number, 8 bits of
+priority, and 1 bit each for a level-sensitive indicator, a masked
+indicator, and a pending indicator.
+
+4.81 KVM_IRQCHIP_SET_SOURCES
+
+Capability: KVM_CAP_SPAPR_XICS
+Architectures: ppc
+Type: vm ioctl
+Parameters: struct kvm_irq_sources
+Returns: 0 on success, -1 on error
+
+Sets the configuration and status for a range of interrupt sources
+from information supplied in a user-supplied buffer.  The argument
+struct gives the starting interrupt source number and the number of
+interrupt sources.  All of the interrupt sources must be within the
+range of a single interrupt source controller (ICS).  The user buffer
+is formatted as for KVM_IRQCHIP_GET_SOURCES.
+
 
 5. The kvm_run structure
 ------------------------
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 8314cd9..8071c7e 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -1066,6 +1066,88 @@ int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval)
 	return 0;
 }
 
+static int kvm_xics_get_sources(struct kvm *kvm, struct kvm_irq_sources *srcs)
+{
+	int ret = 0;
+	struct kvmppc_xics *xics = kvm->arch.xics;
+	struct kvmppc_ics *ics;
+	struct ics_irq_state *irqp;
+	u64 __user *ubufp;
+	u16 idx;
+	u64 val;
+	long int i;
+
+	ics = kvmppc_xics_find_ics(xics, srcs->irq, &idx);
+	if (!ics || idx + srcs->nr_irqs > ics->nr_irqs)
+		return -ENOENT;
+
+	irqp = &ics->irq_state[idx];
+	ubufp = srcs->irqbuf;
+	mutex_lock(&ics->lock);
+	for (i = 0; i < srcs->nr_irqs; ++i, ++irqp, ++ubufp) {
+		val = irqp->server;
+		if (irqp->priority == MASKED && irqp->saved_priority != MASKED)
+			val |= KVM_IRQ_MASKED | ((u64)irqp->saved_priority <<
+						 KVM_IRQ_PRIORITY_SHIFT);
+		else
+			val |= ((u64)irqp->priority << KVM_IRQ_PRIORITY_SHIFT);
+		if (irqp->asserted)
+			val |= KVM_IRQ_LEVEL_SENSITIVE | KVM_IRQ_PENDING;
+		else if (irqp->masked_pending || irqp->resend)
+			val |= KVM_IRQ_PENDING;
+		ret = -EFAULT;
+		if (__put_user(val, ubufp))
+			break;
+		ret = 0;
+	}
+	mutex_unlock(&ics->lock);
+	return ret;
+}
+
+static int kvm_xics_set_sources(struct kvm *kvm, struct kvm_irq_sources *srcs)
+{
+	int ret = 0;
+	struct kvmppc_xics *xics = kvm->arch.xics;
+	struct kvmppc_ics *ics;
+	struct ics_irq_state *irqp;
+	u64 __user *ubufp;
+	u16 idx;
+	u64 val;
+	long int i;
+
+	ics = kvmppc_xics_find_ics(xics, srcs->irq, &idx);
+	if (!ics || idx + srcs->nr_irqs > ics->nr_irqs)
+		return -ENOENT;
+
+	irqp = &ics->irq_state[idx];
+	ubufp = srcs->irqbuf;
+	for (i = 0; i < srcs->nr_irqs; ++i, ++irqp, ++ubufp) {
+		ret = -EFAULT;
+		if (__get_user(val, ubufp))
+			break;
+		ret = 0;
+
+		mutex_lock(&ics->lock);
+		irqp->server = val & KVM_IRQ_SERVER_MASK;
+		irqp->saved_priority = val >> KVM_IRQ_PRIORITY_SHIFT;
+		if (val & KVM_IRQ_MASKED)
+			irqp->priority = MASKED;
+		else
+			irqp->priority = irqp->saved_priority;
+		irqp->resend = 0;
+		irqp->masked_pending = 0;
+		irqp->asserted = 0;
+		if ((val & KVM_IRQ_PENDING) && (val & KVM_IRQ_LEVEL_SENSITIVE))
+			irqp->asserted = 1;
+		mutex_unlock(&ics->lock);
+
+		if (val & KVM_IRQ_PENDING)
+			icp_deliver_irq(xics, NULL, irqp->number);
+	}
+
+	return ret;
+}
+
 /* -- ioctls -- */
 
 static int kvm_vm_ioctl_create_icp(struct kvm *kvm,
@@ -1200,6 +1282,32 @@ int kvmppc_xics_ioctl(struct kvm *kvm, unsigned ioctl, unsigned long arg)
 		break;
 	}
 
+	case KVM_IRQCHIP_GET_SOURCES: {
+		struct kvm_irq_sources sources;
+
+		rc = -EFAULT;
+		if (copy_from_user(&sources, argp, sizeof(sources)))
+			break;
+		if (!access_ok(VERIFY_WRITE, sources.irqbuf,
+			       sources.nr_irqs * sizeof(u64)))
+			break;
+		rc = kvm_xics_get_sources(kvm, &sources);
+		break;
+	}
+
+	case KVM_IRQCHIP_SET_SOURCES: {
+		struct kvm_irq_sources sources;
+
+		rc = -EFAULT;
+		if (copy_from_user(&sources, argp, sizeof(sources)))
+			break;
+		if (!access_ok(VERIFY_READ, sources.irqbuf,
+			       sources.nr_irqs * sizeof(u64)))
+			break;
+		rc = kvm_xics_set_sources(kvm, &sources);
+		break;
+	}
+
 	default:
 		rc = -ENOTTY;
 		break;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 90b5b5c..0d443d4 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -955,7 +955,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = kvm_vm_ioctl_rtas_define_token(kvm, argp);
 		break;
 	}
-	case KVM_IRQ_LINE: {
+	case KVM_IRQ_LINE:
+	case KVM_IRQCHIP_GET_SOURCES:
+	case KVM_IRQCHIP_SET_SOURCES: {
 		struct kvm *kvm = filp->private_data;
 
 		r = -ENOTTY;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 8674d32..edcf78d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -779,6 +779,21 @@ struct kvm_msi {
 	__u8  pad[16];
 };
 
+struct kvm_irq_sources {
+	__u32 irq;
+	__u32 nr_irqs;
+	__u64 __user *irqbuf;
+};
+
+/* irqbuf entries are laid out like this: */
+#define KVM_IRQ_SERVER_SHIFT	0
+#define KVM_IRQ_SERVER_MASK	0xffffffffULL
+#define KVM_IRQ_PRIORITY_SHIFT	32
+#define KVM_IRQ_PRIORITY_MASK	0xff
+#define KVM_IRQ_LEVEL_SENSITIVE	(1ULL << 40)
+#define KVM_IRQ_MASKED		(1ULL << 41)
+#define KVM_IRQ_PENDING		(1ULL << 42)
+
 /*
  * ioctls for VM fds
  */
@@ -867,6 +882,8 @@ struct kvm_s390_ucas_mapping {
 #ifdef __KVM_HAVE_IRQCHIP_ARGS
 #define KVM_CREATE_IRQCHIP_ARGS   _IOW(KVMIO,  0xab, struct kvm_irqchip_args)
 #endif
+#define KVM_IRQCHIP_GET_SOURCES	  _IOW(KVMIO,  0xac, struct kvm_irq_sources)
+#define KVM_IRQCHIP_SET_SOURCES	  _IOW(KVMIO,  0xad, struct kvm_irq_sources)
 
 /*
  * ioctls for vcpu fds
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation
  2012-11-05  3:18 [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation Paul Mackerras
                   ` (8 preceding siblings ...)
  2012-11-05  3:25 ` [RFC PATCH 9/9] KVM: PPC: Book 3S: Facilities to save/restore XICS source controller state Paul Mackerras
@ 2012-12-15  0:46 ` Alexander Graf
  2012-12-15  2:06   ` Benjamin Herrenschmidt
  9 siblings, 1 reply; 17+ messages in thread
From: Alexander Graf @ 2012-12-15  0:46 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm, kvm-ppc, Benjamin Herrenschmidt


On 05.11.2012, at 04:18, Paul Mackerras wrote:

> This series implements in-kernel emulation of the XICS interrupt
> controller specified in the PAPR specification and used by pseries
> guests.  Since the XICS is organized a bit differently to the
> interrupt controllers used on x86 machines, I have defined some new
> ioctls for setting up the XICS and for saving and restoring its
> state.  It may be possible to use some of the currently x86-specific
> ioctls instead.

Is this one already following the new world order that we discussed during KVM Forum? :)


Alex

> 
> Paul.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation
  2012-12-15  0:46 ` [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation Alexander Graf
@ 2012-12-15  2:06   ` Benjamin Herrenschmidt
  2013-01-10  5:09     ` Paul Mackerras
  0 siblings, 1 reply; 17+ messages in thread
From: Benjamin Herrenschmidt @ 2012-12-15  2:06 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Paul Mackerras, kvm, kvm-ppc

On Sat, 2012-12-15 at 01:46 +0100, Alexander Graf wrote:
> On 05.11.2012, at 04:18, Paul Mackerras wrote:
> 
> > This series implements in-kernel emulation of the XICS interrupt
> > controller specified in the PAPR specification and used by pseries
> > guests.  Since the XICS is organized a bit differently to the
> > interrupt controllers used on x86 machines, I have defined some new
> > ioctls for setting up the XICS and for saving and restoring its
> > state.  It may be possible to use some of the currently x86-specific
> > ioctls instead.
> 
> Is this one already following the new world order that we discussed during KVM Forum? :)

The "new world order" .... (sorry, looks like nobody took notes and
people expect me to do a write up from memory now ... :-)

Well, mostly we agreed that the x86 stuff wasn't going to work for us.

So basically what we discussed boils down to:

 - Move the existing "generic" KVM ioctl to create the kernel APIC to
x86 since it's no as-is useful for other archs who, among other things,
might need to pass argument based on the machine type (type of interrupt
subsystem for example, etc...)

 - Once that's done, well, instanciating interrupt controller objects
becomes pretty much an arch specific matter. We could try to keep some
ioctls somewhat common though it's not *that* useful because the various
types & arguments are going to be fairly arch specific, so goes for
configuration. Examples of what could be kept common are:

   * Create irq chip, takes at least a "type" argument, possibly a few
more type-specific args (or a union, but then let's keep space in there
since we can't change the size of the struct later as it would impact
the ioctl number afaik).

  * Assigning the address of an irq chip when it can change (see ARM
patches)

 - The existing business with irqfd etc... is mostly ok, except the
interfaces messing around with MSIs (see virtio-pci use of kvm functions
directly). The assignment of an irq number for an MSI must be a hook,
probably a PCI controller hook, so x86 can get it done via its existing
kernel interfaces and sane architectures can keep the assignment in qemu
where it belongs.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation
  2012-12-15  2:06   ` Benjamin Herrenschmidt
@ 2013-01-10  5:09     ` Paul Mackerras
  2013-01-10 11:42       ` Alexander Graf
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Mackerras @ 2013-01-10  5:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Alexander Graf, kvm, kvm-ppc

On Sat, Dec 15, 2012 at 01:06:13PM +1100, Benjamin Herrenschmidt wrote:
> On Sat, 2012-12-15 at 01:46 +0100, Alexander Graf wrote:
> > On 05.11.2012, at 04:18, Paul Mackerras wrote:
> > 
> > > This series implements in-kernel emulation of the XICS interrupt
> > > controller specified in the PAPR specification and used by pseries
> > > guests.  Since the XICS is organized a bit differently to the
> > > interrupt controllers used on x86 machines, I have defined some new
> > > ioctls for setting up the XICS and for saving and restoring its
> > > state.  It may be possible to use some of the currently x86-specific
> > > ioctls instead.
> > 
> > Is this one already following the new world order that we discussed during KVM Forum? :)
> 
> The "new world order" .... (sorry, looks like nobody took notes and
> people expect me to do a write up from memory now ... :-)
> 
> Well, mostly we agreed that the x86 stuff wasn't going to work for us.
> 
> So basically what we discussed boils down to:
> 
>  - Move the existing "generic" KVM ioctl to create the kernel APIC to
> x86 since it's no as-is useful for other archs who, among other things,
> might need to pass argument based on the machine type (type of interrupt
> subsystem for example, etc...)

Assuming you're talking about KVM_CREATE_IRQCHIP, it is already
handled entirely in arch code (arch/x86/kvm/x86.c and
arch/ia64/kvm/kvm-ia64.c), along with KVM_GET_IRQCHIP and
KVM_SET_IRQCHIP.

>  - Once that's done, well, instanciating interrupt controller objects
> becomes pretty much an arch specific matter. We could try to keep some
> ioctls somewhat common though it's not *that* useful because the various
> types & arguments are going to be fairly arch specific, so goes for
> configuration. Examples of what could be kept common are:
> 
>    * Create irq chip, takes at least a "type" argument, possibly a few
> more type-specific args (or a union, but then let's keep space in there
> since we can't change the size of the struct later as it would impact
> the ioctl number afaik).

The existing KVM_CREATE_IRQCHIP is defined as _IO(...) which means
that it doesn't read or write memory, but there is still the 3rd
argument to the ioctl, which would give us 64 bits to indicate the
type of the top-level IRQ controller (XICS, MPIC, ...), but not much
more.

It sounds like the agreement at the meeting was more along the lines
of the KVM_CREATE_IRQCHIP_ARGS ioctl (as in the patches I posted)
which can be called multiple times to instantiate pieces of the
interrupt framework, rather than having a KVM_CREATE_IRQCHIP that gets
called once early on to say that there we are using in-kernel
interrupt controller emulation, followed by other calls to configure
the various parts of the framework.  Is that accurate?

>   * Assigning the address of an irq chip when it can change (see ARM
> patches)
> 
>  - The existing business with irqfd etc... is mostly ok, except the
> interfaces messing around with MSIs (see virtio-pci use of kvm functions
> directly). The assignment of an irq number for an MSI must be a hook,
> probably a PCI controller hook, so x86 can get it done via its existing
> kernel interfaces and sane architectures can keep the assignment in qemu
> where it belongs.

So, if I've understood you correctly about what was agreed, the set of
ioctls that is implemented in the patches I posted is in line with
what was agreed, isn't it?  If not, where does it differ?  (To recap,
I had KVM_CREATE_IRQCHIP_ARGS, KVM_IRQCHIP_GET_SOURCES and
KVM_IRQCHIP_SET_SOURCES, plus a one-reg interface to get/set the
vcpu-specific state.)

Paul.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation
  2013-01-10  5:09     ` Paul Mackerras
@ 2013-01-10 11:42       ` Alexander Graf
  2013-01-10 19:06         ` Scott Wood
  2013-01-21  0:08         ` Paul Mackerras
  0 siblings, 2 replies; 17+ messages in thread
From: Alexander Graf @ 2013-01-10 11:42 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Benjamin Herrenschmidt, kvm@vger.kernel.org list, kvm-ppc,
	<kvmarm@lists.cs.columbia.edu>


On 10.01.2013, at 06:09, Paul Mackerras wrote:

> On Sat, Dec 15, 2012 at 01:06:13PM +1100, Benjamin Herrenschmidt wrote:
>> On Sat, 2012-12-15 at 01:46 +0100, Alexander Graf wrote:
>>> On 05.11.2012, at 04:18, Paul Mackerras wrote:
>>> 
>>>> This series implements in-kernel emulation of the XICS interrupt
>>>> controller specified in the PAPR specification and used by pseries
>>>> guests.  Since the XICS is organized a bit differently to the
>>>> interrupt controllers used on x86 machines, I have defined some new
>>>> ioctls for setting up the XICS and for saving and restoring its
>>>> state.  It may be possible to use some of the currently x86-specific
>>>> ioctls instead.
>>> 
>>> Is this one already following the new world order that we discussed during KVM Forum? :)
>> 
>> The "new world order" .... (sorry, looks like nobody took notes and
>> people expect me to do a write up from memory now ... :-)
>> 
>> Well, mostly we agreed that the x86 stuff wasn't going to work for us.
>> 
>> So basically what we discussed boils down to:
>> 
>> - Move the existing "generic" KVM ioctl to create the kernel APIC to
>> x86 since it's no as-is useful for other archs who, among other things,
>> might need to pass argument based on the machine type (type of interrupt
>> subsystem for example, etc...)
> 
> Assuming you're talking about KVM_CREATE_IRQCHIP, it is already
> handled entirely in arch code (arch/x86/kvm/x86.c and
> arch/ia64/kvm/kvm-ia64.c), along with KVM_GET_IRQCHIP and
> KVM_SET_IRQCHIP.

This part is about QEMU.

> 
>> - Once that's done, well, instanciating interrupt controller objects
>> becomes pretty much an arch specific matter. We could try to keep some
>> ioctls somewhat common though it's not *that* useful because the various
>> types & arguments are going to be fairly arch specific, so goes for
>> configuration. Examples of what could be kept common are:
>> 
>>   * Create irq chip, takes at least a "type" argument, possibly a few
>> more type-specific args (or a union, but then let's keep space in there
>> since we can't change the size of the struct later as it would impact
>> the ioctl number afaik).
> 
> The existing KVM_CREATE_IRQCHIP is defined as _IO(...) which means
> that it doesn't read or write memory, but there is still the 3rd
> argument to the ioctl, which would give us 64 bits to indicate the
> type of the top-level IRQ controller (XICS, MPIC, ...), but not much
> more.
> 
> It sounds like the agreement at the meeting was more along the lines
> of the KVM_CREATE_IRQCHIP_ARGS ioctl (as in the patches I posted)
> which can be called multiple times to instantiate pieces of the
> interrupt framework, rather than having a KVM_CREATE_IRQCHIP that gets
> called once early on to say that there we are using in-kernel
> interrupt controller emulation, followed by other calls to configure
> the various parts of the framework.  Is that accurate?

KVM_CREATE_IRQCHIP should get a parameter indicating the type of IRQCHIP (XICS / MPIC / APIC / VGIC / ...). The parameter-less version defaults to an arch specific default (APIC for x86, VGIC for arm, error on PPC).

I'm not 100% sure yet whether we want to support models with multiple IRQCHIPs. If so, we need to return a device token from the KVM_CREATE_IRQCHIP ioctl. Maybe it makes more sense to model specific cases like this as separate type though with specific, explicit subdevice ids. Not sure.

Other instantiation attributes we don't know that early on should be settable between the time frame of vm creation and first execution. An example for this are device addresses. Check out the threads "KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS" and "KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl" for more information on this particular bit.


Alex

> 
>>  * Assigning the address of an irq chip when it can change (see ARM
>> patches)
>> 
>> - The existing business with irqfd etc... is mostly ok, except the
>> interfaces messing around with MSIs (see virtio-pci use of kvm functions
>> directly). The assignment of an irq number for an MSI must be a hook,
>> probably a PCI controller hook, so x86 can get it done via its existing
>> kernel interfaces and sane architectures can keep the assignment in qemu
>> where it belongs.
> 
> So, if I've understood you correctly about what was agreed, the set of
> ioctls that is implemented in the patches I posted is in line with
> what was agreed, isn't it?  If not, where does it differ?  (To recap,
> I had KVM_CREATE_IRQCHIP_ARGS, KVM_IRQCHIP_GET_SOURCES and
> KVM_IRQCHIP_SET_SOURCES, plus a one-reg interface to get/set the
> vcpu-specific state.)
> 
> Paul.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation
  2013-01-10 11:42       ` Alexander Graf
@ 2013-01-10 19:06         ` Scott Wood
  2013-01-21  0:08         ` Paul Mackerras
  1 sibling, 0 replies; 17+ messages in thread
From: Scott Wood @ 2013-01-10 19:06 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Paul Mackerras, Benjamin Herrenschmidt, kvm@vger.kernel.org list,
	kvm-ppc, <kvmarm@lists.cs.columbia.edu>

On 01/10/2013 05:42:16 AM, Alexander Graf wrote:
> Other instantiation attributes we don't know that early on should be  
> settable between the time frame of vm creation and first execution.  
> An example for this are device addresses. Check out the threads "KVM:  
> ARM: Rename KVM_SET_DEVICE_ADDRESS" and "KVM: ARM: Introduce  
> KVM_SET_DEVICE_ADDRESS ioctl" for more information on this particular  
> bit.

FWIW, on Freescale chips the MPIC base address can move at runtime if  
the guest writes to CCSRBAR (though QEMU doesn't currently implement  
this).

-Scott

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation
  2013-01-10 11:42       ` Alexander Graf
  2013-01-10 19:06         ` Scott Wood
@ 2013-01-21  0:08         ` Paul Mackerras
  2013-01-21  0:22           ` Alexander Graf
  1 sibling, 1 reply; 17+ messages in thread
From: Paul Mackerras @ 2013-01-21  0:08 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Benjamin Herrenschmidt, kvm@vger.kernel.org list, kvm-ppc,
	<kvmarm@lists.cs.columbia.edu>

On Thu, Jan 10, 2013 at 12:42:16PM +0100, Alexander Graf wrote:
> 
> On 10.01.2013, at 06:09, Paul Mackerras wrote:
> 
> > On Sat, Dec 15, 2012 at 01:06:13PM +1100, Benjamin Herrenschmidt wrote:
[snip]
> >> - Move the existing "generic" KVM ioctl to create the kernel APIC to
> >> x86 since it's no as-is useful for other archs who, among other things,
> >> might need to pass argument based on the machine type (type of interrupt
> >> subsystem for example, etc...)
> > 
> > Assuming you're talking about KVM_CREATE_IRQCHIP, it is already
> > handled entirely in arch code (arch/x86/kvm/x86.c and
> > arch/ia64/kvm/kvm-ia64.c), along with KVM_GET_IRQCHIP and
> > KVM_SET_IRQCHIP.
> 
> This part is about QEMU.

OK, that makes sense, but it presumably can't be done until the new
kernel interface exists, right?

> > 
> >> - Once that's done, well, instanciating interrupt controller objects
> >> becomes pretty much an arch specific matter. We could try to keep some
> >> ioctls somewhat common though it's not *that* useful because the various
> >> types & arguments are going to be fairly arch specific, so goes for
> >> configuration. Examples of what could be kept common are:
> >> 
> >>   * Create irq chip, takes at least a "type" argument, possibly a few
> >> more type-specific args (or a union, but then let's keep space in there
> >> since we can't change the size of the struct later as it would impact
> >> the ioctl number afaik).
> > 
> > The existing KVM_CREATE_IRQCHIP is defined as _IO(...) which means
> > that it doesn't read or write memory, but there is still the 3rd
> > argument to the ioctl, which would give us 64 bits to indicate the
> > type of the top-level IRQ controller (XICS, MPIC, ...), but not much
> > more.
> > 
> > It sounds like the agreement at the meeting was more along the lines
> > of the KVM_CREATE_IRQCHIP_ARGS ioctl (as in the patches I posted)
> > which can be called multiple times to instantiate pieces of the
> > interrupt framework, rather than having a KVM_CREATE_IRQCHIP that gets
> > called once early on to say that there we are using in-kernel
> > interrupt controller emulation, followed by other calls to configure
> > the various parts of the framework.  Is that accurate?
> 
> KVM_CREATE_IRQCHIP should get a parameter indicating the type of IRQCHIP (XICS / MPIC / APIC / VGIC / ...). The parameter-less version defaults to an arch specific default (APIC for x86, VGIC for arm, error on PPC).

So, I propose KVM_CREATE_IRQCHIP_ARGS as the version of the ioctl that
indicates the type of irqchip.  The parameterless version already
gives an error on PPC.

> I'm not 100% sure yet whether we want to support models with multiple IRQCHIPs. If so, we need to return a device token from the KVM_CREATE_IRQCHIP ioctl. Maybe it makes more sense to model specific cases like this as separate type though with specific, explicit subdevice ids. Not sure.

I think it depends on what you mean by IRQCHIP.  If you mean the
overall interrupt controller architecture, then no it doesn't make
sense to have more than one.  The overall architecture might well
contain multiple identifiable pieces which could be considered
IRQCHIPs.  This is the case on x86 already, where they have two
i8259-style PICs and one or more IOAPICs (and presumably emulated
LAPICs as well).  It's the case for the PAPR-style XICS controller,
where we have a presentation controller (ICP) per CPU (like the x86
LAPIC), plus one or more source controllers (ICS) (like the x86
IOAPIC).

Which meaning of IRQCHIP do you prefer?

My series of patches uses KVM_CREATE_IRQCHIP_ARGS to indicate that the
overall architecture is XICS (by creating the ICPs) and to create the
source controllers (ICS), so it gets called multiple times.  It would
be possible to have KVM_CREATE_IRQCHIP_ARGS called just once with a
parameter saying "XICS", and then create the source controllers
implicitly as a side effect of the first KVM_IRQ_SET_SOURCES ioctl
that refers to it.

Would that be preferable?

> Other instantiation attributes we don't know that early on should be settable between the time frame of vm creation and first execution. An example for this are device addresses. Check out the threads "KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS" and "KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl" for more information on this particular bit.

That ioctl looks like it should be generic, and would be useful for
MPIC (OpenPIC) emulation.  I don't need it for PAPR since the
interrupt controllers are accessed through hcalls, not MMIO.

Paul.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation
  2013-01-21  0:08         ` Paul Mackerras
@ 2013-01-21  0:22           ` Alexander Graf
  0 siblings, 0 replies; 17+ messages in thread
From: Alexander Graf @ 2013-01-21  0:22 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Benjamin Herrenschmidt, kvm@vger.kernel.org list, kvm-ppc,
	<kvmarm@lists.cs.columbia.edu>, Scott Wood


On 21.01.2013, at 01:08, Paul Mackerras wrote:

> On Thu, Jan 10, 2013 at 12:42:16PM +0100, Alexander Graf wrote:
>> 
>> On 10.01.2013, at 06:09, Paul Mackerras wrote:
>> 
>>> On Sat, Dec 15, 2012 at 01:06:13PM +1100, Benjamin Herrenschmidt wrote:
> [snip]
>>>> - Move the existing "generic" KVM ioctl to create the kernel APIC to
>>>> x86 since it's no as-is useful for other archs who, among other things,
>>>> might need to pass argument based on the machine type (type of interrupt
>>>> subsystem for example, etc...)
>>> 
>>> Assuming you're talking about KVM_CREATE_IRQCHIP, it is already
>>> handled entirely in arch code (arch/x86/kvm/x86.c and
>>> arch/ia64/kvm/kvm-ia64.c), along with KVM_GET_IRQCHIP and
>>> KVM_SET_IRQCHIP.
>> 
>> This part is about QEMU.
> 
> OK, that makes sense, but it presumably can't be done until the new
> kernel interface exists, right?

Well, moving an x86 specific call into x86 specific code should be possible without kernel changes, no? :)

I'm not sure how useful it is without having a new interface though.

> 
>>> 
>>>> - Once that's done, well, instanciating interrupt controller objects
>>>> becomes pretty much an arch specific matter. We could try to keep some
>>>> ioctls somewhat common though it's not *that* useful because the various
>>>> types & arguments are going to be fairly arch specific, so goes for
>>>> configuration. Examples of what could be kept common are:
>>>> 
>>>>  * Create irq chip, takes at least a "type" argument, possibly a few
>>>> more type-specific args (or a union, but then let's keep space in there
>>>> since we can't change the size of the struct later as it would impact
>>>> the ioctl number afaik).
>>> 
>>> The existing KVM_CREATE_IRQCHIP is defined as _IO(...) which means
>>> that it doesn't read or write memory, but there is still the 3rd
>>> argument to the ioctl, which would give us 64 bits to indicate the
>>> type of the top-level IRQ controller (XICS, MPIC, ...), but not much
>>> more.
>>> 
>>> It sounds like the agreement at the meeting was more along the lines
>>> of the KVM_CREATE_IRQCHIP_ARGS ioctl (as in the patches I posted)
>>> which can be called multiple times to instantiate pieces of the
>>> interrupt framework, rather than having a KVM_CREATE_IRQCHIP that gets
>>> called once early on to say that there we are using in-kernel
>>> interrupt controller emulation, followed by other calls to configure
>>> the various parts of the framework.  Is that accurate?
>> 
>> KVM_CREATE_IRQCHIP should get a parameter indicating the type of IRQCHIP (XICS / MPIC / APIC / VGIC / ...). The parameter-less version defaults to an arch specific default (APIC for x86, VGIC for arm, error on PPC).
> 
> So, I propose KVM_CREATE_IRQCHIP_ARGS as the version of the ioctl that
> indicates the type of irqchip.  The parameterless version already
> gives an error on PPC.

Works for me. I needs to get the type of irqchip as argument, maybe some more (like MPIC version, though that _could_ be handled through a different type too)

> 
>> I'm not 100% sure yet whether we want to support models with multiple IRQCHIPs. If so, we need to return a device token from the KVM_CREATE_IRQCHIP ioctl. Maybe it makes more sense to model specific cases like this as separate type though with specific, explicit subdevice ids. Not sure.
> 
> I think it depends on what you mean by IRQCHIP.  If you mean the
> overall interrupt controller architecture, then no it doesn't make
> sense to have more than one.  The overall architecture might well
> contain multiple identifiable pieces which could be considered
> IRQCHIPs.  This is the case on x86 already, where they have two
> i8259-style PICs and one or more IOAPICs (and presumably emulated
> LAPICs as well).  It's the case for the PAPR-style XICS controller,
> where we have a presentation controller (ICP) per CPU (like the x86
> LAPIC), plus one or more source controllers (ICS) (like the x86
> IOAPIC).
> 
> Which meaning of IRQCHIP do you prefer?

Yeah, I'm always getting confused on all this too. We have 2 things we talk about here

  * overall architecture
  * individual devices hanging off of that architecture

We need some link from "architecture" to "devices" because we need to be able to set MMIO offsets for the MPIC for example. On x86 and ARM today, this is implicit. The device IDs simply exist when we create a certain type of "system wide IRQCHIP".

I'm not sure if that's a good idea or not. At least it needs to be properly documented and the linkage between "system wide type is X" -> "devices 1,2,3 exist" has to be clear.

> 
> My series of patches uses KVM_CREATE_IRQCHIP_ARGS to indicate that the
> overall architecture is XICS (by creating the ICPs) and to create the
> source controllers (ICS), so it gets called multiple times.  It would
> be possible to have KVM_CREATE_IRQCHIP_ARGS called just once with a
> parameter saying "XICS", and then create the source controllers
> implicitly as a side effect of the first KVM_IRQ_SET_SOURCES ioctl
> that refers to it.
> 
> Would that be preferable?

I like the latter better, yes. In fact, instead of KVM_IRQ_SET_SOURCES you could use something like CREATE_DEVICE and create a source controller device, no? I wonder if that'd be any improvement over a SET_SOURCES ioctl. Hrm.

>> Other instantiation attributes we don't know that early on should be settable between the time frame of vm creation and first execution. An example for this are device addresses. Check out the threads "KVM: ARM: Rename KVM_SET_DEVICE_ADDRESS" and "KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl" for more information on this particular bit.
> 
> That ioctl looks like it should be generic, and would be useful for
> MPIC (OpenPIC) emulation.  I don't need it for PAPR since the
> interrupt controllers are accessed through hcalls, not MMIO.

Yes. And it will be. We need it for in-kernel MPIC emulation too.


Alex

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-01-21  0:22 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-05  3:18 [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation Paul Mackerras
2012-11-05  3:19 ` [RFC PATCH 1/9] KVM: PPC: Remove unused argument to kvmppc_core_dequeue_external Paul Mackerras
2012-11-05  3:20 ` [RFC PATCH 2/9] KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls Paul Mackerras
2012-11-05  3:21 ` [RFC PATCH 3/9] KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller Paul Mackerras
2012-11-05  3:21 ` [RFC PATCH 4/9] KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVM Paul Mackerras
2012-11-05  3:22 ` [RFC PATCH 5/9] KVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulation Paul Mackerras
2012-11-05  3:23 ` [RFC PATCH 6/9] KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts Paul Mackerras
2012-11-05  3:23 ` [RFC PATCH 7/9] KVM: PPC: Book3S: Add support for ibm,int-on/off RTAS calls Paul Mackerras
2012-11-05  3:24 ` [RFC PATCH 8/9] KVM: PPC: Book3S: Facilities to save/restore XICS presentation ctrler state Paul Mackerras
2012-11-05  3:25 ` [RFC PATCH 9/9] KVM: PPC: Book 3S: Facilities to save/restore XICS source controller state Paul Mackerras
2012-12-15  0:46 ` [RFC PATCH 0/9] KVM: PPC: In-kernel PAPR interrupt controller emulation Alexander Graf
2012-12-15  2:06   ` Benjamin Herrenschmidt
2013-01-10  5:09     ` Paul Mackerras
2013-01-10 11:42       ` Alexander Graf
2013-01-10 19:06         ` Scott Wood
2013-01-21  0:08         ` Paul Mackerras
2013-01-21  0:22           ` Alexander Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).