linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 00/13] powerpc: kvm: Enable in-kernel acceleration for VFIO
@ 2014-07-15  9:25 Alexey Kardashevskiy
  2014-07-15  9:25 ` [PATCH v1 01/13] KVM: PPC: Account TCE pages in locked_vm Alexey Kardashevskiy
                   ` (12 more replies)
  0 siblings, 13 replies; 15+ messages in thread
From: Alexey Kardashevskiy @ 2014-07-15  9:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Paul Mackerras, Gavin Shan

This enables in-kernel acceleration of TCE hypercalls (H_PUT_TCE,
H_PUT_TCE_INDIRECT, H_STUFF_TCE). This implements acceleration
for both real and virtual modes.

This was made on top of both:
[PATCH v1 00/16] powernv: vfio: Add Dynamic DMA windows (DDW)
[PATCH v1 0/7] powerpc/iommu: kvm: Enable MultiTCE support


Alexey Kardashevskiy (13):
  KVM: PPC: Account TCE pages in locked_vm
  KVM: PPC: Rework kvmppc_spapr_tce_table to support variable page size
  KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently
  KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number
  KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_64 capability number
  KVM: PPC: Add @offset to kvmppc_spapr_tce_table
  KVM: PPC: Add support for 64bit TCE windows
  KVM: PPC: Add hugepage support for IOMMU in-kernel handling
  KVM: PPC: Add page_shift support for in-kernel H_PUT_TCE/etc handlers
  KVM: PPC: Fix kvmppc_gpa_to_hva_and_get() to return host physical
    address
  KVM: PPC: Associate IOMMU group with guest copy of TCE table
  KVM: PPC: vfio kvm device: support spapr tce
  KVM: PPC: Add support for IOMMU in-kernel handling

 Documentation/virtual/kvm/api.txt          |  51 ++++
 Documentation/virtual/kvm/devices/vfio.txt |  20 +-
 arch/powerpc/include/asm/kvm_host.h        |  41 ++-
 arch/powerpc/include/asm/kvm_ppc.h         |   9 +-
 arch/powerpc/include/uapi/asm/kvm.h        |   9 +
 arch/powerpc/kernel/iommu.c                |   6 +-
 arch/powerpc/kvm/Kconfig                   |   2 +
 arch/powerpc/kvm/Makefile                  |   3 +
 arch/powerpc/kvm/book3s_64_vio.c           | 389 +++++++++++++++++++++++++++--
 arch/powerpc/kvm/book3s_64_vio_hv.c        | 177 ++++++++++++-
 arch/powerpc/kvm/book3s_hv.c               |   3 +
 arch/powerpc/kvm/powerpc.c                 |  25 +-
 include/uapi/linux/kvm.h                   |  12 +
 virt/kvm/vfio.c                            |  69 +++++
 14 files changed, 775 insertions(+), 41 deletions(-)

-- 
2.0.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v1 01/13] KVM: PPC: Account TCE pages in locked_vm
  2014-07-15  9:25 [PATCH v1 00/13] powerpc: kvm: Enable in-kernel acceleration for VFIO Alexey Kardashevskiy
@ 2014-07-15  9:25 ` Alexey Kardashevskiy
  2014-07-15  9:29   ` Alexey Kardashevskiy
  2014-07-15  9:25 ` [PATCH v1 02/13] KVM: PPC: Rework kvmppc_spapr_tce_table to support variable page size Alexey Kardashevskiy
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 15+ messages in thread
From: Alexey Kardashevskiy @ 2014-07-15  9:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Paul Mackerras, Gavin Shan

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio.c | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 2137836..4ca33f1 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -73,18 +73,48 @@ static long kvmppc_stt_npages(unsigned long window_size)
 		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
+/*
+ * Checks ulimit in order not to let the user space to pin all
+ * available memory for TCE tables.
+ */
+static long kvmppc_account_memlimit(long npages)
+{
+	unsigned long ret = 0, locked, lock_limit;
+
+	if (!current->mm)
+		return -ESRCH; /* process exited */
+
+	down_write(&current->mm->mmap_sem);
+	locked = current->mm->locked_vm + npages;
+	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+	if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
+		pr_warn("RLIMIT_MEMLOCK (%ld) exceeded\n",
+				rlimit(RLIMIT_MEMLOCK));
+		ret = -ENOMEM;
+	} else {
+		current->mm->locked_vm += npages;
+	}
+	up_write(&current->mm->mmap_sem);
+
+	return ret;
+}
+
 static void release_spapr_tce_table(struct kvmppc_spapr_tce_table *stt)
 {
 	struct kvm *kvm = stt->kvm;
 	int i;
+	long npages = kvmppc_stt_npages(stt->window_size);
 
 	mutex_lock(&kvm->lock);
 	list_del(&stt->list);
-	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
+	for (i = 0; i < npages; i++)
 		__free_page(stt->pages[i]);
+
 	kfree(stt);
 	mutex_unlock(&kvm->lock);
 
+	kvmppc_account_memlimit(-(npages + 1));
+
 	kvm_put_kvm(kvm);
 }
 
@@ -140,6 +170,9 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 	}
 
 	npages = kvmppc_stt_npages(args->window_size);
+	ret = kvmppc_account_memlimit(npages + 1);
+	if (ret)
+		goto fail;
 
 	stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
 		      GFP_KERNEL);
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 02/13] KVM: PPC: Rework kvmppc_spapr_tce_table to support variable page size
  2014-07-15  9:25 [PATCH v1 00/13] powerpc: kvm: Enable in-kernel acceleration for VFIO Alexey Kardashevskiy
  2014-07-15  9:25 ` [PATCH v1 01/13] KVM: PPC: Account TCE pages in locked_vm Alexey Kardashevskiy
@ 2014-07-15  9:25 ` Alexey Kardashevskiy
  2014-07-15  9:25 ` [PATCH v1 03/13] KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently Alexey Kardashevskiy
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Alexey Kardashevskiy @ 2014-07-15  9:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Paul Mackerras, Gavin Shan

At the moment the kvmppc_spapr_tce_table struct can only describe
4GB windows which is not enough for big DMA windows.

This replaces window_size (in bytes, 4GB max) with page_shift (32bit)
and size (64bit, in pages).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/kvm_host.h |  3 ++-
 arch/powerpc/kvm/book3s_64_vio.c    | 17 +++++++++--------
 arch/powerpc/kvm/book3s_64_vio_hv.c |  3 +--
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index c37fee2..d3a154c 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -180,7 +180,8 @@ struct kvmppc_spapr_tce_table {
 	struct list_head list;
 	struct kvm *kvm;
 	u64 liobn;
-	u32 window_size;
+	u32 page_shift;
+	u64 size;		/* in pages */
 	struct page *pages[0];
 };
 
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 4ca33f1..f2c8e4d 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -67,10 +67,9 @@ void kvmppc_spapr_tce_free(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvmppc_spapr_tce_free);
 
-static long kvmppc_stt_npages(unsigned long window_size)
+static long kvmppc_stt_npages(unsigned long size)
 {
-	return ALIGN((window_size >> IOMMU_PAGE_SHIFT_4K)
-		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
+	return ALIGN(size * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
 /*
@@ -103,7 +102,7 @@ static void release_spapr_tce_table(struct kvmppc_spapr_tce_table *stt)
 {
 	struct kvm *kvm = stt->kvm;
 	int i;
-	long npages = kvmppc_stt_npages(stt->window_size);
+	long npages = kvmppc_stt_npages(stt->size);
 
 	mutex_lock(&kvm->lock);
 	list_del(&stt->list);
@@ -123,7 +122,7 @@ static int kvm_spapr_tce_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	struct kvmppc_spapr_tce_table *stt = vma->vm_file->private_data;
 	struct page *page;
 
-	if (vmf->pgoff >= kvmppc_stt_npages(stt->window_size))
+	if (vmf->pgoff >= kvmppc_stt_npages(stt->size))
 		return VM_FAULT_SIGBUS;
 
 	page = stt->pages[vmf->pgoff];
@@ -159,7 +158,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 				   struct kvm_create_spapr_tce *args)
 {
 	struct kvmppc_spapr_tce_table *stt = NULL;
-	long npages;
+	long npages, size;
 	int ret = -ENOMEM;
 	int i;
 
@@ -169,7 +168,8 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 			return -EBUSY;
 	}
 
-	npages = kvmppc_stt_npages(args->window_size);
+	size = args->window_size >> IOMMU_PAGE_SHIFT_4K;
+	npages = kvmppc_stt_npages(size);
 	ret = kvmppc_account_memlimit(npages + 1);
 	if (ret)
 		goto fail;
@@ -180,7 +180,8 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 		goto fail;
 
 	stt->liobn = args->liobn;
-	stt->window_size = args->window_size;
+	stt->page_shift = IOMMU_PAGE_SHIFT_4K;
+	stt->size = size;
 	stt->kvm = kvm;
 
 	for (i = 0; i < npages; i++) {
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 79a39bb..fadfacb 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -73,9 +73,8 @@ long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
 {
 	unsigned long mask = (1 << IOMMU_PAGE_SHIFT_4K) - 1;
 	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
-	unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
 
-	if ((ioba & mask) || (size + npages <= idx))
+	if ((ioba & mask) || (stt->size + npages <= idx))
 		return H_PARAMETER;
 
 	return H_SUCCESS;
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 03/13] KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently
  2014-07-15  9:25 [PATCH v1 00/13] powerpc: kvm: Enable in-kernel acceleration for VFIO Alexey Kardashevskiy
  2014-07-15  9:25 ` [PATCH v1 01/13] KVM: PPC: Account TCE pages in locked_vm Alexey Kardashevskiy
  2014-07-15  9:25 ` [PATCH v1 02/13] KVM: PPC: Rework kvmppc_spapr_tce_table to support variable page size Alexey Kardashevskiy
@ 2014-07-15  9:25 ` Alexey Kardashevskiy
  2014-07-15  9:25 ` [PATCH v1 04/13] KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number Alexey Kardashevskiy
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Alexey Kardashevskiy @ 2014-07-15  9:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Paul Mackerras, Gavin Shan

It does not make much sense to have KVM in book3s-64 and
not to have IOMMU bits for PCI pass through support as it costs little
and allows VFIO to function on book3s KVM.

Having IOMMU_API always enabled makes it unnecessary to have a lot of
"#ifdef IOMMU_API" in arch/powerpc/kvm/book3s_64_vio*. With those
ifdef's we could have only user space emulated devices accelerated
(but not VFIO) which do not seem to be very useful.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index d7a16ac6..301fa6b 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -63,6 +63,7 @@ config KVM_BOOK3S_64
 	select KVM_BOOK3S_64_HANDLER
 	select KVM
 	select KVM_BOOK3S_PR_POSSIBLE if !KVM_BOOK3S_HV_POSSIBLE
+	select SPAPR_TCE_IOMMU if IOMMU_SUPPORT
 	---help---
 	  Support running unmodified book3s_64 and book3s_32 guest kernels
 	  in virtual machines on book3s_64 host processors.
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 04/13] KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number
  2014-07-15  9:25 [PATCH v1 00/13] powerpc: kvm: Enable in-kernel acceleration for VFIO Alexey Kardashevskiy
                   ` (2 preceding siblings ...)
  2014-07-15  9:25 ` [PATCH v1 03/13] KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently Alexey Kardashevskiy
@ 2014-07-15  9:25 ` Alexey Kardashevskiy
  2014-07-15  9:25 ` [PATCH v1 05/13] KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_64 " Alexey Kardashevskiy
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Alexey Kardashevskiy @ 2014-07-15  9:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Paul Mackerras, Gavin Shan

This adds a capability number for in-kernel support for VFIO on
SPAPR platform.

The capability will tell the user space whether in-kernel handlers of
H_PUT_TCE can handle VFIO-targeted requests or not. If not, the user space
must not attempt allocating a TCE table in the host kernel via
the KVM_CREATE_SPAPR_TCE KVM ioctl because in that case TCE requests
will not be passed to the user space which is desired action in
the situation like that.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/uapi/linux/kvm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index e11d8f1..3048c86 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -758,6 +758,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_VM_ATTRIBUTES 101
 #define KVM_CAP_ARM_PSCI_0_2 102
 #define KVM_CAP_PPC_FIXUP_HCALL 103
+#define KVM_CAP_SPAPR_TCE_VFIO 104
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 05/13] KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_64 capability number
  2014-07-15  9:25 [PATCH v1 00/13] powerpc: kvm: Enable in-kernel acceleration for VFIO Alexey Kardashevskiy
                   ` (3 preceding siblings ...)
  2014-07-15  9:25 ` [PATCH v1 04/13] KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number Alexey Kardashevskiy
@ 2014-07-15  9:25 ` Alexey Kardashevskiy
  2014-07-15  9:25 ` [PATCH v1 06/13] KVM: PPC: Add @offset to kvmppc_spapr_tce_table Alexey Kardashevskiy
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Alexey Kardashevskiy @ 2014-07-15  9:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Paul Mackerras, Gavin Shan

This adds a capability number for 64-bit TCE tables support.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/uapi/linux/kvm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3048c86..65c2689 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -759,6 +759,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_ARM_PSCI_0_2 102
 #define KVM_CAP_PPC_FIXUP_HCALL 103
 #define KVM_CAP_SPAPR_TCE_VFIO 104
+#define KVM_CAP_SPAPR_TCE_64 105
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 06/13] KVM: PPC: Add @offset to kvmppc_spapr_tce_table
  2014-07-15  9:25 [PATCH v1 00/13] powerpc: kvm: Enable in-kernel acceleration for VFIO Alexey Kardashevskiy
                   ` (4 preceding siblings ...)
  2014-07-15  9:25 ` [PATCH v1 05/13] KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_64 " Alexey Kardashevskiy
@ 2014-07-15  9:25 ` Alexey Kardashevskiy
  2014-07-15  9:25 ` [PATCH v1 07/13] KVM: PPC: Add support for 64bit TCE windows Alexey Kardashevskiy
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Alexey Kardashevskiy @ 2014-07-15  9:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Paul Mackerras, Gavin Shan

This enables guest visible TCE tables to start from non-zero offset
on a bus. This will be used for VFIO support.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/kvm_host.h | 1 +
 arch/powerpc/kvm/book3s_64_vio_hv.c | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index d3a154c..ed96b09 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -181,6 +181,7 @@ struct kvmppc_spapr_tce_table {
 	struct kvm *kvm;
 	u64 liobn;
 	u32 page_shift;
+	u64 offset;		/* in pages */
 	u64 size;		/* in pages */
 	struct page *pages[0];
 };
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index fadfacb..a3a6597 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -74,7 +74,8 @@ long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
 	unsigned long mask = (1 << IOMMU_PAGE_SHIFT_4K) - 1;
 	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
 
-	if ((ioba & mask) || (stt->size + npages <= idx))
+	if ((ioba & mask) || (idx < stt->offset) ||
+			(stt->offset + stt->size + npages <= idx))
 		return H_PARAMETER;
 
 	return H_SUCCESS;
@@ -146,6 +147,7 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
 	struct page *page;
 	u64 *tbl;
 
+	idx -= stt->offset;
 	page = stt->pages[idx / TCES_PER_PAGE];
 	tbl = kvmppc_page_address(page);
 
@@ -351,6 +353,7 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		return ret;
 
 	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
+	idx -= stt->offset;
 	page = stt->pages[idx / TCES_PER_PAGE];
 	tbl = (u64 *)page_address(page);
 
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 07/13] KVM: PPC: Add support for 64bit TCE windows
  2014-07-15  9:25 [PATCH v1 00/13] powerpc: kvm: Enable in-kernel acceleration for VFIO Alexey Kardashevskiy
                   ` (5 preceding siblings ...)
  2014-07-15  9:25 ` [PATCH v1 06/13] KVM: PPC: Add @offset to kvmppc_spapr_tce_table Alexey Kardashevskiy
@ 2014-07-15  9:25 ` Alexey Kardashevskiy
  2014-07-15  9:25 ` [PATCH v1 08/13] KVM: PPC: Add hugepage support for IOMMU in-kernel handling Alexey Kardashevskiy
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Alexey Kardashevskiy @ 2014-07-15  9:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Paul Mackerras, Gavin Shan

The existing KVM_CREATE_SPAPR_TCE only supports 32bit windows which is not
enough for directly mapped windows as the guest can get more than 4GB.

This adds KVM_CREATE_SPAPR_TCE_64 ioctl and advertises it
via KVM_CAP_SPAPR_TCE_64 capability.

Since 64bit windows are to support Dynamic DMA windows (DDW), let's add
@bus_offset and @page_shift which are also required by DDW.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 Documentation/virtual/kvm/api.txt   | 51 +++++++++++++++++++++++++++++++++++++
 arch/powerpc/include/asm/kvm_ppc.h  |  2 +-
 arch/powerpc/include/uapi/asm/kvm.h |  9 +++++++
 arch/powerpc/kvm/book3s_64_vio.c    | 10 +++++---
 arch/powerpc/kvm/powerpc.c          | 25 +++++++++++++++++-
 include/uapi/linux/kvm.h            |  2 ++
 6 files changed, 94 insertions(+), 5 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index e1c72bf..b4695ea 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2520,6 +2520,57 @@ an implementation for these despite the in kernel acceleration.
 This capability is always enabled.
 
 
+4.88 KVM_CREATE_SPAPR_TCE_64
+
+Capability: KVM_CAP_SPAPR_TCE_64
+Architectures: powerpc
+Type: vm ioctl
+Parameters: struct kvm_create_spapr_tce_64 (in)
+Returns: file descriptor for manipulating the created TCE table
+
+This is an extension for KVM_CAP_SPAPR_TCE which only supports 32bit
+windows.
+
+This creates a virtual TCE (translation control entry) table, which
+is an IOMMU for PAPR-style virtual I/O.  It is used to translate
+logical addresses used in virtual I/O into guest physical addresses,
+and provides a scatter/gather capability for PAPR virtual I/O.
+
+/* for KVM_CAP_SPAPR_TCE_64 */
+struct kvm_create_spapr_tce_64 {
+	__u64 liobn;
+	__u32 page_shift;
+	__u64 offset;	/* in pages */
+	__u64 size; 	/* in pages */
+	__u32 flags;
+};
+
+
+!!! FIXME !!!
+
+
+The liobn field gives the logical IO bus number for which to create a
+TCE table. The window_size field specifies the size of the DMA window
+which this TCE table will translate - the table will contain one 64
+bit TCE entry for every IOMMU page. The bus_offset field tells where
+this window is mapped on the IO bus. The page_shift field tells the size
+of the pages in this window (for example, 10, 16, 24 for 4K, 64K, 16MB
+page sizes respectively). The flags field is not used at the moment
+but provides the room for extensions.
+
+When the guest issues an H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE hcall
+on a liobn for which a TCE table has been created using this ioctl(),
+the kernel will handle it in real or virtual mode, updating the TCE table.
+If liobn has not been registered with this ioctl, H_PUT_TCE/etc calls
+will cause a vm exit and must be handled by userspace.
+
+The return value is a file descriptor which can be passed to mmap(2)
+to map the created TCE table into userspace.  This lets userspace read
+the entries written by kernel-handled H_PUT_TCE calls, and also lets
+userspace update the TCE table directly which is useful in some
+circumstances.
+
+
 5. The kvm_run structure
 ------------------------
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index b84ed80..e0a68ef 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -128,7 +128,7 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 extern int kvmppc_spapr_tce_init(struct kvm_vcpu *vcpu);
 extern void kvmppc_spapr_tce_free(struct kvm_vcpu *vcpu);
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
-				struct kvm_create_spapr_tce *args);
+				struct kvm_create_spapr_tce_64 *args);
 extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(
 		struct kvm *kvm, unsigned long liobn);
 extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 2bc4a94..4452f6e 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -333,6 +333,15 @@ struct kvm_create_spapr_tce {
 	__u32 window_size;
 };
 
+/* for KVM_CAP_SPAPR_TCE_64 */
+struct kvm_create_spapr_tce_64 {
+	__u64 liobn;
+	__u32 page_shift;
+	__u64 offset;	/* in pages */
+	__u64 size;	/* in pages */
+	__u32 flags;
+};
+
 /* for KVM_ALLOCATE_RMA */
 struct kvm_allocate_rma {
 	__u64 rma_size;
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index f2c8e4d..2c6ab20 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -155,20 +155,23 @@ static const struct file_operations kvm_spapr_tce_fops = {
 };
 
 long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
-				   struct kvm_create_spapr_tce *args)
+				   struct kvm_create_spapr_tce_64 *args)
 {
 	struct kvmppc_spapr_tce_table *stt = NULL;
 	long npages, size;
 	int ret = -ENOMEM;
 	int i;
 
+	if (!args->size)
+		return -EINVAL;
+
 	/* Check this LIOBN hasn't been previously allocated */
 	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
 		if (stt->liobn == args->liobn)
 			return -EBUSY;
 	}
 
-	size = args->window_size >> IOMMU_PAGE_SHIFT_4K;
+	size = args->size;
 	npages = kvmppc_stt_npages(size);
 	ret = kvmppc_account_memlimit(npages + 1);
 	if (ret)
@@ -180,7 +183,8 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 		goto fail;
 
 	stt->liobn = args->liobn;
-	stt->page_shift = IOMMU_PAGE_SHIFT_4K;
+	stt->page_shift = args->page_shift;
+	stt->offset = args->offset;
 	stt->size = size;
 	stt->kvm = kvm;
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index e66793b..4d674ed 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -33,6 +33,7 @@
 #include <asm/tlbflush.h>
 #include <asm/cputhreads.h>
 #include <asm/irqflags.h>
+#include <asm/iommu.h>
 #include "timing.h"
 #include "irq.h"
 #include "../mm/mmu_decl.h"
@@ -414,6 +415,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 
 #ifdef CONFIG_PPC_BOOK3S_64
 	case KVM_CAP_SPAPR_TCE:
+	case KVM_CAP_SPAPR_TCE_64:
 	case KVM_CAP_PPC_ALLOC_HTAB:
 	case KVM_CAP_PPC_RTAS:
 	case KVM_CAP_PPC_FIXUP_HCALL:
@@ -1122,13 +1124,34 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		break;
 	}
 #ifdef CONFIG_PPC_BOOK3S_64
+	case KVM_CREATE_SPAPR_TCE_64: {
+		struct kvm_create_spapr_tce_64 create_tce_64;
+
+		r = -EFAULT;
+		if (copy_from_user(&create_tce_64, argp, sizeof(create_tce_64)))
+			goto out;
+		if (create_tce_64.flags) {
+			r = -EINVAL;
+			goto out;
+		}
+		r = kvm_vm_ioctl_create_spapr_tce(kvm, &create_tce_64);
+		goto out;
+	}
 	case KVM_CREATE_SPAPR_TCE: {
 		struct kvm_create_spapr_tce create_tce;
+		struct kvm_create_spapr_tce_64 create_tce_64;
 
 		r = -EFAULT;
 		if (copy_from_user(&create_tce, argp, sizeof(create_tce)))
 			goto out;
-		r = kvm_vm_ioctl_create_spapr_tce(kvm, &create_tce);
+
+		create_tce_64.liobn = create_tce.liobn;
+		create_tce_64.page_shift = IOMMU_PAGE_SHIFT_4K;
+		create_tce_64.offset = 0;
+		create_tce_64.size = create_tce.window_size >>
+				IOMMU_PAGE_SHIFT_4K;
+		create_tce_64.flags = 0;
+		r = kvm_vm_ioctl_create_spapr_tce(kvm, &create_tce_64);
 		goto out;
 	}
 	case KVM_PPC_GET_SMMU_INFO: {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 65c2689..3beb542 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1033,6 +1033,8 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_PPC_ALLOC_HTAB */
 #define KVM_PPC_ALLOCATE_HTAB	  _IOWR(KVMIO, 0xa7, __u32)
 #define KVM_CREATE_SPAPR_TCE	  _IOW(KVMIO,  0xa8, struct kvm_create_spapr_tce)
+#define KVM_CREATE_SPAPR_TCE_64	  _IOW(KVMIO,  0xa8, \
+				       struct kvm_create_spapr_tce_64)
 /* Available with KVM_CAP_RMA */
 #define KVM_ALLOCATE_RMA	  _IOR(KVMIO,  0xa9, struct kvm_allocate_rma)
 /* Available with KVM_CAP_PPC_HTAB_FD */
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 08/13] KVM: PPC: Add hugepage support for IOMMU in-kernel handling
  2014-07-15  9:25 [PATCH v1 00/13] powerpc: kvm: Enable in-kernel acceleration for VFIO Alexey Kardashevskiy
                   ` (6 preceding siblings ...)
  2014-07-15  9:25 ` [PATCH v1 07/13] KVM: PPC: Add support for 64bit TCE windows Alexey Kardashevskiy
@ 2014-07-15  9:25 ` Alexey Kardashevskiy
  2014-07-15  9:25 ` [PATCH v1 09/13] KVM: PPC: Add page_shift support for in-kernel H_PUT_TCE/etc handlers Alexey Kardashevskiy
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Alexey Kardashevskiy @ 2014-07-15  9:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Paul Mackerras, Gavin Shan

This adds special support for huge pages (16MB) in real mode.
The reference counting cannot be easily done for such pages in real
mode (when MMU is off) so this adds a hash table of huge pages.
It is populated in virtual mode and get_page is called just once
per a huge page. Real mode handlers check if the requested page is
in the hash table, then no reference counting is done, otherwise
an exit to virtual mode happens. The hash table is released at KVM
exit.

This defines kvmppc_spapr_iommu_hugepage hash table entry and adds it
to kvm_arch.

This adds kvmppc_iommu_hugepages_init() and
kvmppc_iommu_hugepages_cleanup() helpers. The latter puts cached pages.

This fixes iommu_clear_tces_and_put_pages() not to put huge pages as this
is to be done by kvmppc_iommu_hugepages_cleanup().

This implements a real mode kvmppc_rm_hugepage_gpa_to_hpa() helper to
find a hash entry and a virtual mode kvmppc_iommu_hugepage_try_add()
helper to add one.

At the moment the fastest card available for tests uses up to 9 huge
pages so walking through this hash table does not cost much.
However this can change and we may want to optimize this.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

---

Changes:
v11:
* moved hashtables from IOMMU to KVM

2013/07/12:
* removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
for KVM_BOOK3S_64

2013/06/27:
* list of huge pages replaces with hashtable for better performance
* spinlock removed from real mode and only protects insertion of new
huge [ages descriptors into the hashtable

2013/06/05:
* fixed compile error when CONFIG_IOMMU_API=n

2013/05/20:
* the real mode handler now searches for a huge page by gpa (used to be pte)
* the virtual mode handler prints warning if it is called twice for the same
huge page as the real mode handler is expected to fail just once - when a huge
page is not in the list yet.
* the huge page is refcounted twice - when added to the hugepage list and
when used in the virtual mode hcall handler (can be optimized but it will
make the patch less nice).
---
 arch/powerpc/include/asm/kvm_host.h |  34 +++++++++++
 arch/powerpc/include/asm/kvm_ppc.h  |   2 +
 arch/powerpc/kernel/iommu.c         |   6 +-
 arch/powerpc/kvm/book3s_64_vio.c    | 116 +++++++++++++++++++++++++++++++++++-
 arch/powerpc/kvm/book3s_64_vio_hv.c |  25 ++++++++
 arch/powerpc/kvm/book3s_hv.c        |   3 +
 6 files changed, 183 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index ed96b09..8a3b465 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -31,6 +31,7 @@
 #include <linux/list.h>
 #include <linux/atomic.h>
 #include <linux/tracepoint.h>
+#include <linux/hashtable.h>
 #include <asm/kvm_asm.h>
 #include <asm/processor.h>
 #include <asm/page.h>
@@ -191,6 +192,36 @@ struct kvm_rma_info {
 	unsigned long base_pfn;
 };
 
+/*
+ * The KVM guest can be backed with 16MB pages.
+ * In this case, we cannot do page counting from the real mode
+ * as the compound pages are used - they are linked in a list
+ * with pointers as virtual addresses which are inaccessible
+ * in real mode.
+ *
+ * To address the issue, here is what we do:
+ *
+ * 1) add a hashtable per KVM, each entry is kvmppc_spapr_iommu_hugepage
+ * and describes gpa-to-hpa mapping;
+ * 2) in real mode, if gpa is in the hash table, use the cached hpa;
+ * otherwise pass the request to virtual mode;
+ * 3) in virtual mode, check if gpa is in the hash table and use cached
+ * hpa; otherwise translate gpa to hpa and reference the page.
+ *
+ * hpa of every used hugepage will be cached in the hash table
+ * and referenced just once. Pages are released at KVM exit.
+ */
+#define KVMPPC_SPAPR_HUGEPAGE_HASH(gpa)	hash_32(gpa >> 24, 32)
+#define KVMPPC_SPAPR_HUGEPAGE_BUCKETS   64
+
+struct kvmppc_spapr_iommu_hugepage {
+	struct hlist_node hash_node;
+	unsigned long gpa;	/* Guest physical address */
+	unsigned long hpa;	/* Host physical address */
+	struct page *page;	/* page struct of the very first subpage */
+	unsigned long size;	/* Huge page size (always 16MB at the moment) */
+};
+
 /* XICS components, defined in book3s_xics.c */
 struct kvmppc_xics;
 struct kvmppc_icp;
@@ -266,6 +297,9 @@ struct kvm_arch {
 #ifdef CONFIG_PPC_BOOK3S_64
 	struct list_head spapr_tce_tables;
 	struct list_head rtas_tokens;
+	DECLARE_HASHTABLE(hugepages_hash_tab,
+			ilog2(KVMPPC_SPAPR_HUGEPAGE_BUCKETS));
+	spinlock_t hugepages_write_lock;
 #endif
 #ifdef CONFIG_KVM_MPIC
 	struct openpic *mpic;
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index e0a68ef..86f5015 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -127,6 +127,8 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);
 
 extern int kvmppc_spapr_tce_init(struct kvm_vcpu *vcpu);
 extern void kvmppc_spapr_tce_free(struct kvm_vcpu *vcpu);
+extern void kvmppc_iommu_hugepages_init(struct kvm_arch *ka);
+extern void kvmppc_iommu_hugepages_cleanup(struct kvm_arch *ka);
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 				struct kvm_create_spapr_tce_64 *args);
 extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 259ddb5..bf45d5f 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1018,7 +1018,8 @@ int iommu_clear_tces_and_put_pages(struct iommu_table *tbl,
 			if (!pg) {
 				ret = -EAGAIN;
 			} else if (PageCompound(pg)) {
-				ret = -EAGAIN;
+				/* Hugepages will be released at KVM exit */
+				ret = 0;
 			} else {
 				if (oldtce & TCE_PCI_WRITE)
 					SetPageDirty(pg);
@@ -1030,6 +1031,9 @@ int iommu_clear_tces_and_put_pages(struct iommu_table *tbl,
 
 			if (!pg) {
 				ret = -EAGAIN;
+			} else if (PageCompound(pg)) {
+				/* Hugepages will be released at KVM exit */
+				ret = 0;
 			} else {
 				if (oldtce & TCE_PCI_WRITE)
 					SetPageDirty(pg);
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 2c6ab20..2648d88 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -21,6 +21,7 @@
 #include <linux/string.h>
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
+
 #include <linux/highmem.h>
 #include <linux/gfp.h>
 #include <linux/slab.h>
@@ -67,6 +68,104 @@ void kvmppc_spapr_tce_free(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvmppc_spapr_tce_free);
 
+/*
+ * API to support huge pages in real mode
+ */
+void kvmppc_iommu_hugepages_init(struct kvm_arch *ka)
+{
+	spin_lock_init(&ka->hugepages_write_lock);
+	hash_init(ka->hugepages_hash_tab);
+}
+EXPORT_SYMBOL_GPL(kvmppc_iommu_hugepages_init);
+
+void kvmppc_iommu_hugepages_cleanup(struct kvm_arch *ka)
+{
+	int bkt;
+	struct kvmppc_spapr_iommu_hugepage *hp;
+	struct hlist_node *tmp;
+
+	spin_lock(&ka->hugepages_write_lock);
+	hash_for_each_safe(ka->hugepages_hash_tab, bkt, tmp, hp, hash_node) {
+		pr_debug("Release HP #%u gpa=%lx hpa=%lx size=%ld\n",
+				bkt, hp->gpa, hp->hpa, hp->size);
+		hlist_del_rcu(&hp->hash_node);
+
+		put_page(hp->page);
+		kfree(hp);
+	}
+	spin_unlock(&ka->hugepages_write_lock);
+}
+EXPORT_SYMBOL_GPL(kvmppc_iommu_hugepages_cleanup);
+
+/* Returns true if a page with GPA is already in the hash table */
+static bool kvmppc_iommu_hugepage_lookup_gpa(struct kvm_arch *ka,
+		unsigned long gpa)
+{
+	struct kvmppc_spapr_iommu_hugepage *hp;
+	const unsigned key = KVMPPC_SPAPR_HUGEPAGE_HASH(gpa);
+
+	hash_for_each_possible_rcu(ka->hugepages_hash_tab, hp,
+			hash_node, key) {
+		if ((hp->gpa <= gpa) && (gpa < hp->gpa + hp->size))
+			return true;
+	}
+
+	return false;
+}
+
+/* Returns true if a page with GPA has been added to the hash table */
+static bool kvmppc_iommu_hugepage_add(struct kvm_vcpu *vcpu,
+		unsigned long hva, unsigned long gpa)
+{
+	struct kvm_arch *ka = &vcpu->kvm->arch;
+	struct kvmppc_spapr_iommu_hugepage *hp;
+	const unsigned key = KVMPPC_SPAPR_HUGEPAGE_HASH(gpa);
+	pte_t *ptep;
+	unsigned int shift = 0;
+	static const int is_write = 1;
+
+	ptep = find_linux_pte_or_hugepte(vcpu->arch.pgdir, hva, &shift);
+	WARN_ON(!ptep);
+
+	if (!ptep || (shift <= PAGE_SHIFT))
+		return false;
+
+	hp = kzalloc(sizeof(*hp), GFP_KERNEL);
+	if (!hp)
+		return false;
+
+	hp->gpa = gpa & ~((1 << shift) - 1);
+	hp->hpa = (pte_pfn(*ptep) << PAGE_SHIFT);
+	hp->size = 1 << shift;
+
+	if (get_user_pages_fast(hva & ~(hp->size - 1), 1,
+			is_write, &hp->page) != 1) {
+		kfree(hp);
+		return false;
+	}
+	hash_add_rcu(ka->hugepages_hash_tab, &hp->hash_node, key);
+
+	return true;
+}
+
+/*
+ * Returns true if a page with GPA is in the hash table or
+ * has just been added.
+ */
+static bool kvmppc_iommu_hugepage_try_add(struct kvm_vcpu *vcpu,
+		unsigned long hva, unsigned long gpa)
+{
+	struct kvm_arch *ka = &vcpu->kvm->arch;
+	bool ret;
+
+	spin_lock(&ka->hugepages_write_lock);
+	ret = kvmppc_iommu_hugepage_lookup_gpa(ka, gpa) ||
+			kvmppc_iommu_hugepage_add(vcpu, hva, gpa);
+	spin_unlock(&ka->hugepages_write_lock);
+
+	return ret;
+}
+
 static long kvmppc_stt_npages(unsigned long size)
 {
 	return ALIGN(size * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
@@ -234,8 +333,21 @@ static void __user *kvmppc_gpa_to_hva_and_get(struct kvm_vcpu *vcpu,
 
 	hva = __gfn_to_hva_memslot(memslot, gfn) | (gpa & ~PAGE_MASK);
 
-	if (pg && (get_user_pages_fast(hva & PAGE_MASK, 1, is_write, pg) != 1))
-		return ERROR_ADDR;
+	if (pg) {
+		if (get_user_pages_fast(hva & PAGE_MASK, 1, is_write, pg) != 1)
+			return ERROR_ADDR;
+
+		/*
+		 * Check if this GPA is taken care of by the hash table.
+		 * If this is the case, do not show the caller page struct
+		 * address as huge pages will be released at KVM exit.
+		 */
+		if (PageCompound(*pg) && kvmppc_iommu_hugepage_try_add(
+				vcpu, hva, gpa)) {
+			put_page(*pg);
+			*pg = NULL;
+		}
+	}
 
 	return (void *) hva;
 }
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index a3a6597..6c0b95d 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -156,6 +156,23 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt,
 EXPORT_SYMBOL_GPL(kvmppc_tce_put);
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+
+static unsigned long kvmppc_rm_hugepage_gpa_to_hpa(
+		struct kvm_arch *ka,
+		unsigned long gpa)
+{
+	struct kvmppc_spapr_iommu_hugepage *hp;
+	const unsigned key = KVMPPC_SPAPR_HUGEPAGE_HASH(gpa);
+
+	hash_for_each_possible_rcu_notrace(ka->hugepages_hash_tab, hp,
+			hash_node, key) {
+		if ((gpa <= hp->gpa) && (gpa < hp->gpa + hp->size))
+			return hp->hpa + (gpa & (hp->size - 1));
+	}
+
+	return ERROR_ADDR;
+}
+
 /*
  * Converts guest physical address to host physical address.
  *
@@ -175,6 +192,14 @@ static unsigned long kvmppc_rm_gpa_to_hpa_and_get(struct kvm_vcpu *vcpu,
 	unsigned long gfn = gpa >> PAGE_SHIFT;
 	unsigned shift = 0;
 
+	/* Check if it is a hugepage */
+	hpa = kvmppc_rm_hugepage_gpa_to_hpa(&vcpu->kvm->arch, gpa);
+	if (hpa != ERROR_ADDR) {
+		*pg = NULL; /* Tell the caller not to put page */
+		return hpa;
+	}
+
+	/* System page size case */
 	memslot = search_memslots(kvm_memslots(vcpu->kvm), gfn);
 	if (!memslot)
 		return ERROR_ADDR;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 7f6d18a..708be66 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1333,6 +1333,8 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 	if (kvmppc_spapr_tce_init(vcpu))
 		goto free_vcpu;
 
+	kvmppc_iommu_hugepages_init(&vcpu->kvm->arch);
+
 	return vcpu;
 
 free_vcpu:
@@ -1356,6 +1358,7 @@ static void kvmppc_core_vcpu_free_hv(struct kvm_vcpu *vcpu)
 	unpin_vpa(vcpu->kvm, &vcpu->arch.vpa);
 	spin_unlock(&vcpu->arch.vpa_update_lock);
 	kvmppc_spapr_tce_free(vcpu);
+	kvmppc_iommu_hugepages_cleanup(&vcpu->kvm->arch);
 	kvm_vcpu_uninit(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 09/13] KVM: PPC: Add page_shift support for in-kernel H_PUT_TCE/etc handlers
  2014-07-15  9:25 [PATCH v1 00/13] powerpc: kvm: Enable in-kernel acceleration for VFIO Alexey Kardashevskiy
                   ` (7 preceding siblings ...)
  2014-07-15  9:25 ` [PATCH v1 08/13] KVM: PPC: Add hugepage support for IOMMU in-kernel handling Alexey Kardashevskiy
@ 2014-07-15  9:25 ` Alexey Kardashevskiy
  2014-07-15  9:25 ` [PATCH v1 10/13] KVM: PPC: Fix kvmppc_gpa_to_hva_and_get() to return host physical address Alexey Kardashevskiy
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Alexey Kardashevskiy @ 2014-07-15  9:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Paul Mackerras, Gavin Shan

Recently introduced KVM_CREATE_SPAPR_TCE_64 added page_shift. This makes
use of it in kvmppc_tce_put().

This changes kvmppc_tce_put() to take an TCE index rather than IO address.

This does not change the existing behaviour and will be utilized later
by Dynamic DMA windows which support 64K and 16MB page sizes.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio.c    |  8 ++++----
 arch/powerpc/kvm/book3s_64_vio_hv.c | 16 ++++++++--------
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 2648d88..8250521 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -371,7 +371,7 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
 	if (ret)
 		return ret;
 
-	kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce);
+	kvmppc_tce_put(stt, ioba >> stt->page_shift, tce);
 
 	return H_SUCCESS;
 }
@@ -436,7 +436,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 	}
 
 	for (i = 0; i < npages; ++i)
-		kvmppc_tce_put(stt, (ioba >> IOMMU_PAGE_SHIFT_4K) + i,
+		kvmppc_tce_put(stt, (ioba >> stt->page_shift) + i,
 				vcpu->arch.tce_tmp_hpas[i]);
 
 unlock_exit:
@@ -465,8 +465,8 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
 	if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
 		return H_PARAMETER;
 
-	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
-		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
+	for (i = 0; i < npages; ++i, ioba += (1 << stt->page_shift))
+		kvmppc_tce_put(stt, ioba >> stt->page_shift, tce_value);
 
 	return H_SUCCESS;
 }
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 6c0b95d..99bac58 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -71,8 +71,8 @@ EXPORT_SYMBOL_GPL(kvmppc_find_tce_table);
 long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
 		unsigned long ioba, unsigned long npages)
 {
-	unsigned long mask = (1 << IOMMU_PAGE_SHIFT_4K) - 1;
-	unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
+	unsigned long mask = (1 << stt->page_shift) - 1;
+	unsigned long idx = ioba >> stt->page_shift;
 
 	if ((ioba & mask) || (idx < stt->offset) ||
 			(stt->offset + stt->size + npages <= idx))
@@ -95,7 +95,7 @@ EXPORT_SYMBOL_GPL(kvmppc_ioba_validate);
  */
 long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce)
 {
-	unsigned long mask = ((1 << IOMMU_PAGE_SHIFT_4K) - 1) &
+	unsigned long mask = ((1 << stt->page_shift) - 1) &
 			~(TCE_PCI_WRITE | TCE_PCI_READ);
 
 	if (tce & mask)
@@ -271,7 +271,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (ret)
 		return ret;
 
-	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
+	idx = ioba >> stt->page_shift;
 	kvmppc_tce_put(stt, idx, tce);
 
 	return H_SUCCESS;
@@ -323,7 +323,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 	}
 
 	for (i = 0; i < npages; ++i)
-		kvmppc_tce_put(stt, (ioba >> IOMMU_PAGE_SHIFT_4K) + i,
+		kvmppc_tce_put(stt, (ioba >> stt->page_shift) + i,
 				vcpu->arch.tce_tmp_hpas[i]);
 
 put_page_exit:
@@ -354,8 +354,8 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
 	if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
 		return H_PARAMETER;
 
-	for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE_4K)
-		kvmppc_tce_put(stt, ioba >> IOMMU_PAGE_SHIFT_4K, tce_value);
+	for (i = 0; i < npages; ++i, ioba += (1 << stt->page_shift))
+		kvmppc_tce_put(stt, ioba >> stt->page_shift, tce_value);
 
 	return H_SUCCESS;
 }
@@ -377,7 +377,7 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (ret)
 		return ret;
 
-	idx = ioba >> IOMMU_PAGE_SHIFT_4K;
+	idx = ioba >> stt->page_shift;
 	idx -= stt->offset;
 	page = stt->pages[idx / TCES_PER_PAGE];
 	tbl = (u64 *)page_address(page);
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 10/13] KVM: PPC: Fix kvmppc_gpa_to_hva_and_get() to return host physical address
  2014-07-15  9:25 [PATCH v1 00/13] powerpc: kvm: Enable in-kernel acceleration for VFIO Alexey Kardashevskiy
                   ` (8 preceding siblings ...)
  2014-07-15  9:25 ` [PATCH v1 09/13] KVM: PPC: Add page_shift support for in-kernel H_PUT_TCE/etc handlers Alexey Kardashevskiy
@ 2014-07-15  9:25 ` Alexey Kardashevskiy
  2014-07-15  9:25 ` [PATCH v1 11/13] KVM: PPC: Associate IOMMU group with guest copy of TCE table Alexey Kardashevskiy
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Alexey Kardashevskiy @ 2014-07-15  9:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Paul Mackerras, Gavin Shan

The existing support of emulated devices does not need to calculate
a host physical address as the translation is performed by the userspace.

The upcoming support of VFIO needs it as it stores host physical
addresses in the real hardware TCE table which hardware uses during DMA
transfer. This translation could be done using page struct object
which is returned by kvmppc_gpa_to_hva_and_get().

However kvmppc_gpa_to_hva_and_get() does not return valid page struct
for huge pages to avoid possible bugs with excessive page releases.

This extends kvmppc_gpa_to_hva_and_get() to return a physical page address.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 8250521..573fd6d 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -321,7 +321,7 @@ fail:
  * and returns ERROR_ADDR if failed.
  */
 static void __user *kvmppc_gpa_to_hva_and_get(struct kvm_vcpu *vcpu,
-		unsigned long gpa, struct page **pg)
+		unsigned long gpa, struct page **pg, unsigned long *phpa)
 {
 	unsigned long hva, gfn = gpa >> PAGE_SHIFT;
 	struct kvm_memory_slot *memslot;
@@ -337,6 +337,10 @@ static void __user *kvmppc_gpa_to_hva_and_get(struct kvm_vcpu *vcpu,
 		if (get_user_pages_fast(hva & PAGE_MASK, 1, is_write, pg) != 1)
 			return ERROR_ADDR;
 
+		if (phpa)
+			*phpa = __pa((unsigned long) page_address(*pg)) |
+				(hva & ~PAGE_MASK);
+
 		/*
 		 * Check if this GPA is taken care of by the hash table.
 		 * If this is the case, do not show the caller page struct
@@ -404,7 +408,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		return ret;
 
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
-	tces = kvmppc_gpa_to_hva_and_get(vcpu, tce_list, NULL);
+	tces = kvmppc_gpa_to_hva_and_get(vcpu, tce_list, NULL, NULL);
 	if (tces == ERROR_ADDR) {
 		ret = H_TOO_HARD;
 		goto unlock_exit;
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 11/13] KVM: PPC: Associate IOMMU group with guest copy of TCE table
  2014-07-15  9:25 [PATCH v1 00/13] powerpc: kvm: Enable in-kernel acceleration for VFIO Alexey Kardashevskiy
                   ` (9 preceding siblings ...)
  2014-07-15  9:25 ` [PATCH v1 10/13] KVM: PPC: Fix kvmppc_gpa_to_hva_and_get() to return host physical address Alexey Kardashevskiy
@ 2014-07-15  9:25 ` Alexey Kardashevskiy
  2014-07-15  9:25 ` [PATCH v1 12/13] KVM: PPC: vfio kvm device: support spapr tce Alexey Kardashevskiy
  2014-07-15  9:25 ` [PATCH v1 13/13] KVM: PPC: Add support for IOMMU in-kernel handling Alexey Kardashevskiy
  12 siblings, 0 replies; 15+ messages in thread
From: Alexey Kardashevskiy @ 2014-07-15  9:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Paul Mackerras, Gavin Shan

The existing in-kernel TCE table for emulated devices contains
guest physical addresses which are accesses by emulated devices.
Since we need to keep this information for VFIO devices too
in order to implement H_GET_TCE, we are reusing it.

This adds iommu_group* and iommu_table* pointers to kvmppc_spapr_tce_table.

This adds kvm_spapr_tce_attach_iommu_group() helper to initialize
the pointers.

This puts the group when guest copy of TCE table is destroyed which
happens when TCE table fd is closed.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/kvm_host.h |  2 ++
 arch/powerpc/include/asm/kvm_ppc.h  |  5 +++++
 arch/powerpc/kvm/book3s_64_vio.c    | 28 ++++++++++++++++++++++++++++
 3 files changed, 35 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 8a3b465..8d8eee9 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -184,6 +184,8 @@ struct kvmppc_spapr_tce_table {
 	u32 page_shift;
 	u64 offset;		/* in pages */
 	u64 size;		/* in pages */
+	struct iommu_table *tbl;
+	struct iommu_group *refgrp;/* reference counting only */
 	struct page *pages[0];
 };
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 86f5015..92be7f5 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -129,6 +129,11 @@ extern int kvmppc_spapr_tce_init(struct kvm_vcpu *vcpu);
 extern void kvmppc_spapr_tce_free(struct kvm_vcpu *vcpu);
 extern void kvmppc_iommu_hugepages_init(struct kvm_arch *ka);
 extern void kvmppc_iommu_hugepages_cleanup(struct kvm_arch *ka);
+struct iommu_group;
+extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm,
+				unsigned long liobn,
+				phys_addr_t start_addr,
+				struct iommu_group *grp);
 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 				struct kvm_create_spapr_tce_64 *args);
 extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 573fd6d..b7de38e 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -28,6 +28,7 @@
 #include <linux/hugetlb.h>
 #include <linux/list.h>
 #include <linux/anon_inodes.h>
+#include <linux/iommu.h>
 
 #include <asm/tlbflush.h>
 #include <asm/kvm_ppc.h>
@@ -205,6 +206,10 @@ static void release_spapr_tce_table(struct kvmppc_spapr_tce_table *stt)
 
 	mutex_lock(&kvm->lock);
 	list_del(&stt->list);
+
+	if (stt->refgrp)
+		iommu_group_put(stt->refgrp);
+
 	for (i = 0; i < npages; i++)
 		__free_page(stt->pages[i]);
 
@@ -253,6 +258,29 @@ static const struct file_operations kvm_spapr_tce_fops = {
 	.release	= kvm_spapr_tce_release,
 };
 
+extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm,
+				unsigned long liobn,
+				phys_addr_t start_addr,
+				struct iommu_group *grp)
+{
+	struct kvmppc_spapr_tce_table *stt = NULL;
+
+	/* Check this LIOBN hasn't been previously allocated */
+	list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
+		if (stt->liobn == liobn) {
+			struct spapr_tce_iommu_group *data;
+
+			data = iommu_group_get_iommudata(grp);
+			BUG_ON(!data);
+			stt->tbl = data->ops->get_table(data, start_addr);
+			stt->refgrp = grp;
+			return 0;
+		}
+	}
+
+	return -ENODEV;
+}
+
 long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 				   struct kvm_create_spapr_tce_64 *args)
 {
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 12/13] KVM: PPC: vfio kvm device: support spapr tce
  2014-07-15  9:25 [PATCH v1 00/13] powerpc: kvm: Enable in-kernel acceleration for VFIO Alexey Kardashevskiy
                   ` (10 preceding siblings ...)
  2014-07-15  9:25 ` [PATCH v1 11/13] KVM: PPC: Associate IOMMU group with guest copy of TCE table Alexey Kardashevskiy
@ 2014-07-15  9:25 ` Alexey Kardashevskiy
  2014-07-15  9:25 ` [PATCH v1 13/13] KVM: PPC: Add support for IOMMU in-kernel handling Alexey Kardashevskiy
  12 siblings, 0 replies; 15+ messages in thread
From: Alexey Kardashevskiy @ 2014-07-15  9:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Paul Mackerras, Gavin Shan

In addition to the external VFIO user API, a VFIO KVM device
has been introduced recently.

sPAPR TCE IOMMU is para-virtualized and the guest does map/unmap
via hypercalls which take a logical bus id (LIOBN) as a target IOMMU
identifier. LIOBNs are made up, advertised to the guest system and
linked to IOMMU groups by the user space.
In order to enable acceleration for IOMMU operations in KVM, we need
to tell KVM the information about LIOBN-to-group mapping.

For that, a new KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN parameter
is added. It accepts a pair of a VFIO group fd and LIOBN.

KVM uses kvm_vfio_find_group_by_liobn() once per KVM run and caches
the result in kvm_arch. iommu_group_put() for all groups is called
at KVM finish in the SPAPR TCE (will be added in KVM enablement patch).

Before notifying KVM about new link, this check the group for being
registered with KVM device in order to release them at unexpected KVM
finish.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v5:
* added lock in search function
* changed callback function type name

v4:
* fixed few bugs
* changed kvm_vfio_find_group_by_liobn() to return informative errors

v3:
* total rework
* added a release callback into kvm_vfio_find_group_by_liobn so now
the user of the API can get a notification if the group is about to
disappear
---
 Documentation/virtual/kvm/devices/vfio.txt | 20 ++++++++-
 arch/powerpc/kvm/Kconfig                   |  1 +
 arch/powerpc/kvm/Makefile                  |  3 ++
 include/uapi/linux/kvm.h                   |  8 ++++
 virt/kvm/vfio.c                            | 69 ++++++++++++++++++++++++++++++
 5 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
index ef51740..eaf0f5e 100644
--- a/Documentation/virtual/kvm/devices/vfio.txt
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -16,7 +16,23 @@ Groups:
 
 KVM_DEV_VFIO_GROUP attributes:
   KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
+	kvm_device_attr.addr points to an int32_t file descriptor
+	for the VFIO group.
+
   KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
+	kvm_device_attr.addr points to an int32_t file descriptor
+	for the VFIO group.
 
-For each, kvm_device_attr.addr points to an int32_t file descriptor
-for the VFIO group.
+  KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN: sets a liobn for a VFIO group
+	kvm_device_attr.addr points to a struct:
+		struct kvm_vfio_spapr_tce_liobn {
+			__u32	argsz;
+			__s32	fd;
+			__u32	liobn;
+			__u64	start_addr;
+		};
+		where
+		@argsz is the size of kvm_vfio_spapr_tce_liobn;
+		@fd is a file descriptor for a VFIO group;
+		@liobn is a logical bus id to be associated with the group;
+		@start_addr is a DMA window offset on the IO (PCI) bus.
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 301fa6b..f708e61 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -64,6 +64,7 @@ config KVM_BOOK3S_64
 	select KVM
 	select KVM_BOOK3S_PR_POSSIBLE if !KVM_BOOK3S_HV_POSSIBLE
 	select SPAPR_TCE_IOMMU if IOMMU_SUPPORT
+	select KVM_VFIO if VFIO
 	---help---
 	  Support running unmodified book3s_64 and book3s_32 guest kernels
 	  in virtual machines on book3s_64 host processors.
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index ce569b6..d55c097 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -97,6 +97,9 @@ endif
 kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \
 	book3s_xics.o
 
+kvm-book3s_64-objs-$(CONFIG_KVM_VFIO) += \
+	$(KVM)/vfio.o \
+
 kvm-book3s_64-module-objs += \
 	$(KVM)/kvm_main.o \
 	$(KVM)/eventfd.o \
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3beb542..c1ad9b7 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -949,9 +949,17 @@ struct kvm_device_attr {
 #define  KVM_DEV_VFIO_GROUP			1
 #define   KVM_DEV_VFIO_GROUP_ADD			1
 #define   KVM_DEV_VFIO_GROUP_DEL			2
+#define   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN	3
 #define KVM_DEV_TYPE_ARM_VGIC_V2	5
 #define KVM_DEV_TYPE_FLIC		6
 
+struct kvm_vfio_spapr_tce_liobn {
+	__u32	argsz;
+	__s32	fd;
+	__u32	liobn;
+	__u64	start_addr;
+};
+
 /*
  * ioctls for VM fds
  */
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index ba1a93f..43a224b 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -19,6 +19,10 @@
 #include <linux/uaccess.h>
 #include <linux/vfio.h>
 
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+#include <asm/kvm_ppc.h>
+#endif
+
 struct kvm_vfio_group {
 	struct list_head node;
 	struct vfio_group *vfio_group;
@@ -196,6 +200,68 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
 		kvm_vfio_update_coherency(dev);
 
 		return ret;
+
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+	case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN: {
+		struct kvm_vfio_spapr_tce_liobn param;
+		unsigned long minsz;
+		struct kvm_vfio *kv = dev->private;
+		struct vfio_group *vfio_group;
+		struct kvm_vfio_group *kvg;
+		struct fd f;
+
+		minsz = offsetofend(struct kvm_vfio_spapr_tce_liobn,
+				start_addr);
+
+		if (copy_from_user(&param, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (param.argsz < minsz)
+			return -EINVAL;
+
+		f = fdget(param.fd);
+		if (!f.file)
+			return -EBADF;
+
+		vfio_group = kvm_vfio_group_get_external_user(f.file);
+		fdput(f);
+
+		if (IS_ERR(vfio_group))
+			return PTR_ERR(vfio_group);
+
+		ret = -ENOENT;
+
+		mutex_lock(&kv->lock);
+
+		list_for_each_entry(kvg, &kv->group_list, node) {
+			int group_id;
+			struct iommu_group *grp;
+
+			if (kvg->vfio_group != vfio_group)
+				continue;
+
+			group_id = vfio_external_user_iommu_id(kvg->vfio_group);
+			grp = iommu_group_get_by_id(group_id);
+			if (!grp) {
+				ret = -EFAULT;
+				break;
+			}
+
+			ret = kvm_spapr_tce_attach_iommu_group(dev->kvm,
+					param.liobn, param.start_addr, grp);
+			if (ret)
+				iommu_group_put(grp);
+
+			break;
+		}
+
+		mutex_unlock(&kv->lock);
+
+		kvm_vfio_group_put_external_user(vfio_group);
+
+		return ret;
+	}
+#endif /* CONFIG_SPAPR_TCE_IOMMU */
 	}
 
 	return -ENXIO;
@@ -220,6 +286,9 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 		switch (attr->attr) {
 		case KVM_DEV_VFIO_GROUP_ADD:
 		case KVM_DEV_VFIO_GROUP_DEL:
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+		case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE_LIOBN:
+#endif
 			return 0;
 		}
 
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v1 13/13] KVM: PPC: Add support for IOMMU in-kernel handling
  2014-07-15  9:25 [PATCH v1 00/13] powerpc: kvm: Enable in-kernel acceleration for VFIO Alexey Kardashevskiy
                   ` (11 preceding siblings ...)
  2014-07-15  9:25 ` [PATCH v1 12/13] KVM: PPC: vfio kvm device: support spapr tce Alexey Kardashevskiy
@ 2014-07-15  9:25 ` Alexey Kardashevskiy
  12 siblings, 0 replies; 15+ messages in thread
From: Alexey Kardashevskiy @ 2014-07-15  9:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Paul Mackerras, Gavin Shan

This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
them to user space which saves time on switching to user space and back.

Both real and virtual modes are supported. The kernel tries to
handle a TCE request in the real mode, if fails it passes the request
to the virtual mode to complete the operation. If it a virtual mode
handler fails, the request is passed to user space.

The first user of this is VFIO on POWER. Trampolines to the VFIO external
user API functions are required for this patch.

This adds a "SPAPR TCE IOMMU" KVM device to associate a logical bus
number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling
of map/unmap requests. The device supports a single attribute which is
a struct with LIOBN and IOMMU fd. When the attribute is set, the device
establishes the connection between KVM and VFIO.

Tests show that this patch increases transmission speed from 220MB/s
to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

---

Changes:
v12:
* reworked for the latest VFIO KVM device

v11:
* removed VFIO_IOMMU capability
* fixed comments from Gleb
* added @type to kvmppc_spapr_tce_table struct and split it into 2 parts
(emulated, iommu)

v10:
* all IOMMU TCE links are handled by one KVM device now
* KVM device has its own list of TCE descriptors
* the search-by-liobn function was extended to search through
emulated and IOMMU lists

v9:
* KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with "SPAPR TCE IOMMU"
KVM device
* release_spapr_tce_table() is not shared between different TCE types
* reduced the patch size by moving KVM device bits and VFIO external API
trampolines to separate patches
* moved documentation from Documentation/virtual/kvm/api.txt to
Documentation/virtual/kvm/devices/spapr_tce_iommu.txt

v8:
* fixed warnings from check_patch.pl

2013/07/11:
* removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
for KVM_BOOK3S_64
* kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense
for this here but the next patch for hugepages support will use it more.

2013/07/06:
* added realmode arch_spin_lock to protect TCE table from races
in real and virtual modes
* POWERPC IOMMU API is changed to support real mode
* iommu_take_ownership and iommu_release_ownership are protected by
iommu_table's locks
* VFIO external user API use rewritten
* multiple small fixes

2013/06/27:
* tce_list page is referenced now in order to protect it from accident
invalidation during H_PUT_TCE_INDIRECT execution
* added use of the external user VFIO API

2013/06/05:
* changed capability number
* changed ioctl number
* update the doc article number

2013/05/20:
* removed get_user() from real mode handlers
* kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
translated TCEs, tries realmode_get_page() on those and if it fails, it
passes control over the virtual mode handler which tries to finish
the request handling
* kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
on a page
* The only reason to pass the request to user mode now is when the user mode
did not register TCE table in the kernel, in all other cases the virtual mode
handler is expected to do the job

Conflicts:
	arch/powerpc/include/asm/kvm_host.h
	arch/powerpc/kvm/book3s_64_vio.c
---
 arch/powerpc/include/asm/kvm_host.h |   1 +
 arch/powerpc/kvm/book3s_64_vio.c    | 177 ++++++++++++++++++++++++++++++++++--
 arch/powerpc/kvm/book3s_64_vio_hv.c | 130 ++++++++++++++++++++++++++
 3 files changed, 298 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 8d8eee9..6056114 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -726,6 +726,7 @@ struct kvm_vcpu_arch {
 		 */
 	} tce_rm_fail;			/* failed stage of request processing */
 	struct page *tce_rm_list_pg;	/* unreferenced page from realmode */
+	unsigned long tce_tmp_num;	/* valid entries number */
 #endif
 #if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE) || \
 	defined(CONFIG_KVM_BOOK3S_PR_POSSIBLE)
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index b7de38e..90e7ad1 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -21,7 +21,6 @@
 #include <linux/string.h>
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
-
 #include <linux/highmem.h>
 #include <linux/gfp.h>
 #include <linux/slab.h>
@@ -29,6 +28,8 @@
 #include <linux/list.h>
 #include <linux/anon_inodes.h>
 #include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/file.h>
 
 #include <asm/tlbflush.h>
 #include <asm/kvm_ppc.h>
@@ -347,6 +348,8 @@ fail:
  *
  * If pg!=NULL, tries to increase page counter via get_user_pages_fast()
  * and returns ERROR_ADDR if failed.
+ *
+ * if pg!=NULL&&phpa!=NULL, returns host physical address in *phpa.
  */
 static void __user *kvmppc_gpa_to_hva_and_get(struct kvm_vcpu *vcpu,
 		unsigned long gpa, struct page **pg, unsigned long *phpa)
@@ -384,6 +387,128 @@ static void __user *kvmppc_gpa_to_hva_and_get(struct kvm_vcpu *vcpu,
 	return (void *) hva;
 }
 
+long kvmppc_h_put_tce_iommu(struct kvm_vcpu *vcpu,
+		struct iommu_table *tbl,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce)
+{
+	struct page *pg = NULL;
+	unsigned long hpa;
+	void __user *hva;
+	long idx, ret = H_HARDWARE;
+
+	/* Clear TCE */
+	if (!(tce & (TCE_PCI_READ | TCE_PCI_WRITE))) {
+		if (iommu_tce_clear_param_check(tbl, ioba, 0, 1))
+			return H_PARAMETER;
+
+		if (iommu_clear_tces_and_put_pages(tbl,
+				ioba >> tbl->it_page_shift,
+				1, false))
+			return H_HARDWARE;
+
+		return H_SUCCESS;
+	}
+
+	/* Put TCE */
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	/*
+	 * Real mode referenced the page but hpte changed
+	 * during this operation
+	 */
+	if (vcpu->arch.tce_rm_fail == TCERM_GETPAGE) {
+		put_page(pfn_to_page(vcpu->arch.tce_tmp_hpas[0] >> PAGE_SHIFT));
+		/* And try again */
+	}
+	vcpu->arch.tce_rm_fail = TCERM_NONE;
+#endif
+
+	if (iommu_tce_put_param_check(tbl, ioba, tce))
+		return H_PARAMETER;
+
+	idx = srcu_read_lock(&vcpu->kvm->srcu);
+	hva = kvmppc_gpa_to_hva_and_get(vcpu, tce, &pg, &hpa);
+	if (hva == ERROR_ADDR)
+		goto unlock_exit;
+
+	if (iommu_tce_build(tbl, ioba >> tbl->it_page_shift, &hpa, 1, false)) {
+		if (pg && !PageCompound(pg))
+			put_page(pg);
+		goto unlock_exit;
+	}
+	ret = H_SUCCESS;
+
+unlock_exit:
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
+
+	return ret;
+}
+
+static long kvmppc_h_put_tce_indirect_iommu(struct kvm_vcpu *vcpu,
+		struct iommu_table *tbl, unsigned long ioba,
+		unsigned long __user *tces, unsigned long npages)
+{
+	long i = 0;
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	if (vcpu->arch.tce_rm_fail == TCERM_GETPAGE) {
+		unsigned long tmp;
+
+		if (get_user(tmp, tces + vcpu->arch.tce_tmp_num))
+			return H_HARDWARE;
+		put_page(pfn_to_page(tmp >> PAGE_SHIFT));
+	}
+	i = vcpu->arch.tce_tmp_num;
+#endif
+	for ( ; i < npages; ++i) {
+		struct page *pg = NULL;
+		unsigned long gpa;
+		void __user *hva;
+
+		if (get_user(gpa, tces + i))
+			return H_HARDWARE;
+
+		if (iommu_tce_put_param_check(tbl, ioba +
+					(i << tbl->it_page_shift), gpa))
+			return H_PARAMETER;
+
+		hva = kvmppc_gpa_to_hva_and_get(vcpu, gpa, &pg,
+				&vcpu->arch.tce_tmp_hpas[i]);
+		if (hva == ERROR_ADDR)
+			goto putpages_flush_exit;
+	}
+
+	if (!iommu_tce_build(tbl, ioba >> tbl->it_page_shift,
+			vcpu->arch.tce_tmp_hpas, npages, false))
+		return H_SUCCESS;
+
+putpages_flush_exit:
+	for (--i; i >= 0; --i) {
+		struct page *pg;
+
+		pg = pfn_to_page(vcpu->arch.tce_tmp_hpas[i] >> PAGE_SHIFT);
+		if (pg && !PageCompound(pg))
+			put_page(pg);
+	}
+
+	return H_HARDWARE;
+}
+
+long kvmppc_h_stuff_tce_iommu(struct kvm_vcpu *vcpu,
+		struct iommu_table *tbl,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_value, unsigned long npages)
+{
+	if (iommu_tce_clear_param_check(tbl, ioba, tce_value, npages))
+		return H_PARAMETER;
+
+	if (iommu_clear_tces_and_put_pages(tbl, ioba >> tbl->it_page_shift,
+				npages, false))
+		return H_HARDWARE;
+
+	return H_SUCCESS;
+}
+
 long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
 		unsigned long liobn, unsigned long ioba,
 		unsigned long tce)
@@ -403,6 +528,13 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
 	if (ret)
 		return ret;
 
+	if (stt->tbl) {
+		ret = kvmppc_h_put_tce_iommu(vcpu, stt->tbl, liobn, ioba, tce);
+		if (ret)
+			return ret;
+	}
+
+	/* Update guest version of TCE table */
 	kvmppc_tce_put(stt, ioba >> stt->page_shift, tce);
 
 	return H_SUCCESS;
@@ -455,22 +587,39 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 	if (vcpu->arch.tce_rm_fail == TCERM_PUTLISTPAGE)
 		goto unlock_exit;
 #endif
+	/* Validate TCEs, do not touch tce_tmp_hpas */
+	for (i = vcpu->arch.tce_tmp_num; i < npages; ++i) {
+		unsigned long tce;
 
+		if (get_user(tce, tces + i)) {
+			ret = H_PARAMETER;
+			goto unlock_exit;
+		}
+
+		ret = kvmppc_tce_validate(stt, tce);
+		if (ret)
+			goto unlock_exit;
+	}
+
+	/* Update TCE table if it is VFIO */
+	if (stt->tbl) {
+		ret = kvmppc_h_put_tce_indirect_iommu(vcpu,
+				stt->tbl, ioba, tces, npages);
+		if (ret)
+			goto unlock_exit;
+	}
+
+	/* Update guest version of TCE table */
 	for (i = 0; i < npages; ++i) {
-		if (get_user(vcpu->arch.tce_tmp_hpas[i], tces + i)) {
+		unsigned long tce;
+
+		if (get_user(tce, tces + i)) {
 			ret = H_PARAMETER;
 			goto unlock_exit;
 		}
-
-		ret = kvmppc_tce_validate(stt, vcpu->arch.tce_tmp_hpas[i]);
-		if (ret)
-			goto unlock_exit;
+		kvmppc_tce_put(stt, (ioba >> stt->page_shift) + i, tce);
 	}
 
-	for (i = 0; i < npages; ++i)
-		kvmppc_tce_put(stt, (ioba >> stt->page_shift) + i,
-				vcpu->arch.tce_tmp_hpas[i]);
-
 unlock_exit:
 	srcu_read_unlock(&vcpu->kvm->srcu, idx);
 
@@ -497,6 +646,14 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
 	if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
 		return H_PARAMETER;
 
+	if (stt->tbl) {
+		ret = kvmppc_h_stuff_tce_iommu(vcpu, stt->tbl, liobn, ioba,
+				tce_value, npages);
+		if (ret)
+			return ret;
+	}
+
+	/* Update guest version of TCE table */
 	for (i = 0; i < npages; ++i, ioba += (1 << stt->page_shift))
 		kvmppc_tce_put(stt, ioba >> stt->page_shift, tce_value);
 
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 99bac58..47b76a7 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -26,6 +26,7 @@
 #include <linux/slab.h>
 #include <linux/hugetlb.h>
 #include <linux/list.h>
+#include <linux/iommu.h>
 
 #include <asm/tlbflush.h>
 #include <asm/kvm_ppc.h>
@@ -247,6 +248,107 @@ static unsigned long kvmppc_rm_gpa_to_hpa_and_get(struct kvm_vcpu *vcpu,
 	return hpa;
 }
 
+static long kvmppc_rm_h_put_tce_iommu(struct kvm_vcpu *vcpu,
+		struct iommu_table *tbl, unsigned long liobn,
+		unsigned long ioba, unsigned long tce)
+{
+	int ret = 0;
+	unsigned long hpa;
+	struct page *pg = NULL;
+
+	/* Clear TCE */
+	if (!(tce & (TCE_PCI_READ | TCE_PCI_WRITE))) {
+		if (iommu_tce_clear_param_check(tbl, ioba, 0, 1))
+			return H_PARAMETER;
+
+		if (iommu_clear_tces_and_put_pages(tbl,
+				ioba >> tbl->it_page_shift, 1, true))
+			return H_TOO_HARD;
+
+		return H_SUCCESS;
+	}
+
+	/* Put TCE */
+	if (iommu_tce_put_param_check(tbl, ioba, tce))
+		return H_PARAMETER;
+
+	hpa = kvmppc_rm_gpa_to_hpa_and_get(vcpu, tce, &pg);
+
+	if (hpa == ERROR_ADDR) {
+		vcpu->arch.tce_tmp_hpas[0] = hpa;
+		vcpu->arch.tce_rm_fail = pg ? TCERM_GETPAGE : TCERM_NONE;
+		return H_TOO_HARD;
+	}
+
+	ret = iommu_tce_build(tbl, ioba >> tbl->it_page_shift,
+			      &hpa, 1, true);
+
+	if (ret) {
+		vcpu->arch.tce_tmp_hpas[0] = hpa;
+		vcpu->arch.tce_rm_fail = pg ? TCERM_GETPAGE : TCERM_NONE;
+		return H_TOO_HARD;
+	}
+
+	return H_SUCCESS;
+}
+
+static long kvmppc_rm_h_put_tce_indirect_iommu(struct kvm_vcpu *vcpu,
+		struct iommu_table *tbl, unsigned long ioba,
+		unsigned long *tces, unsigned long npages)
+{
+	int i, ret;
+	unsigned long hpa;
+
+	/* Check all TCEs */
+	for (i = 0; i < npages; ++i) {
+		if (iommu_tce_put_param_check(tbl, ioba +
+				(i << tbl->it_page_shift), tces[i]))
+			return H_PARAMETER;
+	}
+
+	/* Translate TCEs and go get_page() */
+	for (i = 0; i < npages; ++i) {
+		struct page *pg = NULL;
+
+		hpa = kvmppc_rm_gpa_to_hpa_and_get(vcpu, tces[i], &pg);
+		if (hpa == ERROR_ADDR) {
+			vcpu->arch.tce_tmp_hpas[i] = 0xBAADF00D; /* poison */
+			vcpu->arch.tce_tmp_num = i;
+			vcpu->arch.tce_rm_fail = pg ?
+					TCERM_GETPAGE : TCERM_NONE;
+			return H_TOO_HARD;
+		}
+		vcpu->arch.tce_tmp_hpas[i] = hpa;
+	}
+
+	/* Put TCEs to the table */
+	ret = iommu_tce_build(tbl, (ioba >> tbl->it_page_shift),
+			vcpu->arch.tce_tmp_hpas, npages, true);
+	if (ret == -EAGAIN) {
+		vcpu->arch.tce_rm_fail = TCERM_PUTTCE;
+		return H_TOO_HARD;
+	} else if (ret) {
+		return H_HARDWARE;
+	}
+
+	return H_SUCCESS;
+}
+
+static long kvmppc_rm_h_stuff_tce_iommu(struct kvm_vcpu *vcpu,
+		struct iommu_table *tbl,
+		unsigned long liobn, unsigned long ioba,
+		unsigned long tce_value, unsigned long npages)
+{
+	if (iommu_tce_clear_param_check(tbl, ioba, tce_value, npages))
+		return H_PARAMETER;
+
+	if (iommu_clear_tces_and_put_pages(tbl, ioba >> tbl->it_page_shift,
+				npages, true))
+		return H_TOO_HARD;
+
+	return H_SUCCESS;
+}
+
 long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		unsigned long ioba, unsigned long tce)
 {
@@ -262,6 +364,7 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 		return H_TOO_HARD;
 
 	vcpu->arch.tce_rm_fail = TCERM_NONE;
+	vcpu->arch.tce_tmp_num = 0;
 
 	ret = kvmppc_ioba_validate(stt, ioba, 1);
 	if (ret)
@@ -271,6 +374,14 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 	if (ret)
 		return ret;
 
+	if (stt->tbl) {
+		ret = kvmppc_rm_h_put_tce_iommu(vcpu, stt->tbl, liobn,
+				ioba, tce);
+		if (ret)
+			return ret;
+	}
+
+	/* Update guest version of TCE table */
 	idx = ioba >> stt->page_shift;
 	kvmppc_tce_put(stt, idx, tce);
 
@@ -306,6 +417,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 
 	vcpu->arch.tce_rm_fail = TCERM_NONE;
 	vcpu->arch.tce_rm_list_pg = NULL;
+	vcpu->arch.tce_tmp_num = 0;
 	tces = kvmppc_rm_gpa_to_hpa_and_get(vcpu, tce_list, &pg);
 	if (tces == ERROR_ADDR) {
 		vcpu->arch.tce_rm_fail = pg ? TCERM_NONE : TCERM_GETLISTPAGE;
@@ -322,6 +434,16 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		vcpu->arch.tce_tmp_hpas[i] = tce;
 	}
 
+	if (stt->tbl) {
+		ret = kvmppc_rm_h_put_tce_indirect_iommu(vcpu,
+				stt->tbl, ioba, (unsigned long *)tces, npages);
+		if (ret == H_TOO_HARD)
+			return ret;
+		if (ret)
+			goto put_page_exit;
+	}
+
+	/* Update guest version of TCE table */
 	for (i = 0; i < npages; ++i)
 		kvmppc_tce_put(stt, (ioba >> stt->page_shift) + i,
 				vcpu->arch.tce_tmp_hpas[i]);
@@ -354,6 +476,14 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
 	if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
 		return H_PARAMETER;
 
+	if (stt->tbl) {
+		ret = kvmppc_rm_h_stuff_tce_iommu(vcpu, stt->tbl, liobn, ioba,
+				tce_value, npages);
+		if (ret)
+			return ret;
+	}
+
+	/* Update guest version of TCE table */
 	for (i = 0; i < npages; ++i, ioba += (1 << stt->page_shift))
 		kvmppc_tce_put(stt, ioba >> stt->page_shift, tce_value);
 
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 01/13] KVM: PPC: Account TCE pages in locked_vm
  2014-07-15  9:25 ` [PATCH v1 01/13] KVM: PPC: Account TCE pages in locked_vm Alexey Kardashevskiy
@ 2014-07-15  9:29   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 15+ messages in thread
From: Alexey Kardashevskiy @ 2014-07-15  9:29 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Paul Mackerras, Gavin Shan

On 07/15/2014 07:25 PM, Alexey Kardashevskiy wrote:
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>


Just realized this should go to "powernv: vfio: Add Dynamic DMA windows (DDW)".

And neither patchset accounts DDW in locked_vm, need to decide how...


> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 35 ++++++++++++++++++++++++++++++++++-
>  1 file changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 2137836..4ca33f1 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -73,18 +73,48 @@ static long kvmppc_stt_npages(unsigned long window_size)
>  		     * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> +/*
> + * Checks ulimit in order not to let the user space to pin all
> + * available memory for TCE tables.
> + */
> +static long kvmppc_account_memlimit(long npages)
> +{
> +	unsigned long ret = 0, locked, lock_limit;
> +
> +	if (!current->mm)
> +		return -ESRCH; /* process exited */
> +
> +	down_write(&current->mm->mmap_sem);
> +	locked = current->mm->locked_vm + npages;
> +	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> +	if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
> +		pr_warn("RLIMIT_MEMLOCK (%ld) exceeded\n",
> +				rlimit(RLIMIT_MEMLOCK));
> +		ret = -ENOMEM;
> +	} else {
> +		current->mm->locked_vm += npages;
> +	}
> +	up_write(&current->mm->mmap_sem);
> +
> +	return ret;
> +}
> +
>  static void release_spapr_tce_table(struct kvmppc_spapr_tce_table *stt)
>  {
>  	struct kvm *kvm = stt->kvm;
>  	int i;
> +	long npages = kvmppc_stt_npages(stt->window_size);
>  
>  	mutex_lock(&kvm->lock);
>  	list_del(&stt->list);
> -	for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
> +	for (i = 0; i < npages; i++)
>  		__free_page(stt->pages[i]);
> +
>  	kfree(stt);
>  	mutex_unlock(&kvm->lock);
>  
> +	kvmppc_account_memlimit(-(npages + 1));
> +
>  	kvm_put_kvm(kvm);
>  }
>  
> @@ -140,6 +170,9 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  	}
>  
>  	npages = kvmppc_stt_npages(args->window_size);
> +	ret = kvmppc_account_memlimit(npages + 1);
> +	if (ret)
> +		goto fail;
>  
>  	stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
>  		      GFP_KERNEL);
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-07-15  9:29 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-15  9:25 [PATCH v1 00/13] powerpc: kvm: Enable in-kernel acceleration for VFIO Alexey Kardashevskiy
2014-07-15  9:25 ` [PATCH v1 01/13] KVM: PPC: Account TCE pages in locked_vm Alexey Kardashevskiy
2014-07-15  9:29   ` Alexey Kardashevskiy
2014-07-15  9:25 ` [PATCH v1 02/13] KVM: PPC: Rework kvmppc_spapr_tce_table to support variable page size Alexey Kardashevskiy
2014-07-15  9:25 ` [PATCH v1 03/13] KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently Alexey Kardashevskiy
2014-07-15  9:25 ` [PATCH v1 04/13] KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number Alexey Kardashevskiy
2014-07-15  9:25 ` [PATCH v1 05/13] KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_64 " Alexey Kardashevskiy
2014-07-15  9:25 ` [PATCH v1 06/13] KVM: PPC: Add @offset to kvmppc_spapr_tce_table Alexey Kardashevskiy
2014-07-15  9:25 ` [PATCH v1 07/13] KVM: PPC: Add support for 64bit TCE windows Alexey Kardashevskiy
2014-07-15  9:25 ` [PATCH v1 08/13] KVM: PPC: Add hugepage support for IOMMU in-kernel handling Alexey Kardashevskiy
2014-07-15  9:25 ` [PATCH v1 09/13] KVM: PPC: Add page_shift support for in-kernel H_PUT_TCE/etc handlers Alexey Kardashevskiy
2014-07-15  9:25 ` [PATCH v1 10/13] KVM: PPC: Fix kvmppc_gpa_to_hva_and_get() to return host physical address Alexey Kardashevskiy
2014-07-15  9:25 ` [PATCH v1 11/13] KVM: PPC: Associate IOMMU group with guest copy of TCE table Alexey Kardashevskiy
2014-07-15  9:25 ` [PATCH v1 12/13] KVM: PPC: vfio kvm device: support spapr tce Alexey Kardashevskiy
2014-07-15  9:25 ` [PATCH v1 13/13] KVM: PPC: Add support for IOMMU in-kernel handling Alexey Kardashevskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).