RFC: Paravirtualized DMA accesses for KVM

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* RFC: Paravirtualized DMA accesses for KVM
@ 2007-11-07 14:21 Amit Shah
       [not found] ` <d1c72ce6e3a0e73c18993c3f066d1350b147f726.1194445109.git.amit.shah@qumranet.com>
       [not found] ` <1194445269752-git-send-email-amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 2 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-07 14:21 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA


This patchset is work in progress and is sent out for comments.

Guests within KVM can have paravirtualized DMA access. I've tested
the e1000 driver, and that works fine. A few problems/conditions to
get things to work:

- The pv driver should only be used as a module. If built into the
  kernel, It freezes during the HD bringup
- Locks aren't taken on the host; multiple guests with passthrough
  won't work
- Only 64 bit host and 64 bit guests are supported

And there are several FIXMEs mentioned in the code, but none
as grave as the ones already mentioned above.

The bulk of the passthrough work is done in userspace (qemu). Patches
will be sent shortly to the kvm-devel and qemu lists.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/8] KVM: PVDMA Host: Handle reqeusts for guest DMA mappings
       [not found]   ` <d1c72ce6e3a0e73c18993c3f066d1350b147f726.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-11-07 14:21     ` Amit Shah
       [not found]     ` <6d486436cf50e269d8914229d10ff60f3d646795.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
                       ` (7 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-07 14:21 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Amit Shah

Introduce three hypercalls and one ioctl for enabling guest
DMA mappings.

An ioctl comes from userspace (qemu) to notify of a physical
device being assigned to a guest. Guests make a hypercall (once
per device) to find out if the device is a passthrough device
and if any DMA translations are necessary.

Two other hypercalls map and unmap DMA regions respectively
for the guest. We basically look up the host page address
and return it in case of a single-page request.

For a multi-page request, we do a dma_map_sg.

Since guests are pageable, we pin all the pages under the DMA
operation on the map request and unpin them on the unmap
operation.

Major tasks still to be done: implement proper locking (get a
vm-lock), we never free some part of memory

Signed-off-by: Amit Shah <amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
---
 drivers/kvm/x86.c          |  273 ++++++++++++++++++++++++++++++++++++++++++++
 include/asm-x86/kvm_para.h |   23 ++++-
 include/linux/kvm.h        |    3 +
 3 files changed, 297 insertions(+), 2 deletions(-)

diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index e905d46..60ea93a 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -21,8 +21,11 @@
 
 #include <linux/kvm.h>
 #include <linux/fs.h>
+#include <linux/list.h>
+#include <linux/pci.h>
 #include <linux/vmalloc.h>
 #include <linux/module.h>
+#include <linux/highmem.h>
 
 #include <asm/uaccess.h>
 
@@ -61,6 +64,254 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ NULL }
 };
 
+/* Paravirt DMA: We pin the host-side pages for the GPAs that we get
+ * for the DMA operation. We do a sg_map on the host pages for a DMA
+ * operation on the guest side. We un-pin the pages on the
+ * unmap_hypercall.
+ */
+struct dma_map {
+	struct list_head list;
+	int nents;
+	struct scatterlist *sg;
+};
+
+/* This list is to store the guest bus:device:function and host
+ * bus:device:function mapping for passthrough'ed devices.
+ */
+/* FIXME: make this per-vm */
+/* FIXME: delete this list at the end of a vm session */
+struct pv_pci_dev_list {
+	struct list_head list;
+	struct kvm_pv_passthrough_dev pt_dev;
+};
+
+/* FIXME: This should be a per-vm list */
+static LIST_HEAD(dmap_head);
+static LIST_HEAD(pt_dev_head);
+
+static struct dma_map*
+find_matching_dmap(struct list_head *head, dma_addr_t dma)
+{
+	struct list_head *ptr;
+	struct dma_map *match;
+
+	list_for_each(ptr, head) {
+		match = list_entry(ptr, struct dma_map, list);
+		if (match && match->sg[0].dma_address == dma)
+			return match;
+	}
+	return NULL;
+}
+
+static void
+prepare_sg_entry(struct scatterlist *sg, unsigned long addr)
+{
+	unsigned int offset, len;
+
+	offset = addr & ~PAGE_MASK;
+	len = PAGE_SIZE - offset;
+
+	/* FIXME: Use the sg chaining features */
+	sg_set_page(sg, pfn_to_page(addr >> PAGE_SHIFT),
+		    len, offset);
+}
+
+static int pv_map_hypercall(struct kvm_vcpu *vcpu, int npages, gfn_t page_gfn)
+{
+	int i, r = 0;
+	gpa_t gpa;
+	hpa_t page_hpa, hpa;
+	struct dma_map *dmap;
+	struct page *host_page;
+	struct scatterlist *sg;
+	unsigned long *shared_addr, *hcall_page;
+
+	/* We currently don't support dma mappings which have more than
+	 * PAGE_SIZE/sizeof(unsigned long *) pages
+	 */
+	if (!npages || npages > MAX_PVDMA_PAGES) {
+		printk(KERN_INFO "%s: Illegal number of pages: %d\n",
+		       __FUNCTION__, npages);
+		goto out;
+	}
+
+	page_hpa = gpa_to_hpa(vcpu->kvm, page_gfn << PAGE_SHIFT);
+	if (is_error_hpa(page_hpa)) {
+		printk(KERN_INFO "%s: page hpa %p not valid for page_gfn %p\n",
+		       __FUNCTION__, (void *)page_hpa, (void *)page_gfn);
+		goto out;
+	}
+	host_page = pfn_to_page(page_hpa >> PAGE_SHIFT);
+	hcall_page = shared_addr = kmap(host_page);
+
+	/* scatterlist to map guest dma pages into host physical
+	 * memory -- if they exceed the DMA map limit
+	 */
+	sg = kcalloc(npages, sizeof(struct scatterlist), GFP_KERNEL);
+	if (sg == NULL) {
+		printk(KERN_INFO "%s: Couldn't allocate memory (sg)\n",
+		       __FUNCTION__);
+		goto out_unmap;
+	}
+
+	/* List to store all guest pages mapped into host. This will
+	 * be used later to free pages on the host. Think of this as a
+	 * translation table from guest dma addresses into host dma
+	 * addresses
+	 */
+	dmap = kmalloc(sizeof(struct dma_map), GFP_KERNEL);
+	if (dmap == NULL) {
+		printk(KERN_INFO "%s: Couldn't allocate memory\n",
+		       __FUNCTION__);
+		goto out_unmap_sg;
+	}
+
+	/* FIXME: consider the length of the last page. Guest should
+	 * send this info.
+	 */
+	for (i = 0; i < npages; i++) {
+		gpa = *shared_addr++;
+		hpa = gpa_to_hpa(vcpu->kvm, gpa);
+		if (is_error_hpa(hpa)) {
+			int j;
+			printk(KERN_INFO "kvm %s: hpa %p not valid "
+			       "for gpa %p\n",
+			       __FUNCTION__, (void *)gpa, (void *)hpa);
+
+			for (j = 0; j < i; j++)
+				put_page(sg_page(&sg[j]));
+			goto out_unmap_sg;
+		}
+		prepare_sg_entry(&sg[i], hpa);
+		get_page(sg_page(&sg[i]));
+	}
+
+	/* Put this on the dmap_head list, so that we can find it
+	 * later for the 'free' operation
+	 */
+	dmap->sg = sg;
+	dmap->nents = npages;
+	list_add(&dmap->list, &dmap_head);
+
+	/* FIXME: guest should send the direction */
+	r = dma_ops->map_sg(NULL, sg, npages, PCI_DMA_BIDIRECTIONAL);
+	if (r) {
+		r = npages;
+		*hcall_page = sg[0].dma_address;
+	}
+
+ out_unmap:
+	if (!r)
+		*hcall_page = bad_dma_address;
+	kunmap(host_page);
+ out:
+	return r;
+ out_unmap_sg:
+	kfree(dmap);
+	kfree(sg);
+	goto out_unmap;
+}
+
+/* FIXME: the argument passed from guest can be 32-bit. We need 64-bit for
+ * dma_addr_t. Send the dma address in a page.
+ */
+static int pv_unmap_hypercall(struct kvm_vcpu *vcpu, dma_addr_t dma)
+{
+	int i, r = 0;
+	struct dma_map *dmap;
+
+	/* dma is the address we have to 'unmap'. Check if it exists
+	 * in the dma_map list. If yes, free it.
+	 */
+	dmap = find_matching_dmap(&dmap_head, dma);
+	if (dmap) {
+		for (i = 0; i < dmap->nents; i++)
+			put_page(sg_page(&dmap->sg[i]));
+
+		dma_ops->unmap_sg(NULL, dmap->sg, dmap->nents,
+				  PCI_DMA_BIDIRECTIONAL);
+		kfree(dmap->sg);
+		list_del(&dmap->list);
+	} else
+		r = 1;
+
+	return r;
+}
+
+static struct pv_pci_dev_list*
+find_matching_pt_dev(struct list_head *head,
+		     struct kvm_pv_pci_info *pv_pci_info)
+{
+	struct list_head *ptr;
+	struct pv_pci_dev_list *match;
+
+	list_for_each(ptr, head) {
+		match = list_entry(ptr, struct pv_pci_dev_list, list);
+		/* We use guest_name for comparison since we also use this
+		 * function from the hypercall which the guest issues to
+		 * find out if it's a pv device
+		 */
+		if (match &&
+		    (match->pt_dev.guest.busnr == pv_pci_info->busnr) &&
+		    (match->pt_dev.guest.devfn == pv_pci_info->devfn))
+			return match;
+	}
+	return NULL;
+}
+
+static int
+pv_mapped_pci_device_hypercall(struct kvm_vcpu *vcpu, gfn_t page_gfn)
+{
+	int r = -1;
+	hpa_t page_hpa;
+	unsigned long *shared_addr;
+	struct page *host_page;
+	struct kvm_pv_pci_info pv_pci_info;
+
+	page_hpa = gpa_to_hpa(vcpu->kvm, page_gfn << PAGE_SHIFT);
+	if (is_error_hpa(page_hpa)) {
+		printk(KERN_INFO "%s: page hpa %p not valid for page_gfn %p\n",
+		       __FUNCTION__, (void *)page_hpa, (void *)page_gfn);
+		goto out;
+	}
+	host_page = pfn_to_page(page_hpa >> PAGE_SHIFT);
+	shared_addr = kmap(host_page);
+	memcpy(&pv_pci_info, shared_addr, sizeof(struct kvm_pv_pci_info));
+
+	if (find_matching_pt_dev(&pt_dev_head, &pv_pci_info))
+		r = 1;
+	else
+		r = 0;
+
+	kunmap(host_page);
+ out:
+	return r;
+}
+
+static int kvm_vm_ioctl_pv_pt_dev(struct kvm_pv_passthrough_dev *pv_pci_dev)
+{
+	int r = 0;
+	struct pv_pci_dev_list *match;
+
+	/* Has this been added already? */
+	if (find_matching_pt_dev(&pt_dev_head, &pv_pci_dev->guest))
+		goto out;
+
+	match = kmalloc(sizeof(struct pv_pci_dev_list), GFP_KERNEL);
+	if (match == NULL) {
+		printk(KERN_INFO "%s: Couldn't allocate memory\n",
+		       __FUNCTION__);
+		r = -ENOMEM;
+		goto out;
+	}
+	match->pt_dev.guest.busnr = pv_pci_dev->guest.busnr;
+	match->pt_dev.guest.devfn = pv_pci_dev->guest.devfn;
+	match->pt_dev.mach.busnr  = pv_pci_dev->mach.busnr;
+	match->pt_dev.mach.devfn  = pv_pci_dev->mach.devfn;
+	list_add(&match->list, &pt_dev_head);
+ out:
+	return r;
+}
 
 unsigned long segment_base(u16 selector)
 {
@@ -983,6 +1234,19 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = 0;
 		break;
 	}
+	case KVM_ASSIGN_PV_PCI_DEV: {
+		struct kvm_pv_passthrough_dev pv_pci_dev;
+
+		r = -EFAULT;
+		if (copy_from_user(&pv_pci_dev, argp, sizeof pv_pci_dev)) {
+			printk("pv_register: failing copy from user\n");
+			goto out;
+		}
+		r = kvm_vm_ioctl_pv_pt_dev(&pv_pci_dev);
+		if (r)
+			goto out;
+		break;
+	}
 	default:
 		;
 	}
@@ -1649,6 +1913,15 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 	}
 
 	switch (nr) {
+	case KVM_PV_DMA_MAP:
+		ret = pv_map_hypercall(vcpu, a0, a1);
+		break;
+	case KVM_PV_DMA_UNMAP:
+		ret = pv_unmap_hypercall(vcpu, a0);
+		break;
+	case KVM_PV_PCI_DEVICE:
+		ret = pv_mapped_pci_device_hypercall(vcpu, a0);
+		break;
 	default:
 		ret = -KVM_ENOSYS;
 		break;
diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
index c6f3fd8..c4b2be0 100644
--- a/include/asm-x86/kvm_para.h
+++ b/include/asm-x86/kvm_para.h
@@ -17,7 +17,13 @@
 /* This instruction is vmcall.  On non-VT architectures, it will generate a
  * trap that we will then rewrite to the appropriate instruction.
  */
-#define KVM_HYPERCALL ".byte 0x0f,0x01,0xc1"
+#define KVM_HYPERCALL ".byte 0x0f,0x01,0xd9"
+
+/* Hypercall numbers */
+#define KVM_PV_UNUSED		0
+#define KVM_PV_DMA_MAP		1
+#define KVM_PV_DMA_UNMAP	2
+#define KVM_PV_PCI_DEVICE	3
 
 /* For KVM hypercalls, a three-byte sequence of either the vmrun or the vmmrun
  * instruction.  The hypervisor may replace it with something else but only the
@@ -101,5 +107,18 @@ static inline unsigned int kvm_arch_para_features(void)
 }
 
 #endif
-
+/* Info stored for identifying paravirtualized PCI devices in the host kernel */
+struct kvm_pv_pci_info {
+	unsigned char busnr;
+	unsigned int devfn;
+};
+
+/* Mapping between host and guest PCI device */
+struct kvm_pv_passthrough_dev {
+	struct kvm_pv_pci_info guest;
+	struct kvm_pv_pci_info mach;
+};
+
+/* Max. DMA pages we send from guest to host for mapping */
+#define MAX_PVDMA_PAGES (PAGE_SIZE / sizeof(unsigned long *))
 #endif
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 71d33d6..38fbebb 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -9,6 +9,7 @@
 
 #include <asm/types.h>
 #include <linux/ioctl.h>
+#include <linux/kvm_para.h>
 
 #define KVM_API_VERSION 12
 
@@ -381,6 +382,8 @@ struct kvm_signal_mask {
 #define KVM_IRQ_LINE		  _IOW(KVMIO, 0x61, struct kvm_irq_level)
 #define KVM_GET_IRQCHIP		  _IOWR(KVMIO, 0x62, struct kvm_irqchip)
 #define KVM_SET_IRQCHIP		  _IOR(KVMIO,  0x63, struct kvm_irqchip)
+#define KVM_ASSIGN_PV_PCI_DEV	  _IOR(KVMIO, 0x64, \
+				       struct kvm_pv_passthrough_dev)
 
 /*
  * ioctls for vcpu fds
-- 
1.5.3


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 1/8] KVM: PVDMA Host: Handle reqeusts for guest DMA mappings
       [not found] ` <d1c72ce6e3a0e73c18993c3f066d1350b147f726.1194445109.git.amit.shah@qumranet.com>
@ 2007-11-07 14:21   ` Amit Shah
       [not found]   ` <6d486436cf50e269d8914229d10ff60f3d646795.1194445109.git.amit.shah@qumranet.com>
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-07 14:21 UTC (permalink / raw)
  To: kvm-devel, linux-kernel; +Cc: Amit Shah

Introduce three hypercalls and one ioctl for enabling guest
DMA mappings.

An ioctl comes from userspace (qemu) to notify of a physical
device being assigned to a guest. Guests make a hypercall (once
per device) to find out if the device is a passthrough device
and if any DMA translations are necessary.

Two other hypercalls map and unmap DMA regions respectively
for the guest. We basically look up the host page address
and return it in case of a single-page request.

For a multi-page request, we do a dma_map_sg.

Since guests are pageable, we pin all the pages under the DMA
operation on the map request and unpin them on the unmap
operation.

Major tasks still to be done: implement proper locking (get a
vm-lock), we never free some part of memory

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
---
 drivers/kvm/x86.c          |  273 ++++++++++++++++++++++++++++++++++++++++++++
 include/asm-x86/kvm_para.h |   23 ++++-
 include/linux/kvm.h        |    3 +
 3 files changed, 297 insertions(+), 2 deletions(-)

diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index e905d46..60ea93a 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -21,8 +21,11 @@
 
 #include <linux/kvm.h>
 #include <linux/fs.h>
+#include <linux/list.h>
+#include <linux/pci.h>
 #include <linux/vmalloc.h>
 #include <linux/module.h>
+#include <linux/highmem.h>
 
 #include <asm/uaccess.h>
 
@@ -61,6 +64,254 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ NULL }
 };
 
+/* Paravirt DMA: We pin the host-side pages for the GPAs that we get
+ * for the DMA operation. We do a sg_map on the host pages for a DMA
+ * operation on the guest side. We un-pin the pages on the
+ * unmap_hypercall.
+ */
+struct dma_map {
+	struct list_head list;
+	int nents;
+	struct scatterlist *sg;
+};
+
+/* This list is to store the guest bus:device:function and host
+ * bus:device:function mapping for passthrough'ed devices.
+ */
+/* FIXME: make this per-vm */
+/* FIXME: delete this list at the end of a vm session */
+struct pv_pci_dev_list {
+	struct list_head list;
+	struct kvm_pv_passthrough_dev pt_dev;
+};
+
+/* FIXME: This should be a per-vm list */
+static LIST_HEAD(dmap_head);
+static LIST_HEAD(pt_dev_head);
+
+static struct dma_map*
+find_matching_dmap(struct list_head *head, dma_addr_t dma)
+{
+	struct list_head *ptr;
+	struct dma_map *match;
+
+	list_for_each(ptr, head) {
+		match = list_entry(ptr, struct dma_map, list);
+		if (match && match->sg[0].dma_address == dma)
+			return match;
+	}
+	return NULL;
+}
+
+static void
+prepare_sg_entry(struct scatterlist *sg, unsigned long addr)
+{
+	unsigned int offset, len;
+
+	offset = addr & ~PAGE_MASK;
+	len = PAGE_SIZE - offset;
+
+	/* FIXME: Use the sg chaining features */
+	sg_set_page(sg, pfn_to_page(addr >> PAGE_SHIFT),
+		    len, offset);
+}
+
+static int pv_map_hypercall(struct kvm_vcpu *vcpu, int npages, gfn_t page_gfn)
+{
+	int i, r = 0;
+	gpa_t gpa;
+	hpa_t page_hpa, hpa;
+	struct dma_map *dmap;
+	struct page *host_page;
+	struct scatterlist *sg;
+	unsigned long *shared_addr, *hcall_page;
+
+	/* We currently don't support dma mappings which have more than
+	 * PAGE_SIZE/sizeof(unsigned long *) pages
+	 */
+	if (!npages || npages > MAX_PVDMA_PAGES) {
+		printk(KERN_INFO "%s: Illegal number of pages: %d\n",
+		       __FUNCTION__, npages);
+		goto out;
+	}
+
+	page_hpa = gpa_to_hpa(vcpu->kvm, page_gfn << PAGE_SHIFT);
+	if (is_error_hpa(page_hpa)) {
+		printk(KERN_INFO "%s: page hpa %p not valid for page_gfn %p\n",
+		       __FUNCTION__, (void *)page_hpa, (void *)page_gfn);
+		goto out;
+	}
+	host_page = pfn_to_page(page_hpa >> PAGE_SHIFT);
+	hcall_page = shared_addr = kmap(host_page);
+
+	/* scatterlist to map guest dma pages into host physical
+	 * memory -- if they exceed the DMA map limit
+	 */
+	sg = kcalloc(npages, sizeof(struct scatterlist), GFP_KERNEL);
+	if (sg == NULL) {
+		printk(KERN_INFO "%s: Couldn't allocate memory (sg)\n",
+		       __FUNCTION__);
+		goto out_unmap;
+	}
+
+	/* List to store all guest pages mapped into host. This will
+	 * be used later to free pages on the host. Think of this as a
+	 * translation table from guest dma addresses into host dma
+	 * addresses
+	 */
+	dmap = kmalloc(sizeof(struct dma_map), GFP_KERNEL);
+	if (dmap == NULL) {
+		printk(KERN_INFO "%s: Couldn't allocate memory\n",
+		       __FUNCTION__);
+		goto out_unmap_sg;
+	}
+
+	/* FIXME: consider the length of the last page. Guest should
+	 * send this info.
+	 */
+	for (i = 0; i < npages; i++) {
+		gpa = *shared_addr++;
+		hpa = gpa_to_hpa(vcpu->kvm, gpa);
+		if (is_error_hpa(hpa)) {
+			int j;
+			printk(KERN_INFO "kvm %s: hpa %p not valid "
+			       "for gpa %p\n",
+			       __FUNCTION__, (void *)gpa, (void *)hpa);
+
+			for (j = 0; j < i; j++)
+				put_page(sg_page(&sg[j]));
+			goto out_unmap_sg;
+		}
+		prepare_sg_entry(&sg[i], hpa);
+		get_page(sg_page(&sg[i]));
+	}
+
+	/* Put this on the dmap_head list, so that we can find it
+	 * later for the 'free' operation
+	 */
+	dmap->sg = sg;
+	dmap->nents = npages;
+	list_add(&dmap->list, &dmap_head);
+
+	/* FIXME: guest should send the direction */
+	r = dma_ops->map_sg(NULL, sg, npages, PCI_DMA_BIDIRECTIONAL);
+	if (r) {
+		r = npages;
+		*hcall_page = sg[0].dma_address;
+	}
+
+ out_unmap:
+	if (!r)
+		*hcall_page = bad_dma_address;
+	kunmap(host_page);
+ out:
+	return r;
+ out_unmap_sg:
+	kfree(dmap);
+	kfree(sg);
+	goto out_unmap;
+}
+
+/* FIXME: the argument passed from guest can be 32-bit. We need 64-bit for
+ * dma_addr_t. Send the dma address in a page.
+ */
+static int pv_unmap_hypercall(struct kvm_vcpu *vcpu, dma_addr_t dma)
+{
+	int i, r = 0;
+	struct dma_map *dmap;
+
+	/* dma is the address we have to 'unmap'. Check if it exists
+	 * in the dma_map list. If yes, free it.
+	 */
+	dmap = find_matching_dmap(&dmap_head, dma);
+	if (dmap) {
+		for (i = 0; i < dmap->nents; i++)
+			put_page(sg_page(&dmap->sg[i]));
+
+		dma_ops->unmap_sg(NULL, dmap->sg, dmap->nents,
+				  PCI_DMA_BIDIRECTIONAL);
+		kfree(dmap->sg);
+		list_del(&dmap->list);
+	} else
+		r = 1;
+
+	return r;
+}
+
+static struct pv_pci_dev_list*
+find_matching_pt_dev(struct list_head *head,
+		     struct kvm_pv_pci_info *pv_pci_info)
+{
+	struct list_head *ptr;
+	struct pv_pci_dev_list *match;
+
+	list_for_each(ptr, head) {
+		match = list_entry(ptr, struct pv_pci_dev_list, list);
+		/* We use guest_name for comparison since we also use this
+		 * function from the hypercall which the guest issues to
+		 * find out if it's a pv device
+		 */
+		if (match &&
+		    (match->pt_dev.guest.busnr == pv_pci_info->busnr) &&
+		    (match->pt_dev.guest.devfn == pv_pci_info->devfn))
+			return match;
+	}
+	return NULL;
+}
+
+static int
+pv_mapped_pci_device_hypercall(struct kvm_vcpu *vcpu, gfn_t page_gfn)
+{
+	int r = -1;
+	hpa_t page_hpa;
+	unsigned long *shared_addr;
+	struct page *host_page;
+	struct kvm_pv_pci_info pv_pci_info;
+
+	page_hpa = gpa_to_hpa(vcpu->kvm, page_gfn << PAGE_SHIFT);
+	if (is_error_hpa(page_hpa)) {
+		printk(KERN_INFO "%s: page hpa %p not valid for page_gfn %p\n",
+		       __FUNCTION__, (void *)page_hpa, (void *)page_gfn);
+		goto out;
+	}
+	host_page = pfn_to_page(page_hpa >> PAGE_SHIFT);
+	shared_addr = kmap(host_page);
+	memcpy(&pv_pci_info, shared_addr, sizeof(struct kvm_pv_pci_info));
+
+	if (find_matching_pt_dev(&pt_dev_head, &pv_pci_info))
+		r = 1;
+	else
+		r = 0;
+
+	kunmap(host_page);
+ out:
+	return r;
+}
+
+static int kvm_vm_ioctl_pv_pt_dev(struct kvm_pv_passthrough_dev *pv_pci_dev)
+{
+	int r = 0;
+	struct pv_pci_dev_list *match;
+
+	/* Has this been added already? */
+	if (find_matching_pt_dev(&pt_dev_head, &pv_pci_dev->guest))
+		goto out;
+
+	match = kmalloc(sizeof(struct pv_pci_dev_list), GFP_KERNEL);
+	if (match == NULL) {
+		printk(KERN_INFO "%s: Couldn't allocate memory\n",
+		       __FUNCTION__);
+		r = -ENOMEM;
+		goto out;
+	}
+	match->pt_dev.guest.busnr = pv_pci_dev->guest.busnr;
+	match->pt_dev.guest.devfn = pv_pci_dev->guest.devfn;
+	match->pt_dev.mach.busnr  = pv_pci_dev->mach.busnr;
+	match->pt_dev.mach.devfn  = pv_pci_dev->mach.devfn;
+	list_add(&match->list, &pt_dev_head);
+ out:
+	return r;
+}
 
 unsigned long segment_base(u16 selector)
 {
@@ -983,6 +1234,19 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = 0;
 		break;
 	}
+	case KVM_ASSIGN_PV_PCI_DEV: {
+		struct kvm_pv_passthrough_dev pv_pci_dev;
+
+		r = -EFAULT;
+		if (copy_from_user(&pv_pci_dev, argp, sizeof pv_pci_dev)) {
+			printk("pv_register: failing copy from user\n");
+			goto out;
+		}
+		r = kvm_vm_ioctl_pv_pt_dev(&pv_pci_dev);
+		if (r)
+			goto out;
+		break;
+	}
 	default:
 		;
 	}
@@ -1649,6 +1913,15 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 	}
 
 	switch (nr) {
+	case KVM_PV_DMA_MAP:
+		ret = pv_map_hypercall(vcpu, a0, a1);
+		break;
+	case KVM_PV_DMA_UNMAP:
+		ret = pv_unmap_hypercall(vcpu, a0);
+		break;
+	case KVM_PV_PCI_DEVICE:
+		ret = pv_mapped_pci_device_hypercall(vcpu, a0);
+		break;
 	default:
 		ret = -KVM_ENOSYS;
 		break;
diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
index c6f3fd8..c4b2be0 100644
--- a/include/asm-x86/kvm_para.h
+++ b/include/asm-x86/kvm_para.h
@@ -17,7 +17,13 @@
 /* This instruction is vmcall.  On non-VT architectures, it will generate a
  * trap that we will then rewrite to the appropriate instruction.
  */
-#define KVM_HYPERCALL ".byte 0x0f,0x01,0xc1"
+#define KVM_HYPERCALL ".byte 0x0f,0x01,0xd9"
+
+/* Hypercall numbers */
+#define KVM_PV_UNUSED		0
+#define KVM_PV_DMA_MAP		1
+#define KVM_PV_DMA_UNMAP	2
+#define KVM_PV_PCI_DEVICE	3
 
 /* For KVM hypercalls, a three-byte sequence of either the vmrun or the vmmrun
  * instruction.  The hypervisor may replace it with something else but only the
@@ -101,5 +107,18 @@ static inline unsigned int kvm_arch_para_features(void)
 }
 
 #endif
-
+/* Info stored for identifying paravirtualized PCI devices in the host kernel */
+struct kvm_pv_pci_info {
+	unsigned char busnr;
+	unsigned int devfn;
+};
+
+/* Mapping between host and guest PCI device */
+struct kvm_pv_passthrough_dev {
+	struct kvm_pv_pci_info guest;
+	struct kvm_pv_pci_info mach;
+};
+
+/* Max. DMA pages we send from guest to host for mapping */
+#define MAX_PVDMA_PAGES (PAGE_SIZE / sizeof(unsigned long *))
 #endif
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 71d33d6..38fbebb 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -9,6 +9,7 @@
 
 #include <asm/types.h>
 #include <linux/ioctl.h>
+#include <linux/kvm_para.h>
 
 #define KVM_API_VERSION 12
 
@@ -381,6 +382,8 @@ struct kvm_signal_mask {
 #define KVM_IRQ_LINE		  _IOW(KVMIO, 0x61, struct kvm_irq_level)
 #define KVM_GET_IRQCHIP		  _IOWR(KVMIO, 0x62, struct kvm_irqchip)
 #define KVM_SET_IRQCHIP		  _IOR(KVMIO,  0x63, struct kvm_irqchip)
+#define KVM_ASSIGN_PV_PCI_DEV	  _IOR(KVMIO, 0x64, \
+				       struct kvm_pv_passthrough_dev)
 
 /*
  * ioctls for vcpu fds
-- 
1.5.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/8] KVM: Move #include asm/kvm_para.h outside of __KERNEL__
       [not found]     ` <6d486436cf50e269d8914229d10ff60f3d646795.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-11-07 14:21       ` Amit Shah
  0 siblings, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-07 14:21 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Amit Shah

We have some structures defined which are going to be
used by userspace for ioctls.

Signed-off-by: Amit Shah <amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
---
 include/linux/kvm_para.h |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index e4db25f..ff6ac27 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -12,12 +12,12 @@
 /* Return values for hypercalls */
 #define KVM_ENOSYS		1000
 
-#ifdef __KERNEL__
 /*
  * hypercalls use architecture specific
  */
 #include <asm/kvm_para.h>
 
+#ifdef __KERNEL__
 static inline int kvm_para_has_feature(unsigned int feature)
 {
 	if (kvm_arch_para_features() & (1UL << feature))
@@ -26,4 +26,3 @@ static inline int kvm_para_has_feature(unsigned int feature)
 }
 #endif /* __KERNEL__ */
 #endif /* __LINUX_KVM_PARA_H */
-
-- 
1.5.3


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/8] KVM: Move #include asm/kvm_para.h outside of __KERNEL__
       [not found]   ` <6d486436cf50e269d8914229d10ff60f3d646795.1194445109.git.amit.shah@qumranet.com>
@ 2007-11-07 14:21     ` Amit Shah
  0 siblings, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-07 14:21 UTC (permalink / raw)
  To: kvm-devel, linux-kernel; +Cc: Amit Shah

We have some structures defined which are going to be
used by userspace for ioctls.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
---
 include/linux/kvm_para.h |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index e4db25f..ff6ac27 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -12,12 +12,12 @@
 /* Return values for hypercalls */
 #define KVM_ENOSYS		1000
 
-#ifdef __KERNEL__
 /*
  * hypercalls use architecture specific
  */
 #include <asm/kvm_para.h>
 
+#ifdef __KERNEL__
 static inline int kvm_para_has_feature(unsigned int feature)
 {
 	if (kvm_arch_para_features() & (1UL << feature))
@@ -26,4 +26,3 @@ static inline int kvm_para_has_feature(unsigned int feature)
 }
 #endif /* __KERNEL__ */
 #endif /* __LINUX_KVM_PARA_H */
-
-- 
1.5.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA
       [not found]     ` <609d5d611a5fb58ab5a7184be7b6d29494023ba0.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-11-07 14:21       ` Amit Shah
  2007-11-12 10:50       ` Muli Ben-Yehuda
  1 sibling, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-07 14:21 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Amit Shah

We make the dma_mapping_ops structure to point to our structure
so that every DMA access goes through us. (This is the reason this
only works for 64-bit guest. 32-bit guest doesn't yet have a dma_ops
struct.)

We make a hypercall for every device that does a DMA operation
to find out if it is a passthroughed device -- so that we can
make hypercalls on each DMA access. The result of this hypercall
is cached, so that this hypercall is made only once for each device

Right now, this only works as a module: compiling it in causes
it to freeze during the HD bring-up.

Signed-off-by: Amit Shah <amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
---
 drivers/kvm/kvm_pv_dma.c |  398 ++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 398 insertions(+), 0 deletions(-)
 create mode 100644 drivers/kvm/kvm_pv_dma.c

diff --git a/drivers/kvm/kvm_pv_dma.c b/drivers/kvm/kvm_pv_dma.c
new file mode 100644
index 0000000..8d98d98
--- /dev/null
+++ b/drivers/kvm/kvm_pv_dma.c
@@ -0,0 +1,398 @@
+/*
+ * KVM guest DMA para-virtualization driver
+ *
+ * Copyright (C) 2007, Qumranet, Inc., Amit Shah <amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <asm/io.h>
+#include <asm/page.h>
+#include <linux/fs.h>
+#include <linux/pci.h>
+#include <linux/module.h>
+#include <linux/version.h>
+#include <linux/miscdevice.h>
+#include <linux/kvm_para.h>
+
+MODULE_AUTHOR("Amit Shah");
+MODULE_DESCRIPTION("Implements guest para-virtualized DMA");
+MODULE_LICENSE("GPL");
+MODULE_VERSION("1");
+
+#define KVM_DMA_MINOR MISC_DYNAMIC_MINOR
+
+static struct page *page;
+static unsigned long page_gfn;
+
+const struct dma_mapping_ops *orig_dma_ops;
+
+#include <linux/list.h>
+struct pv_passthrough_dev_list {
+	struct list_head list;
+	struct kvm_pv_pci_info pv_pci_info;
+	int is_pv;
+};
+static LIST_HEAD(pt_devs_head);
+
+static struct pv_passthrough_dev_list*
+find_matching_pt_dev(struct list_head *head,
+		     struct kvm_pv_pci_info *pv_pci_info)
+{
+	struct list_head *ptr;
+	struct pv_passthrough_dev_list *match;
+
+	list_for_each(ptr, head) {
+		match = list_entry(ptr, struct pv_passthrough_dev_list, list);
+		if (match &&
+		    (match->pv_pci_info.busnr == pv_pci_info->busnr) &&
+		    (match->pv_pci_info.devfn == pv_pci_info->devfn))
+			return match;
+	}
+	return NULL;
+}
+
+void
+empty_pt_dev_list(struct list_head *head)
+{
+	struct pv_passthrough_dev_list *match;
+
+	while (!list_empty(head)) {
+		match = list_entry(head->next, \
+				   struct pv_passthrough_dev_list, list);
+		list_del(&match->list);
+	}
+}
+
+static int
+kvm_is_pv_device(struct device *dev, const char *name)
+{
+	int r;
+	struct pci_dev *pci_dev;
+	struct kvm_pv_pci_info pv_pci_info;
+	struct pv_passthrough_dev_list *match;
+
+	pci_dev = to_pci_dev(dev);
+	pv_pci_info.busnr = pci_dev->bus->number;
+	pv_pci_info.devfn = pci_dev->devfn;
+
+	match = find_matching_pt_dev(&pt_devs_head, &pv_pci_info);
+	if (match) {
+		r = match->is_pv;
+		goto out;
+	}
+
+	memcpy(page_address(page), &pv_pci_info, sizeof(pv_pci_info));
+	r = kvm_hypercall1(KVM_PV_PCI_DEVICE, page_gfn);
+	if (r < 1) {
+		printk(KERN_INFO "%s: Errror doing hypercall!\n", __FUNCTION__);
+		r = 0;
+		goto out;
+	}
+
+	match = kmalloc(sizeof(struct pv_passthrough_dev_list), GFP_KERNEL);
+	if (match == NULL) {
+		printk(KERN_INFO "%s: Out of memory\n", __FUNCTION__);
+		r = 0;
+		goto out;
+	}
+	match->pv_pci_info.busnr = pv_pci_info.busnr;
+	match->pv_pci_info.devfn = pv_pci_info.devfn;
+	match->is_pv = r;
+	list_add(&match->list, &pt_devs_head);
+ out:
+	return r;
+}
+
+static void *
+kvm_dma_map(void *vaddr, size_t size, dma_addr_t *dma_handle)
+{
+	int npages, i;
+	unsigned long *dma_addr;
+	dma_addr_t host_addr = bad_dma_address;
+
+	if (page == NULL)
+		goto out;
+
+	npages = get_order(size) + 1;
+	dma_addr = page_address(page);
+
+	/* We have to take into consideration the offsets for the
+	 * virtual address provided by the calling
+	 * functions. Currently both, pci_alloc_consistent and
+	 * pci_map_single call this function. We have to change it so
+	 * that we can also pass to the host the offset of the addr in
+	 * the page it is in.
+	 */
+
+	if (*dma_handle == bad_dma_address)
+		goto out;
+
+	/* It's not really OK to use dma_handle here, as the IOMMU or
+	 * swiotlb could have mapped it elsewhere. But what's a better
+	 * solution?
+	 */
+	*dma_addr++ = *dma_handle;
+	if (npages > 1) {
+		/* All of the pages will be contiguous in guest
+		 * physical memory in both, pci_map_consistent and
+		 * pci_map_single cases (see DMA-API.txt)
+		 */
+		/* FIXME: we're currently not crossing over to
+		 * multiple pages to be sent to host, in case
+		 * we have a lot of pages that we can't
+		 * accomodate in one page.
+		 */
+		for (i = 1; i < min((unsigned long)npages, MAX_PVDMA_PAGES); i++)
+			*dma_addr++ = virt_to_phys(vaddr + PAGE_SIZE * i);
+	}
+
+	/* Maybe we need more arguments (we have first two):
+	 * @npages: number of gpas pages in this hypercall
+	 * @page: page we pass to host with all the gpas in them
+	 * @more: are there any more pages coming?
+	 * @offset: offset of the address in the first page
+	 * @direction: direction for the mapping (only for pci_map_single)
+	 */
+	npages = kvm_hypercall2(KVM_PV_DMA_MAP, npages, page_gfn);
+	if (!npages)
+		host_addr = bad_dma_address;
+	else
+		host_addr = *(unsigned long *)page_address(page);
+
+ out:
+	*dma_handle = host_addr;
+	if (host_addr == bad_dma_address)
+		vaddr = NULL;
+	return vaddr;
+}
+
+static void
+kvm_dma_unmap(dma_addr_t dma_handle)
+{
+	kvm_hypercall1(KVM_PV_DMA_UNMAP, dma_handle);
+	return;
+}
+
+static void *
+kvm_dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
+		       gfp_t gfp)
+{
+	void *vaddr = NULL;
+	if ((*dma_handle == bad_dma_address)
+	    || !dma_ops->is_pv_device(dev, dev->bus_id))
+		goto out;
+
+	vaddr = bus_to_virt((unsigned long)dma_handle);
+	vaddr = kvm_dma_map(vaddr, size, dma_handle);
+ out:
+	return vaddr;
+}
+
+static void
+kvm_dma_free_coherent(struct device *dev, size_t size, void *vaddr,
+		      dma_addr_t dma_handle)
+{
+	kvm_dma_unmap(dma_handle);
+}
+
+static dma_addr_t
+kvm_dma_map_single(struct device *dev, void *ptr, size_t size, int direction)
+{
+	dma_addr_t r;
+
+	r = orig_dma_ops->map_single(dev, ptr, size, direction);
+
+	if (r != bad_dma_address && kvm_is_pv_device(dev, dev->bus_id))
+		kvm_dma_map(ptr, size, &r);
+	return r;
+}
+
+static inline void
+kvm_dma_unmap_single(struct device *dev, dma_addr_t addr, size_t size,
+		     int direction)
+{
+	kvm_dma_unmap(addr);
+}
+
+int kvm_pv_dma_mapping_error(dma_addr_t dma_addr)
+{
+	if (orig_dma_ops->mapping_error)
+		return orig_dma_ops->mapping_error(dma_addr);
+
+	printk(KERN_ERR "%s: Unhandled PV DMA operation. Report this.\n",
+	       __FUNCTION__);
+	return dma_addr == bad_dma_address;
+}
+
+/* like map_single, but doesn't check the device mask */
+dma_addr_t kvm_pv_dma_map_simple(struct device *hwdev, char *ptr,
+				 size_t size, int direction)
+{
+	return orig_dma_ops->map_simple(hwdev, ptr, size, direction);
+}
+
+void kvm_pv_dma_sync_single_for_cpu(struct device *hwdev,
+				    dma_addr_t dma_handle, size_t size,
+				    int direction)
+{
+	if (orig_dma_ops->sync_single_for_cpu)
+		orig_dma_ops->sync_single_for_cpu(hwdev, dma_handle,
+						  size, direction);
+}
+
+void kvm_pv_dma_sync_single_for_device(struct device *hwdev,
+				       dma_addr_t dma_handle, size_t size,
+				       int direction)
+{
+	if (orig_dma_ops->sync_single_for_device)
+		orig_dma_ops->sync_single_for_device(hwdev, dma_handle,
+						     size, direction);
+}
+
+void kvm_pv_dma_sync_single_range_for_cpu(struct device *hwdev,
+					  dma_addr_t dma_handle,
+					  unsigned long offset,
+					  size_t size, int direction)
+{
+	if (orig_dma_ops->sync_single_range_for_cpu)
+		orig_dma_ops->sync_single_range_for_cpu(hwdev, dma_handle,
+							offset, size,
+							direction);
+}
+
+void kvm_pv_dma_sync_single_range_for_device(struct device *hwdev,
+					     dma_addr_t dma_handle,
+					     unsigned long offset,
+					     size_t size, int direction)
+{
+	if (orig_dma_ops->sync_single_range_for_device)
+		orig_dma_ops->sync_single_range_for_device(hwdev, dma_handle,
+							   offset, size,
+							   direction);
+}
+
+void kvm_pv_dma_sync_sg_for_cpu(struct device *hwdev,
+		     struct scatterlist *sg, int nelems,
+		     int direction)
+{
+	if (orig_dma_ops->sync_sg_for_cpu)
+		orig_dma_ops->sync_sg_for_cpu(hwdev, sg, nelems, direction);
+}
+
+void kvm_pv_dma_sync_sg_for_device(struct device *hwdev,
+				   struct scatterlist *sg, int nelems,
+				   int direction)
+{
+	if (orig_dma_ops->sync_sg_for_device)
+		orig_dma_ops->sync_sg_for_device(hwdev, sg, nelems, direction);
+}
+
+int kvm_pv_dma_map_sg(struct device *hwdev, struct scatterlist *sg,
+		      int nents, int direction)
+{
+	return orig_dma_ops->map_sg(hwdev, sg, nents, direction);
+	printk(KERN_ERR "%s: Unhandled PV DMA operation. Report this.\n",
+	       __FUNCTION__);
+	return 0;
+}
+
+void kvm_pv_dma_unmap_sg(struct device *hwdev,
+			 struct scatterlist *sg, int nents,
+			 int direction)
+{
+	if (orig_dma_ops->unmap_sg)
+		orig_dma_ops->unmap_sg(hwdev, sg, nents, direction);
+}
+
+int kvm_pv_dma_dma_supported(struct device *hwdev, u64 mask)
+{
+	if (orig_dma_ops->dma_supported)
+		return orig_dma_ops->dma_supported(hwdev, mask);
+	printk(KERN_ERR "%s: Unhandled PV DMA operation. Report this.\n",
+	       __FUNCTION__);
+	return 0;
+}
+
+static const struct dma_mapping_ops kvm_dma_ops = {
+	.alloc_coherent	= kvm_dma_alloc_coherent,
+	.free_coherent	= kvm_dma_free_coherent,
+	.map_single	= kvm_dma_map_single,
+	.unmap_single	= kvm_dma_unmap_single,
+	.is_pv_device	= kvm_is_pv_device,
+
+	.mapping_error  = kvm_pv_dma_mapping_error,
+	.map_simple	= kvm_pv_dma_map_simple,
+	.sync_single_for_cpu = kvm_pv_dma_sync_single_for_cpu,
+	.sync_single_for_device = kvm_pv_dma_sync_single_for_device,
+	.sync_single_range_for_cpu = kvm_pv_dma_sync_single_range_for_cpu,
+	.sync_single_range_for_device = kvm_pv_dma_sync_single_range_for_device,
+	.sync_sg_for_cpu = kvm_pv_dma_sync_sg_for_cpu,
+	.sync_sg_for_device = kvm_pv_dma_sync_sg_for_device,
+	.map_sg		= kvm_pv_dma_map_sg,
+	.unmap_sg	= kvm_pv_dma_unmap_sg,
+};
+
+static struct file_operations dma_chardev_ops;
+static struct miscdevice kvm_dma_dev = {
+	KVM_DMA_MINOR,
+	"kvm_dma",
+	&dma_chardev_ops,
+};
+
+int __init kvm_pv_dma_init(void)
+{
+	int r;
+
+	dma_chardev_ops.owner = THIS_MODULE;
+	if (misc_register(&kvm_dma_dev)) {
+		printk(KERN_ERR "%s: misc device register failed\n",
+		       __FUNCTION__);
+		r = -EBUSY;
+		goto out;
+	}
+	if (!kvm_para_available()) {
+		printk(KERN_ERR "KVM paravirt support not available\n");
+		r = -ENODEV;
+		goto out_dereg;
+	}
+
+	/* FIXME: check for hypercall support */
+	page = alloc_page(GFP_ATOMIC);
+	if (page == NULL) {
+		printk(KERN_ERR "%s: Could not allocate page\n", __FUNCTION__);
+		r = -ENOMEM;
+		goto out_dereg;
+	}
+	page_gfn = page_to_pfn(page);
+
+	orig_dma_ops = dma_ops;
+	dma_ops = &kvm_dma_ops;
+
+	printk(KERN_INFO "KVM PV DMA engine registered\n");
+	return 0;
+	goto out;
+	goto out_free;
+
+ out_free:
+	__free_page(page);
+ out_dereg:
+	misc_deregister(&kvm_dma_dev);
+ out:
+	return r;
+}
+
+static void __exit kvm_pv_dma_exit(void)
+{
+	dma_ops = orig_dma_ops;
+
+	__free_page(page);
+
+	empty_pt_dev_list(&pt_devs_head);
+
+	misc_deregister(&kvm_dma_dev);
+}
+
+module_init(kvm_pv_dma_init);
+module_exit(kvm_pv_dma_exit);
-- 
1.5.3


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA
       [not found]   ` <609d5d611a5fb58ab5a7184be7b6d29494023ba0.1194445109.git.amit.shah@qumranet.com>
@ 2007-11-07 14:21     ` Amit Shah
  0 siblings, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-07 14:21 UTC (permalink / raw)
  To: kvm-devel, linux-kernel; +Cc: Amit Shah

We make the dma_mapping_ops structure to point to our structure
so that every DMA access goes through us. (This is the reason this
only works for 64-bit guest. 32-bit guest doesn't yet have a dma_ops
struct.)

We make a hypercall for every device that does a DMA operation
to find out if it is a passthroughed device -- so that we can
make hypercalls on each DMA access. The result of this hypercall
is cached, so that this hypercall is made only once for each device

Right now, this only works as a module: compiling it in causes
it to freeze during the HD bring-up.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
---
 drivers/kvm/kvm_pv_dma.c |  398 ++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 398 insertions(+), 0 deletions(-)
 create mode 100644 drivers/kvm/kvm_pv_dma.c

diff --git a/drivers/kvm/kvm_pv_dma.c b/drivers/kvm/kvm_pv_dma.c
new file mode 100644
index 0000000..8d98d98
--- /dev/null
+++ b/drivers/kvm/kvm_pv_dma.c
@@ -0,0 +1,398 @@
+/*
+ * KVM guest DMA para-virtualization driver
+ *
+ * Copyright (C) 2007, Qumranet, Inc., Amit Shah <amit.shah@qumranet.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <asm/io.h>
+#include <asm/page.h>
+#include <linux/fs.h>
+#include <linux/pci.h>
+#include <linux/module.h>
+#include <linux/version.h>
+#include <linux/miscdevice.h>
+#include <linux/kvm_para.h>
+
+MODULE_AUTHOR("Amit Shah");
+MODULE_DESCRIPTION("Implements guest para-virtualized DMA");
+MODULE_LICENSE("GPL");
+MODULE_VERSION("1");
+
+#define KVM_DMA_MINOR MISC_DYNAMIC_MINOR
+
+static struct page *page;
+static unsigned long page_gfn;
+
+const struct dma_mapping_ops *orig_dma_ops;
+
+#include <linux/list.h>
+struct pv_passthrough_dev_list {
+	struct list_head list;
+	struct kvm_pv_pci_info pv_pci_info;
+	int is_pv;
+};
+static LIST_HEAD(pt_devs_head);
+
+static struct pv_passthrough_dev_list*
+find_matching_pt_dev(struct list_head *head,
+		     struct kvm_pv_pci_info *pv_pci_info)
+{
+	struct list_head *ptr;
+	struct pv_passthrough_dev_list *match;
+
+	list_for_each(ptr, head) {
+		match = list_entry(ptr, struct pv_passthrough_dev_list, list);
+		if (match &&
+		    (match->pv_pci_info.busnr == pv_pci_info->busnr) &&
+		    (match->pv_pci_info.devfn == pv_pci_info->devfn))
+			return match;
+	}
+	return NULL;
+}
+
+void
+empty_pt_dev_list(struct list_head *head)
+{
+	struct pv_passthrough_dev_list *match;
+
+	while (!list_empty(head)) {
+		match = list_entry(head->next, \
+				   struct pv_passthrough_dev_list, list);
+		list_del(&match->list);
+	}
+}
+
+static int
+kvm_is_pv_device(struct device *dev, const char *name)
+{
+	int r;
+	struct pci_dev *pci_dev;
+	struct kvm_pv_pci_info pv_pci_info;
+	struct pv_passthrough_dev_list *match;
+
+	pci_dev = to_pci_dev(dev);
+	pv_pci_info.busnr = pci_dev->bus->number;
+	pv_pci_info.devfn = pci_dev->devfn;
+
+	match = find_matching_pt_dev(&pt_devs_head, &pv_pci_info);
+	if (match) {
+		r = match->is_pv;
+		goto out;
+	}
+
+	memcpy(page_address(page), &pv_pci_info, sizeof(pv_pci_info));
+	r = kvm_hypercall1(KVM_PV_PCI_DEVICE, page_gfn);
+	if (r < 1) {
+		printk(KERN_INFO "%s: Errror doing hypercall!\n", __FUNCTION__);
+		r = 0;
+		goto out;
+	}
+
+	match = kmalloc(sizeof(struct pv_passthrough_dev_list), GFP_KERNEL);
+	if (match == NULL) {
+		printk(KERN_INFO "%s: Out of memory\n", __FUNCTION__);
+		r = 0;
+		goto out;
+	}
+	match->pv_pci_info.busnr = pv_pci_info.busnr;
+	match->pv_pci_info.devfn = pv_pci_info.devfn;
+	match->is_pv = r;
+	list_add(&match->list, &pt_devs_head);
+ out:
+	return r;
+}
+
+static void *
+kvm_dma_map(void *vaddr, size_t size, dma_addr_t *dma_handle)
+{
+	int npages, i;
+	unsigned long *dma_addr;
+	dma_addr_t host_addr = bad_dma_address;
+
+	if (page == NULL)
+		goto out;
+
+	npages = get_order(size) + 1;
+	dma_addr = page_address(page);
+
+	/* We have to take into consideration the offsets for the
+	 * virtual address provided by the calling
+	 * functions. Currently both, pci_alloc_consistent and
+	 * pci_map_single call this function. We have to change it so
+	 * that we can also pass to the host the offset of the addr in
+	 * the page it is in.
+	 */
+
+	if (*dma_handle == bad_dma_address)
+		goto out;
+
+	/* It's not really OK to use dma_handle here, as the IOMMU or
+	 * swiotlb could have mapped it elsewhere. But what's a better
+	 * solution?
+	 */
+	*dma_addr++ = *dma_handle;
+	if (npages > 1) {
+		/* All of the pages will be contiguous in guest
+		 * physical memory in both, pci_map_consistent and
+		 * pci_map_single cases (see DMA-API.txt)
+		 */
+		/* FIXME: we're currently not crossing over to
+		 * multiple pages to be sent to host, in case
+		 * we have a lot of pages that we can't
+		 * accomodate in one page.
+		 */
+		for (i = 1; i < min((unsigned long)npages, MAX_PVDMA_PAGES); i++)
+			*dma_addr++ = virt_to_phys(vaddr + PAGE_SIZE * i);
+	}
+
+	/* Maybe we need more arguments (we have first two):
+	 * @npages: number of gpas pages in this hypercall
+	 * @page: page we pass to host with all the gpas in them
+	 * @more: are there any more pages coming?
+	 * @offset: offset of the address in the first page
+	 * @direction: direction for the mapping (only for pci_map_single)
+	 */
+	npages = kvm_hypercall2(KVM_PV_DMA_MAP, npages, page_gfn);
+	if (!npages)
+		host_addr = bad_dma_address;
+	else
+		host_addr = *(unsigned long *)page_address(page);
+
+ out:
+	*dma_handle = host_addr;
+	if (host_addr == bad_dma_address)
+		vaddr = NULL;
+	return vaddr;
+}
+
+static void
+kvm_dma_unmap(dma_addr_t dma_handle)
+{
+	kvm_hypercall1(KVM_PV_DMA_UNMAP, dma_handle);
+	return;
+}
+
+static void *
+kvm_dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
+		       gfp_t gfp)
+{
+	void *vaddr = NULL;
+	if ((*dma_handle == bad_dma_address)
+	    || !dma_ops->is_pv_device(dev, dev->bus_id))
+		goto out;
+
+	vaddr = bus_to_virt((unsigned long)dma_handle);
+	vaddr = kvm_dma_map(vaddr, size, dma_handle);
+ out:
+	return vaddr;
+}
+
+static void
+kvm_dma_free_coherent(struct device *dev, size_t size, void *vaddr,
+		      dma_addr_t dma_handle)
+{
+	kvm_dma_unmap(dma_handle);
+}
+
+static dma_addr_t
+kvm_dma_map_single(struct device *dev, void *ptr, size_t size, int direction)
+{
+	dma_addr_t r;
+
+	r = orig_dma_ops->map_single(dev, ptr, size, direction);
+
+	if (r != bad_dma_address && kvm_is_pv_device(dev, dev->bus_id))
+		kvm_dma_map(ptr, size, &r);
+	return r;
+}
+
+static inline void
+kvm_dma_unmap_single(struct device *dev, dma_addr_t addr, size_t size,
+		     int direction)
+{
+	kvm_dma_unmap(addr);
+}
+
+int kvm_pv_dma_mapping_error(dma_addr_t dma_addr)
+{
+	if (orig_dma_ops->mapping_error)
+		return orig_dma_ops->mapping_error(dma_addr);
+
+	printk(KERN_ERR "%s: Unhandled PV DMA operation. Report this.\n",
+	       __FUNCTION__);
+	return dma_addr == bad_dma_address;
+}
+
+/* like map_single, but doesn't check the device mask */
+dma_addr_t kvm_pv_dma_map_simple(struct device *hwdev, char *ptr,
+				 size_t size, int direction)
+{
+	return orig_dma_ops->map_simple(hwdev, ptr, size, direction);
+}
+
+void kvm_pv_dma_sync_single_for_cpu(struct device *hwdev,
+				    dma_addr_t dma_handle, size_t size,
+				    int direction)
+{
+	if (orig_dma_ops->sync_single_for_cpu)
+		orig_dma_ops->sync_single_for_cpu(hwdev, dma_handle,
+						  size, direction);
+}
+
+void kvm_pv_dma_sync_single_for_device(struct device *hwdev,
+				       dma_addr_t dma_handle, size_t size,
+				       int direction)
+{
+	if (orig_dma_ops->sync_single_for_device)
+		orig_dma_ops->sync_single_for_device(hwdev, dma_handle,
+						     size, direction);
+}
+
+void kvm_pv_dma_sync_single_range_for_cpu(struct device *hwdev,
+					  dma_addr_t dma_handle,
+					  unsigned long offset,
+					  size_t size, int direction)
+{
+	if (orig_dma_ops->sync_single_range_for_cpu)
+		orig_dma_ops->sync_single_range_for_cpu(hwdev, dma_handle,
+							offset, size,
+							direction);
+}
+
+void kvm_pv_dma_sync_single_range_for_device(struct device *hwdev,
+					     dma_addr_t dma_handle,
+					     unsigned long offset,
+					     size_t size, int direction)
+{
+	if (orig_dma_ops->sync_single_range_for_device)
+		orig_dma_ops->sync_single_range_for_device(hwdev, dma_handle,
+							   offset, size,
+							   direction);
+}
+
+void kvm_pv_dma_sync_sg_for_cpu(struct device *hwdev,
+		     struct scatterlist *sg, int nelems,
+		     int direction)
+{
+	if (orig_dma_ops->sync_sg_for_cpu)
+		orig_dma_ops->sync_sg_for_cpu(hwdev, sg, nelems, direction);
+}
+
+void kvm_pv_dma_sync_sg_for_device(struct device *hwdev,
+				   struct scatterlist *sg, int nelems,
+				   int direction)
+{
+	if (orig_dma_ops->sync_sg_for_device)
+		orig_dma_ops->sync_sg_for_device(hwdev, sg, nelems, direction);
+}
+
+int kvm_pv_dma_map_sg(struct device *hwdev, struct scatterlist *sg,
+		      int nents, int direction)
+{
+	return orig_dma_ops->map_sg(hwdev, sg, nents, direction);
+	printk(KERN_ERR "%s: Unhandled PV DMA operation. Report this.\n",
+	       __FUNCTION__);
+	return 0;
+}
+
+void kvm_pv_dma_unmap_sg(struct device *hwdev,
+			 struct scatterlist *sg, int nents,
+			 int direction)
+{
+	if (orig_dma_ops->unmap_sg)
+		orig_dma_ops->unmap_sg(hwdev, sg, nents, direction);
+}
+
+int kvm_pv_dma_dma_supported(struct device *hwdev, u64 mask)
+{
+	if (orig_dma_ops->dma_supported)
+		return orig_dma_ops->dma_supported(hwdev, mask);
+	printk(KERN_ERR "%s: Unhandled PV DMA operation. Report this.\n",
+	       __FUNCTION__);
+	return 0;
+}
+
+static const struct dma_mapping_ops kvm_dma_ops = {
+	.alloc_coherent	= kvm_dma_alloc_coherent,
+	.free_coherent	= kvm_dma_free_coherent,
+	.map_single	= kvm_dma_map_single,
+	.unmap_single	= kvm_dma_unmap_single,
+	.is_pv_device	= kvm_is_pv_device,
+
+	.mapping_error  = kvm_pv_dma_mapping_error,
+	.map_simple	= kvm_pv_dma_map_simple,
+	.sync_single_for_cpu = kvm_pv_dma_sync_single_for_cpu,
+	.sync_single_for_device = kvm_pv_dma_sync_single_for_device,
+	.sync_single_range_for_cpu = kvm_pv_dma_sync_single_range_for_cpu,
+	.sync_single_range_for_device = kvm_pv_dma_sync_single_range_for_device,
+	.sync_sg_for_cpu = kvm_pv_dma_sync_sg_for_cpu,
+	.sync_sg_for_device = kvm_pv_dma_sync_sg_for_device,
+	.map_sg		= kvm_pv_dma_map_sg,
+	.unmap_sg	= kvm_pv_dma_unmap_sg,
+};
+
+static struct file_operations dma_chardev_ops;
+static struct miscdevice kvm_dma_dev = {
+	KVM_DMA_MINOR,
+	"kvm_dma",
+	&dma_chardev_ops,
+};
+
+int __init kvm_pv_dma_init(void)
+{
+	int r;
+
+	dma_chardev_ops.owner = THIS_MODULE;
+	if (misc_register(&kvm_dma_dev)) {
+		printk(KERN_ERR "%s: misc device register failed\n",
+		       __FUNCTION__);
+		r = -EBUSY;
+		goto out;
+	}
+	if (!kvm_para_available()) {
+		printk(KERN_ERR "KVM paravirt support not available\n");
+		r = -ENODEV;
+		goto out_dereg;
+	}
+
+	/* FIXME: check for hypercall support */
+	page = alloc_page(GFP_ATOMIC);
+	if (page == NULL) {
+		printk(KERN_ERR "%s: Could not allocate page\n", __FUNCTION__);
+		r = -ENOMEM;
+		goto out_dereg;
+	}
+	page_gfn = page_to_pfn(page);
+
+	orig_dma_ops = dma_ops;
+	dma_ops = &kvm_dma_ops;
+
+	printk(KERN_INFO "KVM PV DMA engine registered\n");
+	return 0;
+	goto out;
+	goto out_free;
+
+ out_free:
+	__free_page(page);
+ out_dereg:
+	misc_deregister(&kvm_dma_dev);
+ out:
+	return r;
+}
+
+static void __exit kvm_pv_dma_exit(void)
+{
+	dma_ops = orig_dma_ops;
+
+	__free_page(page);
+
+	empty_pt_dev_list(&pt_devs_head);
+
+	misc_deregister(&kvm_dma_dev);
+}
+
+module_init(kvm_pv_dma_init);
+module_exit(kvm_pv_dma_exit);
-- 
1.5.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/8] KVM: PVDMA: Introduce is_pv_device() dma operation
       [not found]     ` <218cf425feff1d4daf23d3f25df1eb224108a1a3.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-11-07 14:21       ` Amit Shah
  2007-11-12 10:52       ` Muli Ben-Yehuda
  1 sibling, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-07 14:21 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Amit Shah

A guest can call dma_ops->is_pv_device() to find out
if a device is a passthrough'ed device (device passed
on to a guest by the host). If this is true, a hypercall
will be made to translate DMA mapping operations.

This function can be done away with and just a
kvm_is_pv_device() call can be added, which can be no-op
on a non-pv guest (or on the host).

Signed-off-by: Amit Shah <amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
---
 include/asm-x86/dma-mapping_64.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86/dma-mapping_64.h b/include/asm-x86/dma-mapping_64.h
index ecd0f61..3943edd 100644
--- a/include/asm-x86/dma-mapping_64.h
+++ b/include/asm-x86/dma-mapping_64.h
@@ -48,6 +48,8 @@ struct dma_mapping_ops {
 				int direction);
 	int             (*dma_supported)(struct device *hwdev, u64 mask);
 	int		is_phys;
+	/* Is this a physical device in a paravirtualized guest? */
+	int		(*is_pv_device)(struct device *hwdev, const char *name);
 };
 
 extern dma_addr_t bad_dma_address;
-- 
1.5.3


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/8] KVM: PVDMA: Introduce is_pv_device() dma operation
       [not found]   ` <218cf425feff1d4daf23d3f25df1eb224108a1a3.1194445109.git.amit.shah@qumranet.com>
@ 2007-11-07 14:21     ` Amit Shah
  0 siblings, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-07 14:21 UTC (permalink / raw)
  To: kvm-devel, linux-kernel; +Cc: Amit Shah

A guest can call dma_ops->is_pv_device() to find out
if a device is a passthrough'ed device (device passed
on to a guest by the host). If this is true, a hypercall
will be made to translate DMA mapping operations.

This function can be done away with and just a
kvm_is_pv_device() call can be added, which can be no-op
on a non-pv guest (or on the host).

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
---
 include/asm-x86/dma-mapping_64.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86/dma-mapping_64.h b/include/asm-x86/dma-mapping_64.h
index ecd0f61..3943edd 100644
--- a/include/asm-x86/dma-mapping_64.h
+++ b/include/asm-x86/dma-mapping_64.h
@@ -48,6 +48,8 @@ struct dma_mapping_ops {
 				int direction);
 	int             (*dma_supported)(struct device *hwdev, u64 mask);
 	int		is_phys;
+	/* Is this a physical device in a paravirtualized guest? */
+	int		(*is_pv_device)(struct device *hwdev, const char *name);
 };
 
 extern dma_addr_t bad_dma_address;
-- 
1.5.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/8] KVM: PVDMA: Update dma_alloc_coherent to make it paravirt-aware
       [not found]     ` <e2f5f0c08d08cf66a39c8b452410078617e611f7.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-11-07 14:21       ` Amit Shah
  2007-11-12 10:56       ` Muli Ben-Yehuda
  1 sibling, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-07 14:21 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Amit Shah

Of all the DMA calls, only dma_alloc_coherent might not actually
call dma_ops->alloc_coherent. We make sure that gets called
if the device that's being worked on is a PV device

Signed-off-by: Amit Shah <amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
---
 arch/x86/kernel/pci-dma_64.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/pci-dma_64.c b/arch/x86/kernel/pci-dma_64.c
index aa805b1..d4b1713 100644
--- a/arch/x86/kernel/pci-dma_64.c
+++ b/arch/x86/kernel/pci-dma_64.c
@@ -11,6 +11,7 @@
 #include <asm/io.h>
 #include <asm/gart.h>
 #include <asm/calgary.h>
+#include <linux/kvm_para.h>
 
 int iommu_merge __read_mostly = 1;
 EXPORT_SYMBOL(iommu_merge);
@@ -134,6 +135,18 @@ dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
 		memset(memory, 0, size);
 		if (!mmu) {
 			*dma_handle = virt_to_bus(memory);
+			if (unlikely(dma_ops->is_pv_device)
+			    && unlikely(dma_ops->is_pv_device(dev, dev->bus_id))) {
+				void *r;
+				r = dma_ops->alloc_coherent(dev, size,
+							    dma_handle,
+							    gfp);
+				if (r == NULL) {
+					free_pages((unsigned long)memory,
+						   get_order(size));
+					memory = NULL;
+				}
+			}
 			return memory;
 		}
 	}
-- 
1.5.3


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/8] KVM: PVDMA: Update dma_alloc_coherent to make it paravirt-aware
       [not found]   ` <e2f5f0c08d08cf66a39c8b452410078617e611f7.1194445109.git.amit.shah@qumranet.com>
@ 2007-11-07 14:21     ` Amit Shah
  0 siblings, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-07 14:21 UTC (permalink / raw)
  To: kvm-devel, linux-kernel; +Cc: Amit Shah

Of all the DMA calls, only dma_alloc_coherent might not actually
call dma_ops->alloc_coherent. We make sure that gets called
if the device that's being worked on is a PV device

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
---
 arch/x86/kernel/pci-dma_64.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/pci-dma_64.c b/arch/x86/kernel/pci-dma_64.c
index aa805b1..d4b1713 100644
--- a/arch/x86/kernel/pci-dma_64.c
+++ b/arch/x86/kernel/pci-dma_64.c
@@ -11,6 +11,7 @@
 #include <asm/io.h>
 #include <asm/gart.h>
 #include <asm/calgary.h>
+#include <linux/kvm_para.h>
 
 int iommu_merge __read_mostly = 1;
 EXPORT_SYMBOL(iommu_merge);
@@ -134,6 +135,18 @@ dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
 		memset(memory, 0, size);
 		if (!mmu) {
 			*dma_handle = virt_to_bus(memory);
+			if (unlikely(dma_ops->is_pv_device)
+			    && unlikely(dma_ops->is_pv_device(dev, dev->bus_id))) {
+				void *r;
+				r = dma_ops->alloc_coherent(dev, size,
+							    dma_handle,
+							    gfp);
+				if (r == NULL) {
+					free_pages((unsigned long)memory,
+						   get_order(size));
+					memory = NULL;
+				}
+			}
 			return memory;
 		}
 	}
-- 
1.5.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 6/8] KVM: PVDMA Guest: Add Makefile rule
       [not found]     ` <6bbd61409e4779febab1eaf03796455b22e8ea70.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-11-07 14:21       ` Amit Shah
  0 siblings, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-07 14:21 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Amit Shah

Add Makefile rule for compiling the new file
that we create

Signed-off-by: Amit Shah <amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
---
 drivers/kvm/Makefile |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index cf18ad4..f492e3e 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -8,3 +8,5 @@ kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
 kvm-amd-objs = svm.o
 obj-$(CONFIG_KVM_AMD) += kvm-amd.o
+kvm-pv-dma-objs = kvm_pv_dma.o
+obj-$(CONFIG_KVM_PV_DMA) += kvm_pv_dma.o
-- 
1.5.3


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 6/8] KVM: PVDMA Guest: Add Makefile rule
       [not found]   ` <6bbd61409e4779febab1eaf03796455b22e8ea70.1194445109.git.amit.shah@qumranet.com>
@ 2007-11-07 14:21     ` Amit Shah
  0 siblings, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-07 14:21 UTC (permalink / raw)
  To: kvm-devel, linux-kernel; +Cc: Amit Shah

Add Makefile rule for compiling the new file
that we create

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
---
 drivers/kvm/Makefile |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index cf18ad4..f492e3e 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -8,3 +8,5 @@ kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
 kvm-amd-objs = svm.o
 obj-$(CONFIG_KVM_AMD) += kvm-amd.o
+kvm-pv-dma-objs = kvm_pv_dma.o
+obj-$(CONFIG_KVM_PV_DMA) += kvm_pv_dma.o
-- 
1.5.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 7/8] PVDMA: Guest: Add Kconfig options to select PVDMA
       [not found]     ` <01dd7657bda537d738ea92330606592fa8aaf3c5.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-11-07 14:21       ` Amit Shah
  0 siblings, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-07 14:21 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Amit Shah

This is to be enabled on a guest. Currently, only
'module' works; compiling it in freezes at HD bringup

Signed-off-by: Amit Shah <amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
---
 drivers/kvm/Kconfig |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index 6569206..3385c10 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -47,6 +47,14 @@ config KVM_AMD
 	  Provides support for KVM on AMD processors equipped with the AMD-V
 	  (SVM) extensions.
 
+config KVM_PV_DMA
+	tristate "Para-virtualized DMA access"
+       ---help---
+         Provides support for DMA operations in the guest. A hypercall
+	 is raised to the host to enable devices owned by guest to use
+	 DMA. Select this if compiling a guest kernel and you need
+	 paravirtualized DMA operations.
+
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
 source drivers/lguest/Kconfig
-- 
1.5.3


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 7/8] PVDMA: Guest: Add Kconfig options to select PVDMA
       [not found]   ` <01dd7657bda537d738ea92330606592fa8aaf3c5.1194445109.git.amit.shah@qumranet.com>
@ 2007-11-07 14:21     ` Amit Shah
  0 siblings, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-07 14:21 UTC (permalink / raw)
  To: kvm-devel, linux-kernel; +Cc: Amit Shah

This is to be enabled on a guest. Currently, only
'module' works; compiling it in freezes at HD bringup

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
---
 drivers/kvm/Kconfig |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index 6569206..3385c10 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -47,6 +47,14 @@ config KVM_AMD
 	  Provides support for KVM on AMD processors equipped with the AMD-V
 	  (SVM) extensions.
 
+config KVM_PV_DMA
+	tristate "Para-virtualized DMA access"
+       ---help---
+         Provides support for DMA operations in the guest. A hypercall
+	 is raised to the host to enable devices owned by guest to use
+	 DMA. Select this if compiling a guest kernel and you need
+	 paravirtualized DMA operations.
+
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
 source drivers/lguest/Kconfig
-- 
1.5.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 8/8] KVM: Update drivers/Makefile to check for CONFIG_VIRTUALIZATION
       [not found]     ` <fbc5dea9bfdb021ab2d3808583314901799405a0.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-11-07 14:21       ` Amit Shah
  0 siblings, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-07 14:21 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Amit Shah

Check for CONFIG_VIRTUALIZATION instead of CONFIG_KVM,
since the PV drivers won't depend on CONFIG_KVM and we
still want to be selectable

Signed-off-by: Amit Shah <amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
---
 drivers/Makefile |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/Makefile b/drivers/Makefile
index 8cb37e3..6f1c287 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -47,7 +47,7 @@ obj-$(CONFIG_SPI)		+= spi/
 obj-$(CONFIG_PCCARD)		+= pcmcia/
 obj-$(CONFIG_DIO)		+= dio/
 obj-$(CONFIG_SBUS)		+= sbus/
-obj-$(CONFIG_KVM)		+= kvm/
+obj-$(CONFIG_VIRTUALIZATION)	+= kvm/
 obj-$(CONFIG_ZORRO)		+= zorro/
 obj-$(CONFIG_MAC)		+= macintosh/
 obj-$(CONFIG_ATA_OVER_ETH)	+= block/aoe/
-- 
1.5.3


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 8/8] KVM: Update drivers/Makefile to check for CONFIG_VIRTUALIZATION
       [not found]   ` <fbc5dea9bfdb021ab2d3808583314901799405a0.1194445109.git.amit.shah@qumranet.com>
@ 2007-11-07 14:21     ` Amit Shah
  0 siblings, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-07 14:21 UTC (permalink / raw)
  To: kvm-devel, linux-kernel; +Cc: Amit Shah

Check for CONFIG_VIRTUALIZATION instead of CONFIG_KVM,
since the PV drivers won't depend on CONFIG_KVM and we
still want to be selectable

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
---
 drivers/Makefile |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/Makefile b/drivers/Makefile
index 8cb37e3..6f1c287 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -47,7 +47,7 @@ obj-$(CONFIG_SPI)		+= spi/
 obj-$(CONFIG_PCCARD)		+= pcmcia/
 obj-$(CONFIG_DIO)		+= dio/
 obj-$(CONFIG_SBUS)		+= sbus/
-obj-$(CONFIG_KVM)		+= kvm/
+obj-$(CONFIG_VIRTUALIZATION)	+= kvm/
 obj-$(CONFIG_ZORRO)		+= zorro/
 obj-$(CONFIG_MAC)		+= macintosh/
 obj-$(CONFIG_ATA_OVER_ETH)	+= block/aoe/
-- 
1.5.3

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA
       [not found]     ` <609d5d611a5fb58ab5a7184be7b6d29494023ba0.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-11-07 14:21       ` [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA Amit Shah
@ 2007-11-12 10:50       ` Muli Ben-Yehuda
       [not found]         ` <20071112105001.GF3299-WD1JZD8MxeCTrf4lBMg6DdBPR1lH4CV8@public.gmane.org>
  1 sibling, 1 reply; 29+ messages in thread
From: Muli Ben-Yehuda @ 2007-11-12 10:50 UTC (permalink / raw)
  To: Amit Shah
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, Nov 07, 2007 at 04:21:04PM +0200, Amit Shah wrote:

> We make the dma_mapping_ops structure to point to our structure so
> that every DMA access goes through us. (This is the reason this only
> works for 64-bit guest. 32-bit guest doesn't yet have a dma_ops
> struct.)

I need the same facility for Calgary for falling back to swiotlb if a
translation is disabled on some slot, and IB needs the same facility
for some IB adapters (e.g., ipath). Perhaps it's time to consider
stackable dma-ops (unless someone has a better idea...).

Cheers,
Muli

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/8] KVM: PVDMA: Introduce is_pv_device() dma operation
       [not found]     ` <218cf425feff1d4daf23d3f25df1eb224108a1a3.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-11-07 14:21       ` [PATCH 4/8] KVM: PVDMA: Introduce is_pv_device() dma operation Amit Shah
@ 2007-11-12 10:52       ` Muli Ben-Yehuda
  1 sibling, 0 replies; 29+ messages in thread
From: Muli Ben-Yehuda @ 2007-11-12 10:52 UTC (permalink / raw)
  To: Amit Shah
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, Nov 07, 2007 at 04:21:05PM +0200, Amit Shah wrote:

> A guest can call dma_ops->is_pv_device() to find out if a device is
> a passthrough'ed device (device passed on to a guest by the
> host). If this is true, a hypercall will be made to translate DMA
> mapping operations.

Doesn't really belong in the DMA mapping API. Instead what I think we
should do is to cache this (per device) value in the pci_device struct
(or device struct?) and in the dma-ops implementation inspect it to
decide exactly how to do DMA mapping for this device.

Cheers,
Muli

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/8] KVM: PVDMA: Update dma_alloc_coherent to make it paravirt-aware
       [not found]     ` <e2f5f0c08d08cf66a39c8b452410078617e611f7.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-11-07 14:21       ` [PATCH 5/8] KVM: PVDMA: Update dma_alloc_coherent to make it paravirt-aware Amit Shah
@ 2007-11-12 10:56       ` Muli Ben-Yehuda
       [not found]         ` <20071112105637.GH3299-WD1JZD8MxeCTrf4lBMg6DdBPR1lH4CV8@public.gmane.org>
  1 sibling, 1 reply; 29+ messages in thread
From: Muli Ben-Yehuda @ 2007-11-12 10:56 UTC (permalink / raw)
  To: Amit Shah
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, Nov 07, 2007 at 04:21:06PM +0200, Amit Shah wrote:

> Of all the DMA calls, only dma_alloc_coherent might not actually
> call dma_ops->alloc_coherent. We make sure that gets called if the
> device that's being worked on is a PV device

I always thougt that's a mess... the reason it's done this way is that
Andi Kleen preferred it for some reason at the time. How about trying
to fix it cleanly so that dma_alloc_coherent always gets called rather
than adding a hack to work around a hack? It will require auditing all
of the different x86 dma-ops but I can help with that.

Cheers,
Muli

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA
       [not found]         ` <20071112105001.GF3299-WD1JZD8MxeCTrf4lBMg6DdBPR1lH4CV8@public.gmane.org>
@ 2007-11-12 11:56           ` Amit Shah
       [not found]             ` <200711121726.24907.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-11-12 14:53           ` Gerd Hoffmann
  2007-11-12 16:00           ` Joerg Roedel
  2 siblings, 1 reply; 29+ messages in thread
From: Amit Shah @ 2007-11-12 11:56 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Monday 12 November 2007 16:20:01 Muli Ben-Yehuda wrote:
> On Wed, Nov 07, 2007 at 04:21:04PM +0200, Amit Shah wrote:
> > We make the dma_mapping_ops structure to point to our structure so
> > that every DMA access goes through us. (This is the reason this only
> > works for 64-bit guest. 32-bit guest doesn't yet have a dma_ops
> > struct.)
>
> I need the same facility for Calgary for falling back to swiotlb if a
> translation is disabled on some slot, and IB needs the same facility
> for some IB adapters (e.g., ipath). Perhaps it's time to consider
> stackable dma-ops (unless someone has a better idea...).

That would make great sense and simplify implementations.

How do you propose such an implementation? An array of function pointers for 
each possible call?

>
> Cheers,
> Muli

Amit.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/8] KVM: PVDMA: Update dma_alloc_coherent to make it paravirt-aware
       [not found]         ` <20071112105637.GH3299-WD1JZD8MxeCTrf4lBMg6DdBPR1lH4CV8@public.gmane.org>
@ 2007-11-12 11:59           ` Amit Shah
  0 siblings, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-12 11:59 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Monday 12 November 2007 16:26:37 Muli Ben-Yehuda wrote:
> On Wed, Nov 07, 2007 at 04:21:06PM +0200, Amit Shah wrote:
> > Of all the DMA calls, only dma_alloc_coherent might not actually
> > call dma_ops->alloc_coherent. We make sure that gets called if the
> > device that's being worked on is a PV device
>
> I always thougt that's a mess... the reason it's done this way is that
> Andi Kleen preferred it for some reason at the time. How about trying
> to fix it cleanly so that dma_alloc_coherent always gets called rather
> than adding a hack to work around a hack? It will require auditing all
> of the different x86 dma-ops but I can help with that.

Hmm, nice to know it's like that just because of "not necessary now".

Fixing this along with stacking dma_ops and making 32-bit also dma_ops-ready 
will greatly simplify things.

>
> Cheers,
> Muli



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA
       [not found]             ` <200711121726.24907.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-11-12 13:32               ` Muli Ben-Yehuda
       [not found]                 ` <20071112133207.GJ3299-WD1JZD8MxeCTrf4lBMg6DdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Muli Ben-Yehuda @ 2007-11-12 13:32 UTC (permalink / raw)
  To: Amit Shah
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Mon, Nov 12, 2007 at 05:26:24PM +0530, Amit Shah wrote:
> On Monday 12 November 2007 16:20:01 Muli Ben-Yehuda wrote:
> > On Wed, Nov 07, 2007 at 04:21:04PM +0200, Amit Shah wrote:
> > > We make the dma_mapping_ops structure to point to our structure so
> > > that every DMA access goes through us. (This is the reason this only
> > > works for 64-bit guest. 32-bit guest doesn't yet have a dma_ops
> > > struct.)
> >
> > I need the same facility for Calgary for falling back to swiotlb if a
> > translation is disabled on some slot, and IB needs the same facility
> > for some IB adapters (e.g., ipath). Perhaps it's time to consider
> > stackable dma-ops (unless someone has a better idea...).
> 
> That would make great sense and simplify implementations.
> 
> How do you propose such an implementation? An array of function
> pointers for each possible call?

I was thinking of simply a chain of dma_ops (via dma_ops->prev_ops) ,
where it's the responsibility of each dma_ops implementation to call
or not call the corresponding entry in chain (prev_ops->op()). This
works well for Calgary (which will only use prev_ops selectively, and
I think it will work well for the IB folks. Will it work for you?

Cheers,
Muli

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA
       [not found]                 ` <20071112133207.GJ3299-WD1JZD8MxeCTrf4lBMg6DdBPR1lH4CV8@public.gmane.org>
@ 2007-11-12 13:55                   ` Amit Shah
       [not found]                     ` <200711121925.27844.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Amit Shah @ 2007-11-12 13:55 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Monday 12 November 2007 19:02:07 Muli Ben-Yehuda wrote:
> On Mon, Nov 12, 2007 at 05:26:24PM +0530, Amit Shah wrote:
> > On Monday 12 November 2007 16:20:01 Muli Ben-Yehuda wrote:
> > > On Wed, Nov 07, 2007 at 04:21:04PM +0200, Amit Shah wrote:
> > > > We make the dma_mapping_ops structure to point to our structure so
> > > > that every DMA access goes through us. (This is the reason this only
> > > > works for 64-bit guest. 32-bit guest doesn't yet have a dma_ops
> > > > struct.)
> > >
> > > I need the same facility for Calgary for falling back to swiotlb if a
> > > translation is disabled on some slot, and IB needs the same facility
> > > for some IB adapters (e.g., ipath). Perhaps it's time to consider
> > > stackable dma-ops (unless someone has a better idea...).
> >
> > That would make great sense and simplify implementations.
> >
> > How do you propose such an implementation? An array of function
> > pointers for each possible call?
>
> I was thinking of simply a chain of dma_ops (via dma_ops->prev_ops) ,
> where it's the responsibility of each dma_ops implementation to call
> or not call the corresponding entry in chain (prev_ops->op()). This
> works well for Calgary (which will only use prev_ops selectively, and
> I think it will work well for the IB folks. Will it work for you?

Selectively? What happens in the case when some iommu doesn't want to invoke 
the prev_op, but the mapping depends on it being called (eg, the hypercalling 
op is embedded somewhere in the prev_op chain)

Hmm, also, a hypercall should be the last operation to be called in a few 
cases, but also the first (and the last) to be called in several other cases. 
For example, in a guest, you can go register any number of iotlbs, but you 
don't actually want to do anything there -- you just want to do a hypercall 
and get the mapping from the host.

But in any case, what ensures that the hypercall op always gets called and 
also that it's the last one?

Also, I'm thinking of implementations where let's say sg_map_free is not 
defined for a particular iotlb, but it was defined in the previously 
registered one. How to handle this?

It seems a small dispatcher which takes care of this seems the likely choice 
here, but avoiding it (or at least caching the decisions) is something that 
needs more thought.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA
       [not found]                     ` <200711121925.27844.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-11-12 14:01                       ` Muli Ben-Yehuda
  0 siblings, 0 replies; 29+ messages in thread
From: Muli Ben-Yehuda @ 2007-11-12 14:01 UTC (permalink / raw)
  To: Amit Shah
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Mon, Nov 12, 2007 at 07:25:27PM +0530, Amit Shah wrote:

> Selectively? What happens in the case when some iommu doesn't want
> to invoke the prev_op, but the mapping depends on it being called
> (eg, the hypercalling op is embedded somewhere in the prev_op chain)

Bad things :-)
There needs to be a hierarchy of dma-ops, e.g., nommu/swiotlb, then a
hardware iommu, then pvdma. Not sure where IB fits in here. The
calling order would be the reverse of the initialization order, so
pvdma->hardare->nommu/swiotlb.

> Hmm, also, a hypercall should be the last operation to be called in
> a few cases, but also the first (and the last) to be called in
> several other cases. For example, in a guest, you can go register
> any number of iotlbs, but you don't actually want to do anything
> there -- you just want to do a hypercall and get the mapping from
> the host.
> 
> But in any case, what ensures that the hypercall op always gets
> called and also that it's the last one?

If it gets called first it can ensure that it runs either first or
last, or both, since it controls when to run the other hooks, before
or after it does what it needs to do.
 
> Also, I'm thinking of implementations where let's say sg_map_free is
> not defined for a particular iotlb, but it was defined in the
> previously registered one. How to handle this?

Good point, this will require all dma ops implementation to provide
stubs that just return prev_ops->op if it's set.

> It seems a small dispatcher which takes care of this seems the
> likely choice here, but avoiding it (or at least caching the
> decisions) is something that needs more thought.

Yeah, I'm not too enthusiastic about it, but we do need such a generic
mechanism or we will each end up implementing our own versions of
it...

Cheers,
Muli

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA
       [not found]         ` <20071112105001.GF3299-WD1JZD8MxeCTrf4lBMg6DdBPR1lH4CV8@public.gmane.org>
  2007-11-12 11:56           ` Amit Shah
@ 2007-11-12 14:53           ` Gerd Hoffmann
  2007-11-12 16:00           ` Joerg Roedel
  2 siblings, 0 replies; 29+ messages in thread
From: Gerd Hoffmann @ 2007-11-12 14:53 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Muli Ben-Yehuda wrote:
> On Wed, Nov 07, 2007 at 04:21:04PM +0200, Amit Shah wrote:
> 
>> We make the dma_mapping_ops structure to point to our structure so
>> that every DMA access goes through us. (This is the reason this only
>> works for 64-bit guest. 32-bit guest doesn't yet have a dma_ops
>> struct.)
> 
> I need the same facility for Calgary for falling back to swiotlb if a
> translation is disabled on some slot, and IB needs the same facility
> for some IB adapters (e.g., ipath). Perhaps it's time to consider
> stackable dma-ops (unless someone has a better idea...).

Hmm, at least the later sounds like for per-device dma_ops would be more
useful that stackable ones, as each stack instance just checks "should I
do something for device $foo, if not, call the next one ...".

cheers,
  Gerd


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/8] KVM: PVDMA Host: Handle reqeusts for guest DMA mappings
       [not found]   ` <d1c72ce6e3a0e73c18993c3f066d1350b147f726.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
                       ` (7 preceding siblings ...)
       [not found]     ` <e2f5f0c08d08cf66a39c8b452410078617e611f7.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-11-12 15:55     ` Joerg Roedel
       [not found]       ` <20071112155522.GF6466-5C7GfCeVMHo@public.gmane.org>
  8 siblings, 1 reply; 29+ messages in thread
From: Joerg Roedel @ 2007-11-12 15:55 UTC (permalink / raw)
  To: Amit Shah
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, Nov 07, 2007 at 04:21:02PM +0200, Amit Shah wrote:
> @@ -1649,6 +1913,15 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>  	}
>  
>  	switch (nr) {
> +	case KVM_PV_DMA_MAP:
> +		ret = pv_map_hypercall(vcpu, a0, a1);
> +		break;
> +	case KVM_PV_DMA_UNMAP:
> +		ret = pv_unmap_hypercall(vcpu, a0);
> +		break;
> +	case KVM_PV_PCI_DEVICE:
> +		ret = pv_mapped_pci_device_hypercall(vcpu, a0);
> +		break;
>  	default:
>  		ret = -KVM_ENOSYS;
>  		break;

How does synchronization work with that design? I don't see a hypercall
to synchronize de DMA buffers. It will only work if GART is used as the
dma_ops backend on the host side and not with SWIOTLB. But GART can be
configured away.  Or do I miss something?

Joerg

-- 
           |           AMD Saxony Limited Liability Company & Co. KG
 Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System    |                  Register Court Dresden: HRA 4896
 Research  |              General Partner authorized to represent:
 Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
           | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA
       [not found]         ` <20071112105001.GF3299-WD1JZD8MxeCTrf4lBMg6DdBPR1lH4CV8@public.gmane.org>
  2007-11-12 11:56           ` Amit Shah
  2007-11-12 14:53           ` Gerd Hoffmann
@ 2007-11-12 16:00           ` Joerg Roedel
  2 siblings, 0 replies; 29+ messages in thread
From: Joerg Roedel @ 2007-11-12 16:00 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Mon, Nov 12, 2007 at 12:50:01PM +0200, Muli Ben-Yehuda wrote:
> On Wed, Nov 07, 2007 at 04:21:04PM +0200, Amit Shah wrote:
> 
> > We make the dma_mapping_ops structure to point to our structure so
> > that every DMA access goes through us. (This is the reason this only
> > works for 64-bit guest. 32-bit guest doesn't yet have a dma_ops
> > struct.)
> 
> I need the same facility for Calgary for falling back to swiotlb if a
> translation is disabled on some slot, and IB needs the same facility
> for some IB adapters (e.g., ipath). Perhaps it's time to consider
> stackable dma-ops (unless someone has a better idea...).

Stackable dma-ops sounds good to me. The only problem is that there is a
performance penalty for devices handled on the bottom of the stack. But
the alternative I can think of, a per-device dma-ops structure, uses more
memory and is much more intrusive to the driver core. So I am fine with
a stackable solution.

Joerg

-- 
           |           AMD Saxony Limited Liability Company & Co. KG
 Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System    |                  Register Court Dresden: HRA 4896
 Research  |              General Partner authorized to represent:
 Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
           | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/8] KVM: PVDMA Host: Handle reqeusts for guest DMA mappings
       [not found]       ` <20071112155522.GF6466-5C7GfCeVMHo@public.gmane.org>
@ 2007-11-12 17:07         ` Amit Shah
  0 siblings, 0 replies; 29+ messages in thread
From: Amit Shah @ 2007-11-12 17:07 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Monday 12 November 2007 21:25:22 Joerg Roedel wrote:
> On Wed, Nov 07, 2007 at 04:21:02PM +0200, Amit Shah wrote:
> > @@ -1649,6 +1913,15 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> >  	}
> >
> >  	switch (nr) {
> > +	case KVM_PV_DMA_MAP:
> > +		ret = pv_map_hypercall(vcpu, a0, a1);
> > +		break;
> > +	case KVM_PV_DMA_UNMAP:
> > +		ret = pv_unmap_hypercall(vcpu, a0);
> > +		break;
> > +	case KVM_PV_PCI_DEVICE:
> > +		ret = pv_mapped_pci_device_hypercall(vcpu, a0);
> > +		break;
> >  	default:
> >  		ret = -KVM_ENOSYS;
> >  		break;
>
> How does synchronization work with that design? I don't see a hypercall
> to synchronize de DMA buffers. It will only work if GART is used as the
> dma_ops backend on the host side and not with SWIOTLB. But GART can be
> configured away.  Or do I miss something?

A per-VM lock is needed while mapping or unmapping. It's one of the TODOs.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2007-11-12 17:07 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-07 14:21 RFC: Paravirtualized DMA accesses for KVM Amit Shah
     [not found] ` <d1c72ce6e3a0e73c18993c3f066d1350b147f726.1194445109.git.amit.shah@qumranet.com>
2007-11-07 14:21   ` [PATCH 1/8] KVM: PVDMA Host: Handle reqeusts for guest DMA mappings Amit Shah
     [not found]   ` <6d486436cf50e269d8914229d10ff60f3d646795.1194445109.git.amit.shah@qumranet.com>
2007-11-07 14:21     ` [PATCH 2/8] KVM: Move #include asm/kvm_para.h outside of __KERNEL__ Amit Shah
     [not found]   ` <609d5d611a5fb58ab5a7184be7b6d29494023ba0.1194445109.git.amit.shah@qumranet.com>
2007-11-07 14:21     ` [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA Amit Shah
     [not found]   ` <218cf425feff1d4daf23d3f25df1eb224108a1a3.1194445109.git.amit.shah@qumranet.com>
2007-11-07 14:21     ` [PATCH 4/8] KVM: PVDMA: Introduce is_pv_device() dma operation Amit Shah
     [not found]   ` <e2f5f0c08d08cf66a39c8b452410078617e611f7.1194445109.git.amit.shah@qumranet.com>
2007-11-07 14:21     ` [PATCH 5/8] KVM: PVDMA: Update dma_alloc_coherent to make it paravirt-aware Amit Shah
     [not found]   ` <6bbd61409e4779febab1eaf03796455b22e8ea70.1194445109.git.amit.shah@qumranet.com>
2007-11-07 14:21     ` [PATCH 6/8] KVM: PVDMA Guest: Add Makefile rule Amit Shah
     [not found]   ` <01dd7657bda537d738ea92330606592fa8aaf3c5.1194445109.git.amit.shah@qumranet.com>
2007-11-07 14:21     ` [PATCH 7/8] PVDMA: Guest: Add Kconfig options to select PVDMA Amit Shah
     [not found]   ` <fbc5dea9bfdb021ab2d3808583314901799405a0.1194445109.git.amit.shah@qumranet.com>
2007-11-07 14:21     ` [PATCH 8/8] KVM: Update drivers/Makefile to check for CONFIG_VIRTUALIZATION Amit Shah
     [not found] ` <1194445269752-git-send-email-amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
     [not found]   ` <d1c72ce6e3a0e73c18993c3f066d1350b147f726.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-07 14:21     ` [PATCH 1/8] KVM: PVDMA Host: Handle reqeusts for guest DMA mappings Amit Shah
     [not found]     ` <6d486436cf50e269d8914229d10ff60f3d646795.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-07 14:21       ` [PATCH 2/8] KVM: Move #include asm/kvm_para.h outside of __KERNEL__ Amit Shah
     [not found]     ` <218cf425feff1d4daf23d3f25df1eb224108a1a3.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-07 14:21       ` [PATCH 4/8] KVM: PVDMA: Introduce is_pv_device() dma operation Amit Shah
2007-11-12 10:52       ` Muli Ben-Yehuda
     [not found]     ` <6bbd61409e4779febab1eaf03796455b22e8ea70.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-07 14:21       ` [PATCH 6/8] KVM: PVDMA Guest: Add Makefile rule Amit Shah
     [not found]     ` <01dd7657bda537d738ea92330606592fa8aaf3c5.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-07 14:21       ` [PATCH 7/8] PVDMA: Guest: Add Kconfig options to select PVDMA Amit Shah
     [not found]     ` <fbc5dea9bfdb021ab2d3808583314901799405a0.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-07 14:21       ` [PATCH 8/8] KVM: Update drivers/Makefile to check for CONFIG_VIRTUALIZATION Amit Shah
     [not found]     ` <609d5d611a5fb58ab5a7184be7b6d29494023ba0.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-07 14:21       ` [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA Amit Shah
2007-11-12 10:50       ` Muli Ben-Yehuda
     [not found]         ` <20071112105001.GF3299-WD1JZD8MxeCTrf4lBMg6DdBPR1lH4CV8@public.gmane.org>
2007-11-12 11:56           ` Amit Shah
     [not found]             ` <200711121726.24907.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-12 13:32               ` Muli Ben-Yehuda
     [not found]                 ` <20071112133207.GJ3299-WD1JZD8MxeCTrf4lBMg6DdBPR1lH4CV8@public.gmane.org>
2007-11-12 13:55                   ` Amit Shah
     [not found]                     ` <200711121925.27844.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-12 14:01                       ` Muli Ben-Yehuda
2007-11-12 14:53           ` Gerd Hoffmann
2007-11-12 16:00           ` Joerg Roedel
     [not found]     ` <e2f5f0c08d08cf66a39c8b452410078617e611f7.1194445109.git.amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-07 14:21       ` [PATCH 5/8] KVM: PVDMA: Update dma_alloc_coherent to make it paravirt-aware Amit Shah
2007-11-12 10:56       ` Muli Ben-Yehuda
     [not found]         ` <20071112105637.GH3299-WD1JZD8MxeCTrf4lBMg6DdBPR1lH4CV8@public.gmane.org>
2007-11-12 11:59           ` Amit Shah
2007-11-12 15:55     ` [PATCH 1/8] KVM: PVDMA Host: Handle reqeusts for guest DMA mappings Joerg Roedel
     [not found]       ` <20071112155522.GF6466-5C7GfCeVMHo@public.gmane.org>
2007-11-12 17:07         ` Amit Shah

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox