kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [V6] Userspace patches for PCI device assignment
@ 2008-09-23 14:54 Amit Shah
  2008-09-23 14:54 ` [PATCH 1/7] KVM/userspace: Device Assignment: Add ioctl wrappers needed for assigning devices Amit Shah
  0 siblings, 1 reply; 31+ messages in thread
From: Amit Shah @ 2008-09-23 14:54 UTC (permalink / raw)
  To: avi; +Cc: kvm, muli, anthony, benami, weidong.han, allen.m.kay


This patchset enables device assignment for KVM hosts for PCI devices. It uses the Intel IOMMU by default if available.

Major changes since the last send:
- More error checking
- Change data structure names to match qemu style
- Add support for hot-adding devices (this works, but is currently RFC)
- Split patches for easier review

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/7] KVM/userspace: Device Assignment: Add ioctl wrappers needed for assigning devices
  2008-09-23 14:54 [V6] Userspace patches for PCI device assignment Amit Shah
@ 2008-09-23 14:54 ` Amit Shah
  2008-09-23 14:54   ` [PATCH 2/7] qemu: Introduce pci_map_irq to get irq nr from pin number for a PCI device Amit Shah
  0 siblings, 1 reply; 31+ messages in thread
From: Amit Shah @ 2008-09-23 14:54 UTC (permalink / raw)
  To: avi; +Cc: kvm, muli, anthony, benami, weidong.han, allen.m.kay, Amit Shah

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 libkvm/libkvm.c |   13 +++++++++++++
 libkvm/libkvm.h |   27 +++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index 63fbcba..7744daa 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -1059,3 +1059,16 @@ int kvm_unregister_coalesced_mmio(kvm_context_t kvm, uint64_t addr, uint32_t siz
 	return -ENOSYS;
 }
 
+#ifdef KVM_CAP_DEVICE_ASSIGNMENT
+int kvm_assign_pci_device(kvm_context_t kvm,
+			  struct kvm_assigned_pci_dev *assigned_dev)
+{
+	return ioctl(kvm->vm_fd, KVM_ASSIGN_PCI_DEVICE, assigned_dev);
+}
+
+int kvm_assign_irq(kvm_context_t kvm,
+		   struct kvm_assigned_irq *assigned_irq)
+{
+	return ioctl(kvm->vm_fd, KVM_ASSIGN_IRQ, assigned_irq);
+}
+#endif
diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
index 79dd769..ee16d65 100644
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -658,4 +658,31 @@ int kvm_s390_interrupt(kvm_context_t kvm, int slot,
 int kvm_s390_set_initial_psw(kvm_context_t kvm, int slot, psw_t psw);
 int kvm_s390_store_status(kvm_context_t kvm, int slot, unsigned long addr);
 #endif
+
+#ifdef KVM_CAP_DEVICE_ASSIGNMENT
+/*!
+ * \brief Notifies host kernel about a PCI device to be assigned to a guest
+ *
+ * Used for PCI device assignment, this function notifies the host
+ * kernel about the assigning of the physical PCI device to a guest.
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param assigned_dev Parameters, like bus, devfn number, etc
+ */
+int kvm_assign_pci_device(kvm_context_t kvm,
+			  struct kvm_assigned_pci_dev *assigned_dev);
+
+/*!
+ * \brief Notifies host kernel about changes to IRQ for an assigned device
+ *
+ * Used for PCI device assignment, this function notifies the host
+ * kernel about the changes in IRQ number for an assigned physical
+ * PCI device.
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param assigned_irq Parameters, like dev id, host irq, guest irq, etc
+ */
+int kvm_assign_irq(kvm_context_t kvm,
+		   struct kvm_assigned_irq *assigned_irq);
+#endif
 #endif
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/7] qemu: Introduce pci_map_irq to get irq nr from pin number for a PCI device
  2008-09-23 14:54 ` [PATCH 1/7] KVM/userspace: Device Assignment: Add ioctl wrappers needed for assigning devices Amit Shah
@ 2008-09-23 14:54   ` Amit Shah
  2008-09-23 14:54     ` [PATCH 3/7] qemu: piix: Introduce functions to get pin number from irq and vice versa Amit Shah
  2008-09-23 16:12     ` [PATCH 2/7] qemu: Introduce pci_map_irq to get irq nr from pin number for a PCI device Anthony Liguori
  0 siblings, 2 replies; 31+ messages in thread
From: Amit Shah @ 2008-09-23 14:54 UTC (permalink / raw)
  To: avi; +Cc: kvm, muli, anthony, benami, weidong.han, allen.m.kay, Amit Shah

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 qemu/hw/pci.c |    5 +++++
 qemu/hw/pci.h |    1 +
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
index 07d37a8..61ff0f6 100644
--- a/qemu/hw/pci.c
+++ b/qemu/hw/pci.c
@@ -560,6 +560,11 @@ static void pci_set_irq(void *opaque, int irq_num, int level)
     bus->set_irq(bus->irq_opaque, irq_num, bus->irq_count[irq_num] != 0);
 }
 
+int pci_map_irq(PCIDevice *pci_dev, int pin)
+{
+	return pci_dev->bus->map_irq(pci_dev, pin);
+}
+
 /***********************************************************/
 /* monitor info on PCI */
 
diff --git a/qemu/hw/pci.h b/qemu/hw/pci.h
index 60e4094..e11fbbf 100644
--- a/qemu/hw/pci.h
+++ b/qemu/hw/pci.h
@@ -81,6 +81,7 @@ void pci_register_io_region(PCIDevice *pci_dev, int region_num,
                             uint32_t size, int type,
                             PCIMapIORegionFunc *map_func);
 
+int pci_map_irq(PCIDevice *pci_dev, int pin);
 uint32_t pci_default_read_config(PCIDevice *d,
                                  uint32_t address, int len);
 void pci_default_write_config(PCIDevice *d,
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/7] qemu: piix: Introduce functions to get pin number from irq and vice versa
  2008-09-23 14:54   ` [PATCH 2/7] qemu: Introduce pci_map_irq to get irq nr from pin number for a PCI device Amit Shah
@ 2008-09-23 14:54     ` Amit Shah
  2008-09-23 14:54       ` [PATCH 4/7] qemu: Include hw.h in qemu/hw/isa.h to fix compile issues Amit Shah
  2008-09-23 16:13       ` [PATCH 3/7] qemu: piix: Introduce functions to get pin number from irq and vice versa Anthony Liguori
  2008-09-23 16:12     ` [PATCH 2/7] qemu: Introduce pci_map_irq to get irq nr from pin number for a PCI device Anthony Liguori
  1 sibling, 2 replies; 31+ messages in thread
From: Amit Shah @ 2008-09-23 14:54 UTC (permalink / raw)
  To: avi; +Cc: kvm, muli, anthony, benami, weidong.han, allen.m.kay, Amit Shah

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 qemu/hw/piix_pci.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/qemu/hw/piix_pci.c b/qemu/hw/piix_pci.c
index 6fbf47b..dc12c8a 100644
--- a/qemu/hw/piix_pci.c
+++ b/qemu/hw/piix_pci.c
@@ -243,6 +243,25 @@ static void piix3_set_irq(qemu_irq *pic, int irq_num, int level)
     }
 }
 
+int piix3_get_pin(int pic_irq)
+{
+    int i;
+    for (i = 0; i < 4; i++)
+        if (piix3_dev->config[0x60+i] == pic_irq)
+            return i;
+    return -1;
+}
+
+int piix_get_irq(int pin)
+{
+    if (piix3_dev)
+        return piix3_dev->config[0x60+pin];
+    if (piix4_dev)
+        return piix4_dev->config[0x60+pin];
+
+    return 0;
+}
+
 static void piix3_reset(PCIDevice *d)
 {
     uint8_t *pci_conf = d->config;
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/7] qemu: Include hw.h in qemu/hw/isa.h to fix compile issues
  2008-09-23 14:54     ` [PATCH 3/7] qemu: piix: Introduce functions to get pin number from irq and vice versa Amit Shah
@ 2008-09-23 14:54       ` Amit Shah
  2008-09-23 14:54         ` [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests Amit Shah
  2008-09-23 16:13         ` [PATCH 4/7] qemu: Include hw.h in qemu/hw/isa.h to fix compile issues Anthony Liguori
  2008-09-23 16:13       ` [PATCH 3/7] qemu: piix: Introduce functions to get pin number from irq and vice versa Anthony Liguori
  1 sibling, 2 replies; 31+ messages in thread
From: Amit Shah @ 2008-09-23 14:54 UTC (permalink / raw)
  To: avi; +Cc: kvm, muli, anthony, benami, weidong.han, allen.m.kay, Amit Shah

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 qemu/hw/isa.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/qemu/hw/isa.h b/qemu/hw/isa.h
index 222e4f3..e4a1326 100644
--- a/qemu/hw/isa.h
+++ b/qemu/hw/isa.h
@@ -2,6 +2,8 @@
 #define HW_ISA_H
 /* ISA bus */
 
+#include "hw.h"
+
 extern target_phys_addr_t isa_mem_base;
 
 int register_ioport_read(int start, int length, int size,
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests
  2008-09-23 14:54       ` [PATCH 4/7] qemu: Include hw.h in qemu/hw/isa.h to fix compile issues Amit Shah
@ 2008-09-23 14:54         ` Amit Shah
  2008-09-23 14:54           ` [PATCH 6/7] KVM/userspace: Build vtd.c for Intel IOMMU support Amit Shah
                             ` (3 more replies)
  2008-09-23 16:13         ` [PATCH 4/7] qemu: Include hw.h in qemu/hw/isa.h to fix compile issues Anthony Liguori
  1 sibling, 4 replies; 31+ messages in thread
From: Amit Shah @ 2008-09-23 14:54 UTC (permalink / raw)
  To: avi; +Cc: kvm, muli, anthony, benami, weidong.han, allen.m.kay, Amit Shah

From: Or Sagi <ors@tutis.com>
From: Nir Peleg <nir@tutis.com>
From: Amit Shah <amit.shah@redhat.com>
From: Ben-Ami Yassour <benami@il.ibm.com>
From: Weidong Han <weidong.han@intel.com>
From: Glauber de Oliveira Costa <gcosta@redhat.com>

With this patch, we can assign a device on the host machine to a
guest.

A new command-line option, -pcidevice is added.
For example, to invoke it for a device sitting at PCI bus:dev.fn
04:08.0, use this:

        -pcidevice host=04:08.0

* The host driver for the device, if any, is to be removed before
assigning the device (else device assignment will fail).

* A device that shares IRQ with another host device cannot currently
be assigned.

This works only with the in-kernel irqchip method; to use the
userspace irqchip, a kernel module (irqhook) and some extra changes
are needed.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 qemu/Makefile.target        |    1 +
 qemu/hw/device-assignment.c |  665 +++++++++++++++++++++++++++++++++++++++++++
 qemu/hw/device-assignment.h |   93 ++++++
 qemu/hw/pc.c                |    9 +
 qemu/hw/pci.c               |    7 +
 qemu/vl.c                   |   18 ++
 6 files changed, 793 insertions(+), 0 deletions(-)
 create mode 100644 qemu/hw/device-assignment.c
 create mode 100644 qemu/hw/device-assignment.h

diff --git a/qemu/Makefile.target b/qemu/Makefile.target
index 72f3db8..40eb273 100644
--- a/qemu/Makefile.target
+++ b/qemu/Makefile.target
@@ -616,6 +616,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
 OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
 OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
 OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o
+OBJS+= device-assignment.o
 ifeq ($(USE_KVM_PIT), 1)
 OBJS+= i8254-kvm.o
 endif
diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
new file mode 100644
index 0000000..e70daf2
--- /dev/null
+++ b/qemu/hw/device-assignment.c
@@ -0,0 +1,665 @@
+/*
+ * Copyright (c) 2007, Neocleus Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ *
+ *  Assign a PCI device from the host to a guest VM.
+ *
+ *  Adapted for KVM by Qumranet.
+ *
+ *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
+ *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
+ *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
+ *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
+ */
+#include <stdio.h>
+#include <sys/io.h>
+#include "qemu-kvm.h"
+#include <linux/kvm_para.h>
+#include "device-assignment.h"
+
+/* From linux/ioport.h */
+#define IORESOURCE_IO		0x00000100	/* Resource type */
+#define IORESOURCE_MEM		0x00000200
+#define IORESOURCE_IRQ		0x00000400
+#define IORESOURCE_DMA		0x00000800
+#define IORESOURCE_PREFETCH	0x00001000	/* No side effects */
+
+/* #define DEVICE_ASSIGNMENT_DEBUG */
+
+#ifdef DEVICE_ASSIGNMENT_DEBUG
+#define DEBUG(fmt, args...) fprintf(stderr, "%s: " fmt, __func__ , ## args)
+#else
+#define DEBUG(fmt, args...)
+#endif
+
+static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr,
+				       uint32_t value)
+{
+	AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
+	uint32_t r_pio = (unsigned long)r_access->r_virtbase
+		+ (addr - r_access->e_physbase);
+
+	if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
+		fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x"
+			" r_virtbase=%08lx value=%08x\n",
+			__func__, r_pio, (int)r_access->e_physbase,
+			(unsigned long)r_access->r_virtbase, value);
+	}
+	iopl(3);
+	outb(value, r_pio);
+}
+
+static void assigned_dev_ioport_writew(void *opaque, uint32_t addr,
+				       uint32_t value)
+{
+	AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
+	uint32_t r_pio = (unsigned long)r_access->r_virtbase
+		+ (addr - r_access->e_physbase);
+
+	if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
+		fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x"
+			" r_virtbase=%08lx value=%08x\n",
+			__func__, r_pio, (int)r_access->e_physbase,
+			(unsigned long)r_access->r_virtbase, value);
+	}
+	iopl(3);
+	outw(value, r_pio);
+}
+
+static void assigned_dev_ioport_writel(void *opaque, uint32_t addr,
+				       uint32_t value)
+{
+	AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
+	uint32_t r_pio = (unsigned long)r_access->r_virtbase
+		+ (addr - r_access->e_physbase);
+
+	if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
+		fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x"
+			" r_virtbase=%08lx value=%08x\n",
+			__func__, r_pio, (int)r_access->e_physbase,
+			(unsigned long)r_access->r_virtbase, value);
+	}
+	iopl(3);
+	outl(value, r_pio);
+}
+
+static uint32_t assigned_dev_ioport_readb(void *opaque, uint32_t addr)
+{
+	AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
+	uint32_t r_pio = (addr - r_access->e_physbase)
+		+ (unsigned long)r_access->r_virtbase;
+	uint32_t value;
+
+	iopl(3);
+	value = inb(r_pio);
+	if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
+		fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x "
+			"r_virtbase=%08lx value=%08x\n",
+			__func__, r_pio, (int)r_access->e_physbase,
+			(unsigned long)r_access->r_virtbase, value);
+	}
+	return value;
+}
+
+static uint32_t assigned_dev_ioport_readw(void *opaque, uint32_t addr)
+{
+	AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
+	uint32_t r_pio = (addr - r_access->e_physbase)
+		+ (unsigned long)r_access->r_virtbase;
+	uint32_t value;
+
+	iopl(3);
+	value = inw(r_pio);
+	if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
+		fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x "
+			"r_virtbase=%08lx value=%08x\n",
+			__func__, r_pio, (int)r_access->e_physbase,
+			(unsigned long)r_access->r_virtbase, value);
+	}
+	return value;
+}
+
+static uint32_t assigned_dev_ioport_readl(void *opaque, uint32_t addr)
+{
+	AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
+	uint32_t r_pio = (addr - r_access->e_physbase)
+		+ (unsigned long)r_access->r_virtbase;
+	uint32_t value;
+
+	iopl(3);
+	value = inl(r_pio);
+	if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
+		fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x "
+			"r_virtbase=%08lx value=%08x\n",
+			__func__, r_pio, (int)r_access->e_physbase,
+			(unsigned long)r_access->r_virtbase, value);
+	}
+	return value;
+}
+
+static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
+			 uint32_t e_phys, uint32_t e_size, int type)
+{
+	AssignedDevice *r_dev = (AssignedDevice *) pci_dev;
+	AssignedDevRegion *region = &r_dev->v_addrs[region_num];
+	int first_map = (region->e_size == 0);
+	int ret = 0;
+
+	DEBUG("e_phys=%08x r_virt=%p type=%d len=%08x region_num=%d \n",
+	      e_phys, r_dev->v_addrs[region_num].r_virtbase, type, e_size,
+	      region_num);
+
+	region->e_physbase = e_phys;
+	region->e_size = e_size;
+
+	/* FIXME: Add support for emulated MMIO for non-kvm guests */
+	if (kvm_enabled()) {
+		if (!first_map)
+			kvm_destroy_phys_mem(kvm_context, e_phys, e_size);
+		if (e_size > 0)
+			ret = kvm_register_phys_mem(kvm_context, e_phys,
+						    region->r_virtbase,
+						    e_size, 0);
+		if (ret != 0)
+			fprintf(stderr,
+				"%s: Error: create new mapping failed\n",
+				__func__);
+	}
+}
+
+static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num,
+				    uint32_t addr, uint32_t size, int type)
+{
+	AssignedDevice *r_dev = (AssignedDevice *) pci_dev;
+
+	r_dev->v_addrs[region_num].e_physbase = addr;
+	DEBUG("%s: address=0x%x type=0x%x len=%d region_num=%d \n",
+	      __func__, addr, type, size, region_num);
+
+	register_ioport_read(addr, size, 1, assigned_dev_ioport_readb,
+			     (void *) (r_dev->v_addrs + region_num));
+	register_ioport_read(addr, size, 2, assigned_dev_ioport_readw,
+			     (void *) (r_dev->v_addrs + region_num));
+	register_ioport_read(addr, size, 4, assigned_dev_ioport_readl,
+			     (void *) (r_dev->v_addrs + region_num));
+	register_ioport_write(addr, size, 1, assigned_dev_ioport_writeb,
+			      (void *) (r_dev->v_addrs + region_num));
+	register_ioport_write(addr, size, 2, assigned_dev_ioport_writew,
+			      (void *) (r_dev->v_addrs + region_num));
+	register_ioport_write(addr, size, 4, assigned_dev_ioport_writel,
+			      (void *) (r_dev->v_addrs + region_num));
+}
+
+static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address,
+					  uint32_t val, int len)
+{
+	int fd, r;
+
+	DEBUG("%s: (%x.%x): address=%04x val=0x%08x len=%d\n",
+	      __func__, ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+	      (uint16_t) address, val, len);
+
+	if (address == 0x4) {
+		pci_default_write_config(d, address, val, len);
+		/* Continue to program the card */
+	}
+
+	if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
+	    address == 0x3c || address == 0x3d) {
+		/* used for update-mappings (BAR emulation) */
+		pci_default_write_config(d, address, val, len);
+		return;
+	}
+	DEBUG("%s: NON BAR (%x.%x): address=%04x val=0x%08x len=%d\n",
+	      __func__, ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
+	      (uint16_t) address, val, len);
+	fd = ((AssignedDevice *)d)->real_device.config_fd;
+	r = lseek(fd, address, SEEK_SET);
+	if (r < 0) {
+		fprintf(stderr, "%s: bad seek, errno = %d\n",
+			__func__, errno);
+		return;
+	}
+again:
+	r = write(fd, &val, len);
+	if (r < 0) {
+		if (errno == EINTR || errno == EAGAIN)
+			goto again;
+		fprintf(stderr, "%s: write failed, errno = %d\n",
+			__func__, errno);
+	}
+}
+
+static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address,
+					     int len)
+{
+	uint32_t val = 0;
+	int fd, r;
+
+	if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
+	    address == 0x3c || address == 0x3d) {
+		val = pci_default_read_config(d, address, len);
+		DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
+		      (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val,
+		      len);
+		return val;
+	}
+
+	/* vga specific, remove later */
+	if (address == 0xFC)
+		goto do_log;
+
+	fd = ((AssignedDevice *)d)->real_device.config_fd;
+	r = lseek(fd, address, SEEK_SET);
+	if (r < 0) {
+		fprintf(stderr, "%s: bad seek, errno = %d\n",
+			__func__, errno);
+		return val;
+	}
+again:
+	r = read(fd, &val, len);
+	if (r < 0) {
+		if (errno == EINTR || errno == EAGAIN)
+			goto again;
+		fprintf(stderr, "%s: read failed, errno = %d\n",
+			__func__, errno);
+	}
+do_log:
+	DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
+	      (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
+
+	/* kill the special capabilities */
+	if (address == 4 && len == 4)
+		val &= ~0x100000;
+	else if (address == 6)
+		val &= ~0x10;
+
+	return val;
+}
+
+static int assigned_dev_register_regions(PCIRegion *io_regions,
+					 unsigned long regions_num,
+					 AssignedDevice *pci_dev)
+{
+	uint32_t i;
+	PCIRegion *cur_region = io_regions;
+
+	for (i = 0; i < regions_num; i++, cur_region++) {
+		if (!cur_region->valid)
+			continue;
+#ifdef DEVICE_ASSIGNMENT_DEBUG
+		pci_dev->v_addrs[i].debug |= DEVICE_ASSIGNMENT_DEBUG_MMIO
+					     | DEVICE_ASSIGNMENT_DEBUG_PIO;
+#endif
+		pci_dev->v_addrs[i].num = i;
+
+		/* handle memory io regions */
+		if (cur_region->type & IORESOURCE_MEM) {
+			int t = cur_region->type & IORESOURCE_PREFETCH
+				? PCI_ADDRESS_SPACE_MEM_PREFETCH
+				: PCI_ADDRESS_SPACE_MEM;
+
+			/* map physical memory */
+			pci_dev->v_addrs[i].e_physbase = cur_region->base_addr;
+			pci_dev->v_addrs[i].r_virtbase =
+				mmap(NULL,
+				     (cur_region->size + 0xFFF) & 0xFFFFF000,
+				     PROT_WRITE | PROT_READ, MAP_SHARED,
+				     cur_region->resource_fd, (off_t) 0);
+
+			if ((void *) -1 == pci_dev->v_addrs[i].r_virtbase) {
+				fprintf(stderr, "%s: Error: Couldn't mmap 0x%x!"
+					"\n", __func__,
+					(uint32_t) (cur_region->base_addr));
+				return -1;
+			}
+			pci_dev->v_addrs[i].r_size = cur_region->size;
+			pci_dev->v_addrs[i].e_size = 0;
+
+			/* add offset */
+			pci_dev->v_addrs[i].r_virtbase +=
+				(cur_region->base_addr & 0xFFF);
+
+			pci_register_io_region((PCIDevice *) pci_dev, i,
+					       cur_region->size, t,
+					       assigned_dev_iomem_map);
+			continue;
+		}
+		/* handle port io regions */
+		pci_register_io_region((PCIDevice *) pci_dev, i,
+				       cur_region->size, PCI_ADDRESS_SPACE_IO,
+				       assigned_dev_ioport_map);
+
+		pci_dev->v_addrs[i].e_physbase = cur_region->base_addr;
+		pci_dev->v_addrs[i].r_virtbase =
+			(void *)(long)cur_region->base_addr;
+		/* not relevant for port io */
+		pci_dev->v_addrs[i].memory_index = 0;
+	}
+
+	/* success */
+	return 0;
+}
+
+static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus,
+			   uint8_t r_dev, uint8_t r_func)
+{
+	char dir[128], name[128], comp[16];
+	int fd, r = 0;
+	FILE *f;
+	unsigned long long start, end, size, flags;
+	PCIRegion *rp;
+	PCIDevRegions *dev = &pci_dev->real_device;
+
+	dev->region_number = 0;
+
+	sprintf(dir, "/sys/bus/pci/devices/0000:%02x:%02x.%x/",
+		r_bus, r_dev, r_func);
+	strcpy(name, dir);
+	strcat(name, "config");
+	fd = open(name, O_RDWR);
+	if (fd == -1) {
+		fprintf(stderr, "%s: %s: %m\n", __func__, name);
+		return 1;
+	}
+	dev->config_fd = fd;
+again:
+	r = read(fd, pci_dev->dev.config, sizeof pci_dev->dev.config);
+	if (r < 0) {
+		if (errno == EINTR || errno == EAGAIN)
+			goto again;
+		fprintf(stderr, "%s: read failed, errno = %d\n",
+			__func__, errno);
+	}
+	strcpy(name, dir);
+	strcat(name, "resource");
+
+	f = fopen(name, "r");
+	if (f == NULL) {
+		fprintf(stderr, "%s: %s: %m\n", __func__, name);
+		return 1;
+	}
+	for (r = 0; fscanf(f, "%lli %lli %lli\n", &start, &end, &flags) == 3;
+	     r++) {
+		rp = dev->regions + r;
+		rp->valid = 0;
+		size = end - start + 1;
+		flags &= IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH;
+		if (size == 0 || (flags & ~IORESOURCE_PREFETCH) == 0)
+			continue;
+		if (flags & IORESOURCE_MEM) {
+			flags &= ~IORESOURCE_IO;
+			sprintf(comp, "resource%d", r);
+			strcpy(name, dir);
+			strcat(name, comp);
+			fd = open(name, O_RDWR);
+			if (fd == -1)
+				continue;		/* probably ROM */
+			rp->resource_fd = fd;
+		} else
+			flags &= ~IORESOURCE_PREFETCH;
+
+		rp->type = flags;
+		rp->valid = 1;
+		rp->base_addr = start;
+		rp->size = size;
+		DEBUG("%s: region %d size %d start 0x%x type %d "
+		      "resource_fd %d\n", __func__, r, rp->size, start,
+		      rp->type, rp->resource_fd);
+	}
+	fclose(f);
+
+	dev->region_number = r;
+	return 0;
+}
+
+#define	MAX_ASSIGNED_DEVS 4
+struct {
+	char name[15];
+	int bus;
+	int dev;
+	int func;
+	AssignedDevice *assigned_dev;
+} assigned_devices[MAX_ASSIGNED_DEVS];
+
+int nr_assigned_devices;
+static int disable_iommu;
+
+static uint32_t calc_assigned_dev_id(uint8_t bus, uint8_t devfn)
+{
+	return (uint32_t)bus << 8 | (uint32_t)devfn;
+}
+
+static AssignedDevice *register_real_device(PCIBus *e_bus,
+					    const char *e_dev_name,
+					    int e_devfn, uint8_t r_bus,
+					    uint8_t r_dev, uint8_t r_func)
+{
+	int r;
+	AssignedDevice *pci_dev;
+	uint8_t e_device, e_intx;
+
+	DEBUG("%s: Registering real physical device %s (devfn=0x%x)\n",
+	      __func__, e_dev_name, e_devfn);
+
+	pci_dev = (AssignedDevice *)
+		pci_register_device(e_bus, e_dev_name, sizeof(AssignedDevice),
+				    e_devfn, assigned_dev_pci_read_config,
+				    assigned_dev_pci_write_config);
+	if (NULL == pci_dev) {
+		fprintf(stderr, "%s: Error: Couldn't register real device %s\n",
+			__func__, e_dev_name);
+		return NULL;
+	}
+	if (get_real_device(pci_dev, r_bus, r_dev, r_func)) {
+		fprintf(stderr, "%s: Error: Couldn't get real device (%s)!\n",
+			__func__, e_dev_name);
+		goto out;
+	}
+
+	/* handle real device's MMIO/PIO BARs */
+	if (assigned_dev_register_regions(pci_dev->real_device.regions,
+					  pci_dev->real_device.region_number,
+					  pci_dev))
+		goto out;
+
+	/* handle interrupt routing */
+	e_device = (pci_dev->dev.devfn >> 3) & 0x1f;
+	e_intx = pci_dev->dev.config[0x3d] - 1;
+	pci_dev->intpin = e_intx;
+	pci_dev->run = 0;
+	pci_dev->girq = 0;
+	pci_dev->h_busnr = r_bus;
+	pci_dev->h_devfn = PCI_DEVFN(r_dev, r_func);
+
+#ifdef KVM_CAP_DEVICE_ASSIGNMENT
+	if (kvm_enabled()) {
+		struct kvm_assigned_pci_dev assigned_dev_data;
+
+		memset(&assigned_dev_data, 0, sizeof(assigned_dev_data));
+		assigned_dev_data.assigned_dev_id  =
+			calc_assigned_dev_id(pci_dev->h_busnr,
+					     (uint32_t)pci_dev->h_devfn);
+		assigned_dev_data.busnr = pci_dev->h_busnr;
+		assigned_dev_data.devfn = pci_dev->h_devfn;
+
+#ifdef KVM_CAP_IOMMU
+		/* We always enable the IOMMU if present
+		 * (or when not disabled on the command line)
+		 */
+		r = kvm_check_extension(kvm_context, KVM_CAP_IOMMU);
+		if (r && !disable_iommu)
+			assigned_dev_data.flags |= KVM_DEV_ASSIGN_ENABLE_IOMMU;
+#endif
+		r = kvm_assign_pci_device(kvm_context, &assigned_dev_data);
+		if (r < 0) {
+			fprintf(stderr, "Could not notify kernel about "
+				"assigned device \"%s\"\n", e_dev_name);
+			perror("pt-ioctl");
+			goto out;
+		}
+	}
+#endif
+	fprintf(stderr, "Registered host PCI device %02x:%02x.%1x "
+		"(\"%s\") as guest device %02x:%02x.%1x\n",
+		r_bus, r_dev, r_func, e_dev_name,
+		pci_bus_num(e_bus), e_device, r_func);
+
+	return pci_dev;
+out:
+	pci_unregister_device(&pci_dev->dev);
+	return NULL;
+}
+
+extern int get_param_value(char *buf, int buf_size,
+			   const char *tag, const char *str);
+extern int piix_get_irq(int);
+
+#ifdef KVM_CAP_DEVICE_ASSIGNMENT
+/* The pci config space got updated. Check if irq numbers have changed
+ * for our devices
+ */
+void assigned_dev_update_irq(PCIDevice *d)
+{
+	int i, irq, r;
+	AssignedDevice *assigned_dev;
+
+	for (i = 0; i < nr_assigned_devices; i++) {
+		assigned_dev = assigned_devices[i].assigned_dev;
+		if (assigned_dev == NULL)
+			continue;
+
+		irq = pci_map_irq(&assigned_dev->dev, assigned_dev->intpin);
+		irq = piix_get_irq(irq);
+
+		if (irq != assigned_dev->girq) {
+			struct kvm_assigned_irq assigned_irq_data;
+
+			memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
+			assigned_irq_data.assigned_dev_id  =
+				calc_assigned_dev_id(assigned_dev->h_busnr,
+						     (uint8_t)
+						     assigned_dev->h_devfn);
+			assigned_irq_data.guest_irq = irq;
+			assigned_irq_data.host_irq =
+				assigned_dev->real_device.irq;
+			r = kvm_assign_irq(kvm_context, &assigned_irq_data);
+			if (r < 0) {
+				perror("assigned_dev_update_irq");
+				fprintf(stderr, "Are you assigning a device "
+					"that shares IRQ with some other "
+					"device?\n");
+				pci_unregister_device(&assigned_dev->dev);
+				continue;
+			}
+			assigned_dev->girq = irq;
+		}
+	}
+}
+#endif
+
+static int init_device_assignment(void)
+{
+	/* Do we have any devices to be assigned? */
+	if (nr_assigned_devices == 0)
+		return -1;
+	iopl(3);
+	return 0;
+}
+
+struct PCIDevice *init_assigned_device(PCIBus *bus, int *index)
+{
+	AssignedDevice *dev = NULL;
+	int i;
+
+	if (*index == -1) {
+		if (init_device_assignment() < 0)
+			return NULL;
+
+		*index = nr_assigned_devices - 1;
+	}
+	i = *index;
+	dev = register_real_device(bus, assigned_devices[i].name, -1,
+				   assigned_devices[i].bus,
+				   assigned_devices[i].dev,
+				   assigned_devices[i].func);
+	if (dev == NULL) {
+		fprintf(stderr, "Error: Couldn't register device \"%s\"\n",
+			assigned_devices[i].name);
+	}
+	assigned_devices[i].assigned_dev = dev;
+
+	--*index;
+	return &dev->dev;
+}
+
+/*
+ * Syntax to assign device:
+ *
+ * -pcidevice dev=bus:dev.func,dma=dma
+ *
+ * Example:
+ * -pcidevice host=00:13.0,dma=pvdma
+ *
+ * dma can currently only be 'none' to disable iommu support.
+ */
+void add_assigned_device(const char *arg)
+{
+	char *cp, *cp1;
+	char device[8];
+	char dma[6];
+	int r;
+
+	if (nr_assigned_devices >= MAX_ASSIGNED_DEVS) {
+		fprintf(stderr, "Too many assigned devices (max %d)\n",
+			MAX_ASSIGNED_DEVS);
+		return;
+	}
+	memset(&assigned_devices[nr_assigned_devices], 0,
+	       sizeof assigned_devices[nr_assigned_devices]);
+
+	r = get_param_value(device, sizeof device, "host", arg);
+
+	r = get_param_value(assigned_devices[nr_assigned_devices].name,
+			    sizeof assigned_devices[nr_assigned_devices].name,
+			    "name", arg);
+	if (!r)
+		strncpy(assigned_devices[nr_assigned_devices].name, device, 8);
+
+#ifdef KVM_CAP_IOMMU
+	r = get_param_value(dma, sizeof dma, "dma", arg);
+	if (r && !strncmp(dma, "none", 4))
+		disable_iommu = 1;
+#endif
+	cp = device;
+	assigned_devices[nr_assigned_devices].bus = strtoul(cp, &cp1, 16);
+	if (*cp1 != ':')
+		goto bad;
+	cp = cp1 + 1;
+
+	assigned_devices[nr_assigned_devices].dev = strtoul(cp, &cp1, 16);
+	if (*cp1 != '.')
+		goto bad;
+	cp = cp1 + 1;
+
+	assigned_devices[nr_assigned_devices].func = strtoul(cp, &cp1, 16);
+
+	nr_assigned_devices++;
+	return;
+bad:
+	fprintf(stderr, "pcidevice argument parse error; "
+		"please check the help text for usage\n");
+}
diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h
new file mode 100644
index 0000000..b77e484
--- /dev/null
+++ b/qemu/hw/device-assignment.h
@@ -0,0 +1,93 @@
+/*
+ * Copyright (c) 2007, Neocleus Corporation.
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ *  Data structures for storing PCI state
+ *
+ *  Adapted to kvm by Qumranet
+ *
+ *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
+ *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
+ *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
+ *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
+ */
+
+#ifndef __DEVICE_ASSIGNMENT_H__
+#define __DEVICE_ASSIGNMENT_H__
+
+#include <sys/mman.h>
+#include "qemu-common.h"
+#include "pci.h"
+#include <linux/types.h>
+
+#define DEVICE_ASSIGNMENT_DEBUG_PIO	(0x01)
+#define DEVICE_ASSIGNMENT_DEBUG_MMIO	(0x02)
+
+/* From include/linux/pci.h in the kernel sources */
+#define PCI_DEVFN(slot, func)	((((slot) & 0x1f) << 3) | ((func) & 0x07))
+
+typedef uint32_t pciaddr_t;
+
+#define MAX_IO_REGIONS			(6)
+
+typedef struct pci_region_s {
+	int type;	/* Memory or port I/O */
+	int valid;
+	pciaddr_t base_addr;
+	pciaddr_t size;		/* size of the region */
+	int resource_fd;
+} PCIRegion;
+
+typedef struct pci_dev_s {
+	uint8_t bus, dev, func;	/* Bus inside domain, device and function */
+	int irq;		/* IRQ number */
+	uint16_t region_number;	/* number of active regions */
+
+	/* Port I/O or MMIO Regions */
+	PCIRegion regions[MAX_IO_REGIONS];
+	int config_fd;
+} PCIDevRegions;
+
+typedef struct assigned_dev_region_s {
+	target_phys_addr_t e_physbase;
+	uint32_t memory_index;
+	void *r_virtbase;	/* mmapped access address */
+	int num;		/* our index within v_addrs[] */
+	uint32_t e_size;        /* emulated size of region in bytes */
+	uint32_t r_size;        /* real size of region in bytes */
+	uint32_t debug;
+} AssignedDevRegion;
+
+typedef struct assigned_dev_s {
+	PCIDevice dev;
+	int intpin;
+	uint8_t debug_flags;
+	AssignedDevRegion v_addrs[PCI_NUM_REGIONS];
+	PCIDevRegions real_device;
+	int run;
+	int girq;
+	unsigned char h_busnr;
+	unsigned int h_devfn;
+	int bound;
+} AssignedDevice;
+
+/* Initialization functions */
+PCIDevice *init_assigned_device(PCIBus *bus, int *index);
+void add_assigned_device(const char *arg);
+void assigned_dev_set_vector(int irq, int vector);
+void assigned_dev_ack_mirq(int vector);
+
+#endif				/* __DEVICE_ASSIGNMENT_H__ */
diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c
index 6053103..4a611cc 100644
--- a/qemu/hw/pc.c
+++ b/qemu/hw/pc.c
@@ -32,6 +32,7 @@
 #include "smbus.h"
 #include "boards.h"
 #include "console.h"
+#include "device-assignment.h"
 
 #include "qemu-kvm.h"
 
@@ -1006,6 +1007,14 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size,
         }
     }
 
+    /* Initialize assigned devices */
+    if (pci_enabled) {
+        int r = -1;
+        do {
+            init_assigned_device(pci_bus, &r);
+	} while (r >= 0);
+    }
+
     rtc_state = rtc_init(0x70, i8259[8]);
 
     qemu_register_boot_set(pc_boot_set, rtc_state);
diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
index 61ff0f6..e4e8386 100644
--- a/qemu/hw/pci.c
+++ b/qemu/hw/pci.c
@@ -50,6 +50,7 @@ struct PCIBus {
 
 static void pci_update_mappings(PCIDevice *d);
 static void pci_set_irq(void *opaque, int irq_num, int level);
+void assigned_dev_update_irq(PCIDevice *d);
 
 target_phys_addr_t pci_mem_base;
 static int pci_irq_index;
@@ -453,6 +454,12 @@ void pci_default_write_config(PCIDevice *d,
         val >>= 8;
     }
 
+#ifdef KVM_CAP_DEVICE_ASSIGNMENT
+    if (kvm_enabled() && qemu_kvm_irqchip_in_kernel() &&
+	address >= 0x60 && address <= 0x63)
+	assigned_dev_update_irq(d);
+#endif
+
     end = address + len;
     if (end > PCI_COMMAND && address < (PCI_COMMAND + 2)) {
         /* if the command register is modified, we must modify the mappings */
diff --git a/qemu/vl.c b/qemu/vl.c
index 2fb8552..83f28c5 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -37,6 +37,7 @@
 #include "qemu-char.h"
 #include "block.h"
 #include "audio/audio.h"
+#include "hw/device-assignment.h"
 #include "migration.h"
 #include "balloon.h"
 #include "qemu-kvm.h"
@@ -8469,6 +8470,12 @@ static void help(int exitcode)
 #endif
 	   "-no-kvm-irqchip disable KVM kernel mode PIC/IOAPIC/LAPIC\n"
 	   "-no-kvm-pit	    disable KVM kernel mode PIT\n"
+#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
+	   "-pcidevice host=bus:dev.func[,dma=none][,name=\"string\"]\n"
+	   "                expose a PCI device to the guest OS.\n"
+	   "                dma=none: don't perform any dma translations (default is to use an iommu)\n"
+	   "                'string' is used in log output.\n"
+#endif
 #endif
 #ifdef TARGET_I386
            "-std-vga        simulate a standard VGA card with VESA Bochs Extensions\n"
@@ -8592,6 +8599,9 @@ enum {
     QEMU_OPTION_no_kvm,
     QEMU_OPTION_no_kvm_irqchip,
     QEMU_OPTION_no_kvm_pit,
+#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
+    QEMU_OPTION_pcidevice,
+#endif
     QEMU_OPTION_no_reboot,
     QEMU_OPTION_no_shutdown,
     QEMU_OPTION_show_cursor,
@@ -8680,6 +8690,9 @@ const QEMUOption qemu_options[] = {
 #endif
     { "no-kvm-irqchip", 0, QEMU_OPTION_no_kvm_irqchip },
     { "no-kvm-pit", 0, QEMU_OPTION_no_kvm_pit },
+#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
+    { "pcidevice", HAS_ARG, QEMU_OPTION_pcidevice },
+#endif
 #endif
 #if defined(TARGET_PPC) || defined(TARGET_SPARC)
     { "g", 1, QEMU_OPTION_g },
@@ -9586,6 +9599,11 @@ int main(int argc, char **argv)
 		kvm_pit = 0;
 		break;
 	    }
+#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
+	    case QEMU_OPTION_pcidevice:
+		add_assigned_device(optarg);
+		break;
+#endif
 #endif
             case QEMU_OPTION_usb:
                 usb_enabled = 1;
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 6/7] KVM/userspace: Build vtd.c for Intel IOMMU support
  2008-09-23 14:54         ` [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests Amit Shah
@ 2008-09-23 14:54           ` Amit Shah
  2008-09-23 14:54             ` [PATCH 7/7] KVM/userspace: Device Assignment: Support for hot plugging PCI devices Amit Shah
  2008-09-23 16:31             ` [PATCH 6/7] KVM/userspace: Build vtd.c for Intel IOMMU support Anthony Liguori
  2008-09-23 16:30           ` [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests Anthony Liguori
                             ` (2 subsequent siblings)
  3 siblings, 2 replies; 31+ messages in thread
From: Amit Shah @ 2008-09-23 14:54 UTC (permalink / raw)
  To: avi; +Cc: kvm, muli, anthony, benami, weidong.han, allen.m.kay, Amit Shah

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 kernel/x86/Kbuild |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/x86/Kbuild b/kernel/x86/Kbuild
index 8dc0483..a4cd00c 100644
--- a/kernel/x86/Kbuild
+++ b/kernel/x86/Kbuild
@@ -5,6 +5,9 @@ kvm-objs := kvm_main.o x86.o mmu.o x86_emulate.o ../anon_inodes.o irq.o i8259.o
 ifeq ($(CONFIG_KVM_TRACE),y)
 kvm-objs += kvm_trace.o
 endif
+ifeq ($(CONFIG_DMAR),y)
+kvm-objs += vtd.o
+endif
 kvm-intel-objs := vmx.o vmx-debug.o ../external-module-compat.o
 kvm-amd-objs := svm.o ../external-module-compat.o
 
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 7/7] KVM/userspace: Device Assignment: Support for hot plugging PCI devices
  2008-09-23 14:54           ` [PATCH 6/7] KVM/userspace: Build vtd.c for Intel IOMMU support Amit Shah
@ 2008-09-23 14:54             ` Amit Shah
  2008-09-23 16:32               ` Anthony Liguori
  2008-09-23 16:31             ` [PATCH 6/7] KVM/userspace: Build vtd.c for Intel IOMMU support Anthony Liguori
  1 sibling, 1 reply; 31+ messages in thread
From: Amit Shah @ 2008-09-23 14:54 UTC (permalink / raw)
  To: avi; +Cc: kvm, muli, anthony, benami, weidong.han, allen.m.kay, Amit Shah

This patch adds support for hot-plugging host PCI devices into
guests

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 qemu/hw/device-hotplug.c |   19 +++++++++++++++++++
 qemu/monitor.c           |    2 +-
 2 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/qemu/hw/device-hotplug.c b/qemu/hw/device-hotplug.c
index 8e2bc35..6d2ab8e 100644
--- a/qemu/hw/device-hotplug.c
+++ b/qemu/hw/device-hotplug.c
@@ -6,6 +6,7 @@
 #include "pc.h"
 #include "console.h"
 #include "block_int.h"
+#include "device-assignment.h"
 
 #define PCI_BASE_CLASS_STORAGE          0x01
 #define PCI_BASE_CLASS_NETWORK          0x02
@@ -27,6 +28,22 @@ static PCIDevice *qemu_system_hot_add_nic(const char *opts, int bus_nr)
     return pci_nic_init (pci_bus, &nd_table[ret], -1);
 }
 
+static PCIDevice *qemu_system_hot_assign_device(const char *opts, int bus_nr)
+{
+    int index;
+    PCIBus *pci_bus;
+
+    pci_bus = pci_find_bus(bus_nr);
+    if (!pci_bus) {
+        term_printf ("Can't find pci_bus %d\n", bus_nr);
+        return NULL;
+    }
+    add_assigned_device(opts);
+
+    index = -1;
+    return init_assigned_device(pci_bus, &index);
+}
+
 static int add_init_drive(const char *opts)
 {
     int drive_opt_idx, drive_idx;
@@ -143,6 +160,8 @@ void device_hot_add(int pcibus, const char *type, const char *opts)
         dev = qemu_system_hot_add_nic(opts, pcibus);
     else if (strcmp(type, "storage") == 0)
         dev = qemu_system_hot_add_storage(opts, pcibus);
+    else if (strcmp(type, "assigned") == 0)
+        dev = qemu_system_hot_assign_device(opts, pcibus);
     else
         term_printf("invalid type: %s\n", type);
 
diff --git a/qemu/monitor.c b/qemu/monitor.c
index 2619fdd..6cf5e8a 100644
--- a/qemu/monitor.c
+++ b/qemu/monitor.c
@@ -1516,7 +1516,7 @@ static term_cmd_t term_cmds[] = {
                                         "[,cyls=c,heads=h,secs=s[,trans=t]]\n"
                                         "[snapshot=on|off][,cache=on|off]",
                                         "add drive to PCI storage controller" },
-    { "pci_add", "iss", device_hot_add, "bus nic|storage [[vlan=n][,macaddr=addr][,model=type]] [file=file][,if=type][,bus=nr]...", "hot-add PCI device" },
+    { "pci_add", "iss", device_hot_add, "bus nic|storage|assigned [[vlan=n][,macaddr=addr][,model=type]] [file=file][,if=type][,bus=nr]... [host=02:00.0[,name=string][,dma=none]" "hot-add PCI device" },
     { "pci_del", "ii", device_hot_remove, "bus slot-number", "hot remove PCI device" },
 #endif
     { "balloon", "i", do_balloon,
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/7] qemu: Introduce pci_map_irq to get irq nr from pin number for a PCI device
  2008-09-23 14:54   ` [PATCH 2/7] qemu: Introduce pci_map_irq to get irq nr from pin number for a PCI device Amit Shah
  2008-09-23 14:54     ` [PATCH 3/7] qemu: piix: Introduce functions to get pin number from irq and vice versa Amit Shah
@ 2008-09-23 16:12     ` Anthony Liguori
  1 sibling, 0 replies; 31+ messages in thread
From: Anthony Liguori @ 2008-09-23 16:12 UTC (permalink / raw)
  To: Amit Shah; +Cc: avi, kvm, muli, benami, weidong.han, allen.m.kay

Amit Shah wrote:
> Signed-off-by: Amit Shah <amit.shah@redhat.com>
> ---
>  qemu/hw/pci.c |    5 +++++
>  qemu/hw/pci.h |    1 +
>  2 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
> index 07d37a8..61ff0f6 100644
> --- a/qemu/hw/pci.c
> +++ b/qemu/hw/pci.c
> @@ -560,6 +560,11 @@ static void pci_set_irq(void *opaque, int irq_num, int level)
>      bus->set_irq(bus->irq_opaque, irq_num, bus->irq_count[irq_num] != 0);
>  }
>  
> +int pci_map_irq(PCIDevice *pci_dev, int pin)
> +{
> +	return pci_dev->bus->map_irq(pci_dev, pin);
>   

Formatting is wrong.

Regards,

Anthony Liguori

> +}
> +
>  /***********************************************************/
>  /* monitor info on PCI */
>  
> diff --git a/qemu/hw/pci.h b/qemu/hw/pci.h
> index 60e4094..e11fbbf 100644
> --- a/qemu/hw/pci.h
> +++ b/qemu/hw/pci.h
> @@ -81,6 +81,7 @@ void pci_register_io_region(PCIDevice *pci_dev, int region_num,
>                              uint32_t size, int type,
>                              PCIMapIORegionFunc *map_func);
>  
> +int pci_map_irq(PCIDevice *pci_dev, int pin);
>  uint32_t pci_default_read_config(PCIDevice *d,
>                                   uint32_t address, int len);
>  void pci_default_write_config(PCIDevice *d,
>   


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] qemu: piix: Introduce functions to get pin number from irq and vice versa
  2008-09-23 14:54     ` [PATCH 3/7] qemu: piix: Introduce functions to get pin number from irq and vice versa Amit Shah
  2008-09-23 14:54       ` [PATCH 4/7] qemu: Include hw.h in qemu/hw/isa.h to fix compile issues Amit Shah
@ 2008-09-23 16:13       ` Anthony Liguori
  2008-09-24  4:28         ` Amit Shah
  1 sibling, 1 reply; 31+ messages in thread
From: Anthony Liguori @ 2008-09-23 16:13 UTC (permalink / raw)
  To: Amit Shah; +Cc: avi, kvm, muli, benami, weidong.han, allen.m.kay

Amit Shah wrote:
> Signed-off-by: Amit Shah <amit.shah@redhat.com>
> ---
>  qemu/hw/piix_pci.c |   19 +++++++++++++++++++
>  1 files changed, 19 insertions(+), 0 deletions(-)
>
> diff --git a/qemu/hw/piix_pci.c b/qemu/hw/piix_pci.c
> index 6fbf47b..dc12c8a 100644
> --- a/qemu/hw/piix_pci.c
> +++ b/qemu/hw/piix_pci.c
> @@ -243,6 +243,25 @@ static void piix3_set_irq(qemu_irq *pic, int irq_num, int level)
>      }
>  }
>  
> +int piix3_get_pin(int pic_irq)
> +{
> +    int i;
> +    for (i = 0; i < 4; i++)
> +        if (piix3_dev->config[0x60+i] == pic_irq)
> +            return i;
> +    return -1;
> +}
> +
> +int piix_get_irq(int pin)
> +{
> +    if (piix3_dev)
> +        return piix3_dev->config[0x60+pin];
> +    if (piix4_dev)
> +        return piix4_dev->config[0x60+pin];
> +
> +    return 0;
> +}
> +
>   

If these are being exported, don't they need to be declared in a header?

Regards,

Anthony Liguori

>  static void piix3_reset(PCIDevice *d)
>  {
>      uint8_t *pci_conf = d->config;
>   


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] qemu: Include hw.h in qemu/hw/isa.h to fix compile issues
  2008-09-23 14:54       ` [PATCH 4/7] qemu: Include hw.h in qemu/hw/isa.h to fix compile issues Amit Shah
  2008-09-23 14:54         ` [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests Amit Shah
@ 2008-09-23 16:13         ` Anthony Liguori
  2008-09-24  4:27           ` Amit Shah
  1 sibling, 1 reply; 31+ messages in thread
From: Anthony Liguori @ 2008-09-23 16:13 UTC (permalink / raw)
  To: Amit Shah; +Cc: avi, kvm, muli, benami, weidong.han, allen.m.kay

Amit Shah wrote:
> Signed-off-by: Amit Shah <amit.shah@redhat.com>
> ---
>  qemu/hw/isa.h |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/qemu/hw/isa.h b/qemu/hw/isa.h
> index 222e4f3..e4a1326 100644
> --- a/qemu/hw/isa.h
> +++ b/qemu/hw/isa.h
> @@ -2,6 +2,8 @@
>  #define HW_ISA_H
>  /* ISA bus */
>  
> +#include "hw.h"
> +
>  extern target_phys_addr_t isa_mem_base;
>  
>  int register_ioport_read(int start, int length, int size,
>   

What compile issues?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests
  2008-09-23 14:54         ` [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests Amit Shah
  2008-09-23 14:54           ` [PATCH 6/7] KVM/userspace: Build vtd.c for Intel IOMMU support Amit Shah
@ 2008-09-23 16:30           ` Anthony Liguori
  2008-09-23 18:32             ` Muli Ben-Yehuda
  2008-09-24  4:58             ` Amit Shah
  2008-09-25  4:54           ` Yang, Sheng
  2008-09-26  1:34           ` Yang, Sheng
  3 siblings, 2 replies; 31+ messages in thread
From: Anthony Liguori @ 2008-09-23 16:30 UTC (permalink / raw)
  To: Amit Shah; +Cc: avi, kvm, muli, benami, weidong.han, allen.m.kay

Amit Shah wrote:
> From: Or Sagi <ors@tutis.com>
> From: Nir Peleg <nir@tutis.com>
> From: Amit Shah <amit.shah@redhat.com>
> From: Ben-Ami Yassour <benami@il.ibm.com>
> From: Weidong Han <weidong.han@intel.com>
> From: Glauber de Oliveira Costa <gcosta@redhat.com>
>
> With this patch, we can assign a device on the host machine to a
> guest.
>
> A new command-line option, -pcidevice is added.
> For example, to invoke it for a device sitting at PCI bus:dev.fn
> 04:08.0, use this:
>
>         -pcidevice host=04:08.0
>
> * The host driver for the device, if any, is to be removed before
> assigning the device (else device assignment will fail).
>
> * A device that shares IRQ with another host device cannot currently
> be assigned.
>
> This works only with the in-kernel irqchip method; to use the
> userspace irqchip, a kernel module (irqhook) and some extra changes
> are needed.
>
> Signed-off-by: Amit Shah <amit.shah@redhat.com>
> ---
>  qemu/Makefile.target        |    1 +
>  qemu/hw/device-assignment.c |  665 +++++++++++++++++++++++++++++++++++++++++++
>  qemu/hw/device-assignment.h |   93 ++++++
>  qemu/hw/pc.c                |    9 +
>  qemu/hw/pci.c               |    7 +
>  qemu/vl.c                   |   18 ++
>  6 files changed, 793 insertions(+), 0 deletions(-)
>  create mode 100644 qemu/hw/device-assignment.c
>  create mode 100644 qemu/hw/device-assignment.h
>
> diff --git a/qemu/Makefile.target b/qemu/Makefile.target
> index 72f3db8..40eb273 100644
> --- a/qemu/Makefile.target
> +++ b/qemu/Makefile.target
> @@ -616,6 +616,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
>  OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
>  OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
>  OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o
> +OBJS+= device-assignment.o
>   

This needs to be conditional on at least linux hosts, but probably also 
kvm support.

>  ifeq ($(USE_KVM_PIT), 1)
>  OBJS+= i8254-kvm.o
>  endif
> diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
> new file mode 100644
> index 0000000..e70daf2
> --- /dev/null
> +++ b/qemu/hw/device-assignment.c
> @@ -0,0 +1,665 @@
> +/*
> + * Copyright (c) 2007, Neocleus Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + *
> + *
> + *  Assign a PCI device from the host to a guest VM.
> + *
> + *  Adapted for KVM by Qumranet.
> + *
> + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
> + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
> + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
> + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
> + */
> +#include <stdio.h>
> +#include <sys/io.h>
> +#include "qemu-kvm.h"
> +#include <linux/kvm_para.h>
> +#include "device-assignment.h"
> +
> +/* From linux/ioport.h */
> +#define IORESOURCE_IO		0x00000100	/* Resource type */
> +#define IORESOURCE_MEM		0x00000200
> +#define IORESOURCE_IRQ		0x00000400
> +#define IORESOURCE_DMA		0x00000800
> +#define IORESOURCE_PREFETCH	0x00001000	/* No side effects */
> +
> +/* #define DEVICE_ASSIGNMENT_DEBUG */
> +
> +#ifdef DEVICE_ASSIGNMENT_DEBUG
> +#define DEBUG(fmt, args...) fprintf(stderr, "%s: " fmt, __func__ , ## args)
> +#else
> +#define DEBUG(fmt, args...)
> +#endif
>   

Both should be in do { } while(0) to preserve statement semantics.  
Please use C99 variadic macros too.

> +static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr,
> +				       uint32_t value)
> +{
> +	AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
> +	uint32_t r_pio = (unsigned long)r_access->r_virtbase
>   

Should be target_ulong if it's a target virtual address.

> +		+ (addr - r_access->e_physbase);
> +
> +	if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
> +		fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x"
> +			" r_virtbase=%08lx value=%08x\n",
> +			__func__, r_pio, (int)r_access->e_physbase,
> +			(unsigned long)r_access->r_virtbase, value);
> +	}
> +	iopl(3);
> +	outb(value, r_pio);
>   

The formatting is wrong for this entire file.  Also, you shouldn't have 
device specific debug.  Should probably error check iopl(3).  It's not 
necessary to call it every time you do an outb, just once when initialized.

> +static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
> +			 uint32_t e_phys, uint32_t e_size, int type)
> +{
> +	AssignedDevice *r_dev = (AssignedDevice *) pci_dev;
> +	AssignedDevRegion *region = &r_dev->v_addrs[region_num];
> +	int first_map = (region->e_size == 0);
> +	int ret = 0;
> +
> +	DEBUG("e_phys=%08x r_virt=%p type=%d len=%08x region_num=%d \n",
> +	      e_phys, r_dev->v_addrs[region_num].r_virtbase, type, e_size,
> +	      region_num);
> +
> +	region->e_physbase = e_phys;
> +	region->e_size = e_size;
> +
> +	/* FIXME: Add support for emulated MMIO for non-kvm guests */
> +	if (kvm_enabled()) {
>   

This doesn't work at all if kvm isn't enabled right?  You should 
probably bail out in the init if kvm isn't enabled.  If this whole file 
is included conditionally based on KVM support, then you don't have to 
worry about using kvm_enabled() guards to conditionally compile out code.

> +static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address,
> +					  uint32_t val, int len)
> +{
> +	int fd, r;
> +
> +	DEBUG("%s: (%x.%x): address=%04x val=0x%08x len=%d\n",
> +	      __func__, ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
> +	      (uint16_t) address, val, len);
> +
> +	if (address == 0x4) {
> +		pci_default_write_config(d, address, val, len);
> +		/* Continue to program the card */
> +	}
> +
> +	if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
> +	    address == 0x3c || address == 0x3d) {
> +		/* used for update-mappings (BAR emulation) */
> +		pci_default_write_config(d, address, val, len);
> +		return;
> +	}
> +	DEBUG("%s: NON BAR (%x.%x): address=%04x val=0x%08x len=%d\n",
> +	      __func__, ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
> +	      (uint16_t) address, val, len);
> +	fd = ((AssignedDevice *)d)->real_device.config_fd;
> +	r = lseek(fd, address, SEEK_SET);
> +	if (r < 0) {
> +		fprintf(stderr, "%s: bad seek, errno = %d\n",
> +			__func__, errno);
> +		return;
> +	}
> +again:
> +	r = write(fd, &val, len);
> +	if (r < 0) {
> +		if (errno == EINTR || errno == EAGAIN)
> +			goto again;
> +		fprintf(stderr, "%s: write failed, errno = %d\n",
> +			__func__, errno);
> +	}
> +}
>   

Things may be simplified by doing pwrite/pread here.

> +static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address,
> +					     int len)
> +{
> +	uint32_t val = 0;
> +	int fd, r;
> +
> +	if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
> +	    address == 0x3c || address == 0x3d) {
> +		val = pci_default_read_config(d, address, len);
> +		DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> +		      (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val,
> +		      len);
> +		return val;
> +	}
> +
> +	/* vga specific, remove later */
> +	if (address == 0xFC)
> +		goto do_log;
> +
> +	fd = ((AssignedDevice *)d)->real_device.config_fd;
> +	r = lseek(fd, address, SEEK_SET);
> +	if (r < 0) {
> +		fprintf(stderr, "%s: bad seek, errno = %d\n",
> +			__func__, errno);
> +		return val;
> +	}
> +again:
> +	r = read(fd, &val, len);
> +	if (r < 0) {
> +		if (errno == EINTR || errno == EAGAIN)
> +			goto again;
> +		fprintf(stderr, "%s: read failed, errno = %d\n",
> +			__func__, errno);
> +	}
> +do_log:
> +	DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> +	      (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
> +
> +	/* kill the special capabilities */
> +	if (address == 4 && len == 4)
> +		val &= ~0x100000;
> +	else if (address == 6)
> +		val &= ~0x10;
> +
> +	return val;
> +}
> +
> +static int assigned_dev_register_regions(PCIRegion *io_regions,
> +					 unsigned long regions_num,
> +					 AssignedDevice *pci_dev)
> +{
> +	uint32_t i;
> +	PCIRegion *cur_region = io_regions;
> +
> +	for (i = 0; i < regions_num; i++, cur_region++) {
> +		if (!cur_region->valid)
> +			continue;
> +#ifdef DEVICE_ASSIGNMENT_DEBUG
> +		pci_dev->v_addrs[i].debug |= DEVICE_ASSIGNMENT_DEBUG_MMIO
> +					     | DEVICE_ASSIGNMENT_DEBUG_PIO;
> +#endif
> +		pci_dev->v_addrs[i].num = i;
> +
> +		/* handle memory io regions */
> +		if (cur_region->type & IORESOURCE_MEM) {
> +			int t = cur_region->type & IORESOURCE_PREFETCH
> +				? PCI_ADDRESS_SPACE_MEM_PREFETCH
> +				: PCI_ADDRESS_SPACE_MEM;
> +
> +			/* map physical memory */
> +			pci_dev->v_addrs[i].e_physbase = cur_region->base_addr;
> +			pci_dev->v_addrs[i].r_virtbase =
> +				mmap(NULL,
> +				     (cur_region->size + 0xFFF) & 0xFFFFF000,
> +				     PROT_WRITE | PROT_READ, MAP_SHARED,
> +				     cur_region->resource_fd, (off_t) 0);
> +
> +			if ((void *) -1 == pci_dev->v_addrs[i].r_virtbase) {
> +				fprintf(stderr, "%s: Error: Couldn't mmap 0x%x!"
> +					"\n", __func__,
> +					(uint32_t) (cur_region->base_addr));
> +				return -1;
> +			}
> +			pci_dev->v_addrs[i].r_size = cur_region->size;
> +			pci_dev->v_addrs[i].e_size = 0;
> +
> +			/* add offset */
> +			pci_dev->v_addrs[i].r_virtbase +=
> +				(cur_region->base_addr & 0xFFF);
> +
> +			pci_register_io_region((PCIDevice *) pci_dev, i,
> +					       cur_region->size, t,
> +					       assigned_dev_iomem_map);
> +			continue;
> +		}
> +		/* handle port io regions */
> +		pci_register_io_region((PCIDevice *) pci_dev, i,
> +				       cur_region->size, PCI_ADDRESS_SPACE_IO,
> +				       assigned_dev_ioport_map);
> +
> +		pci_dev->v_addrs[i].e_physbase = cur_region->base_addr;
> +		pci_dev->v_addrs[i].r_virtbase =
> +			(void *)(long)cur_region->base_addr;
> +		/* not relevant for port io */
> +		pci_dev->v_addrs[i].memory_index = 0;
> +	}
> +
> +	/* success */
> +	return 0;
> +}
> +
> +static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus,
> +			   uint8_t r_dev, uint8_t r_func)
> +{
> +	char dir[128], name[128], comp[16];
> +	int fd, r = 0;
> +	FILE *f;
> +	unsigned long long start, end, size, flags;
> +	PCIRegion *rp;
> +	PCIDevRegions *dev = &pci_dev->real_device;
> +
> +	dev->region_number = 0;
> +
> +	sprintf(dir, "/sys/bus/pci/devices/0000:%02x:%02x.%x/",
> +		r_bus, r_dev, r_func);
>   

snprintf()

> +	strcpy(name, dir);
> +	strcat(name, "config");
>   

snprintf()

> +	fd = open(name, O_RDWR);
> +	if (fd == -1) {
> +		fprintf(stderr, "%s: %s: %m\n", __func__, name);
> +		return 1;
> +	}
> +	dev->config_fd = fd;
> +again:
> +	r = read(fd, pci_dev->dev.config, sizeof pci_dev->dev.config);
>   

Please use parens with sizeof().

> +	if (r < 0) {
> +		if (errno == EINTR || errno == EAGAIN)
> +			goto again;
> +		fprintf(stderr, "%s: read failed, errno = %d\n",
> +			__func__, errno);
> +	}
> +	strcpy(name, dir);
> +	strcat(name, "resource");
>   

snprintf()

> +	f = fopen(name, "r");
> +	if (f == NULL) {
> +		fprintf(stderr, "%s: %s: %m\n", __func__, name);
> +		return 1;
> +	}
> +	for (r = 0; fscanf(f, "%lli %lli %lli\n", &start, &end, &flags) == 3;
> +	     r++) {
>   

Please make this a while loop.

> +		rp = dev->regions + r;
> +		rp->valid = 0;
> +		size = end - start + 1;
> +		flags &= IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH;
> +		if (size == 0 || (flags & ~IORESOURCE_PREFETCH) == 0)
> +			continue;
> +		if (flags & IORESOURCE_MEM) {
> +			flags &= ~IORESOURCE_IO;
> +			sprintf(comp, "resource%d", r);
> +			strcpy(name, dir);
> +			strcat(name, comp);
>   


snprintf()

> +			fd = open(name, O_RDWR);
> +			if (fd == -1)
> +				continue;		/* probably ROM */
> +			rp->resource_fd = fd;
> +		} else
> +			flags &= ~IORESOURCE_PREFETCH;
> +
> +		rp->type = flags;
> +		rp->valid = 1;
> +		rp->base_addr = start;
> +		rp->size = size;
> +		DEBUG("%s: region %d size %d start 0x%x type %d "
> +		      "resource_fd %d\n", __func__, r, rp->size, start,
> +		      rp->type, rp->resource_fd);
> +	}
> +	fclose(f);
> +
> +	dev->region_number = r;
> +	return 0;
> +}
> +
> +#define	MAX_ASSIGNED_DEVS 4
> +struct {
> +	char name[15];
> +	int bus;
> +	int dev;
> +	int func;
> +	AssignedDevice *assigned_dev;
> +} assigned_devices[MAX_ASSIGNED_DEVS];
>   

Any reason not to just use a list here?  sys-queue.h makes that very easy.

> +int nr_assigned_devices;
> +static int disable_iommu;
> +
> +static uint32_t calc_assigned_dev_id(uint8_t bus, uint8_t devfn)
> +{
> +	return (uint32_t)bus << 8 | (uint32_t)devfn;
> +}
> +
> +static AssignedDevice *register_real_device(PCIBus *e_bus,
> +					    const char *e_dev_name,
> +					    int e_devfn, uint8_t r_bus,
> +					    uint8_t r_dev, uint8_t r_func)
> +{
> +	int r;
> +	AssignedDevice *pci_dev;
> +	uint8_t e_device, e_intx;
> +
> +	DEBUG("%s: Registering real physical device %s (devfn=0x%x)\n",
> +	      __func__, e_dev_name, e_devfn);
> +
> +	pci_dev = (AssignedDevice *)
> +		pci_register_device(e_bus, e_dev_name, sizeof(AssignedDevice),
> +				    e_devfn, assigned_dev_pci_read_config,
> +				    assigned_dev_pci_write_config);
> +	if (NULL == pci_dev) {
> +		fprintf(stderr, "%s: Error: Couldn't register real device %s\n",
> +			__func__, e_dev_name);
> +		return NULL;
> +	}
> +	if (get_real_device(pci_dev, r_bus, r_dev, r_func)) {
> +		fprintf(stderr, "%s: Error: Couldn't get real device (%s)!\n",
> +			__func__, e_dev_name);
> +		goto out;
> +	}
> +
> +	/* handle real device's MMIO/PIO BARs */
> +	if (assigned_dev_register_regions(pci_dev->real_device.regions,
> +					  pci_dev->real_device.region_number,
> +					  pci_dev))
> +		goto out;
> +
> +	/* handle interrupt routing */
> +	e_device = (pci_dev->dev.devfn >> 3) & 0x1f;
> +	e_intx = pci_dev->dev.config[0x3d] - 1;
> +	pci_dev->intpin = e_intx;
> +	pci_dev->run = 0;
> +	pci_dev->girq = 0;
> +	pci_dev->h_busnr = r_bus;
> +	pci_dev->h_devfn = PCI_DEVFN(r_dev, r_func);
> +
> +#ifdef KVM_CAP_DEVICE_ASSIGNMENT
> +	if (kvm_enabled()) {
> +		struct kvm_assigned_pci_dev assigned_dev_data;
> +
> +		memset(&assigned_dev_data, 0, sizeof(assigned_dev_data));
> +		assigned_dev_data.assigned_dev_id  =
> +			calc_assigned_dev_id(pci_dev->h_busnr,
> +					     (uint32_t)pci_dev->h_devfn);
> +		assigned_dev_data.busnr = pci_dev->h_busnr;
> +		assigned_dev_data.devfn = pci_dev->h_devfn;
> +
> +#ifdef KVM_CAP_IOMMU
> +		/* We always enable the IOMMU if present
> +		 * (or when not disabled on the command line)
> +		 */
> +		r = kvm_check_extension(kvm_context, KVM_CAP_IOMMU);
> +		if (r && !disable_iommu)
> +			assigned_dev_data.flags |= KVM_DEV_ASSIGN_ENABLE_IOMMU;
> +#endif
> +		r = kvm_assign_pci_device(kvm_context, &assigned_dev_data);
> +		if (r < 0) {
> +			fprintf(stderr, "Could not notify kernel about "
> +				"assigned device \"%s\"\n", e_dev_name);
> +			perror("pt-ioctl");
> +			goto out;
> +		}
> +	}
> +#endif
> +	fprintf(stderr, "Registered host PCI device %02x:%02x.%1x "
> +		"(\"%s\") as guest device %02x:%02x.%1x\n",
> +		r_bus, r_dev, r_func, e_dev_name,
> +		pci_bus_num(e_bus), e_device, r_func);
>   

Please don't fprintf() unconditionally.

A lot more checks are needed here to see if things can succeed.  We 
definitely should bail out if they can't.

> +
> +	return pci_dev;
> +out:
> +	pci_unregister_device(&pci_dev->dev);
> +	return NULL;
> +}
> +
> +extern int get_param_value(char *buf, int buf_size,
> +			   const char *tag, const char *str);
> +extern int piix_get_irq(int);
>   

Don't do this in C files.

> +#ifdef KVM_CAP_DEVICE_ASSIGNMENT
> +/* The pci config space got updated. Check if irq numbers have changed
> + * for our devices
> + */
> +void assigned_dev_update_irq(PCIDevice *d)
> +{
> +	int i, irq, r;
> +	AssignedDevice *assigned_dev;
> +
> +	for (i = 0; i < nr_assigned_devices; i++) {
> +		assigned_dev = assigned_devices[i].assigned_dev;
> +		if (assigned_dev == NULL)
> +			continue;
> +
> +		irq = pci_map_irq(&assigned_dev->dev, assigned_dev->intpin);
> +		irq = piix_get_irq(irq);
> +
> +		if (irq != assigned_dev->girq) {
> +			struct kvm_assigned_irq assigned_irq_data;
> +
> +			memset(&assigned_irq_data, 0, sizeof assigned_irq_data);
> +			assigned_irq_data.assigned_dev_id  =
> +				calc_assigned_dev_id(assigned_dev->h_busnr,
> +						     (uint8_t)
> +						     assigned_dev->h_devfn);
> +			assigned_irq_data.guest_irq = irq;
> +			assigned_irq_data.host_irq =
> +				assigned_dev->real_device.irq;
> +			r = kvm_assign_irq(kvm_context, &assigned_irq_data);
> +			if (r < 0) {
> +				perror("assigned_dev_update_irq");
> +				fprintf(stderr, "Are you assigning a device "
> +					"that shares IRQ with some other "
> +					"device?\n");
> +				pci_unregister_device(&assigned_dev->dev);
> +				continue;
> +			}
> +			assigned_dev->girq = irq;
> +		}
> +	}
> +}
> +#endif
> +
> +static int init_device_assignment(void)
> +{
> +	/* Do we have any devices to be assigned? */
> +	if (nr_assigned_devices == 0)
> +		return -1;
> +	iopl(3);
> +	return 0;
> +}
> +
> +struct PCIDevice *init_assigned_device(PCIBus *bus, int *index)
> +{
> +	AssignedDevice *dev = NULL;
> +	int i;
> +
> +	if (*index == -1) {
> +		if (init_device_assignment() < 0)
> +			return NULL;
> +
> +		*index = nr_assigned_devices - 1;
> +	}
> +	i = *index;
> +	dev = register_real_device(bus, assigned_devices[i].name, -1,
> +				   assigned_devices[i].bus,
> +				   assigned_devices[i].dev,
> +				   assigned_devices[i].func);
> +	if (dev == NULL) {
> +		fprintf(stderr, "Error: Couldn't register device \"%s\"\n",
> +			assigned_devices[i].name);
> +	}
> +	assigned_devices[i].assigned_dev = dev;
> +
> +	--*index;
> +	return &dev->dev;
> +}
> +
> +/*
> + * Syntax to assign device:
> + *
> + * -pcidevice dev=bus:dev.func,dma=dma
> + *
> + * Example:
> + * -pcidevice host=00:13.0,dma=pvdma
> + *
> + * dma can currently only be 'none' to disable iommu support.
>   

Does it actually work if you disable iommu support?

> + */
> +void add_assigned_device(const char *arg)
> +{
> +	char *cp, *cp1;
> +	char device[8];
> +	char dma[6];
> +	int r;
> +
> +	if (nr_assigned_devices >= MAX_ASSIGNED_DEVS) {
> +		fprintf(stderr, "Too many assigned devices (max %d)\n",
> +			MAX_ASSIGNED_DEVS);
> +		return;
> +	}
> +	memset(&assigned_devices[nr_assigned_devices], 0,
> +	       sizeof assigned_devices[nr_assigned_devices]);
> +	r = get_param_value(device, sizeof device, "host", arg);
> +
> +	r = get_param_value(assigned_devices[nr_assigned_devices].name,
> +			    sizeof assigned_devices[nr_assigned_devices].name,
> +			    "name", arg);
> +	if (!r)
> +		strncpy(assigned_devices[nr_assigned_devices].name, device, 8);
> +
> +#ifdef KVM_CAP_IOMMU
> +	r = get_param_value(dma, sizeof dma, "dma", arg);
> +	if (r && !strncmp(dma, "none", 4))
> +		disable_iommu = 1;
> +#endif
> +	cp = device;
> +	assigned_devices[nr_assigned_devices].bus = strtoul(cp, &cp1, 16);
> +	if (*cp1 != ':')
> +		goto bad;
> +	cp = cp1 + 1;
> +
> +	assigned_devices[nr_assigned_devices].dev = strtoul(cp, &cp1, 16);
> +	if (*cp1 != '.')
> +		goto bad;
> +	cp = cp1 + 1;
> +
> +	assigned_devices[nr_assigned_devices].func = strtoul(cp, &cp1, 16);
> +
> +	nr_assigned_devices++;
> +	return;
> +bad:
> +	fprintf(stderr, "pcidevice argument parse error; "
> +		"please check the help text for usage\n");
> +}
> diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h
> new file mode 100644
> index 0000000..b77e484
> --- /dev/null
> +++ b/qemu/hw/device-assignment.h
> @@ -0,0 +1,93 @@
> +/*
> + * Copyright (c) 2007, Neocleus Corporation.
> + * Copyright (c) 2007, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + *
> + *  Data structures for storing PCI state
> + *
> + *  Adapted to kvm by Qumranet
> + *
> + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
> + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
> + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
> + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
> + */
> +
> +#ifndef __DEVICE_ASSIGNMENT_H__
> +#define __DEVICE_ASSIGNMENT_H__
> +
> +#include <sys/mman.h>
>   

Don't think this is needed here.

> +#include "qemu-common.h"
> +#include "pci.h"
> +#include <linux/types.h>
>   

Nor this.

> +
> +#define DEVICE_ASSIGNMENT_DEBUG_PIO	(0x01)
> +#define DEVICE_ASSIGNMENT_DEBUG_MMIO	(0x02)
> +
> +/* From include/linux/pci.h in the kernel sources */
> +#define PCI_DEVFN(slot, func)	((((slot) & 0x1f) << 3) | ((func) & 0x07))
> +
> +typedef uint32_t pciaddr_t;
> +
> +#define MAX_IO_REGIONS			(6)
> +
> +typedef struct pci_region_s {
>   
typedef struct PCIRegion

> +	int type;	/* Memory or port I/O */
> +	int valid;
> +	pciaddr_t base_addr;
> +	pciaddr_t size;		/* size of the region */
>   

ram_addr_t.

> +	int resource_fd;
> +} PCIRegion;
> +
> +typedef struct pci_dev_s {
>   

typedef struct PCIDevRegions

> +	uint8_t bus, dev, func;	/* Bus inside domain, device and function */
> +	int irq;		/* IRQ number */
> +	uint16_t region_number;	/* number of active regions */
> +
> +	/* Port I/O or MMIO Regions */
> +	PCIRegion regions[MAX_IO_REGIONS];
> +	int config_fd;
> +} PCIDevRegions;
> +
> +typedef struct assigned_dev_region_s {
> +	target_phys_addr_t e_physbase;
> +	uint32_t memory_index;
> +	void *r_virtbase;	/* mmapped access address */
> +	int num;		/* our index within v_addrs[] */
> +	uint32_t e_size;        /* emulated size of region in bytes */
> +	uint32_t r_size;        /* real size of region in bytes */
> +	uint32_t debug;
> +} AssignedDevRegion;
> +
> +typedef struct assigned_dev_s {
> +	PCIDevice dev;
> +	int intpin;
> +	uint8_t debug_flags;
> +	AssignedDevRegion v_addrs[PCI_NUM_REGIONS];
> +	PCIDevRegions real_device;
> +	int run;
> +	int girq;
> +	unsigned char h_busnr;
> +	unsigned int h_devfn;
> +	int bound;
> +} AssignedDevice;
> +
> +/* Initialization functions */
> +PCIDevice *init_assigned_device(PCIBus *bus, int *index);
> +void add_assigned_device(const char *arg);
> +void assigned_dev_set_vector(int irq, int vector);
> +void assigned_dev_ack_mirq(int vector);
> +
> +#endif				/* __DEVICE_ASSIGNMENT_H__ */
> diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c
> index 6053103..4a611cc 100644
> --- a/qemu/hw/pc.c
> +++ b/qemu/hw/pc.c
> @@ -32,6 +32,7 @@
>  #include "smbus.h"
>  #include "boards.h"
>  #include "console.h"
> +#include "device-assignment.h"
>  
>  #include "qemu-kvm.h"
>  
> @@ -1006,6 +1007,14 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size,
>          }
>      }
>  
> +    /* Initialize assigned devices */
> +    if (pci_enabled) {
> +        int r = -1;
> +        do {
> +            init_assigned_device(pci_bus, &r);
>   

Why pass r by reference instead of just returning it?  At any rate, you 
should detect when this fails and gracefully terminate QEMU.

> +	} while (r >= 0);
> +    }
> +
>      rtc_state = rtc_init(0x70, i8259[8]);
>  
>      qemu_register_boot_set(pc_boot_set, rtc_state);
> diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
> index 61ff0f6..e4e8386 100644
> --- a/qemu/hw/pci.c
> +++ b/qemu/hw/pci.c
> @@ -50,6 +50,7 @@ struct PCIBus {
>  
>  static void pci_update_mappings(PCIDevice *d);
>  static void pci_set_irq(void *opaque, int irq_num, int level);
> +void assigned_dev_update_irq(PCIDevice *d);
>  
>  target_phys_addr_t pci_mem_base;
>  static int pci_irq_index;
> @@ -453,6 +454,12 @@ void pci_default_write_config(PCIDevice *d,
>          val >>= 8;
>      }
>  
> +#ifdef KVM_CAP_DEVICE_ASSIGNMENT
> +    if (kvm_enabled() && qemu_kvm_irqchip_in_kernel() &&
> +	address >= 0x60 && address <= 0x63)
> +	assigned_dev_update_irq(d);
> +#endif
> +
>      end = address + len;
>      if (end > PCI_COMMAND && address < (PCI_COMMAND + 2)) {
>          /* if the command register is modified, we must modify the mappings */
> diff --git a/qemu/vl.c b/qemu/vl.c
> index 2fb8552..83f28c5 100644
> --- a/qemu/vl.c
> +++ b/qemu/vl.c
> @@ -37,6 +37,7 @@
>  #include "qemu-char.h"
>  #include "block.h"
>  #include "audio/audio.h"
> +#include "hw/device-assignment.h"
>  #include "migration.h"
>  #include "balloon.h"
>  #include "qemu-kvm.h"
> @@ -8469,6 +8470,12 @@ static void help(int exitcode)
>  #endif
>  	   "-no-kvm-irqchip disable KVM kernel mode PIC/IOAPIC/LAPIC\n"
>  	   "-no-kvm-pit	    disable KVM kernel mode PIT\n"
> +#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
> +	   "-pcidevice host=bus:dev.func[,dma=none][,name=\"string\"]\n"
> +	   "                expose a PCI device to the guest OS.\n"
> +	   "                dma=none: don't perform any dma translations (default is to use an iommu)\n"
> +	   "                'string' is used in log output.\n"
> +#endif
>  #endif
>  #ifdef TARGET_I386
>             "-std-vga        simulate a standard VGA card with VESA Bochs Extensions\n"
> @@ -8592,6 +8599,9 @@ enum {
>      QEMU_OPTION_no_kvm,
>      QEMU_OPTION_no_kvm_irqchip,
>      QEMU_OPTION_no_kvm_pit,
> +#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
> +    QEMU_OPTION_pcidevice,
> +#endif
>      QEMU_OPTION_no_reboot,
>      QEMU_OPTION_no_shutdown,
>      QEMU_OPTION_show_cursor,
> @@ -8680,6 +8690,9 @@ const QEMUOption qemu_options[] = {
>  #endif
>      { "no-kvm-irqchip", 0, QEMU_OPTION_no_kvm_irqchip },
>      { "no-kvm-pit", 0, QEMU_OPTION_no_kvm_pit },
> +#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
> +    { "pcidevice", HAS_ARG, QEMU_OPTION_pcidevice },
> +#endif
>  #endif
>  #if defined(TARGET_PPC) || defined(TARGET_SPARC)
>      { "g", 1, QEMU_OPTION_g },
> @@ -9586,6 +9599,11 @@ int main(int argc, char **argv)
>  		kvm_pit = 0;
>  		break;
>  	    }
> +#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
> +	    case QEMU_OPTION_pcidevice:
> +		add_assigned_device(optarg);
>   

You should copy into an array, then in pc.c, iterate through the array 
and call into add_assigned_device.

Regards,

Anthony Liguori

> +		break;
> +#endif
>  #endif
>              case QEMU_OPTION_usb:
>                  usb_enabled = 1;
>   


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 6/7] KVM/userspace: Build vtd.c for Intel IOMMU support
  2008-09-23 14:54           ` [PATCH 6/7] KVM/userspace: Build vtd.c for Intel IOMMU support Amit Shah
  2008-09-23 14:54             ` [PATCH 7/7] KVM/userspace: Device Assignment: Support for hot plugging PCI devices Amit Shah
@ 2008-09-23 16:31             ` Anthony Liguori
  2008-09-24  4:25               ` Amit Shah
  1 sibling, 1 reply; 31+ messages in thread
From: Anthony Liguori @ 2008-09-23 16:31 UTC (permalink / raw)
  To: Amit Shah; +Cc: avi, kvm, muli, benami, weidong.han, allen.m.kay

Amit Shah wrote:
> Signed-off-by: Amit Shah <amit.shah@redhat.com>
> ---
>  kernel/x86/Kbuild |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/x86/Kbuild b/kernel/x86/Kbuild
> index 8dc0483..a4cd00c 100644
> --- a/kernel/x86/Kbuild
> +++ b/kernel/x86/Kbuild
> @@ -5,6 +5,9 @@ kvm-objs := kvm_main.o x86.o mmu.o x86_emulate.o ../anon_inodes.o irq.o i8259.o
>  ifeq ($(CONFIG_KVM_TRACE),y)
>  kvm-objs += kvm_trace.o
>  endif
> +ifeq ($(CONFIG_DMAR),y)
> +kvm-objs += vtd.o
> +endif
>  kvm-intel-objs := vmx.o vmx-debug.o ../external-module-compat.o
>  kvm-amd-objs := svm.o ../external-module-compat.o
>  
>   

Where's the file come from?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/7] KVM/userspace: Device Assignment: Support for hot plugging PCI devices
  2008-09-23 14:54             ` [PATCH 7/7] KVM/userspace: Device Assignment: Support for hot plugging PCI devices Amit Shah
@ 2008-09-23 16:32               ` Anthony Liguori
  2008-09-24  4:24                 ` Amit Shah
  0 siblings, 1 reply; 31+ messages in thread
From: Anthony Liguori @ 2008-09-23 16:32 UTC (permalink / raw)
  To: Amit Shah; +Cc: avi, kvm, muli, benami, weidong.han, allen.m.kay

Amit Shah wrote:
> This patch adds support for hot-plugging host PCI devices into
> guests
>   

Instead of using assigned, it should probably be host.

Regards,

Anthony Liguori

> Signed-off-by: Amit Shah <amit.shah@redhat.com>
> ---
>  qemu/hw/device-hotplug.c |   19 +++++++++++++++++++
>  qemu/monitor.c           |    2 +-
>  2 files changed, 20 insertions(+), 1 deletions(-)
>
> diff --git a/qemu/hw/device-hotplug.c b/qemu/hw/device-hotplug.c
> index 8e2bc35..6d2ab8e 100644
> --- a/qemu/hw/device-hotplug.c
> +++ b/qemu/hw/device-hotplug.c
> @@ -6,6 +6,7 @@
>  #include "pc.h"
>  #include "console.h"
>  #include "block_int.h"
> +#include "device-assignment.h"
>  
>  #define PCI_BASE_CLASS_STORAGE          0x01
>  #define PCI_BASE_CLASS_NETWORK          0x02
> @@ -27,6 +28,22 @@ static PCIDevice *qemu_system_hot_add_nic(const char *opts, int bus_nr)
>      return pci_nic_init (pci_bus, &nd_table[ret], -1);
>  }
>  
> +static PCIDevice *qemu_system_hot_assign_device(const char *opts, int bus_nr)
> +{
> +    int index;
> +    PCIBus *pci_bus;
> +
> +    pci_bus = pci_find_bus(bus_nr);
> +    if (!pci_bus) {
> +        term_printf ("Can't find pci_bus %d\n", bus_nr);
> +        return NULL;
> +    }
> +    add_assigned_device(opts);
> +
> +    index = -1;
> +    return init_assigned_device(pci_bus, &index);
> +}
> +
>  static int add_init_drive(const char *opts)
>  {
>      int drive_opt_idx, drive_idx;
> @@ -143,6 +160,8 @@ void device_hot_add(int pcibus, const char *type, const char *opts)
>          dev = qemu_system_hot_add_nic(opts, pcibus);
>      else if (strcmp(type, "storage") == 0)
>          dev = qemu_system_hot_add_storage(opts, pcibus);
> +    else if (strcmp(type, "assigned") == 0)
> +        dev = qemu_system_hot_assign_device(opts, pcibus);
>      else
>          term_printf("invalid type: %s\n", type);
>  
> diff --git a/qemu/monitor.c b/qemu/monitor.c
> index 2619fdd..6cf5e8a 100644
> --- a/qemu/monitor.c
> +++ b/qemu/monitor.c
> @@ -1516,7 +1516,7 @@ static term_cmd_t term_cmds[] = {
>                                          "[,cyls=c,heads=h,secs=s[,trans=t]]\n"
>                                          "[snapshot=on|off][,cache=on|off]",
>                                          "add drive to PCI storage controller" },
> -    { "pci_add", "iss", device_hot_add, "bus nic|storage [[vlan=n][,macaddr=addr][,model=type]] [file=file][,if=type][,bus=nr]...", "hot-add PCI device" },
> +    { "pci_add", "iss", device_hot_add, "bus nic|storage|assigned [[vlan=n][,macaddr=addr][,model=type]] [file=file][,if=type][,bus=nr]... [host=02:00.0[,name=string][,dma=none]" "hot-add PCI device" },
>      { "pci_del", "ii", device_hot_remove, "bus slot-number", "hot remove PCI device" },
>  #endif
>      { "balloon", "i", do_balloon,
>   


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests
  2008-09-23 16:30           ` [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests Anthony Liguori
@ 2008-09-23 18:32             ` Muli Ben-Yehuda
  2008-09-23 19:18               ` Anthony Liguori
  2008-09-24  4:58             ` Amit Shah
  1 sibling, 1 reply; 31+ messages in thread
From: Muli Ben-Yehuda @ 2008-09-23 18:32 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Amit Shah, avi, kvm, Ben-Ami Yassour1, weidong.han, allen.m.kay

On Tue, Sep 23, 2008 at 11:30:32AM -0500, Anthony Liguori wrote:
>
>> +		+ (addr - r_access->e_physbase);
>> +
>> +	if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
>> +		fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x"
>> +			" r_virtbase=%08lx value=%08x\n",
>> +			__func__, r_pio, (int)r_access->e_physbase,
>> +			(unsigned long)r_access->r_virtbase, value);
>> +	}
>> +	iopl(3);
>> +	outb(value, r_pio);
>>   
>
> The formatting is wrong for this entire file.  Also, you shouldn't
> have device specific debug.  Should probably error check iopl(3).
> It's not necessary to call it every time you do an outb, just once
> when initialized.

We tried that at first, but ran into cases where even after iopl()
ran, pio's from qemu still failed. Does qemu do anything to drop
iopl() privileges? In any case calling iopl() unconditionally on every
pio fixed it, but is obviously not the right long-term solution.

Cheers,
Muli
-- 
The First Workshop on I/O Virtualization (WIOV '08)
Dec 2008, San Diego, CA, http://www.usenix.org/wiov08/
                      xxx
SYSTOR 2009---The Israeli Experimental Systems Conference
http://www.haifa.il.ibm.com/conferences/systor2009/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests
  2008-09-23 18:32             ` Muli Ben-Yehuda
@ 2008-09-23 19:18               ` Anthony Liguori
  2008-09-23 19:43                 ` Muli Ben-Yehuda
  0 siblings, 1 reply; 31+ messages in thread
From: Anthony Liguori @ 2008-09-23 19:18 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: Amit Shah, avi, kvm, Ben-Ami Yassour1, weidong.han, allen.m.kay

Muli Ben-Yehuda wrote:
> On Tue, Sep 23, 2008 at 11:30:32AM -0500, Anthony Liguori wrote:
>   
>>> +		+ (addr - r_access->e_physbase);
>>> +
>>> +	if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
>>> +		fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x"
>>> +			" r_virtbase=%08lx value=%08x\n",
>>> +			__func__, r_pio, (int)r_access->e_physbase,
>>> +			(unsigned long)r_access->r_virtbase, value);
>>> +	}
>>> +	iopl(3);
>>> +	outb(value, r_pio);
>>>   
>>>       
>> The formatting is wrong for this entire file.  Also, you shouldn't
>> have device specific debug.  Should probably error check iopl(3).
>> It's not necessary to call it every time you do an outb, just once
>> when initialized.
>>     
>
> We tried that at first, but ran into cases where even after iopl()
> ran, pio's from qemu still failed. Does qemu do anything to drop
> iopl() privileges? In any case calling iopl() unconditionally on every
> pio fixed it, but is obviously not the right long-term solution.
>   

Make sure you issue iopl() before any of the VCPU threads are spawned.  
Otherwise, you may be running into issues when something other than 
VCPU-0 is doing PIO/MMIO and you haven't iopl()'d for that thread.

Regards,

Anthony Liguori

> Cheers,
> Muli
>   


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests
  2008-09-23 19:18               ` Anthony Liguori
@ 2008-09-23 19:43                 ` Muli Ben-Yehuda
  2008-09-23 19:58                   ` Anthony Liguori
  0 siblings, 1 reply; 31+ messages in thread
From: Muli Ben-Yehuda @ 2008-09-23 19:43 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Amit Shah, avi, kvm, Ben-Ami Yassour1, weidong.han, allen.m.kay

On Tue, Sep 23, 2008 at 02:18:51PM -0500, Anthony Liguori wrote:

> Make sure you issue iopl() before any of the VCPU threads are
> spawned.  Otherwise, you may be running into issues when something
> other than VCPU-0 is doing PIO/MMIO and you haven't iopl()'d for
> that thread.

Yeah, we thought of that, but as far as I can recall this happened
with a single VCPU. In any case, we'll look into it again.

Cheers,
Muli
-- 
The First Workshop on I/O Virtualization (WIOV '08)
Dec 2008, San Diego, CA, http://www.usenix.org/wiov08/
                      xxx
SYSTOR 2009---The Israeli Experimental Systems Conference
http://www.haifa.il.ibm.com/conferences/systor2009/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests
  2008-09-23 19:43                 ` Muli Ben-Yehuda
@ 2008-09-23 19:58                   ` Anthony Liguori
  0 siblings, 0 replies; 31+ messages in thread
From: Anthony Liguori @ 2008-09-23 19:58 UTC (permalink / raw)
  To: Muli Ben-Yehuda
  Cc: Amit Shah, avi, kvm, Ben-Ami Yassour1, weidong.han, allen.m.kay

Muli Ben-Yehuda wrote:
> On Tue, Sep 23, 2008 at 02:18:51PM -0500, Anthony Liguori wrote:
>
>   
>> Make sure you issue iopl() before any of the VCPU threads are
>> spawned.  Otherwise, you may be running into issues when something
>> other than VCPU-0 is doing PIO/MMIO and you haven't iopl()'d for
>> that thread.
>>     
>
> Yeah, we thought of that, but as far as I can recall this happened
> with a single VCPU. In any case, we'll look into it again.
>   

The io thread runs in a separate thread than the VCPU.  So if you were 
doing iopl(3) in the io thread (for instance, when called from 
machine_init), then the VCPU thread wouldn't necessarily have inherited 
the iopl().

You could also do it explicitly on vcpu thread construction.

Regards,

Anthony Liguori

> Cheers,
> Muli
>   


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 7/7] KVM/userspace: Device Assignment: Support for hot plugging PCI devices
  2008-09-23 16:32               ` Anthony Liguori
@ 2008-09-24  4:24                 ` Amit Shah
  0 siblings, 0 replies; 31+ messages in thread
From: Amit Shah @ 2008-09-24  4:24 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: avi, kvm, muli, benami, weidong.han, allen.m.kay

* On Tuesday 23 Sep 2008 22:02:04 Anthony Liguori wrote:
> Amit Shah wrote:
> > This patch adds support for hot-plugging host PCI devices into
> > guests
>
> Instead of using assigned, it should probably be host.

Yes, that's a good name.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 6/7] KVM/userspace: Build vtd.c for Intel IOMMU support
  2008-09-23 16:31             ` [PATCH 6/7] KVM/userspace: Build vtd.c for Intel IOMMU support Anthony Liguori
@ 2008-09-24  4:25               ` Amit Shah
  2008-09-24 15:08                 ` Anthony Liguori
  0 siblings, 1 reply; 31+ messages in thread
From: Amit Shah @ 2008-09-24  4:25 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: avi, kvm, muli, benami, weidong.han, allen.m.kay

* On Tuesday 23 Sep 2008 22:01:10 Anthony Liguori wrote:
> Amit Shah wrote:
> > Signed-off-by: Amit Shah <amit.shah@redhat.com>
> > ---
> >  kernel/x86/Kbuild |    3 +++
> >  1 files changed, 3 insertions(+), 0 deletions(-)
> >
> > diff --git a/kernel/x86/Kbuild b/kernel/x86/Kbuild
> > index 8dc0483..a4cd00c 100644
> > --- a/kernel/x86/Kbuild
> > +++ b/kernel/x86/Kbuild
> > @@ -5,6 +5,9 @@ kvm-objs := kvm_main.o x86.o mmu.o x86_emulate.o
> > ../anon_inodes.o irq.o i8259.o ifeq ($(CONFIG_KVM_TRACE),y)
> >  kvm-objs += kvm_trace.o
> >  endif
> > +ifeq ($(CONFIG_DMAR),y)
> > +kvm-objs += vtd.o
> > +endif
> >  kvm-intel-objs := vmx.o vmx-debug.o ../external-module-compat.o
> >  kvm-amd-objs := svm.o ../external-module-compat.o
>
> Where's the file come from?

Already in the kernel tree -- arch/x86/kvm/vtd.c

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] qemu: Include hw.h in qemu/hw/isa.h to fix compile issues
  2008-09-23 16:13         ` [PATCH 4/7] qemu: Include hw.h in qemu/hw/isa.h to fix compile issues Anthony Liguori
@ 2008-09-24  4:27           ` Amit Shah
  2008-09-24 11:35             ` Amit Shah
  2008-09-24 14:59             ` Anthony Liguori
  0 siblings, 2 replies; 31+ messages in thread
From: Amit Shah @ 2008-09-24  4:27 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: avi, kvm, muli, benami, weidong.han, allen.m.kay

* On Tuesday 23 Sep 2008 21:43:44 Anthony Liguori wrote:
> Amit Shah wrote:
> > Signed-off-by: Amit Shah <amit.shah@redhat.com>
> > ---
> >  qemu/hw/isa.h |    2 ++
> >  1 files changed, 2 insertions(+), 0 deletions(-)
> >
> > diff --git a/qemu/hw/isa.h b/qemu/hw/isa.h
> > index 222e4f3..e4a1326 100644
> > --- a/qemu/hw/isa.h
> > +++ b/qemu/hw/isa.h
> > @@ -2,6 +2,8 @@
> >  #define HW_ISA_H
> >  /* ISA bus */
> >
> > +#include "hw.h"
> > +
> >  extern target_phys_addr_t isa_mem_base;
> >
> >  int register_ioport_read(int start, int length, int size,
>
> What compile issues?

register_ioport_read* and register_ioport_write* functions cause a lot of 
this.

gcc -I. -I.. -I/home/amit/src/kvm-userspace/qemu/target-i386 -I/home/amit/src/kvm-userspace/qemu -MMD -MT 
device-assignment.o -MP -DNEED_CPU_H -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D__user= -I/home/amit/src/kvm-userspace/qemu/tcg -I/home/amit/src/kvm-userspace/qemu/tcg/x86_64 -I/home/amit/src/kvm-userspace/qemu/fpu  -DHAS_AUDIO -DHAS_AUDIO_CHOICE -I/home/amit/src/kvm-userspace/qemu/slirp -I /home/amit/src/kvm-userspace/qemu/../libkvm  -DCONFIG_X86 -O2 -g -fno-strict-aliasing -Wall -Wundef -Wendif-labels -Wwrite-strings  -m64 -I /home/amit/src/kvm-userspace/kernel/include -c -o 
device-assignment.o /home/amit/src/kvm-userspace/qemu/hw/device-assignment.c
In file included from /home/amit/src/kvm-userspace/qemu/hw/pci.h:6,
                 
from /home/amit/src/kvm-userspace/qemu/hw/device-assignment.h:34,
                 
from /home/amit/src/kvm-userspace/qemu/hw/device-assignment.c:32:
/home/amit/src/kvm-userspace/qemu/hw/isa.h:8: error: expected declaration 
specifiers or â...â before âIOPortReadFuncâ
/home/amit/src/kvm-userspace/qemu/hw/isa.h:10: error: expected declaration 
specifiers or â...â before âIOPortWriteFuncâ
/home/amit/src/kvm-userspace/qemu/hw/device-assignment.c: In function 
âassigned_dev_ioport_mapâ:
/home/amit/src/kvm-userspace/qemu/hw/device-assignment.c:193: error: too many 
arguments to function âregister_ioport_readâ
/home/amit/src/kvm-userspace/qemu/hw/device-assignment.c:195: error: too many 
arguments to function âregister_ioport_readâ
/home/amit/src/kvm-userspace/qemu/hw/device-assignment.c:197: error: too many 
arguments to function âregister_ioport_readâ
/home/amit/src/kvm-userspace/qemu/hw/device-assignment.c:199: error: too many 
arguments to function âregister_ioport_writeâ
/home/amit/src/kvm-userspace/qemu/hw/device-assignment.c:201: error: too many 
arguments to function âregister_ioport_writeâ
/home/amit/src/kvm-userspace/qemu/hw/device-assignment.c:203: error: too many 
arguments to function âregister_ioport_writeâ
make[2]: *** [device-assignment.o] Error 1
make[2]: Leaving directory `/home/amit/src/kvm-userspace/qemu/x86_64-softmmu'
make[1]: *** [subdir-x86_64-softmmu] Error 2
make[1]: Leaving directory `/home/amit/src/kvm-userspace/qemu'
make: *** [qemu] Error 2

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/7] qemu: piix: Introduce functions to get pin number from irq and vice versa
  2008-09-23 16:13       ` [PATCH 3/7] qemu: piix: Introduce functions to get pin number from irq and vice versa Anthony Liguori
@ 2008-09-24  4:28         ` Amit Shah
  0 siblings, 0 replies; 31+ messages in thread
From: Amit Shah @ 2008-09-24  4:28 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: avi, kvm, muli, benami, weidong.han, allen.m.kay

* On Tuesday 23 Sep 2008 21:43:26 Anthony Liguori wrote:
> Amit Shah wrote:
> > Signed-off-by: Amit Shah <amit.shah@redhat.com>
> > ---
> >  qemu/hw/piix_pci.c |   19 +++++++++++++++++++
> >  1 files changed, 19 insertions(+), 0 deletions(-)
> >
> > diff --git a/qemu/hw/piix_pci.c b/qemu/hw/piix_pci.c
> > index 6fbf47b..dc12c8a 100644
> > --- a/qemu/hw/piix_pci.c
> > +++ b/qemu/hw/piix_pci.c
> > @@ -243,6 +243,25 @@ static void piix3_set_irq(qemu_irq *pic, int
> > irq_num, int level) }
> >  }
> >
> > +int piix3_get_pin(int pic_irq)
> > +{
> > +    int i;
> > +    for (i = 0; i < 4; i++)
> > +        if (piix3_dev->config[0x60+i] == pic_irq)
> > +            return i;
> > +    return -1;
> > +}
> > +
> > +int piix_get_irq(int pin)
> > +{
> > +    if (piix3_dev)
> > +        return piix3_dev->config[0x60+pin];
> > +    if (piix4_dev)
> > +        return piix4_dev->config[0x60+pin];
> > +
> > +    return 0;
> > +}
> > +
>
> If these are being exported, don't they need to be declared in a header?

I'll put then in a header.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests
  2008-09-23 16:30           ` [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests Anthony Liguori
  2008-09-23 18:32             ` Muli Ben-Yehuda
@ 2008-09-24  4:58             ` Amit Shah
  2008-09-24 15:07               ` Anthony Liguori
  1 sibling, 1 reply; 31+ messages in thread
From: Amit Shah @ 2008-09-24  4:58 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: avi, kvm, muli, benami, weidong.han, allen.m.kay

* On Tuesday 23 Sep 2008 22:00:32 Anthony Liguori wrote:
> Amit Shah wrote:

> > diff --git a/qemu/Makefile.target b/qemu/Makefile.target
> > index 72f3db8..40eb273 100644
> > --- a/qemu/Makefile.target
> > +++ b/qemu/Makefile.target
> > @@ -616,6 +616,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
> >  OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
> >  OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
> >  OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o
> > +OBJS+= device-assignment.o
>
> This needs to be conditional on at least linux hosts, but probably also
> kvm support.

I didn't see any other file that's doing it. So I added this conditional in 
vl.c by having a #if defined(__linux__). That's how usb-linux.c does it as 
well. Is there a better way?

Not the whole functionality needs kvm support. This should be able to work 
even without kvm (for example, when the guest is 1:1 mapped in the host 
address space).

> > +	/* FIXME: Add support for emulated MMIO for non-kvm guests */
> > +	if (kvm_enabled()) {
>
> This doesn't work at all if kvm isn't enabled right?  You should
> probably bail out in the init if kvm isn't enabled.  If this whole file
> is included conditionally based on KVM support, then you don't have to
> worry about using kvm_enabled() guards to conditionally compile out code.

Non-kvm support is currently broken and should be fixed, but that can happen 
after we get this merged.

I can temporarily add a check for kvm_enabled and bail out.

> > +	sprintf(dir, "/sys/bus/pci/devices/0000:%02x:%02x.%x/",
> > +		r_bus, r_dev, r_func);
>
> snprintf()

It's guarded by the %02x modifiers; so this doesn't depend on user input.

> > +	strcpy(name, dir);
> > +	strcat(name, "config");
>
> snprintf()

... and as a result, all these don't depend on user input.

> > +#define	MAX_ASSIGNED_DEVS 4
> > +struct {
> > +	char name[15];
> > +	int bus;
> > +	int dev;
> > +	int func;
> > +	AssignedDevice *assigned_dev;
> > +} assigned_devices[MAX_ASSIGNED_DEVS];
>
> Any reason not to just use a list here?  sys-queue.h makes that very easy.

> > +	fprintf(stderr, "Registered host PCI device %02x:%02x.%1x "
> > +		"(\"%s\") as guest device %02x:%02x.%1x\n",
> > +		r_bus, r_dev, r_func, e_dev_name,
> > +		pci_bus_num(e_bus), e_device, r_func);
>
> Please don't fprintf() unconditionally.

OK; however, a vmdk file open does that so I though it was alright to do it.

> A lot more checks are needed here to see if things can succeed.  We
> definitely should bail out if they can't.

Bailing out is done in the out: label below. What else do  you think can fail? 
I've taken care of all the cases that do fail IMO.

> > +	return pci_dev;
> > +out:
> > +	pci_unregister_device(&pci_dev->dev);
> > +	return NULL;
> > +}

> > +/*
> > + * Syntax to assign device:
> > + *
> > + * -pcidevice dev=bus:dev.func,dma=dma
> > + *
> > + * Example:
> > + * -pcidevice host=00:13.0,dma=pvdma
> > + *
> > + * dma can currently only be 'none' to disable iommu support.
>
> Does it actually work if you disable iommu support?

If the guest is 1:1 mapped.


> > diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h
> > new file mode 100644
> > index 0000000..b77e484
> > --- /dev/null
> > +++ b/qemu/hw/device-assignment.h

> > +#include <sys/mman.h>
>
> Don't think this is needed here.

We use mmap(), so this is needed.

> > +#include "qemu-common.h"
> > +#include "pci.h"
> > +#include <linux/types.h>
>
> Nor this.

This isn't.

> > diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c
> > index 6053103..4a611cc 100644
> > --- a/qemu/hw/pc.c
> > +++ b/qemu/hw/pc.c

> > +    /* Initialize assigned devices */
> > +    if (pci_enabled) {
> > +        int r = -1;
> > +        do {
> > +            init_assigned_device(pci_bus, &r);
>
> Why pass r by reference instead of just returning it?  At any rate, you
> should detect when this fails and gracefully terminate QEMU.

'r' is the count of the number of assigned devices -- mostly needed because we 
have the data stored in an array. If we migrate to a list, this can be 
relaxed.

ATM, I start the guest without assigning the device. I haven't figured out a 
way to gracefully terminate qemu yet.


> > --- a/qemu/vl.c
> > +++ b/qemu/vl.c

> > +#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
> > +	    case QEMU_OPTION_pcidevice:
> > +		add_assigned_device(optarg);
>
> You should copy into an array, then in pc.c, iterate through the array
> and call into add_assigned_device.

Is there any benefit in doing this? We're moving the iterate out of vl.c to 
pc.c and both will happen at the same time.

>
> Regards,
>
> Anthony Liguori

Thanks!

Amit.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] qemu: Include hw.h in qemu/hw/isa.h to fix compile issues
  2008-09-24  4:27           ` Amit Shah
@ 2008-09-24 11:35             ` Amit Shah
  2008-09-24 14:59             ` Anthony Liguori
  1 sibling, 0 replies; 31+ messages in thread
From: Amit Shah @ 2008-09-24 11:35 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: avi, kvm, muli, benami, weidong.han, allen.m.kay

* On Wednesday 24 Sep 2008 09:57:39 Amit Shah wrote:
> * On Tuesday 23 Sep 2008 21:43:44 Anthony Liguori wrote:
> > Amit Shah wrote:

> > What compile issues?
>
> register_ioport_read* and register_ioport_write* functions cause a lot of
> this.

I just noticed a lot of .h files need hw.h included to function and they all 
just put hw.h first in the .c where it's needed and then put the .h; so I'll 
include hw.h in device-assignment.c, but this is really screwed.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/7] qemu: Include hw.h in qemu/hw/isa.h to fix compile issues
  2008-09-24  4:27           ` Amit Shah
  2008-09-24 11:35             ` Amit Shah
@ 2008-09-24 14:59             ` Anthony Liguori
  1 sibling, 0 replies; 31+ messages in thread
From: Anthony Liguori @ 2008-09-24 14:59 UTC (permalink / raw)
  To: Amit Shah; +Cc: avi, kvm, muli, benami, weidong.han, allen.m.kay

Amit Shah wrote:
> * On Tuesday 23 Sep 2008 21:43:44 Anthony Liguori wrote:
>   
>> Amit Shah wrote:
>>     
>>> Signed-off-by: Amit Shah <amit.shah@redhat.com>
>>> ---
>>>  qemu/hw/isa.h |    2 ++
>>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/qemu/hw/isa.h b/qemu/hw/isa.h
>>> index 222e4f3..e4a1326 100644
>>> --- a/qemu/hw/isa.h
>>> +++ b/qemu/hw/isa.h
>>> @@ -2,6 +2,8 @@
>>>  #define HW_ISA_H
>>>  /* ISA bus */
>>>
>>> +#include "hw.h"
>>> +
>>>  extern target_phys_addr_t isa_mem_base;
>>>
>>>  int register_ioport_read(int start, int length, int size,
>>>       
>> What compile issues?
>>     
>
> register_ioport_read* and register_ioport_write* functions cause a lot of 
> this.
>   

You could also address this by including hw.h before including isa.h.  
Basically, everything should include "qemu-common.h" and anything that's 
implemented emulated hardware should include "hw.h" before including 
anything else.  It's not perfect, but it's how things are right now.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests
  2008-09-24  4:58             ` Amit Shah
@ 2008-09-24 15:07               ` Anthony Liguori
  2008-09-24 17:10                 ` Amit Shah
  0 siblings, 1 reply; 31+ messages in thread
From: Anthony Liguori @ 2008-09-24 15:07 UTC (permalink / raw)
  To: Amit Shah; +Cc: avi, kvm, muli, benami, weidong.han, allen.m.kay

Amit Shah wrote:
> * On Tuesday 23 Sep 2008 22:00:32 Anthony Liguori wrote:
>   
>> Amit Shah wrote:
>>     
>
>   
>>> diff --git a/qemu/Makefile.target b/qemu/Makefile.target
>>> index 72f3db8..40eb273 100644
>>> --- a/qemu/Makefile.target
>>> +++ b/qemu/Makefile.target
>>> @@ -616,6 +616,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
>>>  OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
>>>  OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
>>>  OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o
>>> +OBJS+= device-assignment.o
>>>       
>> This needs to be conditional on at least linux hosts, but probably also
>> kvm support.
>>     
>
> I didn't see any other file that's doing it. So I added this conditional in 
> vl.c by having a #if defined(__linux__). That's how usb-linux.c does it as 
> well. Is there a better way?
>   

aio and compatfd currently do it this way.  block-raw-win32 and 
block-raw-posix are this way.  We're slowly moving things away from 
#ifdef #else #endif to conditional compilation.

> Not the whole functionality needs kvm support. This should be able to work 
> even without kvm (for example, when the guest is 1:1 mapped in the host 
> address space).
>   

KVM is needed for interrupt remapping though.  That's something I don't 
see happening for normal userspace any time soon.

>>> +	/* FIXME: Add support for emulated MMIO for non-kvm guests */
>>> +	if (kvm_enabled()) {
>>>       
>> This doesn't work at all if kvm isn't enabled right?  You should
>> probably bail out in the init if kvm isn't enabled.  If this whole file
>> is included conditionally based on KVM support, then you don't have to
>> worry about using kvm_enabled() guards to conditionally compile out code.
>>     
>
> Non-kvm support is currently broken and should be fixed, but that can happen 
> after we get this merged.
>   

But it would take bouncing interrupts to userspace?  I don't think that 
will ever happen upstream personally.  At any rate, there's no point in 
even trying to support something like that until progress is made 
upstream on this front.

> I can temporarily add a check for kvm_enabled and bail out.
>
>   
>>> +	sprintf(dir, "/sys/bus/pci/devices/0000:%02x:%02x.%x/",
>>> +		r_bus, r_dev, r_func);
>>>       
>> snprintf()
>>     
>
> It's guarded by the %02x modifiers; so this doesn't depend on user input.
>   

strcpy or sprintf should never be used.  It doesn't matter if it's safe 
in a particular instance.  There are safer functions to use (like snprintf).

All it takes is for someone to come along and change the /sys/bus path 
to be larger without adjusting the buffer size and everything goes to 
hell.  It's inherently brittle.

>>> +	fprintf(stderr, "Registered host PCI device %02x:%02x.%1x "
>>> +		"(\"%s\") as guest device %02x:%02x.%1x\n",
>>> +		r_bus, r_dev, r_func, e_dev_name,
>>> +		pci_bus_num(e_bus), e_device, r_func);
>>>       
>> Please don't fprintf() unconditionally.
>>     
>
> OK; however, a vmdk file open does that so I though it was alright to do it.
>   

I obviously don't use vmdk or else I would have removed that by now :-)

>> A lot more checks are needed here to see if things can succeed.  We
>> definitely should bail out if they can't.
>>     
>
> Bailing out is done in the out: label below. What else do  you think can fail? 
> I've taken care of all the cases that do fail IMO.
>
>   
>>> +	return pci_dev;
>>> +out:
>>> +	pci_unregister_device(&pci_dev->dev);
>>> +	return NULL;
>>> +}
>>>       
>
>   
>>> +/*
>>> + * Syntax to assign device:
>>> + *
>>> + * -pcidevice dev=bus:dev.func,dma=dma
>>> + *
>>> + * Example:
>>> + * -pcidevice host=00:13.0,dma=pvdma
>>> + *
>>> + * dma can currently only be 'none' to disable iommu support.
>>>       
>> Does it actually work if you disable iommu support?
>>     
>
> If the guest is 1:1 mapped.
>   

You mean with Andrea's reserved ram patches?

>>> +#include <sys/mman.h>
>>>       
>> Don't think this is needed here.
>>     
>
> We use mmap(), so this is needed.
>   

Ah.

>>> +    /* Initialize assigned devices */
>>> +    if (pci_enabled) {
>>> +        int r = -1;
>>> +        do {
>>> +            init_assigned_device(pci_bus, &r);
>>>       
>> Why pass r by reference instead of just returning it?  At any rate, you
>> should detect when this fails and gracefully terminate QEMU.
>>     
>
> 'r' is the count of the number of assigned devices -- mostly needed because we 
> have the data stored in an array. If we migrate to a list, this can be 
> relaxed.
>
> ATM, I start the guest without assigning the device. I haven't figured out a 
> way to gracefully terminate qemu yet.
>   

In the case of hot plug, you fail the hot plug.  If you start with 
device assignment, just doing an "exit" would be sufficient.

>>> +#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
>>> +	    case QEMU_OPTION_pcidevice:
>>> +		add_assigned_device(optarg);
>>>       
>> You should copy into an array, then in pc.c, iterate through the array
>> and call into add_assigned_device.
>>     
>
> Is there any benefit in doing this? We're moving the iterate out of vl.c to 
> pc.c and both will happen at the same time.
>   

It's how everything else works.

Regards,

Anthony Liguori

>> Regards,
>>
>> Anthony Liguori
>>     
>
> Thanks!
>
> Amit.
>   


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 6/7] KVM/userspace: Build vtd.c for Intel IOMMU support
  2008-09-24  4:25               ` Amit Shah
@ 2008-09-24 15:08                 ` Anthony Liguori
  0 siblings, 0 replies; 31+ messages in thread
From: Anthony Liguori @ 2008-09-24 15:08 UTC (permalink / raw)
  To: Amit Shah; +Cc: avi, kvm, muli, benami, weidong.han, allen.m.kay

Amit Shah wrote:
> * On Tuesday 23 Sep 2008 22:01:10 Anthony Liguori wrote:
>   
>> Amit Shah wrote:
>>     
>>> Signed-off-by: Amit Shah <amit.shah@redhat.com>
>>> ---
>>>  kernel/x86/Kbuild |    3 +++
>>>  1 files changed, 3 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/kernel/x86/Kbuild b/kernel/x86/Kbuild
>>> index 8dc0483..a4cd00c 100644
>>> --- a/kernel/x86/Kbuild
>>> +++ b/kernel/x86/Kbuild
>>> @@ -5,6 +5,9 @@ kvm-objs := kvm_main.o x86.o mmu.o x86_emulate.o
>>> ../anon_inodes.o irq.o i8259.o ifeq ($(CONFIG_KVM_TRACE),y)
>>>  kvm-objs += kvm_trace.o
>>>  endif
>>> +ifeq ($(CONFIG_DMAR),y)
>>> +kvm-objs += vtd.o
>>> +endif
>>>  kvm-intel-objs := vmx.o vmx-debug.o ../external-module-compat.o
>>>  kvm-amd-objs := svm.o ../external-module-compat.o
>>>       
>> Where's the file come from?
>>     
>
> Already in the kernel tree -- arch/x86/kvm/vtd.c
>   

So this is independent of the rest of the series?  Any reason not to 
commit this Avi?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests
  2008-09-24 15:07               ` Anthony Liguori
@ 2008-09-24 17:10                 ` Amit Shah
  0 siblings, 0 replies; 31+ messages in thread
From: Amit Shah @ 2008-09-24 17:10 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: avi, kvm, muli, benami, weidong.han, allen.m.kay

* On Wednesday 24 Sep 2008 20:37:24 Anthony Liguori wrote:
> Amit Shah wrote:
> > * On Tuesday 23 Sep 2008 22:00:32 Anthony Liguori wrote:
> >> Amit Shah wrote:
> >>> diff --git a/qemu/Makefile.target b/qemu/Makefile.target
> >>> index 72f3db8..40eb273 100644
> >>> --- a/qemu/Makefile.target
> >>> +++ b/qemu/Makefile.target
> >>> @@ -616,6 +616,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
> >>>  OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
> >>>  OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
> >>>  OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o
> >>> +OBJS+= device-assignment.o
> >>
> >> This needs to be conditional on at least linux hosts, but probably also
> >> kvm support.
> >
> > I didn't see any other file that's doing it. So I added this conditional
> > in vl.c by having a #if defined(__linux__). That's how usb-linux.c does
> > it as well. Is there a better way?
>
> aio and compatfd currently do it this way.  block-raw-win32 and
> block-raw-posix are this way.  We're slowly moving things away from
> #ifdef #else #endif to conditional compilation.

So is

ifdef(CONFIG_LINUX)
OBJS+=..
endif

supposed to work?
(or is it CONFIG_LINUX_USER?)

> > Not the whole functionality needs kvm support. This should be able to
> > work even without kvm (for example, when the guest is 1:1 mapped in the
> > host address space).
>
> KVM is needed for interrupt remapping though.  That's something I don't
> see happening for normal userspace any time soon.

We already have an implementation that I've not submitted. I remember at least 
one user asking to support that functionality (because he wanted to try this 
on powerpc).

> >>> +	/* FIXME: Add support for emulated MMIO for non-kvm guests */
> >>> +	if (kvm_enabled()) {
> >>
> >> This doesn't work at all if kvm isn't enabled right?  You should
> >> probably bail out in the init if kvm isn't enabled.  If this whole file
> >> is included conditionally based on KVM support, then you don't have to
> >> worry about using kvm_enabled() guards to conditionally compile out
> >> code.
> >
> > Non-kvm support is currently broken and should be fixed, but that can
> > happen after we get this merged.
>
> But it would take bouncing interrupts to userspace?  I don't think that
> will ever happen upstream personally.  At any rate, there's no point in
> even trying to support something like that until progress is made
> upstream on this front.

With 1:1 mapping (Andrea's patches) and --no-kvm-irqchip (or indeed --no-kvm) 
and the userspace interrupt bouncing patches, we can support this.

> > I can temporarily add a check for kvm_enabled and bail out.
> >
> >>> +	sprintf(dir, "/sys/bus/pci/devices/0000:%02x:%02x.%x/",
> >>> +		r_bus, r_dev, r_func);
> >>
> >> snprintf()
> >
> > It's guarded by the %02x modifiers; so this doesn't depend on user input.
>
> strcpy or sprintf should never be used.  It doesn't matter if it's safe
> in a particular instance.  There are safer functions to use (like
> snprintf).

Hmm, qemu is littered with such strcpy()s though. I'll change this.

> All it takes is for someone to come along and change the /sys/bus path
> to be larger without adjusting the buffer size and everything goes to
> hell.  It's inherently brittle.
>
> >>> +	fprintf(stderr, "Registered host PCI device %02x:%02x.%1x "
> >>> +		"(\"%s\") as guest device %02x:%02x.%1x\n",
> >>> +		r_bus, r_dev, r_func, e_dev_name,
> >>> +		pci_bus_num(e_bus), e_device, r_func);
> >>
> >> Please don't fprintf() unconditionally.
> >
> > OK; however, a vmdk file open does that so I though it was alright to do
> > it.
>
> I obviously don't use vmdk or else I would have removed that by now :-)
>
> >> A lot more checks are needed here to see if things can succeed.  We
> >> definitely should bail out if they can't.
> >
> > Bailing out is done in the out: label below. What else do  you think can
> > fail? I've taken care of all the cases that do fail IMO.
> >
> >>> +	return pci_dev;
> >>> +out:
> >>> +	pci_unregister_device(&pci_dev->dev);
> >>> +	return NULL;
> >>> +}
> >>>
> >>>
> >>>
> >>> +/*
> >>> + * Syntax to assign device:
> >>> + *
> >>> + * -pcidevice dev=bus:dev.func,dma=dma
> >>> + *
> >>> + * Example:
> >>> + * -pcidevice host=00:13.0,dma=pvdma
> >>> + *
> >>> + * dma can currently only be 'none' to disable iommu support.
> >>
> >> Does it actually work if you disable iommu support?
> >
> > If the guest is 1:1 mapped.
>
> You mean with Andrea's reserved ram patches?

Yes.

> >>> +#include <sys/mman.h>
> >>
> >> Don't think this is needed here.
> >
> > We use mmap(), so this is needed.
>
> Ah.
>
> >>> +    /* Initialize assigned devices */
> >>> +    if (pci_enabled) {
> >>> +        int r = -1;
> >>> +        do {
> >>> +            init_assigned_device(pci_bus, &r);
> >>
> >> Why pass r by reference instead of just returning it?  At any rate, you
> >> should detect when this fails and gracefully terminate QEMU.
> >
> > 'r' is the count of the number of assigned devices -- mostly needed
> > because we have the data stored in an array. If we migrate to a list,
> > this can be relaxed.
> >
> > ATM, I start the guest without assigning the device. I haven't figured
> > out a way to gracefully terminate qemu yet.
>
> In the case of hot plug, you fail the hot plug.  If you start with
> device assignment, just doing an "exit" would be sufficient.

What about allocated memory? I'm sure there'll be lots of leaks in case of 
just exit().

> >>> +#if defined(TARGET_I386) || defined(TARGET_X86_64) ||
> >>> defined(__linux__) +	    case QEMU_OPTION_pcidevice:
> >>> +		add_assigned_device(optarg);
> >>
> >> You should copy into an array, then in pc.c, iterate through the array
> >> and call into add_assigned_device.
> >
> > Is there any benefit in doing this? We're moving the iterate out of vl.c
> > to pc.c and both will happen at the same time.
>
> It's how everything else works.

But there's no particular benefit to it. On the down side, we'll need some 
array (the size of which will not be known since we don't know how many 
devices will be assigned). This current scheme is simple; can't this stay 
this way?

Amit

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests
  2008-09-23 14:54         ` [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests Amit Shah
  2008-09-23 14:54           ` [PATCH 6/7] KVM/userspace: Build vtd.c for Intel IOMMU support Amit Shah
  2008-09-23 16:30           ` [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests Anthony Liguori
@ 2008-09-25  4:54           ` Yang, Sheng
  2008-09-25  5:20             ` Yang, Sheng
  2008-09-26  1:34           ` Yang, Sheng
  3 siblings, 1 reply; 31+ messages in thread
From: Yang, Sheng @ 2008-09-25  4:54 UTC (permalink / raw)
  To: kvm
  Cc: Amit Shah, avi@redhat.com, muli@il.ibm.com, anthony@codemonkey.ws,
	benami@il.ibm.com, Han, Weidong, Kay, Allen M

On Tuesday 23 September 2008 22:54:53 Amit Shah wrote:
> From: Or Sagi <ors@tutis.com>
> From: Nir Peleg <nir@tutis.com>
> From: Amit Shah <amit.shah@redhat.com>
> From: Ben-Ami Yassour <benami@il.ibm.com>
> From: Weidong Han <weidong.han@intel.com>
> From: Glauber de Oliveira Costa <gcosta@redhat.com>
>
> With this patch, we can assign a device on the host machine to a
> guest.
>
> A new command-line option, -pcidevice is added.
> For example, to invoke it for a device sitting at PCI bus:dev.fn
> 04:08.0, use this:
>
>         -pcidevice host=04:08.0
>
> * The host driver for the device, if any, is to be removed before
> assigning the device (else device assignment will fail).
>
> * A device that shares IRQ with another host device cannot currently
> be assigned.
>
> This works only with the in-kernel irqchip method; to use the
> userspace irqchip, a kernel module (irqhook) and some extra changes
> are needed.
>

Hi Amit

I am afraid I got this when try to enable VT-d.

create_userspace_phys_mem: Invalid argument
assigned_dev_iomem_map: Error: create new mapping failed

Can you have a look at it? (and the patch you sent to Weidong don't got this 
problem.)

Thanks.
--
regards
Yang, Sheng

> Signed-off-by: Amit Shah <amit.shah@redhat.com>
> ---
>  qemu/Makefile.target        |    1 +
>  qemu/hw/device-assignment.c |  665
> +++++++++++++++++++++++++++++++++++++++++++ qemu/hw/device-assignment.h |  
> 93 ++++++
>  qemu/hw/pc.c                |    9 +
>  qemu/hw/pci.c               |    7 +
>  qemu/vl.c                   |   18 ++
>  6 files changed, 793 insertions(+), 0 deletions(-)
>  create mode 100644 qemu/hw/device-assignment.c
>  create mode 100644 qemu/hw/device-assignment.h
>
> diff --git a/qemu/Makefile.target b/qemu/Makefile.target
> index 72f3db8..40eb273 100644
> --- a/qemu/Makefile.target
> +++ b/qemu/Makefile.target
> @@ -616,6 +616,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
>  OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
>  OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
>  OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o
> +OBJS+= device-assignment.o
>  ifeq ($(USE_KVM_PIT), 1)
>  OBJS+= i8254-kvm.o
>  endif
> diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
> new file mode 100644
> index 0000000..e70daf2
> --- /dev/null
> +++ b/qemu/hw/device-assignment.c
> @@ -0,0 +1,665 @@
> +/*
> + * Copyright (c) 2007, Neocleus Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> for + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> with + * this program; if not, write to the Free Software Foundation, Inc.,
> 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + *
> + *
> + *  Assign a PCI device from the host to a guest VM.
> + *
> + *  Adapted for KVM by Qumranet.
> + *
> + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
> + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
> + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
> + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
> + */
> +#include <stdio.h>
> +#include <sys/io.h>
> +#include "qemu-kvm.h"
> +#include <linux/kvm_para.h>
> +#include "device-assignment.h"
> +
> +/* From linux/ioport.h */
> +#define IORESOURCE_IO          0x00000100      /* Resource type */
> +#define IORESOURCE_MEM         0x00000200
> +#define IORESOURCE_IRQ         0x00000400
> +#define IORESOURCE_DMA         0x00000800
> +#define IORESOURCE_PREFETCH    0x00001000      /* No side effects */
> +
> +/* #define DEVICE_ASSIGNMENT_DEBUG */
> +
> +#ifdef DEVICE_ASSIGNMENT_DEBUG
> +#define DEBUG(fmt, args...) fprintf(stderr, "%s: " fmt, __func__ , ##
> args) +#else
> +#define DEBUG(fmt, args...)
> +#endif
> +
> +static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr,
> +                                      uint32_t value)
> +{
> +       AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
> +       uint32_t r_pio = (unsigned long)r_access->r_virtbase
> +               + (addr - r_access->e_physbase);
> +
> +       if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
> +               fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x"
> +                       " r_virtbase=%08lx value=%08x\n",
> +                       __func__, r_pio, (int)r_access->e_physbase,
> +                       (unsigned long)r_access->r_virtbase, value);
> +       }
> +       iopl(3);
> +       outb(value, r_pio);
> +}
> +
> +static void assigned_dev_ioport_writew(void *opaque, uint32_t addr,
> +                                      uint32_t value)
> +{
> +       AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
> +       uint32_t r_pio = (unsigned long)r_access->r_virtbase
> +               + (addr - r_access->e_physbase);
> +
> +       if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
> +               fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x"
> +                       " r_virtbase=%08lx value=%08x\n",
> +                       __func__, r_pio, (int)r_access->e_physbase,
> +                       (unsigned long)r_access->r_virtbase, value);
> +       }
> +       iopl(3);
> +       outw(value, r_pio);
> +}
> +
> +static void assigned_dev_ioport_writel(void *opaque, uint32_t addr,
> +                                      uint32_t value)
> +{
> +       AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
> +       uint32_t r_pio = (unsigned long)r_access->r_virtbase
> +               + (addr - r_access->e_physbase);
> +
> +       if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
> +               fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x"
> +                       " r_virtbase=%08lx value=%08x\n",
> +                       __func__, r_pio, (int)r_access->e_physbase,
> +                       (unsigned long)r_access->r_virtbase, value);
> +       }
> +       iopl(3);
> +       outl(value, r_pio);
> +}
> +
> +static uint32_t assigned_dev_ioport_readb(void *opaque, uint32_t addr)
> +{
> +       AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
> +       uint32_t r_pio = (addr - r_access->e_physbase)
> +               + (unsigned long)r_access->r_virtbase;
> +       uint32_t value;
> +
> +       iopl(3);
> +       value = inb(r_pio);
> +       if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
> +               fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x "
> +                       "r_virtbase=%08lx value=%08x\n",
> +                       __func__, r_pio, (int)r_access->e_physbase,
> +                       (unsigned long)r_access->r_virtbase, value);
> +       }
> +       return value;
> +}
> +
> +static uint32_t assigned_dev_ioport_readw(void *opaque, uint32_t addr)
> +{
> +       AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
> +       uint32_t r_pio = (addr - r_access->e_physbase)
> +               + (unsigned long)r_access->r_virtbase;
> +       uint32_t value;
> +
> +       iopl(3);
> +       value = inw(r_pio);
> +       if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
> +               fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x "
> +                       "r_virtbase=%08lx value=%08x\n",
> +                       __func__, r_pio, (int)r_access->e_physbase,
> +                       (unsigned long)r_access->r_virtbase, value);
> +       }
> +       return value;
> +}
> +
> +static uint32_t assigned_dev_ioport_readl(void *opaque, uint32_t addr)
> +{
> +       AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
> +       uint32_t r_pio = (addr - r_access->e_physbase)
> +               + (unsigned long)r_access->r_virtbase;
> +       uint32_t value;
> +
> +       iopl(3);
> +       value = inl(r_pio);
> +       if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
> +               fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x "
> +                       "r_virtbase=%08lx value=%08x\n",
> +                       __func__, r_pio, (int)r_access->e_physbase,
> +                       (unsigned long)r_access->r_virtbase, value);
> +       }
> +       return value;
> +}
> +
> +static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
> +                        uint32_t e_phys, uint32_t e_size, int type)
> +{
> +       AssignedDevice *r_dev = (AssignedDevice *) pci_dev;
> +       AssignedDevRegion *region = &r_dev->v_addrs[region_num];
> +       int first_map = (region->e_size == 0);
> +       int ret = 0;
> +
> +       DEBUG("e_phys=%08x r_virt=%p type=%d len=%08x region_num=%d \n",
> +             e_phys, r_dev->v_addrs[region_num].r_virtbase, type, e_size,
> +             region_num);
> +
> +       region->e_physbase = e_phys;
> +       region->e_size = e_size;
> +
> +       /* FIXME: Add support for emulated MMIO for non-kvm guests */
> +       if (kvm_enabled()) {
> +               if (!first_map)
> +                       kvm_destroy_phys_mem(kvm_context, e_phys, e_size);
> +               if (e_size > 0)
> +                       ret = kvm_register_phys_mem(kvm_context, e_phys,
> +                                                   region->r_virtbase,
> +                                                   e_size, 0);
> +               if (ret != 0)
> +                       fprintf(stderr,
> +                               "%s: Error: create new mapping failed\n",
> +                               __func__);
> +       }
> +}
> +
> +static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num,
> +                                   uint32_t addr, uint32_t size, int type)
> +{
> +       AssignedDevice *r_dev = (AssignedDevice *) pci_dev;
> +
> +       r_dev->v_addrs[region_num].e_physbase = addr;
> +       DEBUG("%s: address=0x%x type=0x%x len=%d region_num=%d \n",
> +             __func__, addr, type, size, region_num);
> +
> +       register_ioport_read(addr, size, 1, assigned_dev_ioport_readb,
> +                            (void *) (r_dev->v_addrs + region_num));
> +       register_ioport_read(addr, size, 2, assigned_dev_ioport_readw,
> +                            (void *) (r_dev->v_addrs + region_num));
> +       register_ioport_read(addr, size, 4, assigned_dev_ioport_readl,
> +                            (void *) (r_dev->v_addrs + region_num));
> +       register_ioport_write(addr, size, 1, assigned_dev_ioport_writeb,
> +                             (void *) (r_dev->v_addrs + region_num));
> +       register_ioport_write(addr, size, 2, assigned_dev_ioport_writew,
> +                             (void *) (r_dev->v_addrs + region_num));
> +       register_ioport_write(addr, size, 4, assigned_dev_ioport_writel,
> +                             (void *) (r_dev->v_addrs + region_num));
> +}
> +
> +static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address,
> +                                         uint32_t val, int len)
> +{
> +       int fd, r;
> +
> +       DEBUG("%s: (%x.%x): address=%04x val=0x%08x len=%d\n",
> +             __func__, ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
> +             (uint16_t) address, val, len);
> +
> +       if (address == 0x4) {
> +               pci_default_write_config(d, address, val, len);
> +               /* Continue to program the card */
> +       }
> +
> +       if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
> +           address == 0x3c || address == 0x3d) {
> +               /* used for update-mappings (BAR emulation) */
> +               pci_default_write_config(d, address, val, len);
> +               return;
> +       }
> +       DEBUG("%s: NON BAR (%x.%x): address=%04x val=0x%08x len=%d\n",
> +             __func__, ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
> +             (uint16_t) address, val, len);
> +       fd = ((AssignedDevice *)d)->real_device.config_fd;
> +       r = lseek(fd, address, SEEK_SET);
> +       if (r < 0) {
> +               fprintf(stderr, "%s: bad seek, errno = %d\n",
> +                       __func__, errno);
> +               return;
> +       }
> +again:
> +       r = write(fd, &val, len);
> +       if (r < 0) {
> +               if (errno == EINTR || errno == EAGAIN)
> +                       goto again;
> +               fprintf(stderr, "%s: write failed, errno = %d\n",
> +                       __func__, errno);
> +       }
> +}
> +
> +static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t
> address, +                                            int len)
> +{
> +       uint32_t val = 0;
> +       int fd, r;
> +
> +       if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
> +           address == 0x3c || address == 0x3d) {
> +               val = pci_default_read_config(d, address, len);
> +               DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> +                     (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address,
> val, +                     len);
> +               return val;
> +       }
> +
> +       /* vga specific, remove later */
> +       if (address == 0xFC)
> +               goto do_log;
> +
> +       fd = ((AssignedDevice *)d)->real_device.config_fd;
> +       r = lseek(fd, address, SEEK_SET);
> +       if (r < 0) {
> +               fprintf(stderr, "%s: bad seek, errno = %d\n",
> +                       __func__, errno);
> +               return val;
> +       }
> +again:
> +       r = read(fd, &val, len);
> +       if (r < 0) {
> +               if (errno == EINTR || errno == EAGAIN)
> +                       goto again;
> +               fprintf(stderr, "%s: read failed, errno = %d\n",
> +                       __func__, errno);
> +       }
> +do_log:
> +       DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> +             (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len);
> +
> +       /* kill the special capabilities */
> +       if (address == 4 && len == 4)
> +               val &= ~0x100000;
> +       else if (address == 6)
> +               val &= ~0x10;
> +
> +       return val;
> +}
> +
> +static int assigned_dev_register_regions(PCIRegion *io_regions,
> +                                        unsigned long regions_num,
> +                                        AssignedDevice *pci_dev)
> +{
> +       uint32_t i;
> +       PCIRegion *cur_region = io_regions;
> +
> +       for (i = 0; i < regions_num; i++, cur_region++) {
> +               if (!cur_region->valid)
> +                       continue;
> +#ifdef DEVICE_ASSIGNMENT_DEBUG
> +               pci_dev->v_addrs[i].debug |= DEVICE_ASSIGNMENT_DEBUG_MMIO
> +                                            | DEVICE_ASSIGNMENT_DEBUG_PIO;
> +#endif
> +               pci_dev->v_addrs[i].num = i;
> +
> +               /* handle memory io regions */
> +               if (cur_region->type & IORESOURCE_MEM) {
> +                       int t = cur_region->type & IORESOURCE_PREFETCH
> +                               ? PCI_ADDRESS_SPACE_MEM_PREFETCH
> +                               : PCI_ADDRESS_SPACE_MEM;
> +
> +                       /* map physical memory */
> +                       pci_dev->v_addrs[i].e_physbase =
> cur_region->base_addr; +                      
> pci_dev->v_addrs[i].r_virtbase =
> +                               mmap(NULL,
> +                                    (cur_region->size + 0xFFF) &
> 0xFFFFF000, +                                    PROT_WRITE | PROT_READ,
> MAP_SHARED, +                                    cur_region->resource_fd,
> (off_t) 0); +
> +                       if ((void *) -1 == pci_dev->v_addrs[i].r_virtbase)
> { +                               fprintf(stderr, "%s: Error: Couldn't mmap
> 0x%x!" +                                       "\n", __func__,
> +                                       (uint32_t)
> (cur_region->base_addr)); +                               return -1;
> +                       }
> +                       pci_dev->v_addrs[i].r_size = cur_region->size;
> +                       pci_dev->v_addrs[i].e_size = 0;
> +
> +                       /* add offset */
> +                       pci_dev->v_addrs[i].r_virtbase +=
> +                               (cur_region->base_addr & 0xFFF);
> +
> +                       pci_register_io_region((PCIDevice *) pci_dev, i,
> +                                              cur_region->size, t,
> +                                              assigned_dev_iomem_map);
> +                       continue;
> +               }
> +               /* handle port io regions */
> +               pci_register_io_region((PCIDevice *) pci_dev, i,
> +                                      cur_region->size,
> PCI_ADDRESS_SPACE_IO, +                                     
> assigned_dev_ioport_map);
> +
> +               pci_dev->v_addrs[i].e_physbase = cur_region->base_addr;
> +               pci_dev->v_addrs[i].r_virtbase =
> +                       (void *)(long)cur_region->base_addr;
> +               /* not relevant for port io */
> +               pci_dev->v_addrs[i].memory_index = 0;
> +       }
> +
> +       /* success */
> +       return 0;
> +}
> +
> +static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus,
> +                          uint8_t r_dev, uint8_t r_func)
> +{
> +       char dir[128], name[128], comp[16];
> +       int fd, r = 0;
> +       FILE *f;
> +       unsigned long long start, end, size, flags;
> +       PCIRegion *rp;
> +       PCIDevRegions *dev = &pci_dev->real_device;
> +
> +       dev->region_number = 0;
> +
> +       sprintf(dir, "/sys/bus/pci/devices/0000:%02x:%02x.%x/",
> +               r_bus, r_dev, r_func);
> +       strcpy(name, dir);
> +       strcat(name, "config");
> +       fd = open(name, O_RDWR);
> +       if (fd == -1) {
> +               fprintf(stderr, "%s: %s: %m\n", __func__, name);
> +               return 1;
> +       }
> +       dev->config_fd = fd;
> +again:
> +       r = read(fd, pci_dev->dev.config, sizeof pci_dev->dev.config);
> +       if (r < 0) {
> +               if (errno == EINTR || errno == EAGAIN)
> +                       goto again;
> +               fprintf(stderr, "%s: read failed, errno = %d\n",
> +                       __func__, errno);
> +       }
> +       strcpy(name, dir);
> +       strcat(name, "resource");
> +
> +       f = fopen(name, "r");
> +       if (f == NULL) {
> +               fprintf(stderr, "%s: %s: %m\n", __func__, name);
> +               return 1;
> +       }
> +       for (r = 0; fscanf(f, "%lli %lli %lli\n", &start, &end, &flags) ==
> 3; +            r++) {
> +               rp = dev->regions + r;
> +               rp->valid = 0;
> +               size = end - start + 1;
> +               flags &= IORESOURCE_IO | IORESOURCE_MEM |
> IORESOURCE_PREFETCH; +               if (size == 0 || (flags &
> ~IORESOURCE_PREFETCH) == 0) +                       continue;
> +               if (flags & IORESOURCE_MEM) {
> +                       flags &= ~IORESOURCE_IO;
> +                       sprintf(comp, "resource%d", r);
> +                       strcpy(name, dir);
> +                       strcat(name, comp);
> +                       fd = open(name, O_RDWR);
> +                       if (fd == -1)
> +                               continue;               /* probably ROM */
> +                       rp->resource_fd = fd;
> +               } else
> +                       flags &= ~IORESOURCE_PREFETCH;
> +
> +               rp->type = flags;
> +               rp->valid = 1;
> +               rp->base_addr = start;
> +               rp->size = size;
> +               DEBUG("%s: region %d size %d start 0x%x type %d "
> +                     "resource_fd %d\n", __func__, r, rp->size, start,
> +                     rp->type, rp->resource_fd);
> +       }
> +       fclose(f);
> +
> +       dev->region_number = r;
> +       return 0;
> +}
> +
> +#define        MAX_ASSIGNED_DEVS 4
> +struct {
> +       char name[15];
> +       int bus;
> +       int dev;
> +       int func;
> +       AssignedDevice *assigned_dev;
> +} assigned_devices[MAX_ASSIGNED_DEVS];
> +
> +int nr_assigned_devices;
> +static int disable_iommu;
> +
> +static uint32_t calc_assigned_dev_id(uint8_t bus, uint8_t devfn)
> +{
> +       return (uint32_t)bus << 8 | (uint32_t)devfn;
> +}
> +
> +static AssignedDevice *register_real_device(PCIBus *e_bus,
> +                                           const char *e_dev_name,
> +                                           int e_devfn, uint8_t r_bus,
> +                                           uint8_t r_dev, uint8_t r_func)
> +{
> +       int r;
> +       AssignedDevice *pci_dev;
> +       uint8_t e_device, e_intx;
> +
> +       DEBUG("%s: Registering real physical device %s (devfn=0x%x)\n",
> +             __func__, e_dev_name, e_devfn);
> +
> +       pci_dev = (AssignedDevice *)
> +               pci_register_device(e_bus, e_dev_name,
> sizeof(AssignedDevice), +                                   e_devfn,
> assigned_dev_pci_read_config, +                                  
> assigned_dev_pci_write_config); +       if (NULL == pci_dev) {
> +               fprintf(stderr, "%s: Error: Couldn't register real device
> %s\n", +                       __func__, e_dev_name);
> +               return NULL;
> +       }
> +       if (get_real_device(pci_dev, r_bus, r_dev, r_func)) {
> +               fprintf(stderr, "%s: Error: Couldn't get real device
> (%s)!\n", +                       __func__, e_dev_name);
> +               goto out;
> +       }
> +
> +       /* handle real device's MMIO/PIO BARs */
> +       if (assigned_dev_register_regions(pci_dev->real_device.regions,
> +                                        
> pci_dev->real_device.region_number, +                                      
>   pci_dev))
> +               goto out;
> +
> +       /* handle interrupt routing */
> +       e_device = (pci_dev->dev.devfn >> 3) & 0x1f;
> +       e_intx = pci_dev->dev.config[0x3d] - 1;
> +       pci_dev->intpin = e_intx;
> +       pci_dev->run = 0;
> +       pci_dev->girq = 0;
> +       pci_dev->h_busnr = r_bus;
> +       pci_dev->h_devfn = PCI_DEVFN(r_dev, r_func);
> +
> +#ifdef KVM_CAP_DEVICE_ASSIGNMENT
> +       if (kvm_enabled()) {
> +               struct kvm_assigned_pci_dev assigned_dev_data;
> +
> +               memset(&assigned_dev_data, 0, sizeof(assigned_dev_data));
> +               assigned_dev_data.assigned_dev_id  =
> +                       calc_assigned_dev_id(pci_dev->h_busnr,
> +                                            (uint32_t)pci_dev->h_devfn);
> +               assigned_dev_data.busnr = pci_dev->h_busnr;
> +               assigned_dev_data.devfn = pci_dev->h_devfn;
> +
> +#ifdef KVM_CAP_IOMMU
> +               /* We always enable the IOMMU if present
> +                * (or when not disabled on the command line)
> +                */
> +               r = kvm_check_extension(kvm_context, KVM_CAP_IOMMU);
> +               if (r && !disable_iommu)
> +                       assigned_dev_data.flags |=
> KVM_DEV_ASSIGN_ENABLE_IOMMU; +#endif
> +               r = kvm_assign_pci_device(kvm_context, &assigned_dev_data);
> +               if (r < 0) {
> +                       fprintf(stderr, "Could not notify kernel about "
> +                               "assigned device \"%s\"\n", e_dev_name);
> +                       perror("pt-ioctl");
> +                       goto out;
> +               }
> +       }
> +#endif
> +       fprintf(stderr, "Registered host PCI device %02x:%02x.%1x "
> +               "(\"%s\") as guest device %02x:%02x.%1x\n",
> +               r_bus, r_dev, r_func, e_dev_name,
> +               pci_bus_num(e_bus), e_device, r_func);
> +
> +       return pci_dev;
> +out:
> +       pci_unregister_device(&pci_dev->dev);
> +       return NULL;
> +}
> +
> +extern int get_param_value(char *buf, int buf_size,
> +                          const char *tag, const char *str);
> +extern int piix_get_irq(int);
> +
> +#ifdef KVM_CAP_DEVICE_ASSIGNMENT
> +/* The pci config space got updated. Check if irq numbers have changed
> + * for our devices
> + */
> +void assigned_dev_update_irq(PCIDevice *d)
> +{
> +       int i, irq, r;
> +       AssignedDevice *assigned_dev;
> +
> +       for (i = 0; i < nr_assigned_devices; i++) {
> +               assigned_dev = assigned_devices[i].assigned_dev;
> +               if (assigned_dev == NULL)
> +                       continue;
> +
> +               irq = pci_map_irq(&assigned_dev->dev,
> assigned_dev->intpin); +               irq = piix_get_irq(irq);
> +
> +               if (irq != assigned_dev->girq) {
> +                       struct kvm_assigned_irq assigned_irq_data;
> +
> +                       memset(&assigned_irq_data, 0, sizeof
> assigned_irq_data); +                      
> assigned_irq_data.assigned_dev_id  =
> +                               calc_assigned_dev_id(assigned_dev->h_busnr,
> +                                                    (uint8_t)
> +                                                   
> assigned_dev->h_devfn); +                       assigned_irq_data.guest_irq
> = irq;
> +                       assigned_irq_data.host_irq =
> +                               assigned_dev->real_device.irq;
> +                       r = kvm_assign_irq(kvm_context,
> &assigned_irq_data); +                       if (r < 0) {
> +                               perror("assigned_dev_update_irq");
> +                               fprintf(stderr, "Are you assigning a device
> " +                                       "that shares IRQ with some other
> " +                                       "device?\n");
> +                               pci_unregister_device(&assigned_dev->dev);
> +                               continue;
> +                       }
> +                       assigned_dev->girq = irq;
> +               }
> +       }
> +}
> +#endif
> +
> +static int init_device_assignment(void)
> +{
> +       /* Do we have any devices to be assigned? */
> +       if (nr_assigned_devices == 0)
> +               return -1;
> +       iopl(3);
> +       return 0;
> +}
> +
> +struct PCIDevice *init_assigned_device(PCIBus *bus, int *index)
> +{
> +       AssignedDevice *dev = NULL;
> +       int i;
> +
> +       if (*index == -1) {
> +               if (init_device_assignment() < 0)
> +                       return NULL;
> +
> +               *index = nr_assigned_devices - 1;
> +       }
> +       i = *index;
> +       dev = register_real_device(bus, assigned_devices[i].name, -1,
> +                                  assigned_devices[i].bus,
> +                                  assigned_devices[i].dev,
> +                                  assigned_devices[i].func);
> +       if (dev == NULL) {
> +               fprintf(stderr, "Error: Couldn't register device \"%s\"\n",
> +                       assigned_devices[i].name);
> +       }
> +       assigned_devices[i].assigned_dev = dev;
> +
> +       --*index;
> +       return &dev->dev;
> +}
> +
> +/*
> + * Syntax to assign device:
> + *
> + * -pcidevice dev=bus:dev.func,dma=dma
> + *
> + * Example:
> + * -pcidevice host=00:13.0,dma=pvdma
> + *
> + * dma can currently only be 'none' to disable iommu support.
> + */
> +void add_assigned_device(const char *arg)
> +{
> +       char *cp, *cp1;
> +       char device[8];
> +       char dma[6];
> +       int r;
> +
> +       if (nr_assigned_devices >= MAX_ASSIGNED_DEVS) {
> +               fprintf(stderr, "Too many assigned devices (max %d)\n",
> +                       MAX_ASSIGNED_DEVS);
> +               return;
> +       }
> +       memset(&assigned_devices[nr_assigned_devices], 0,
> +              sizeof assigned_devices[nr_assigned_devices]);
> +
> +       r = get_param_value(device, sizeof device, "host", arg);
> +
> +       r = get_param_value(assigned_devices[nr_assigned_devices].name,
> +                           sizeof
> assigned_devices[nr_assigned_devices].name, +                          
> "name", arg);
> +       if (!r)
> +               strncpy(assigned_devices[nr_assigned_devices].name, device,
> 8); +
> +#ifdef KVM_CAP_IOMMU
> +       r = get_param_value(dma, sizeof dma, "dma", arg);
> +       if (r && !strncmp(dma, "none", 4))
> +               disable_iommu = 1;
> +#endif
> +       cp = device;
> +       assigned_devices[nr_assigned_devices].bus = strtoul(cp, &cp1, 16);
> +       if (*cp1 != ':')
> +               goto bad;
> +       cp = cp1 + 1;
> +
> +       assigned_devices[nr_assigned_devices].dev = strtoul(cp, &cp1, 16);
> +       if (*cp1 != '.')
> +               goto bad;
> +       cp = cp1 + 1;
> +
> +       assigned_devices[nr_assigned_devices].func = strtoul(cp, &cp1, 16);
> +
> +       nr_assigned_devices++;
> +       return;
> +bad:
> +       fprintf(stderr, "pcidevice argument parse error; "
> +               "please check the help text for usage\n");
> +}
> diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h
> new file mode 100644
> index 0000000..b77e484
> --- /dev/null
> +++ b/qemu/hw/device-assignment.h
> @@ -0,0 +1,93 @@
> +/*
> + * Copyright (c) 2007, Neocleus Corporation.
> + * Copyright (c) 2007, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> for + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> with + * this program; if not, write to the Free Software Foundation, Inc.,
> 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + *
> + *  Data structures for storing PCI state
> + *
> + *  Adapted to kvm by Qumranet
> + *
> + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
> + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
> + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
> + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
> + */
> +
> +#ifndef __DEVICE_ASSIGNMENT_H__
> +#define __DEVICE_ASSIGNMENT_H__
> +
> +#include <sys/mman.h>
> +#include "qemu-common.h"
> +#include "pci.h"
> +#include <linux/types.h>
> +
> +#define DEVICE_ASSIGNMENT_DEBUG_PIO    (0x01)
> +#define DEVICE_ASSIGNMENT_DEBUG_MMIO   (0x02)
> +
> +/* From include/linux/pci.h in the kernel sources */
> +#define PCI_DEVFN(slot, func)  ((((slot) & 0x1f) << 3) | ((func) & 0x07))
> +
> +typedef uint32_t pciaddr_t;
> +
> +#define MAX_IO_REGIONS                 (6)
> +
> +typedef struct pci_region_s {
> +       int type;       /* Memory or port I/O */
> +       int valid;
> +       pciaddr_t base_addr;
> +       pciaddr_t size;         /* size of the region */
> +       int resource_fd;
> +} PCIRegion;
> +
> +typedef struct pci_dev_s {
> +       uint8_t bus, dev, func; /* Bus inside domain, device and function
> */ +       int irq;                /* IRQ number */
> +       uint16_t region_number; /* number of active regions */
> +
> +       /* Port I/O or MMIO Regions */
> +       PCIRegion regions[MAX_IO_REGIONS];
> +       int config_fd;
> +} PCIDevRegions;
> +
> +typedef struct assigned_dev_region_s {
> +       target_phys_addr_t e_physbase;
> +       uint32_t memory_index;
> +       void *r_virtbase;       /* mmapped access address */
> +       int num;                /* our index within v_addrs[] */
> +       uint32_t e_size;        /* emulated size of region in bytes */
> +       uint32_t r_size;        /* real size of region in bytes */
> +       uint32_t debug;
> +} AssignedDevRegion;
> +
> +typedef struct assigned_dev_s {
> +       PCIDevice dev;
> +       int intpin;
> +       uint8_t debug_flags;
> +       AssignedDevRegion v_addrs[PCI_NUM_REGIONS];
> +       PCIDevRegions real_device;
> +       int run;
> +       int girq;
> +       unsigned char h_busnr;
> +       unsigned int h_devfn;
> +       int bound;
> +} AssignedDevice;
> +
> +/* Initialization functions */
> +PCIDevice *init_assigned_device(PCIBus *bus, int *index);
> +void add_assigned_device(const char *arg);
> +void assigned_dev_set_vector(int irq, int vector);
> +void assigned_dev_ack_mirq(int vector);
> +
> +#endif                         /* __DEVICE_ASSIGNMENT_H__ */
> diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c
> index 6053103..4a611cc 100644
> --- a/qemu/hw/pc.c
> +++ b/qemu/hw/pc.c
> @@ -32,6 +32,7 @@
>  #include "smbus.h"
>  #include "boards.h"
>  #include "console.h"
> +#include "device-assignment.h"
>
>  #include "qemu-kvm.h"
>
> @@ -1006,6 +1007,14 @@ static void pc_init1(ram_addr_t ram_size, int
> vga_ram_size, }
>      }
>
> +    /* Initialize assigned devices */
> +    if (pci_enabled) {
> +        int r = -1;
> +        do {
> +            init_assigned_device(pci_bus, &r);
> +       } while (r >= 0);
> +    }
> +
>      rtc_state = rtc_init(0x70, i8259[8]);
>
>      qemu_register_boot_set(pc_boot_set, rtc_state);
> diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
> index 61ff0f6..e4e8386 100644
> --- a/qemu/hw/pci.c
> +++ b/qemu/hw/pci.c
> @@ -50,6 +50,7 @@ struct PCIBus {
>
>  static void pci_update_mappings(PCIDevice *d);
>  static void pci_set_irq(void *opaque, int irq_num, int level);
> +void assigned_dev_update_irq(PCIDevice *d);
>
>  target_phys_addr_t pci_mem_base;
>  static int pci_irq_index;
> @@ -453,6 +454,12 @@ void pci_default_write_config(PCIDevice *d,
>          val >>= 8;
>      }
>
> +#ifdef KVM_CAP_DEVICE_ASSIGNMENT
> +    if (kvm_enabled() && qemu_kvm_irqchip_in_kernel() &&
> +       address >= 0x60 && address <= 0x63)
> +       assigned_dev_update_irq(d);
> +#endif
> +
>      end = address + len;
>      if (end > PCI_COMMAND && address < (PCI_COMMAND + 2)) {
>          /* if the command register is modified, we must modify the
> mappings */ diff --git a/qemu/vl.c b/qemu/vl.c
> index 2fb8552..83f28c5 100644
> --- a/qemu/vl.c
> +++ b/qemu/vl.c
> @@ -37,6 +37,7 @@
>  #include "qemu-char.h"
>  #include "block.h"
>  #include "audio/audio.h"
> +#include "hw/device-assignment.h"
>  #include "migration.h"
>  #include "balloon.h"
>  #include "qemu-kvm.h"
> @@ -8469,6 +8470,12 @@ static void help(int exitcode)
>  #endif
>            "-no-kvm-irqchip disable KVM kernel mode PIC/IOAPIC/LAPIC\n"
>            "-no-kvm-pit     disable KVM kernel mode PIT\n"
> +#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
> +          "-pcidevice host=bus:dev.func[,dma=none][,name=\"string\"]\n"
> +          "                expose a PCI device to the guest OS.\n"
> +          "                dma=none: don't perform any dma translations
> (default is to use an iommu)\n" +          "                'string' is
> used in log output.\n"
> +#endif
>  #endif
>  #ifdef TARGET_I386
>             "-std-vga        simulate a standard VGA card with VESA Bochs
> Extensions\n" @@ -8592,6 +8599,9 @@ enum {
>      QEMU_OPTION_no_kvm,
>      QEMU_OPTION_no_kvm_irqchip,
>      QEMU_OPTION_no_kvm_pit,
> +#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
> +    QEMU_OPTION_pcidevice,
> +#endif
>      QEMU_OPTION_no_reboot,
>      QEMU_OPTION_no_shutdown,
>      QEMU_OPTION_show_cursor,
> @@ -8680,6 +8690,9 @@ const QEMUOption qemu_options[] = {
>  #endif
>      { "no-kvm-irqchip", 0, QEMU_OPTION_no_kvm_irqchip },
>      { "no-kvm-pit", 0, QEMU_OPTION_no_kvm_pit },
> +#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
> +    { "pcidevice", HAS_ARG, QEMU_OPTION_pcidevice },
> +#endif
>  #endif
>  #if defined(TARGET_PPC) || defined(TARGET_SPARC)
>      { "g", 1, QEMU_OPTION_g },
> @@ -9586,6 +9599,11 @@ int main(int argc, char **argv)
>                 kvm_pit = 0;
>                 break;
>             }
> +#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
> +           case QEMU_OPTION_pcidevice:
> +               add_assigned_device(optarg);
> +               break;
> +#endif
>  #endif
>              case QEMU_OPTION_usb:
>                  usb_enabled = 1;
> --
> 1.5.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests
  2008-09-25  4:54           ` Yang, Sheng
@ 2008-09-25  5:20             ` Yang, Sheng
  0 siblings, 0 replies; 31+ messages in thread
From: Yang, Sheng @ 2008-09-25  5:20 UTC (permalink / raw)
  To: kvm
  Cc: Amit Shah, avi@redhat.com, muli@il.ibm.com, anthony@codemonkey.ws,
	benami@il.ibm.com, Han, Weidong, Kay, Allen M

On Thursday 25 September 2008 12:54:46 Yang, Sheng wrote:
> On Tuesday 23 September 2008 22:54:53 Amit Shah wrote:
> > From: Or Sagi <ors@tutis.com>
> > From: Nir Peleg <nir@tutis.com>
> > From: Amit Shah <amit.shah@redhat.com>
> > From: Ben-Ami Yassour <benami@il.ibm.com>
> > From: Weidong Han <weidong.han@intel.com>
> > From: Glauber de Oliveira Costa <gcosta@redhat.com>
> >
> > With this patch, we can assign a device on the host machine to a
> > guest.
> >
> > A new command-line option, -pcidevice is added.
> > For example, to invoke it for a device sitting at PCI bus:dev.fn
> > 04:08.0, use this:
> >
> >         -pcidevice host=04:08.0
> >
> > * The host driver for the device, if any, is to be removed before
> > assigning the device (else device assignment will fail).
> >
> > * A device that shares IRQ with another host device cannot currently
> > be assigned.
> >
> > This works only with the in-kernel irqchip method; to use the
> > userspace irqchip, a kernel module (irqhook) and some extra changes
> > are needed.
>
> Hi Amit
>
> I am afraid I got this when try to enable VT-d.
>
> create_userspace_phys_mem: Invalid argument
> assigned_dev_iomem_map: Error: create new mapping failed
>
> Can you have a look at it? (and the patch you sent to Weidong don't got
> this problem.)

Oh, Weidong's patch "[PATCH] VT-d: Fix iommu map page for mmio pages" fix it. 
--
regards
Yang, Sheng
>
> Thanks.
> --
> regards
> Yang, Sheng
>
> > Signed-off-by: Amit Shah <amit.shah@redhat.com>
> > ---
> >  qemu/Makefile.target        |    1 +
> >  qemu/hw/device-assignment.c |  665
> > +++++++++++++++++++++++++++++++++++++++++++ qemu/hw/device-assignment.h |
> > 93 ++++++
> >  qemu/hw/pc.c                |    9 +
> >  qemu/hw/pci.c               |    7 +
> >  qemu/vl.c                   |   18 ++
> >  6 files changed, 793 insertions(+), 0 deletions(-)
> >  create mode 100644 qemu/hw/device-assignment.c
> >  create mode 100644 qemu/hw/device-assignment.h
> >
> > diff --git a/qemu/Makefile.target b/qemu/Makefile.target
> > index 72f3db8..40eb273 100644
> > --- a/qemu/Makefile.target
> > +++ b/qemu/Makefile.target
> > @@ -616,6 +616,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o
> >  OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o
> >  OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
> >  OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o
> > +OBJS+= device-assignment.o
> >  ifeq ($(USE_KVM_PIT), 1)
> >  OBJS+= i8254-kvm.o
> >  endif
> > diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
> > new file mode 100644
> > index 0000000..e70daf2
> > --- /dev/null
> > +++ b/qemu/hw/device-assignment.c
> > @@ -0,0 +1,665 @@
> > +/*
> > + * Copyright (c) 2007, Neocleus Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > it + * under the terms and conditions of the GNU General Public License,
> > + * version 2, as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope it will be useful, but
> > WITHOUT + * ANY WARRANTY; without even the implied warranty of
> > MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > General Public License for + * more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > along with + * this program; if not, write to the Free Software
> > Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307
> > USA.
> > + *
> > + *
> > + *  Assign a PCI device from the host to a guest VM.
> > + *
> > + *  Adapted for KVM by Qumranet.
> > + *
> > + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
> > + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
> > + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
> > + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
> > + */
> > +#include <stdio.h>
> > +#include <sys/io.h>
> > +#include "qemu-kvm.h"
> > +#include <linux/kvm_para.h>
> > +#include "device-assignment.h"
> > +
> > +/* From linux/ioport.h */
> > +#define IORESOURCE_IO          0x00000100      /* Resource type */
> > +#define IORESOURCE_MEM         0x00000200
> > +#define IORESOURCE_IRQ         0x00000400
> > +#define IORESOURCE_DMA         0x00000800
> > +#define IORESOURCE_PREFETCH    0x00001000      /* No side effects */
> > +
> > +/* #define DEVICE_ASSIGNMENT_DEBUG */
> > +
> > +#ifdef DEVICE_ASSIGNMENT_DEBUG
> > +#define DEBUG(fmt, args...) fprintf(stderr, "%s: " fmt, __func__ , ##
> > args) +#else
> > +#define DEBUG(fmt, args...)
> > +#endif
> > +
> > +static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr,
> > +                                      uint32_t value)
> > +{
> > +       AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
> > +       uint32_t r_pio = (unsigned long)r_access->r_virtbase
> > +               + (addr - r_access->e_physbase);
> > +
> > +       if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
> > +               fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x"
> > +                       " r_virtbase=%08lx value=%08x\n",
> > +                       __func__, r_pio, (int)r_access->e_physbase,
> > +                       (unsigned long)r_access->r_virtbase, value);
> > +       }
> > +       iopl(3);
> > +       outb(value, r_pio);
> > +}
> > +
> > +static void assigned_dev_ioport_writew(void *opaque, uint32_t addr,
> > +                                      uint32_t value)
> > +{
> > +       AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
> > +       uint32_t r_pio = (unsigned long)r_access->r_virtbase
> > +               + (addr - r_access->e_physbase);
> > +
> > +       if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
> > +               fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x"
> > +                       " r_virtbase=%08lx value=%08x\n",
> > +                       __func__, r_pio, (int)r_access->e_physbase,
> > +                       (unsigned long)r_access->r_virtbase, value);
> > +       }
> > +       iopl(3);
> > +       outw(value, r_pio);
> > +}
> > +
> > +static void assigned_dev_ioport_writel(void *opaque, uint32_t addr,
> > +                                      uint32_t value)
> > +{
> > +       AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
> > +       uint32_t r_pio = (unsigned long)r_access->r_virtbase
> > +               + (addr - r_access->e_physbase);
> > +
> > +       if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
> > +               fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x"
> > +                       " r_virtbase=%08lx value=%08x\n",
> > +                       __func__, r_pio, (int)r_access->e_physbase,
> > +                       (unsigned long)r_access->r_virtbase, value);
> > +       }
> > +       iopl(3);
> > +       outl(value, r_pio);
> > +}
> > +
> > +static uint32_t assigned_dev_ioport_readb(void *opaque, uint32_t addr)
> > +{
> > +       AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
> > +       uint32_t r_pio = (addr - r_access->e_physbase)
> > +               + (unsigned long)r_access->r_virtbase;
> > +       uint32_t value;
> > +
> > +       iopl(3);
> > +       value = inb(r_pio);
> > +       if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
> > +               fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x "
> > +                       "r_virtbase=%08lx value=%08x\n",
> > +                       __func__, r_pio, (int)r_access->e_physbase,
> > +                       (unsigned long)r_access->r_virtbase, value);
> > +       }
> > +       return value;
> > +}
> > +
> > +static uint32_t assigned_dev_ioport_readw(void *opaque, uint32_t addr)
> > +{
> > +       AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
> > +       uint32_t r_pio = (addr - r_access->e_physbase)
> > +               + (unsigned long)r_access->r_virtbase;
> > +       uint32_t value;
> > +
> > +       iopl(3);
> > +       value = inw(r_pio);
> > +       if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
> > +               fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x "
> > +                       "r_virtbase=%08lx value=%08x\n",
> > +                       __func__, r_pio, (int)r_access->e_physbase,
> > +                       (unsigned long)r_access->r_virtbase, value);
> > +       }
> > +       return value;
> > +}
> > +
> > +static uint32_t assigned_dev_ioport_readl(void *opaque, uint32_t addr)
> > +{
> > +       AssignedDevRegion *r_access = (AssignedDevRegion *)opaque;
> > +       uint32_t r_pio = (addr - r_access->e_physbase)
> > +               + (unsigned long)r_access->r_virtbase;
> > +       uint32_t value;
> > +
> > +       iopl(3);
> > +       value = inl(r_pio);
> > +       if (r_access->debug & DEVICE_ASSIGNMENT_DEBUG_PIO) {
> > +               fprintf(stderr, "%s: r_pio=%08x e_physbase=%08x "
> > +                       "r_virtbase=%08lx value=%08x\n",
> > +                       __func__, r_pio, (int)r_access->e_physbase,
> > +                       (unsigned long)r_access->r_virtbase, value);
> > +       }
> > +       return value;
> > +}
> > +
> > +static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num,
> > +                        uint32_t e_phys, uint32_t e_size, int type)
> > +{
> > +       AssignedDevice *r_dev = (AssignedDevice *) pci_dev;
> > +       AssignedDevRegion *region = &r_dev->v_addrs[region_num];
> > +       int first_map = (region->e_size == 0);
> > +       int ret = 0;
> > +
> > +       DEBUG("e_phys=%08x r_virt=%p type=%d len=%08x region_num=%d \n",
> > +             e_phys, r_dev->v_addrs[region_num].r_virtbase, type,
> > e_size, +             region_num);
> > +
> > +       region->e_physbase = e_phys;
> > +       region->e_size = e_size;
> > +
> > +       /* FIXME: Add support for emulated MMIO for non-kvm guests */
> > +       if (kvm_enabled()) {
> > +               if (!first_map)
> > +                       kvm_destroy_phys_mem(kvm_context, e_phys,
> > e_size); +               if (e_size > 0)
> > +                       ret = kvm_register_phys_mem(kvm_context, e_phys,
> > +                                                   region->r_virtbase,
> > +                                                   e_size, 0);
> > +               if (ret != 0)
> > +                       fprintf(stderr,
> > +                               "%s: Error: create new mapping failed\n",
> > +                               __func__);
> > +       }
> > +}
> > +
> > +static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num,
> > +                                   uint32_t addr, uint32_t size, int
> > type) +{
> > +       AssignedDevice *r_dev = (AssignedDevice *) pci_dev;
> > +
> > +       r_dev->v_addrs[region_num].e_physbase = addr;
> > +       DEBUG("%s: address=0x%x type=0x%x len=%d region_num=%d \n",
> > +             __func__, addr, type, size, region_num);
> > +
> > +       register_ioport_read(addr, size, 1, assigned_dev_ioport_readb,
> > +                            (void *) (r_dev->v_addrs + region_num));
> > +       register_ioport_read(addr, size, 2, assigned_dev_ioport_readw,
> > +                            (void *) (r_dev->v_addrs + region_num));
> > +       register_ioport_read(addr, size, 4, assigned_dev_ioport_readl,
> > +                            (void *) (r_dev->v_addrs + region_num));
> > +       register_ioport_write(addr, size, 1, assigned_dev_ioport_writeb,
> > +                             (void *) (r_dev->v_addrs + region_num));
> > +       register_ioport_write(addr, size, 2, assigned_dev_ioport_writew,
> > +                             (void *) (r_dev->v_addrs + region_num));
> > +       register_ioport_write(addr, size, 4, assigned_dev_ioport_writel,
> > +                             (void *) (r_dev->v_addrs + region_num));
> > +}
> > +
> > +static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t
> > address, +                                         uint32_t val, int len)
> > +{
> > +       int fd, r;
> > +
> > +       DEBUG("%s: (%x.%x): address=%04x val=0x%08x len=%d\n",
> > +             __func__, ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
> > +             (uint16_t) address, val, len);
> > +
> > +       if (address == 0x4) {
> > +               pci_default_write_config(d, address, val, len);
> > +               /* Continue to program the card */
> > +       }
> > +
> > +       if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
> > +           address == 0x3c || address == 0x3d) {
> > +               /* used for update-mappings (BAR emulation) */
> > +               pci_default_write_config(d, address, val, len);
> > +               return;
> > +       }
> > +       DEBUG("%s: NON BAR (%x.%x): address=%04x val=0x%08x len=%d\n",
> > +             __func__, ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7),
> > +             (uint16_t) address, val, len);
> > +       fd = ((AssignedDevice *)d)->real_device.config_fd;
> > +       r = lseek(fd, address, SEEK_SET);
> > +       if (r < 0) {
> > +               fprintf(stderr, "%s: bad seek, errno = %d\n",
> > +                       __func__, errno);
> > +               return;
> > +       }
> > +again:
> > +       r = write(fd, &val, len);
> > +       if (r < 0) {
> > +               if (errno == EINTR || errno == EAGAIN)
> > +                       goto again;
> > +               fprintf(stderr, "%s: write failed, errno = %d\n",
> > +                       __func__, errno);
> > +       }
> > +}
> > +
> > +static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t
> > address, +                                            int len)
> > +{
> > +       uint32_t val = 0;
> > +       int fd, r;
> > +
> > +       if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
> > +           address == 0x3c || address == 0x3d) {
> > +               val = pci_default_read_config(d, address, len);
> > +               DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> > +                     (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address,
> > val, +                     len);
> > +               return val;
> > +       }
> > +
> > +       /* vga specific, remove later */
> > +       if (address == 0xFC)
> > +               goto do_log;
> > +
> > +       fd = ((AssignedDevice *)d)->real_device.config_fd;
> > +       r = lseek(fd, address, SEEK_SET);
> > +       if (r < 0) {
> > +               fprintf(stderr, "%s: bad seek, errno = %d\n",
> > +                       __func__, errno);
> > +               return val;
> > +       }
> > +again:
> > +       r = read(fd, &val, len);
> > +       if (r < 0) {
> > +               if (errno == EINTR || errno == EAGAIN)
> > +                       goto again;
> > +               fprintf(stderr, "%s: read failed, errno = %d\n",
> > +                       __func__, errno);
> > +       }
> > +do_log:
> > +       DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> > +             (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val,
> > len); +
> > +       /* kill the special capabilities */
> > +       if (address == 4 && len == 4)
> > +               val &= ~0x100000;
> > +       else if (address == 6)
> > +               val &= ~0x10;
> > +
> > +       return val;
> > +}
> > +
> > +static int assigned_dev_register_regions(PCIRegion *io_regions,
> > +                                        unsigned long regions_num,
> > +                                        AssignedDevice *pci_dev)
> > +{
> > +       uint32_t i;
> > +       PCIRegion *cur_region = io_regions;
> > +
> > +       for (i = 0; i < regions_num; i++, cur_region++) {
> > +               if (!cur_region->valid)
> > +                       continue;
> > +#ifdef DEVICE_ASSIGNMENT_DEBUG
> > +               pci_dev->v_addrs[i].debug |= DEVICE_ASSIGNMENT_DEBUG_MMIO
> > +                                            |
> > DEVICE_ASSIGNMENT_DEBUG_PIO; +#endif
> > +               pci_dev->v_addrs[i].num = i;
> > +
> > +               /* handle memory io regions */
> > +               if (cur_region->type & IORESOURCE_MEM) {
> > +                       int t = cur_region->type & IORESOURCE_PREFETCH
> > +                               ? PCI_ADDRESS_SPACE_MEM_PREFETCH
> > +                               : PCI_ADDRESS_SPACE_MEM;
> > +
> > +                       /* map physical memory */
> > +                       pci_dev->v_addrs[i].e_physbase =
> > cur_region->base_addr; +
> > pci_dev->v_addrs[i].r_virtbase =
> > +                               mmap(NULL,
> > +                                    (cur_region->size + 0xFFF) &
> > 0xFFFFF000, +                                    PROT_WRITE | PROT_READ,
> > MAP_SHARED, +                                    cur_region->resource_fd,
> > (off_t) 0); +
> > +                       if ((void *) -1 ==
> > pci_dev->v_addrs[i].r_virtbase) { +                              
> > fprintf(stderr, "%s: Error: Couldn't mmap 0x%x!" +                       
> >                "\n", __func__,
> > +                                       (uint32_t)
> > (cur_region->base_addr)); +                               return -1;
> > +                       }
> > +                       pci_dev->v_addrs[i].r_size = cur_region->size;
> > +                       pci_dev->v_addrs[i].e_size = 0;
> > +
> > +                       /* add offset */
> > +                       pci_dev->v_addrs[i].r_virtbase +=
> > +                               (cur_region->base_addr & 0xFFF);
> > +
> > +                       pci_register_io_region((PCIDevice *) pci_dev, i,
> > +                                              cur_region->size, t,
> > +                                              assigned_dev_iomem_map);
> > +                       continue;
> > +               }
> > +               /* handle port io regions */
> > +               pci_register_io_region((PCIDevice *) pci_dev, i,
> > +                                      cur_region->size,
> > PCI_ADDRESS_SPACE_IO, +
> > assigned_dev_ioport_map);
> > +
> > +               pci_dev->v_addrs[i].e_physbase = cur_region->base_addr;
> > +               pci_dev->v_addrs[i].r_virtbase =
> > +                       (void *)(long)cur_region->base_addr;
> > +               /* not relevant for port io */
> > +               pci_dev->v_addrs[i].memory_index = 0;
> > +       }
> > +
> > +       /* success */
> > +       return 0;
> > +}
> > +
> > +static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus,
> > +                          uint8_t r_dev, uint8_t r_func)
> > +{
> > +       char dir[128], name[128], comp[16];
> > +       int fd, r = 0;
> > +       FILE *f;
> > +       unsigned long long start, end, size, flags;
> > +       PCIRegion *rp;
> > +       PCIDevRegions *dev = &pci_dev->real_device;
> > +
> > +       dev->region_number = 0;
> > +
> > +       sprintf(dir, "/sys/bus/pci/devices/0000:%02x:%02x.%x/",
> > +               r_bus, r_dev, r_func);
> > +       strcpy(name, dir);
> > +       strcat(name, "config");
> > +       fd = open(name, O_RDWR);
> > +       if (fd == -1) {
> > +               fprintf(stderr, "%s: %s: %m\n", __func__, name);
> > +               return 1;
> > +       }
> > +       dev->config_fd = fd;
> > +again:
> > +       r = read(fd, pci_dev->dev.config, sizeof pci_dev->dev.config);
> > +       if (r < 0) {
> > +               if (errno == EINTR || errno == EAGAIN)
> > +                       goto again;
> > +               fprintf(stderr, "%s: read failed, errno = %d\n",
> > +                       __func__, errno);
> > +       }
> > +       strcpy(name, dir);
> > +       strcat(name, "resource");
> > +
> > +       f = fopen(name, "r");
> > +       if (f == NULL) {
> > +               fprintf(stderr, "%s: %s: %m\n", __func__, name);
> > +               return 1;
> > +       }
> > +       for (r = 0; fscanf(f, "%lli %lli %lli\n", &start, &end, &flags)
> > == 3; +            r++) {
> > +               rp = dev->regions + r;
> > +               rp->valid = 0;
> > +               size = end - start + 1;
> > +               flags &= IORESOURCE_IO | IORESOURCE_MEM |
> > IORESOURCE_PREFETCH; +               if (size == 0 || (flags &
> > ~IORESOURCE_PREFETCH) == 0) +                       continue;
> > +               if (flags & IORESOURCE_MEM) {
> > +                       flags &= ~IORESOURCE_IO;
> > +                       sprintf(comp, "resource%d", r);
> > +                       strcpy(name, dir);
> > +                       strcat(name, comp);
> > +                       fd = open(name, O_RDWR);
> > +                       if (fd == -1)
> > +                               continue;               /* probably ROM
> > */ +                       rp->resource_fd = fd;
> > +               } else
> > +                       flags &= ~IORESOURCE_PREFETCH;
> > +
> > +               rp->type = flags;
> > +               rp->valid = 1;
> > +               rp->base_addr = start;
> > +               rp->size = size;
> > +               DEBUG("%s: region %d size %d start 0x%x type %d "
> > +                     "resource_fd %d\n", __func__, r, rp->size, start,
> > +                     rp->type, rp->resource_fd);
> > +       }
> > +       fclose(f);
> > +
> > +       dev->region_number = r;
> > +       return 0;
> > +}
> > +
> > +#define        MAX_ASSIGNED_DEVS 4
> > +struct {
> > +       char name[15];
> > +       int bus;
> > +       int dev;
> > +       int func;
> > +       AssignedDevice *assigned_dev;
> > +} assigned_devices[MAX_ASSIGNED_DEVS];
> > +
> > +int nr_assigned_devices;
> > +static int disable_iommu;
> > +
> > +static uint32_t calc_assigned_dev_id(uint8_t bus, uint8_t devfn)
> > +{
> > +       return (uint32_t)bus << 8 | (uint32_t)devfn;
> > +}
> > +
> > +static AssignedDevice *register_real_device(PCIBus *e_bus,
> > +                                           const char *e_dev_name,
> > +                                           int e_devfn, uint8_t r_bus,
> > +                                           uint8_t r_dev, uint8_t
> > r_func) +{
> > +       int r;
> > +       AssignedDevice *pci_dev;
> > +       uint8_t e_device, e_intx;
> > +
> > +       DEBUG("%s: Registering real physical device %s (devfn=0x%x)\n",
> > +             __func__, e_dev_name, e_devfn);
> > +
> > +       pci_dev = (AssignedDevice *)
> > +               pci_register_device(e_bus, e_dev_name,
> > sizeof(AssignedDevice), +                                   e_devfn,
> > assigned_dev_pci_read_config, +
> > assigned_dev_pci_write_config); +       if (NULL == pci_dev) {
> > +               fprintf(stderr, "%s: Error: Couldn't register real device
> > %s\n", +                       __func__, e_dev_name);
> > +               return NULL;
> > +       }
> > +       if (get_real_device(pci_dev, r_bus, r_dev, r_func)) {
> > +               fprintf(stderr, "%s: Error: Couldn't get real device
> > (%s)!\n", +                       __func__, e_dev_name);
> > +               goto out;
> > +       }
> > +
> > +       /* handle real device's MMIO/PIO BARs */
> > +       if (assigned_dev_register_regions(pci_dev->real_device.regions,
> > +
> > pci_dev->real_device.region_number, +
> >   pci_dev))
> > +               goto out;
> > +
> > +       /* handle interrupt routing */
> > +       e_device = (pci_dev->dev.devfn >> 3) & 0x1f;
> > +       e_intx = pci_dev->dev.config[0x3d] - 1;
> > +       pci_dev->intpin = e_intx;
> > +       pci_dev->run = 0;
> > +       pci_dev->girq = 0;
> > +       pci_dev->h_busnr = r_bus;
> > +       pci_dev->h_devfn = PCI_DEVFN(r_dev, r_func);
> > +
> > +#ifdef KVM_CAP_DEVICE_ASSIGNMENT
> > +       if (kvm_enabled()) {
> > +               struct kvm_assigned_pci_dev assigned_dev_data;
> > +
> > +               memset(&assigned_dev_data, 0, sizeof(assigned_dev_data));
> > +               assigned_dev_data.assigned_dev_id  =
> > +                       calc_assigned_dev_id(pci_dev->h_busnr,
> > +                                            (uint32_t)pci_dev->h_devfn);
> > +               assigned_dev_data.busnr = pci_dev->h_busnr;
> > +               assigned_dev_data.devfn = pci_dev->h_devfn;
> > +
> > +#ifdef KVM_CAP_IOMMU
> > +               /* We always enable the IOMMU if present
> > +                * (or when not disabled on the command line)
> > +                */
> > +               r = kvm_check_extension(kvm_context, KVM_CAP_IOMMU);
> > +               if (r && !disable_iommu)
> > +                       assigned_dev_data.flags |=
> > KVM_DEV_ASSIGN_ENABLE_IOMMU; +#endif
> > +               r = kvm_assign_pci_device(kvm_context,
> > &assigned_dev_data); +               if (r < 0) {
> > +                       fprintf(stderr, "Could not notify kernel about "
> > +                               "assigned device \"%s\"\n", e_dev_name);
> > +                       perror("pt-ioctl");
> > +                       goto out;
> > +               }
> > +       }
> > +#endif
> > +       fprintf(stderr, "Registered host PCI device %02x:%02x.%1x "
> > +               "(\"%s\") as guest device %02x:%02x.%1x\n",
> > +               r_bus, r_dev, r_func, e_dev_name,
> > +               pci_bus_num(e_bus), e_device, r_func);
> > +
> > +       return pci_dev;
> > +out:
> > +       pci_unregister_device(&pci_dev->dev);
> > +       return NULL;
> > +}
> > +
> > +extern int get_param_value(char *buf, int buf_size,
> > +                          const char *tag, const char *str);
> > +extern int piix_get_irq(int);
> > +
> > +#ifdef KVM_CAP_DEVICE_ASSIGNMENT
> > +/* The pci config space got updated. Check if irq numbers have changed
> > + * for our devices
> > + */
> > +void assigned_dev_update_irq(PCIDevice *d)
> > +{
> > +       int i, irq, r;
> > +       AssignedDevice *assigned_dev;
> > +
> > +       for (i = 0; i < nr_assigned_devices; i++) {
> > +               assigned_dev = assigned_devices[i].assigned_dev;
> > +               if (assigned_dev == NULL)
> > +                       continue;
> > +
> > +               irq = pci_map_irq(&assigned_dev->dev,
> > assigned_dev->intpin); +               irq = piix_get_irq(irq);
> > +
> > +               if (irq != assigned_dev->girq) {
> > +                       struct kvm_assigned_irq assigned_irq_data;
> > +
> > +                       memset(&assigned_irq_data, 0, sizeof
> > assigned_irq_data); +
> > assigned_irq_data.assigned_dev_id  =
> > +                              
> > calc_assigned_dev_id(assigned_dev->h_busnr, +                            
> >                        (uint8_t)
> > +
> > assigned_dev->h_devfn); +                      
> > assigned_irq_data.guest_irq = irq;
> > +                       assigned_irq_data.host_irq =
> > +                               assigned_dev->real_device.irq;
> > +                       r = kvm_assign_irq(kvm_context,
> > &assigned_irq_data); +                       if (r < 0) {
> > +                               perror("assigned_dev_update_irq");
> > +                               fprintf(stderr, "Are you assigning a
> > device " +                                       "that shares IRQ with
> > some other " +                                       "device?\n");
> > +                              
> > pci_unregister_device(&assigned_dev->dev); +                             
> >  continue;
> > +                       }
> > +                       assigned_dev->girq = irq;
> > +               }
> > +       }
> > +}
> > +#endif
> > +
> > +static int init_device_assignment(void)
> > +{
> > +       /* Do we have any devices to be assigned? */
> > +       if (nr_assigned_devices == 0)
> > +               return -1;
> > +       iopl(3);
> > +       return 0;
> > +}
> > +
> > +struct PCIDevice *init_assigned_device(PCIBus *bus, int *index)
> > +{
> > +       AssignedDevice *dev = NULL;
> > +       int i;
> > +
> > +       if (*index == -1) {
> > +               if (init_device_assignment() < 0)
> > +                       return NULL;
> > +
> > +               *index = nr_assigned_devices - 1;
> > +       }
> > +       i = *index;
> > +       dev = register_real_device(bus, assigned_devices[i].name, -1,
> > +                                  assigned_devices[i].bus,
> > +                                  assigned_devices[i].dev,
> > +                                  assigned_devices[i].func);
> > +       if (dev == NULL) {
> > +               fprintf(stderr, "Error: Couldn't register device
> > \"%s\"\n", +                       assigned_devices[i].name);
> > +       }
> > +       assigned_devices[i].assigned_dev = dev;
> > +
> > +       --*index;
> > +       return &dev->dev;
> > +}
> > +
> > +/*
> > + * Syntax to assign device:
> > + *
> > + * -pcidevice dev=bus:dev.func,dma=dma
> > + *
> > + * Example:
> > + * -pcidevice host=00:13.0,dma=pvdma
> > + *
> > + * dma can currently only be 'none' to disable iommu support.
> > + */
> > +void add_assigned_device(const char *arg)
> > +{
> > +       char *cp, *cp1;
> > +       char device[8];
> > +       char dma[6];
> > +       int r;
> > +
> > +       if (nr_assigned_devices >= MAX_ASSIGNED_DEVS) {
> > +               fprintf(stderr, "Too many assigned devices (max %d)\n",
> > +                       MAX_ASSIGNED_DEVS);
> > +               return;
> > +       }
> > +       memset(&assigned_devices[nr_assigned_devices], 0,
> > +              sizeof assigned_devices[nr_assigned_devices]);
> > +
> > +       r = get_param_value(device, sizeof device, "host", arg);
> > +
> > +       r = get_param_value(assigned_devices[nr_assigned_devices].name,
> > +                           sizeof
> > assigned_devices[nr_assigned_devices].name, +
> > "name", arg);
> > +       if (!r)
> > +               strncpy(assigned_devices[nr_assigned_devices].name,
> > device, 8); +
> > +#ifdef KVM_CAP_IOMMU
> > +       r = get_param_value(dma, sizeof dma, "dma", arg);
> > +       if (r && !strncmp(dma, "none", 4))
> > +               disable_iommu = 1;
> > +#endif
> > +       cp = device;
> > +       assigned_devices[nr_assigned_devices].bus = strtoul(cp, &cp1,
> > 16); +       if (*cp1 != ':')
> > +               goto bad;
> > +       cp = cp1 + 1;
> > +
> > +       assigned_devices[nr_assigned_devices].dev = strtoul(cp, &cp1,
> > 16); +       if (*cp1 != '.')
> > +               goto bad;
> > +       cp = cp1 + 1;
> > +
> > +       assigned_devices[nr_assigned_devices].func = strtoul(cp, &cp1,
> > 16); +
> > +       nr_assigned_devices++;
> > +       return;
> > +bad:
> > +       fprintf(stderr, "pcidevice argument parse error; "
> > +               "please check the help text for usage\n");
> > +}
> > diff --git a/qemu/hw/device-assignment.h b/qemu/hw/device-assignment.h
> > new file mode 100644
> > index 0000000..b77e484
> > --- /dev/null
> > +++ b/qemu/hw/device-assignment.h
> > @@ -0,0 +1,93 @@
> > +/*
> > + * Copyright (c) 2007, Neocleus Corporation.
> > + * Copyright (c) 2007, Intel Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > it + * under the terms and conditions of the GNU General Public License,
> > + * version 2, as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope it will be useful, but
> > WITHOUT + * ANY WARRANTY; without even the implied warranty of
> > MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > General Public License for + * more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > along with + * this program; if not, write to the Free Software
> > Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307
> > USA.
> > + *
> > + *  Data structures for storing PCI state
> > + *
> > + *  Adapted to kvm by Qumranet
> > + *
> > + *  Copyright (c) 2007, Neocleus, Alex Novik (alex@neocleus.com)
> > + *  Copyright (c) 2007, Neocleus, Guy Zana (guy@neocleus.com)
> > + *  Copyright (C) 2008, Qumranet, Amit Shah (amit.shah@qumranet.com)
> > + *  Copyright (C) 2008, Red Hat, Amit Shah (amit.shah@redhat.com)
> > + */
> > +
> > +#ifndef __DEVICE_ASSIGNMENT_H__
> > +#define __DEVICE_ASSIGNMENT_H__
> > +
> > +#include <sys/mman.h>
> > +#include "qemu-common.h"
> > +#include "pci.h"
> > +#include <linux/types.h>
> > +
> > +#define DEVICE_ASSIGNMENT_DEBUG_PIO    (0x01)
> > +#define DEVICE_ASSIGNMENT_DEBUG_MMIO   (0x02)
> > +
> > +/* From include/linux/pci.h in the kernel sources */
> > +#define PCI_DEVFN(slot, func)  ((((slot) & 0x1f) << 3) | ((func) &
> > 0x07)) +
> > +typedef uint32_t pciaddr_t;
> > +
> > +#define MAX_IO_REGIONS                 (6)
> > +
> > +typedef struct pci_region_s {
> > +       int type;       /* Memory or port I/O */
> > +       int valid;
> > +       pciaddr_t base_addr;
> > +       pciaddr_t size;         /* size of the region */
> > +       int resource_fd;
> > +} PCIRegion;
> > +
> > +typedef struct pci_dev_s {
> > +       uint8_t bus, dev, func; /* Bus inside domain, device and function
> > */ +       int irq;                /* IRQ number */
> > +       uint16_t region_number; /* number of active regions */
> > +
> > +       /* Port I/O or MMIO Regions */
> > +       PCIRegion regions[MAX_IO_REGIONS];
> > +       int config_fd;
> > +} PCIDevRegions;
> > +
> > +typedef struct assigned_dev_region_s {
> > +       target_phys_addr_t e_physbase;
> > +       uint32_t memory_index;
> > +       void *r_virtbase;       /* mmapped access address */
> > +       int num;                /* our index within v_addrs[] */
> > +       uint32_t e_size;        /* emulated size of region in bytes */
> > +       uint32_t r_size;        /* real size of region in bytes */
> > +       uint32_t debug;
> > +} AssignedDevRegion;
> > +
> > +typedef struct assigned_dev_s {
> > +       PCIDevice dev;
> > +       int intpin;
> > +       uint8_t debug_flags;
> > +       AssignedDevRegion v_addrs[PCI_NUM_REGIONS];
> > +       PCIDevRegions real_device;
> > +       int run;
> > +       int girq;
> > +       unsigned char h_busnr;
> > +       unsigned int h_devfn;
> > +       int bound;
> > +} AssignedDevice;
> > +
> > +/* Initialization functions */
> > +PCIDevice *init_assigned_device(PCIBus *bus, int *index);
> > +void add_assigned_device(const char *arg);
> > +void assigned_dev_set_vector(int irq, int vector);
> > +void assigned_dev_ack_mirq(int vector);
> > +
> > +#endif                         /* __DEVICE_ASSIGNMENT_H__ */
> > diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c
> > index 6053103..4a611cc 100644
> > --- a/qemu/hw/pc.c
> > +++ b/qemu/hw/pc.c
> > @@ -32,6 +32,7 @@
> >  #include "smbus.h"
> >  #include "boards.h"
> >  #include "console.h"
> > +#include "device-assignment.h"
> >
> >  #include "qemu-kvm.h"
> >
> > @@ -1006,6 +1007,14 @@ static void pc_init1(ram_addr_t ram_size, int
> > vga_ram_size, }
> >      }
> >
> > +    /* Initialize assigned devices */
> > +    if (pci_enabled) {
> > +        int r = -1;
> > +        do {
> > +            init_assigned_device(pci_bus, &r);
> > +       } while (r >= 0);
> > +    }
> > +
> >      rtc_state = rtc_init(0x70, i8259[8]);
> >
> >      qemu_register_boot_set(pc_boot_set, rtc_state);
> > diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
> > index 61ff0f6..e4e8386 100644
> > --- a/qemu/hw/pci.c
> > +++ b/qemu/hw/pci.c
> > @@ -50,6 +50,7 @@ struct PCIBus {
> >
> >  static void pci_update_mappings(PCIDevice *d);
> >  static void pci_set_irq(void *opaque, int irq_num, int level);
> > +void assigned_dev_update_irq(PCIDevice *d);
> >
> >  target_phys_addr_t pci_mem_base;
> >  static int pci_irq_index;
> > @@ -453,6 +454,12 @@ void pci_default_write_config(PCIDevice *d,
> >          val >>= 8;
> >      }
> >
> > +#ifdef KVM_CAP_DEVICE_ASSIGNMENT
> > +    if (kvm_enabled() && qemu_kvm_irqchip_in_kernel() &&
> > +       address >= 0x60 && address <= 0x63)
> > +       assigned_dev_update_irq(d);
> > +#endif
> > +
> >      end = address + len;
> >      if (end > PCI_COMMAND && address < (PCI_COMMAND + 2)) {
> >          /* if the command register is modified, we must modify the
> > mappings */ diff --git a/qemu/vl.c b/qemu/vl.c
> > index 2fb8552..83f28c5 100644
> > --- a/qemu/vl.c
> > +++ b/qemu/vl.c
> > @@ -37,6 +37,7 @@
> >  #include "qemu-char.h"
> >  #include "block.h"
> >  #include "audio/audio.h"
> > +#include "hw/device-assignment.h"
> >  #include "migration.h"
> >  #include "balloon.h"
> >  #include "qemu-kvm.h"
> > @@ -8469,6 +8470,12 @@ static void help(int exitcode)
> >  #endif
> >            "-no-kvm-irqchip disable KVM kernel mode PIC/IOAPIC/LAPIC\n"
> >            "-no-kvm-pit     disable KVM kernel mode PIT\n"
> > +#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
> > +          "-pcidevice host=bus:dev.func[,dma=none][,name=\"string\"]\n"
> > +          "                expose a PCI device to the guest OS.\n"
> > +          "                dma=none: don't perform any dma translations
> > (default is to use an iommu)\n" +          "                'string' is
> > used in log output.\n"
> > +#endif
> >  #endif
> >  #ifdef TARGET_I386
> >             "-std-vga        simulate a standard VGA card with VESA Bochs
> > Extensions\n" @@ -8592,6 +8599,9 @@ enum {
> >      QEMU_OPTION_no_kvm,
> >      QEMU_OPTION_no_kvm_irqchip,
> >      QEMU_OPTION_no_kvm_pit,
> > +#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
> > +    QEMU_OPTION_pcidevice,
> > +#endif
> >      QEMU_OPTION_no_reboot,
> >      QEMU_OPTION_no_shutdown,
> >      QEMU_OPTION_show_cursor,
> > @@ -8680,6 +8690,9 @@ const QEMUOption qemu_options[] = {
> >  #endif
> >      { "no-kvm-irqchip", 0, QEMU_OPTION_no_kvm_irqchip },
> >      { "no-kvm-pit", 0, QEMU_OPTION_no_kvm_pit },
> > +#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
> > +    { "pcidevice", HAS_ARG, QEMU_OPTION_pcidevice },
> > +#endif
> >  #endif
> >  #if defined(TARGET_PPC) || defined(TARGET_SPARC)
> >      { "g", 1, QEMU_OPTION_g },
> > @@ -9586,6 +9599,11 @@ int main(int argc, char **argv)
> >                 kvm_pit = 0;
> >                 break;
> >             }
> > +#if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(__linux__)
> > +           case QEMU_OPTION_pcidevice:
> > +               add_assigned_device(optarg);
> > +               break;
> > +#endif
> >  #endif
> >              case QEMU_OPTION_usb:
> >                  usb_enabled = 1;
> > --
> > 1.5.4.3
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests
  2008-09-23 14:54         ` [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests Amit Shah
                             ` (2 preceding siblings ...)
  2008-09-25  4:54           ` Yang, Sheng
@ 2008-09-26  1:34           ` Yang, Sheng
  3 siblings, 0 replies; 31+ messages in thread
From: Yang, Sheng @ 2008-09-26  1:34 UTC (permalink / raw)
  To: kvm
  Cc: Amit Shah, avi@redhat.com, muli@il.ibm.com, anthony@codemonkey.ws,
	benami@il.ibm.com, Han, Weidong, Kay, Allen M

On Tuesday 23 September 2008 22:54:53 Amit Shah wrote:
> +static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t
> address, +                                            int len)
> +{
> +       uint32_t val = 0;
> +       int fd, r;
> +
> +       if ((address >= 0x10 && address <= 0x24) || address == 0x34 ||
> +           address == 0x3c || address == 0x3d) {
> +               val = pci_default_read_config(d, address, len);
> +               DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n",
> +                     (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address,
> val, +                     len);
> +               return val;
> +       }
> +
> +       /* vga specific, remove later */
> +       if (address == 0xFC)
> +               goto do_log;
> +
> +       fd = ((AssignedDevice *)d)->real_device.config_fd;
> +       r = lseek(fd, address, SEEK_SET);
> +       if (r < 0) {
> +               fprintf(stderr, "%s: bad seek, errno = %d\n",
> +                       __func__, errno);
> +               return val;
> +       }

This read from configuration space method got a little trouble: vender id and 
device id read from configuration space directly rather than "vender" 
and "device" file in the sysfs. That's cause trouble with some device that 
configuration space inconsistent with "vender" and "device" file, e.g. some 
fix up by host PCI subsystem in kernel. 

Maybe it can be delay a little for a following patch, but we should address 
this issue... Maybe we can use libpci? There are more fields than vender and 
device got this problem, like "irq".

--
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2008-09-26  1:33 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-23 14:54 [V6] Userspace patches for PCI device assignment Amit Shah
2008-09-23 14:54 ` [PATCH 1/7] KVM/userspace: Device Assignment: Add ioctl wrappers needed for assigning devices Amit Shah
2008-09-23 14:54   ` [PATCH 2/7] qemu: Introduce pci_map_irq to get irq nr from pin number for a PCI device Amit Shah
2008-09-23 14:54     ` [PATCH 3/7] qemu: piix: Introduce functions to get pin number from irq and vice versa Amit Shah
2008-09-23 14:54       ` [PATCH 4/7] qemu: Include hw.h in qemu/hw/isa.h to fix compile issues Amit Shah
2008-09-23 14:54         ` [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests Amit Shah
2008-09-23 14:54           ` [PATCH 6/7] KVM/userspace: Build vtd.c for Intel IOMMU support Amit Shah
2008-09-23 14:54             ` [PATCH 7/7] KVM/userspace: Device Assignment: Support for hot plugging PCI devices Amit Shah
2008-09-23 16:32               ` Anthony Liguori
2008-09-24  4:24                 ` Amit Shah
2008-09-23 16:31             ` [PATCH 6/7] KVM/userspace: Build vtd.c for Intel IOMMU support Anthony Liguori
2008-09-24  4:25               ` Amit Shah
2008-09-24 15:08                 ` Anthony Liguori
2008-09-23 16:30           ` [PATCH 5/7] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests Anthony Liguori
2008-09-23 18:32             ` Muli Ben-Yehuda
2008-09-23 19:18               ` Anthony Liguori
2008-09-23 19:43                 ` Muli Ben-Yehuda
2008-09-23 19:58                   ` Anthony Liguori
2008-09-24  4:58             ` Amit Shah
2008-09-24 15:07               ` Anthony Liguori
2008-09-24 17:10                 ` Amit Shah
2008-09-25  4:54           ` Yang, Sheng
2008-09-25  5:20             ` Yang, Sheng
2008-09-26  1:34           ` Yang, Sheng
2008-09-23 16:13         ` [PATCH 4/7] qemu: Include hw.h in qemu/hw/isa.h to fix compile issues Anthony Liguori
2008-09-24  4:27           ` Amit Shah
2008-09-24 11:35             ` Amit Shah
2008-09-24 14:59             ` Anthony Liguori
2008-09-23 16:13       ` [PATCH 3/7] qemu: piix: Introduce functions to get pin number from irq and vice versa Anthony Liguori
2008-09-24  4:28         ` Amit Shah
2008-09-23 16:12     ` [PATCH 2/7] qemu: Introduce pci_map_irq to get irq nr from pin number for a PCI device Anthony Liguori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).