* [Qemu-devel] [PATCH v2 1/5] vfio: Introduce documentation for VFIO driver
2012-01-23 17:20 [Qemu-devel] [PATCH v2 0/5] VFIO core framework Alex Williamson
@ 2012-01-23 17:20 ` Alex Williamson
2012-01-23 17:20 ` [Qemu-devel] [PATCH v2 2/5] vfio: VFIO core header Alex Williamson
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Alex Williamson @ 2012-01-23 17:20 UTC (permalink / raw)
To: chrisw, aik, david, joerg.roedel, agraf, benve, aafabbri, B08248,
B07421, avi, konrad.wilk, kvm, qemu-devel, iommu, linux-pci,
linux-kernel
Including rationale for design, example usage and API description.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
Documentation/vfio.txt | 359 ++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 359 insertions(+), 0 deletions(-)
create mode 100644 Documentation/vfio.txt
diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
new file mode 100644
index 0000000..4dfccf6
--- /dev/null
+++ b/Documentation/vfio.txt
@@ -0,0 +1,359 @@
+VFIO - "Virtual Function I/O"[1]
+-------------------------------------------------------------------------------
+Many modern system now provide DMA and interrupt remapping facilities
+to help ensure I/O devices behave within the boundaries they've been
+allotted. This includes x86 hardware with AMD-Vi and Intel VT-d,
+POWER systems with Partitionable Endpoints (PEs) and embedded PowerPC
+systems such as Freescale PAMU. The VFIO driver is an IOMMU/device
+agnostic framework for exposing direct device access to userspace, in
+a secure, IOMMU protected environment. In other words, this allows
+safe[2], non-privileged, userspace drivers.
+
+Why do we want that? Virtual machines often make use of direct device
+access ("device assignment") when configured for the highest possible
+I/O performance. From a device and host perspective, this simply
+turns the VM into a userspace driver, with the benefits of
+significantly reduced latency, higher bandwidth, and direct use of
+bare-metal device drivers[3].
+
+Some applications, particularly in the high performance computing
+field, also benefit from low-overhead, direct device access from
+userspace. Examples include network adapters (often non-TCP/IP based)
+and compute accelerators. Prior to VFIO, these drivers had to either
+go through the full development cycle to become proper upstream
+driver, be maintained out of tree, or make use of the UIO framework,
+which has no notion of IOMMU protection, limited interrupt support,
+and requires root privileges to access things like PCI configuration
+space.
+
+The VFIO driver framework intends to unify these, replacing both the
+KVM PCI specific device assignment code as well as provide a more
+secure, more featureful userspace driver environment than UIO.
+
+Groups, Devices, and IOMMUs
+-------------------------------------------------------------------------------
+
+Userspace drivers are primarily concerned with manipulating individual
+devices and setting up mappings in the IOMMU for those devices.
+Unfortunately, the IOMMU doesn't always have the granularity to track
+mappings for an individual device. Sometimes this is a topology
+barrier, such as a PCIe-to-PCI bridge interposing the device and
+IOMMU, other times this is an IOMMU limitation. In any case, the
+reality is that devices are not always independent with respect to the
+IOMMU. Translations setup for one device can be used by another
+device in these scenarios.
+
+The IOMMU API exposes these relationships by identifying an "IOMMU
+group" for these dependent devices. Devices on the same bus with the
+same IOMMU group (or just "group" for this document) are not isolated
+from each other with respect to DMA mappings. For userspace usage,
+this logically means that instead of being able to grant ownership of
+an individual device, we must grant ownership of a group, which may
+contain one or more devices.
+
+These groups therefore become a fundamental component of VFIO and the
+working unit we use for exposing devices and granting permissions to
+userspace. In addition, VFIO make efforts to ensure the integrity of
+the group for user access. This includes ensuring that all devices
+within the group are controlled by VFIO (vs native host drivers)
+before allowing a user to access any member of the group or the IOMMU
+mappings, as well as maintaining the group viability as devices are
+dynamically added or removed from the system.
+
+To access a device through VFIO, a user must open a character device
+for the group that the device belongs to and then issue an ioctl to
+retrieve a file descriptor for the individual device. This ensures
+that the user has permissions to the group (file based access to the
+/dev entry) and allows a check point at which VFIO can deny access to
+the device if the group is not viable (all devices within the group
+controlled by VFIO). A file descriptor for the IOMMU is obtain in the
+same fashion.
+
+VFIO defines a standard set of APIs for access to devices and a
+modular interface for adding new, bus-specific VFIO device drivers.
+We call these "VFIO bus drivers". The vfio-pci module is an example
+of a bus driver for exposing PCI devices. When the bus driver module
+is loaded it enumerates all of the devices for it's bus, registering
+each device with the vfio core along with a set of callbacks. For
+buses that support hotplug, the bus driver also adds itself to the
+notification chain for such events. The callbacks registered with
+each device implement the VFIO device access API for that bus.
+
+The VFIO device API includes ioctls for describing the device, the I/O
+regions and their read/write/mmap offsets on the device descriptor, as
+well as mechanisms for describing and registering interrupt
+notifications.
+
+The VFIO IOMMU object is accessed in a similar way; an ioctl on the
+group provides a file descriptor for programming the IOMMU. Like
+devices, the IOMMU file descriptor is only accessible when a group is
+viable. The API for the IOMMU is effectively a userspace extension of
+the kernel IOMMU API. The IOMMU provides an ioctl to describe the
+IOMMU domain as well as to setup and teardown DMA mappings. As the
+IOMMU API is extended to support more esoteric IOMMU implementations,
+it's expected that the VFIO interface will also evolve.
+
+To facilitate this evolution, all of the VFIO interfaces are designed
+for extensions. Particularly, for all structures passed via ioctl, we
+include a structure size and flags field. We also define the ioctl
+request to be independent of passed structure size. This allows us to
+later add structure fields and define flags as necessary. It's
+expected that each additional field will have an associated flag to
+indicate whether the data is valid. Additionally, we provide an
+"info" ioctl for each file descriptor, which allows us to flag new
+features as they're added (ex. an IOMMU domain configuration ioctl).
+
+The final aspect of VFIO is the notion of merging groups. In both the
+assignment of devices to virtual machines and the pure userspace
+driver model, it's expect that a single user instance is likely to
+have multiple groups in use simultaneously. For a virtual machine,
+this can happen simply by assigning multiple devices to a guest that
+belong to different groups. If these groups are all using the same
+set of IOMMU mappings, the overhead of userspace setting up and
+tearing down the mappings, as well as the internal IOMMU driver
+overhead of managing those mappings can be non-trivial. On x86, the
+IOMMU will often map the full guest memory, allowing for transparent
+device assignment. Therefore any device assigned to a given guest
+will make use of identical IOMMU mappings. Some IOMMU implementations
+are able to easily reduce the overhead this generates by simply using
+the same set of page tables across multiple groups. VFIO allows users
+to take advantage of this option by merging groups together,
+effectively creating a super group (NB IOMMU groups only define the
+minimum granularity).
+
+A user can attempt to merge groups together by calling the merge ioctl
+on one group (the "merger") and pass the file descriptor for the group
+to be merged in (the "mergee"). Note that existing DMA mappings
+cannot be atomically merged between groups, it's therefore a
+requirement that the mergee group is not in use. This is enforced by
+not allowing open device or iommu file descriptors on the mergee group
+at the time of merging. The merger group can be actively in use at
+the time of merging. Likewise, to unmerge a group, none of the device
+file descriptors for the group being removed can be in use. The
+remaining merged group can be actively in use.
+
+If the groups cannot be merged, the ioctl will fail and the user will
+need to manage the groups independently. Users should have no
+expectation for group merging to be successful. Some platforms may
+not support it at all, others may only enable merging of sufficiently
+similar groups. If the ioctl succeeds, then the group file
+descriptors are effectively fungible between the groups. That is,
+instead of their actions being isolated to the individual group, each
+of them are gateways into the combined, merged group. For instance,
+retrieving an IOMMU file descriptor from any group returns a reference
+to the same object, mappings to that IOMMU descriptor are visible to
+all devices in the merged group, and device descriptors can be
+retrieved for any device in the merged group from any one of the group
+file descriptors. In effect, a user can manage devices and the IOMMU
+of a merged group using a single file descriptor (saving the other
+merged group file descriptors away only for later unmerging) without
+the permission complications of creating a separate "super group"
+character device.
+
+VFIO Usage Example
+-------------------------------------------------------------------------------
+
+Assume user wants to access PCI device 0000:06:0d.0
+
+$ cat /sys/bus/pci/devices/0000:06:0d.0/iommu_group
+240
+
+Since this device is on the "pci" bus, the user can then find the
+character device for interacting with the VFIO group as:
+
+$ ls -l /dev/vfio/pci:240
+crw-rw---- 1 root root 252, 27 Dec 15 15:13 /dev/vfio/pci:240
+
+We can also examine other members of the group through sysfs:
+
+$ ls -l /sys/devices/virtual/vfio/pci:240/devices/
+total 0
+lrwxrwxrwx 1 root root 0 Dec 20 12:01 0000:06:0d.0 -> \
+ ../../../../pci0000:00/0000:00:1e.0/0000:06:0d.0
+lrwxrwxrwx 1 root root 0 Dec 20 12:01 0000:06:0d.1 -> \
+ ../../../../pci0000:00/0000:00:1e.0/0000:06:0d.1
+
+This group therefore contains two devices[4]. VFIO will prevent
+device or iommu manipulation unless all group members are attached to
+the vfio bus driver, so we simply unbind the devices from their
+current driver and rebind them to vfio:
+
+#!/bin/sh
+for i in /sys/devices/virtual/vfio/pci:240/devices/*; do
+ dir=$(readlink -f $i)
+ if [ -L $dir/driver ]; then
+ echo $(basename $i) > $dir/driver/unbind
+ fi
+ vendor=$(cat $dir/vendor)
+ device=$(cat $dir/device)
+ echo $vendor $device > /sys/bus/pci/drivers/vfio/new_id
+ echo $(basename $i) > /sys/bus/pci/drivers/vfio/bind
+done
+
+# chown user:user /dev/vfio/pci:240
+
+The user now has full access to all the devices and the iommu for this
+group and can access them as follows:
+
+ int group, iommu, device, i;
+ struct vfio_group_info group_info = { .argsz = sizeof(group_info) };
+ struct vfio_iommu_info iommu_info = { .argsz = sizeof(iommu_info) };
+ struct vfio_dma_map dma_map = { .argsz = sizeof(dma_map) };
+ struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
+
+ /* Open the group */
+ group = open("/dev/vfio/pci:240", O_RDWR);
+
+ /* Test the group is viable and available */
+ ioctl(group, VFIO_GROUP_GET_INFO, &group_info);
+
+ if (!(group_info.flags & VFIO_GROUP_FLAGS_VIABLE))
+ /* Group is not viable */
+
+ if ((group_info.flags & VFIO_GROUP_FLAGS_MM_LOCKED))
+ /* Already in use by someone else */
+
+ /* Get a file descriptor for the IOMMU */
+ iommu = ioctl(group, VFIO_GROUP_GET_IOMMU_FD);
+
+ /* Test the IOMMU is what we expect */
+ ioctl(iommu, VFIO_IOMMU_GET_INFO, &iommu_info);
+
+ /* Allocate some space and setup a DMA mapping */
+ dma_map.vaddr = mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+ dma_map.size = 1024 * 1024;
+ dma_map.iova = 0; /* 1MB starting at 0x0 from device view */
+ dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
+
+ ioctl(iommu, VFIO_IOMMU_MAP_DMA, &dma_map);
+
+ /* Get a file descriptor for the device */
+ device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "0000:06:0d.0");
+
+ /* Test and setup the device */
+ ioctl(device, VFIO_DEVICE_GET_INFO, &device_info);
+
+ for (i = 0; i < device_info.num_regions; i++) {
+ struct vfio_region_info reg = { .argsz = sizeof(reg) };
+
+ reg.index = i;
+
+ ioctl(device, VFIO_DEVICE_GET_REGION_INFO, ®);
+
+ /* Setup mappings... read/write offsets, mmaps
+ * For PCI devices, config space is a region */
+ }
+
+ for (i = 0; i < device_info.num_irqs; i++) {
+ struct vfio_irq_info irq = { .argsz = sizeof(irq) };
+
+ irq.index = i;
+
+ ioctl(device, VFIO_DEVICE_GET_IRQ_INFO, ®);
+
+ /* Setup IRQs... eventfds, VFIO_DEVICE_SET_IRQS */
+ }
+
+ /* Gratuitous device reset and go... */
+ ioctl(device, VFIO_DEVICE_RESET);
+
+VFIO User API
+-------------------------------------------------------------------------------
+
+Please see include/linux/vfio.h for complete API documentation.
+
+VFIO bus driver API
+-------------------------------------------------------------------------------
+
+Bus drivers, such as PCI, have three jobs:
+ 1) Add/remove devices from vfio
+ 2) Provide vfio_device_ops for device access
+ 3) Device binding and unbinding
+
+When initialized, the bus driver should enumerate the devices on its
+bus and call vfio_group_add_dev() for each device. If the bus
+supports hotplug, notifiers should be enabled to track devices being
+added and removed. vfio_group_del_dev() removes a previously added
+device from vfio.
+
+extern int vfio_group_add_dev(struct device *dev,
+ const struct vfio_device_ops *ops);
+extern void vfio_group_del_dev(struct device *dev);
+
+Adding a device registers a vfio_device_ops function pointer structure
+for the device:
+
+struct vfio_device_ops {
+ bool (*match)(struct device *dev, char *buf);
+ int (*claim)(struct device *dev);
+ int (*open)(void *device_data);
+ void (*release)(void *device_data);
+ ssize_t (*read)(void *device_data, char __user *buf,
+ size_t count, loff_t *ppos);
+ ssize_t (*write)(void *device_data, const char __user *buf,
+ size_t size, loff_t *ppos);
+ long (*ioctl)(void *device_data, unsigned int cmd,
+ unsigned long arg);
+ int (*mmap)(void *device_data, struct vm_area_struct *vma);
+};
+
+For buses supporting hotplug, all functions are required to be
+implemented. Non-hotplug buses do not need to implement claim().
+
+match() provides a device specific method for associating a struct
+device to a user provided string. Many drivers may simply strcmp the
+buffer to dev_name().
+
+claim() is used when a device is hot-added to a group that is already
+in use. This is how VFIO requests that a bus driver manually takes
+ownership of a device. The expected call path for this is triggered
+from the bus add notifier. The bus driver calls vfio_group_add_dev for
+the newly added device, vfio-core determines this group is already in
+use and calls claim on the bus driver. This triggers the bus driver
+to call it's own probe function, including calling vfio_bind_dev to
+mark the device as controlled by vfio. The device is then available
+for use by the group.
+
+The remaining vfio_device_ops are similar to a simplified struct
+file_operations except a device_data pointer is provided rather than a
+file pointer. The device_data is an opaque structure registered by
+the bus driver when a device is bound to the vfio bus driver:
+
+extern int vfio_bind_dev(struct device *dev, void *device_data);
+extern void *vfio_unbind_dev(struct device *dev);
+
+When the device is unbound from the driver, the bus driver will call
+vfio_unbind_dev() which will return the device_data for any bus driver
+specific cleanup and freeing of the structure. The vfio_unbind_dev
+call may block if the group is currently in use.
+
+-------------------------------------------------------------------------------
+
+[1] VFIO was originally an acronym for "Virtual Function I/O" in it's
+initial implementation by Tom Lyon while as Cisco. We've since
+outgrown the acronym, but it's catchy.
+
+[2] "safe" also depends upon a device being "well behaved". It's
+possible for multi-function devices to have backdoors between
+functions and even for single function devices to have alternative
+access to things like PCI config space through MMIO registers. To
+guard against the former we can include additional precautions in the
+IOMMU driver to group multi-function PCI devices together
+(iommu=group_mf). The latter we can't prevent, but the IOMMU should
+still provide isolation. For PCI, SR-IOV Virtual Functions are the
+best indicator of "well behaved", as these are designed for
+virtualization usage models.
+
+[3] As always there are trade-offs to virtual machine device
+assignment that are beyond the scope of VFIO. It's expected that
+future IOMMU technologies will reduce some, but maybe not all, of
+these trade-offs.
+
+[4] In this case the device is below a PCI bridge, so transactions
+from either function of the device are indistinguishable to the iommu:
+
+-[0000:00]-+-1e.0-[06]--+-0d.0
+ \-0d.1
+
+00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v2 2/5] vfio: VFIO core header
2012-01-23 17:20 [Qemu-devel] [PATCH v2 0/5] VFIO core framework Alex Williamson
2012-01-23 17:20 ` [Qemu-devel] [PATCH v2 1/5] vfio: Introduce documentation for VFIO driver Alex Williamson
@ 2012-01-23 17:20 ` Alex Williamson
2012-01-23 17:21 ` [Qemu-devel] [PATCH v2 3/5] vfio: VFIO core group interface Alex Williamson
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Alex Williamson @ 2012-01-23 17:20 UTC (permalink / raw)
To: chrisw, aik, david, joerg.roedel, agraf, benve, aafabbri, B08248,
B07421, avi, konrad.wilk, kvm, qemu-devel, iommu, linux-pci,
linux-kernel
This defines both the user and bus driver APIs.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
Documentation/ioctl/ioctl-number.txt | 1
include/linux/vfio.h | 395 ++++++++++++++++++++++++++++++++++
2 files changed, 396 insertions(+), 0 deletions(-)
create mode 100644 include/linux/vfio.h
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 2550754..79c5ef8 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -88,6 +88,7 @@ Code Seq#(hex) Include File Comments
and kernel/power/user.c
'8' all SNP8023 advanced NIC card
<mailto:mcr@solidum.com>
+';' 64-83 linux/vfio.h
'@' 00-0F linux/radeonfb.h conflict!
'@' 00-0F drivers/video/aty/aty128fb.c conflict!
'A' 00-1F linux/apm_bios.h conflict!
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
new file mode 100644
index 0000000..797dbe4
--- /dev/null
+++ b/include/linux/vfio.h
@@ -0,0 +1,395 @@
+/*
+ * VFIO API definition
+ *
+ * Copyright (C) 2012 Red Hat, Inc. All rights reserved.
+ * Author: Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef VFIO_H
+#define VFIO_H
+
+#include <linux/types.h>
+
+#ifdef __KERNEL__ /* Internal VFIO-core/bus driver API */
+
+/**
+ * struct vfio_device_ops - VFIO bus driver device callbacks
+ *
+ * @match: Return true if buf describes the device
+ * @claim: Force driver to attach to device
+ * @open: Called when userspace receives file descriptor for device
+ * @release: Called when userspace releases file descriptor for device
+ * @read: Perform read(2) on device file descriptor
+ * @write: Perform write(2) on device file descriptor
+ * @ioctl: Perform ioctl(2) on device file descriptor, supporting VFIO_DEVICE_*
+ * operations documented below
+ * @mmap: Perform mmap(2) on a region of the device file descriptor
+ */
+struct vfio_device_ops {
+ bool (*match)(struct device *dev, const char *buf);
+ int (*claim)(struct device *dev);
+ int (*open)(void *device_data);
+ void (*release)(void *device_data);
+ ssize_t (*read)(void *device_data, char __user *buf,
+ size_t count, loff_t *ppos);
+ ssize_t (*write)(void *device_data, const char __user *buf,
+ size_t count, loff_t *size);
+ long (*ioctl)(void *device_data, unsigned int cmd,
+ unsigned long arg);
+ int (*mmap)(void *device_data, struct vm_area_struct *vma);
+};
+
+/**
+ * vfio_group_add_dev() - Add a device to the vfio-core
+ *
+ * @dev: Device to add
+ * @ops: VFIO bus driver callbacks for device
+ *
+ * This registration makes the VFIO core aware of the device, creates
+ * groups objects as required and exposes chardevs under /dev/vfio.
+ *
+ * Return 0 on success, errno on failure.
+ */
+extern int vfio_group_add_dev(struct device *dev,
+ const struct vfio_device_ops *ops);
+
+/**
+ * vfio_group_del_dev() - Remove a device from the vfio-core
+ *
+ * @dev: Device to remove
+ *
+ * Remove a device previously added to the VFIO core, removing groups
+ * and chardevs as necessary.
+ */
+extern void vfio_group_del_dev(struct device *dev);
+
+/**
+ * vfio_bind_dev() - Indicate device is bound to the VFIO bus driver and
+ * register private data structure for ops callbacks.
+ *
+ * @dev: Device being bound
+ * @device_data: VFIO bus driver private data
+ *
+ * This registration indicate that a device previously registered with
+ * vfio_group_add_dev() is now available for use by the VFIO core. When
+ * all devices within a group are available, the group is viable and my
+ * be used by userspace drivers. Typically called from VFIO bus driver
+ * probe function.
+ *
+ * Return 0 on success, errno on failure
+ */
+extern int vfio_bind_dev(struct device *dev, void *device_data);
+
+/**
+ * vfio_unbind_dev() - Indicate device is unbinding from VFIO bus driver
+ *
+ * @dev: Device being unbound
+ *
+ * De-registration of the device previously registered with vfio_bind_dev()
+ * from VFIO. Upon completion, the device is no longer available for use by
+ * the VFIO core. Typically called from the VFIO bus driver remove function.
+ * The VFIO core will attempt to release the device from users and may take
+ * measures to free the device and/or block as necessary.
+ *
+ * Returns pointer to private device_data structure registered with
+ * vfio_bind_dev().
+ */
+extern void *vfio_unbind_dev(struct device *dev);
+
+
+/**
+ * offsetofend(TYPE, MEMBER)
+ *
+ * @TYPE: The type of the structure
+ * @MEMBER: The member within the structure to get the end offset of
+ *
+ * Simple helper macro for dealing with variable sized structures passed
+ * from user space. This allows us to easily determine if the provided
+ * structure is sized to include various fields.
+ */
+#define offsetofend(TYPE, MEMBER) ({ \
+ TYPE tmp; \
+ offsetof(TYPE, MEMBER) + sizeof(tmp.MEMBER); }) \
+
+#endif /* __KERNEL__ */
+
+/* Kernel & User level defines for VFIO IOCTLs. */
+
+/*
+ * The IOCTL interface is designed for extensibility by embedding the
+ * structure length (argsz) and flags into structures passed between
+ * kernel and userspace. We therefore use the _IO() macro for these
+ * defines to avoid implicitly embedding a size into the ioctl request.
+ * As structure fields are added, argsz will increase to match and flag
+ * bits will be defined to indicate additional fields with valid data.
+ * It's *always* the caller's responsibility to indicate the size of
+ * the structure passed by setting argsz appropriately.
+ */
+
+#define VFIO_TYPE (';')
+#define VFIO_BASE 100
+
+/* --------------- IOCTLs for GROUP file descriptors --------------- */
+
+/**
+ * VFIO_GROUP_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 0, struct vfio_group_info)
+ *
+ * Retrieve information about the group. Fills in provided
+ * struct vfio_group_info. Caller sets argsz.
+ */
+struct vfio_group_info {
+ __u32 argsz;
+ __u32 flags;
+#define VFIO_GROUP_FLAGS_VIABLE (1 << 0)
+#define VFIO_GROUP_FLAGS_MM_LOCKED (1 << 1)
+};
+
+#define VFIO_GROUP_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 0)
+
+/**
+ * VFIO_GROUP_MERGE - _IOW(VFIO_TYPE, VFIO_BASE + 1, __s32)
+ *
+ * Merge group indicated by passed file descriptor into current group.
+ * Current group may be in use, group indicated by file descriptor
+ * cannot be in use (no open iommu or devices).
+ */
+#define VFIO_GROUP_MERGE _IOW(VFIO_TYPE, VFIO_BASE + 1, __s32)
+
+/**
+ * VFIO_GROUP_UNMERGE - _IO(VFIO_TYPE, VFIO_BASE + 2)
+ *
+ * Remove the current group from a merged set. The current group cannot
+ * have any open devices.
+ */
+#define VFIO_GROUP_UNMERGE _IO(VFIO_TYPE, VFIO_BASE + 2)
+
+/**
+ * VFIO_GROUP_GET_IOMMU_FD - _IO(VFIO_TYPE, VFIO_BASE + 3)
+ *
+ * Return a new file descriptor for the IOMMU object. The IOMMU object
+ * is shared among members of a merged group.
+ */
+#define VFIO_GROUP_GET_IOMMU_FD _IO(VFIO_TYPE, VFIO_BASE + 3)
+
+/**
+ * VFIO_GROUP_GET_DEVICE_FD - _IOW(VFIO_TYPE, VFIO_BASE + 4, char)
+ *
+ * Return a new file descriptor for the device object described by
+ * the provided char array.
+ */
+#define VFIO_GROUP_GET_DEVICE_FD _IOW(VFIO_TYPE, VFIO_BASE + 4, char)
+
+
+/* --------------- IOCTLs for IOMMU file descriptors --------------- */
+
+/**
+ * VFIO_IOMMU_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 5, struct vfio_iommu_info)
+ *
+ * Retrieve information about the IOMMU object. Fills in provided
+ * struct vfio_iommu_info. Caller sets argsz.
+ */
+struct vfio_iommu_info {
+ __u32 argsz;
+ __u32 flags;
+ __u64 iova_start; /* IOVA base address */
+ __u64 iova_size; /* IOVA window size */
+ __u64 iova_entries; /* Number mapping entries available */
+ __u64 iova_pgsizes; /* Bitmap of supported page sizes */
+};
+
+#define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 5)
+
+/**
+ * VFIO_IOMMU_MAP_DMA - _IOW(VFIO_TYPE, VFIO_BASE + 6, struct vfio_dma_map)
+ *
+ * Map process virtual addresses to IO virtual addresses using the
+ * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
+ */
+struct vfio_dma_map {
+ __u32 argsz;
+ __u32 flags;
+#define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
+#define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
+ __u64 vaddr; /* Process virtual address */
+ __u64 iova; /* IO virtual address */
+ __u64 size; /* Size of mapping (bytes) */
+};
+
+#define VFIO_IOMMU_MAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 6)
+
+/**
+ * VFIO_IOMMU_UNMAP_DMA - _IOW(VFIO_TYPE, VFIO_BASE + 7, struct vfio_dma_unmap)
+ *
+ * Unmap IO virtual addresses using the provided struct vfio_dma_unmap.
+ * Caller sets argsz.
+ */
+struct vfio_dma_unmap {
+ __u32 argsz;
+ __u32 flags;
+ __u64 iova; /* IO virtual address */
+ __u64 size; /* Size of mapping (bytes) */
+};
+
+#define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 7)
+
+
+/* --------------- IOCTLs for DEVICE file descriptors --------------- */
+
+/**
+ * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 8,
+ * struct vfio_device_info)
+ *
+ * Retrieve information about the device. Fills in provided
+ * struct vfio_device_info. Caller sets argsz.
+ */
+struct vfio_device_info {
+ __u32 argsz;
+ __u32 flags;
+#define VFIO_DEVICE_FLAGS_RESET (1 << 0) /* Device supports reset */
+ __u32 num_regions; /* Max region index + 1 */
+ __u32 num_irqs; /* Max IRQ index + 1 */
+};
+
+#define VFIO_DEVICE_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 8)
+
+/**
+ * VFIO_DEVICE_GET_REGION_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 9,
+ * struct vfio_region_info)
+ *
+ * Retrieve information about a device region. Caller provides
+ * struct vfio_region_info with index value set. Caller sets argsz.
+ * Implementation of region mapping is bus driver specific. This is
+ * intended to describe MMIO, I/O port, as well as bus specific
+ * regions (ex. PCI config space). Zero sized regions may be used
+ * to describe unimplemented regions (ex. unimplemented PCI BARs).
+ */
+struct vfio_region_info {
+ __u32 argsz;
+ __u32 flags;
+#define VFIO_REGION_INFO_FLAG_MMAP (1 << 0) /* Region supports mmap */
+#define VFIO_REGION_INFO_FLAG_RO (1 << 1) /* Region is read-only */
+ __u32 index; /* Region index */
+ __u32 resv; /* Reserved for alignment */
+ __u64 size; /* Region size (bytes) */
+ __u64 offset; /* Region offset from start of device fd */
+};
+
+#define VFIO_DEVICE_GET_REGION_INFO _IO(VFIO_TYPE, VFIO_BASE + 9)
+
+/**
+ * VFIO_DEVICE_GET_IRQ_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 10,
+ * struct vfio_irq_info)
+ *
+ * Retrieve information about a device IRQ. Caller provides
+ * struct vfio_irq_info with index value set. Caller sets argsz.
+ * Implementation of IRQ mapping is bus driver specific. Indexes
+ * using multiple IRQs are primarily intended to support MSI-like
+ * interrupt blocks. Zero count irq blocks may be used to describe
+ * unimplemented interrupt types.
+ *
+ * The EVENTFD flag indicates the interrupt index supports eventfd based
+ * signaling.
+ *
+ * The MASKABLE flags indicates the index supports MASK and UNMASK
+ * actions described below.
+ *
+ * AUTOMASKED indicates that after signaling, the interrupt line is
+ * automatically masked by VFIO and the user needs to unmask the line
+ * to receive new interrupts. This is primarily intended to distinguish
+ * level triggered interrupts.
+ *
+ * The NORESIZE flag indicates that the interrupt lines within the index
+ * are setup as a set and new subindexes cannot be enabled without first
+ * disabling the entire index. This is used for interrupts like PCI MSI
+ * and MSI-X where the driver may only use a subset of the available
+ * indexes, but VFIO needs to enable a specific number of vectors
+ * upfront. In the case of MSI-X, where the user can enable MSI-X and
+ * then add and unmask vectors, it's up to userspace to make the decision
+ * whether to allocate the maximum supported number of vectors or tear
+ * down setup and incrementally increase the vectors as each is enabled.
+ */
+struct vfio_irq_info {
+ __u32 argsz;
+ __u32 flags;
+#define VFIO_IRQ_INFO_EVENTFD (1 << 0)
+#define VFIO_IRQ_INFO_MASKABLE (1 << 1)
+#define VFIO_IRQ_INFO_AUTOMASKED (1 << 2)
+#define VFIO_IRQ_INFO_NORESIZE (1 << 3)
+ __u32 index; /* IRQ index */
+ __s32 count; /* Number of IRQs within this index */
+};
+
+#define VFIO_DEVICE_GET_IRQ_INFO _IO(VFIO_TYPE, VFIO_BASE + 10)
+
+/**
+ * VFIO_DEVICE_SET_IRQS - _IOW(VFIO_TYPE, VFIO_BASE + 11, struct vfio_irq_set)
+ *
+ * Set signaling, masking, and unmasking of interrupts. Caller provides
+ * struct vfio_irq_set with all fields set. 'start' and 'count' indicate
+ * the range of subindexes being specified.
+ *
+ * The DATA flags specify the type of data provided. If DATA_NONE, the
+ * operation performs the specified action immediately on the specified
+ * interrupt(s). For example, to unmask AUTOMASKED interrupt [0,0]:
+ * flags = (DATA_NONE|ACTION_UNMASK), index = 0, start = 0, count = 1.
+ *
+ * DATA_BOOL allows sparse support for the same on arrays of interrupts.
+ * For example, to mask interrupts [0,1] and [0,3] (but not [0,2]):
+ * flags = (DATA_BOOL|ACTION_MASK), index = 0, start = 1, count = 3,
+ * data = {1,0,1}
+ *
+ * DATA_EVENTFD binds the specified ACTION to the provided __s32 eventfd.
+ * A value of -1 can be used to either de-assign interrupts if already
+ * assigned or skip un-assigned interrupts. For example, to set an eventfd
+ * to be trigger for interrupts [0,0] and [0,2]:
+ * flags = (DATA_EVENTFD|ACTION_TRIGGER), index = 0, start = 0, count = 3,
+ * data = {fd1, -1, fd2}
+ * If index [0,1] is previously set, two count = 1 ioctls calls would be
+ * required to set [0,0] and [0,2] without changing [0,1].
+ *
+ * Once a signaling mechanism is set, DATA_BOOL or DATA_NONE can be used
+ * with ACTION_TRIGGER to perform kernel level interrupt loopback testing
+ * from userspace (ie. simulate hardware triggering).
+ *
+ * Setting of an event triggering mechanism to userspace for ACTION_TRIGGER
+ * enables the interrupt index for the device. Individual subindex interrupts
+ * can be disabled using the -1 value for DATA_EVENTFD or the index can be
+ * disabled as a whole with: flags = (DATA_NONE|ACTION_TRIGGER), count = 0.
+ *
+ * Note that ACTION_[UN]MASK specify user->kernel signaling (irqfds) while
+ * ACTION_TRIGGER specifies kernel->user signaling.
+ */
+struct vfio_irq_set {
+ __u32 argsz;
+ __u32 flags;
+#define VFIO_IRQ_SET_DATA_NONE (1 << 0) /* Data not present */
+#define VFIO_IRQ_SET_DATA_BOOL (1 << 1) /* Data is bool (u8) */
+#define VFIO_IRQ_SET_DATA_EVENTFD (1 << 2) /* Data is eventfd (s32) */
+#define VFIO_IRQ_SET_ACTION_MASK (1 << 3) /* Mask interrupt */
+#define VFIO_IRQ_SET_ACTION_UNMASK (1 << 4) /* Unmask interrupt */
+#define VFIO_IRQ_SET_ACTION_TRIGGER (1 << 5) /* Trigger interrupt */
+ __u32 index;
+ __s32 start;
+ __s32 count;
+ __u8 data[];
+};
+
+#define VFIO_DEVICE_SET_IRQS _IO(VFIO_TYPE, VFIO_BASE + 11)
+
+#define VFIO_IRQ_SET_DATA_TYPE_MASK (VFIO_IRQ_SET_DATA_NONE | \
+ VFIO_IRQ_SET_DATA_BOOL | \
+ VFIO_IRQ_SET_DATA_EVENTFD)
+#define VFIO_IRQ_SET_ACTION_TYPE_MASK (VFIO_IRQ_SET_ACTION_MASK | \
+ VFIO_IRQ_SET_ACTION_UNMASK | \
+ VFIO_IRQ_SET_ACTION_TRIGGER)
+/**
+ * VFIO_DEVICE_RESET - _IO(VFIO_TYPE, VFIO_BASE + 12)
+ *
+ * Reset a device.
+ */
+#define VFIO_DEVICE_RESET _IO(VFIO_TYPE, VFIO_BASE + 12)
+
+#endif /* VFIO_H */
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v2 3/5] vfio: VFIO core group interface
2012-01-23 17:20 [Qemu-devel] [PATCH v2 0/5] VFIO core framework Alex Williamson
2012-01-23 17:20 ` [Qemu-devel] [PATCH v2 1/5] vfio: Introduce documentation for VFIO driver Alex Williamson
2012-01-23 17:20 ` [Qemu-devel] [PATCH v2 2/5] vfio: VFIO core header Alex Williamson
@ 2012-01-23 17:21 ` Alex Williamson
2012-01-23 17:21 ` [Qemu-devel] [PATCH v2 4/5] vfio: VFIO core IOMMU mapping support Alex Williamson
2012-01-23 17:21 ` [Qemu-devel] [PATCH v2 5/5] vfio: VFIO core Kconfig and Makefile Alex Williamson
4 siblings, 0 replies; 6+ messages in thread
From: Alex Williamson @ 2012-01-23 17:21 UTC (permalink / raw)
To: chrisw, aik, david, joerg.roedel, agraf, benve, aafabbri, B08248,
B07421, avi, konrad.wilk, kvm, qemu-devel, iommu, linux-pci,
linux-kernel
This provides the base group management with conduits to the
IOMMU driver and VFIO bus drivers.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
drivers/vfio/vfio_main.c | 1248 +++++++++++++++++++++++++++++++++++++++++++
drivers/vfio/vfio_private.h | 36 +
2 files changed, 1284 insertions(+), 0 deletions(-)
create mode 100644 drivers/vfio/vfio_main.c
create mode 100644 drivers/vfio/vfio_private.h
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
new file mode 100644
index 0000000..fcd6476
--- /dev/null
+++ b/drivers/vfio/vfio_main.c
@@ -0,0 +1,1248 @@
+/*
+ * VFIO framework
+ *
+ * Copyright (C) 2012 Red Hat, Inc. All rights reserved.
+ * Author: Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc. All rights reserved.
+ * Author: Tom Lyon, pugs@cisco.com
+ */
+
+#include <linux/cdev.h>
+#include <linux/compat.h>
+#include <linux/device.h>
+#include <linux/file.h>
+#include <linux/anon_inodes.h>
+#include <linux/fs.h>
+#include <linux/idr.h>
+#include <linux/iommu.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/wait.h>
+
+#include "vfio_private.h"
+
+#define DRIVER_VERSION "0.2"
+#define DRIVER_AUTHOR "Alex Williamson <alex.williamson@redhat.com>"
+#define DRIVER_DESC "VFIO - User Level meta-driver"
+
+static struct vfio {
+ dev_t devt;
+ struct cdev cdev;
+ struct list_head group_list;
+ struct mutex lock;
+ struct kref kref;
+ struct class *class;
+ struct idr idr;
+ wait_queue_head_t release_q;
+} vfio;
+
+static const struct file_operations vfio_group_fops;
+
+struct vfio_group {
+ dev_t devt;
+ unsigned int groupid;
+ struct bus_type *bus;
+ struct vfio_iommu *iommu;
+ struct list_head device_list;
+ struct list_head iommu_next;
+ struct list_head group_next;
+ struct device *dev;
+ struct kobject *devices_kobj;
+ int refcnt;
+ bool tainted;
+};
+
+struct vfio_device {
+ struct device *dev;
+ const struct vfio_device_ops *ops;
+ struct vfio_group *group;
+ struct list_head device_next;
+ bool attached;
+ bool deleteme;
+ int refcnt;
+ void *device_data;
+};
+
+/*
+ * Helper functions called under vfio.lock
+ */
+
+/* Return true if any devices within a group are opened */
+static bool __vfio_group_devs_inuse(struct vfio_group *group)
+{
+ struct list_head *pos;
+
+ list_for_each(pos, &group->device_list) {
+ struct vfio_device *device;
+
+ device = list_entry(pos, struct vfio_device, device_next);
+ if (device->refcnt)
+ return true;
+ }
+ return false;
+}
+
+/*
+ * Return true if any of the groups attached to an iommu are opened.
+ * We can only tear apart merged groups when nothing is left open.
+ */
+static bool __vfio_iommu_groups_inuse(struct vfio_iommu *iommu)
+{
+ struct list_head *pos;
+
+ list_for_each(pos, &iommu->group_list) {
+ struct vfio_group *group;
+
+ group = list_entry(pos, struct vfio_group, iommu_next);
+ if (group->refcnt)
+ return true;
+ }
+ return false;
+}
+
+/*
+ * An iommu is "in use" if it has a file descriptor open or if any of
+ * the groups assigned to the iommu have devices open.
+ */
+static bool __vfio_iommu_inuse(struct vfio_iommu *iommu)
+{
+ struct list_head *pos;
+
+ if (iommu->refcnt)
+ return true;
+
+ list_for_each(pos, &iommu->group_list) {
+ struct vfio_group *group;
+
+ group = list_entry(pos, struct vfio_group, iommu_next);
+
+ if (__vfio_group_devs_inuse(group))
+ return true;
+ }
+ return false;
+}
+
+static void __vfio_group_set_iommu(struct vfio_group *group,
+ struct vfio_iommu *iommu)
+{
+ if (group->iommu)
+ list_del(&group->iommu_next);
+ if (iommu)
+ list_add(&group->iommu_next, &iommu->group_list);
+
+ group->iommu = iommu;
+}
+
+static void __vfio_iommu_detach_dev(struct vfio_iommu *iommu,
+ struct vfio_device *device)
+{
+ if (WARN_ON(!iommu->domain && device->attached))
+ return;
+
+ if (!device->attached)
+ return;
+
+ iommu_detach_device(iommu->domain, device->dev);
+ device->attached = false;
+}
+
+static void __vfio_iommu_detach_group(struct vfio_iommu *iommu,
+ struct vfio_group *group)
+{
+ struct list_head *pos;
+
+ list_for_each(pos, &group->device_list) {
+ struct vfio_device *device;
+
+ device = list_entry(pos, struct vfio_device, device_next);
+ __vfio_iommu_detach_dev(iommu, device);
+ }
+}
+
+static int __vfio_iommu_attach_dev(struct vfio_iommu *iommu,
+ struct vfio_device *device)
+{
+ int ret;
+
+ if (WARN_ON(device->attached || !iommu || !iommu->domain))
+ return -EINVAL;
+
+ ret = iommu_attach_device(iommu->domain, device->dev);
+ if (!ret)
+ device->attached = true;
+
+ return ret;
+}
+
+static int __vfio_iommu_attach_group(struct vfio_iommu *iommu,
+ struct vfio_group *group)
+{
+ struct list_head *pos;
+
+ list_for_each(pos, &group->device_list) {
+ struct vfio_device *device;
+ int ret;
+
+ device = list_entry(pos, struct vfio_device, device_next);
+ ret = __vfio_iommu_attach_dev(iommu, device);
+ if (ret) {
+ __vfio_iommu_detach_group(iommu, group);
+ return ret;
+ }
+ }
+ return 0;
+}
+
+/*
+ * The iommu is viable, ie. ready to be configured, when all the devices
+ * for all the groups attached to the iommu are bound to their vfio device
+ * drivers (ex. vfio-pci). This sets the device_data private data pointer.
+ */
+static bool __vfio_iommu_viable(struct vfio_iommu *iommu)
+{
+ struct list_head *gpos, *dpos;
+
+ list_for_each(gpos, &iommu->group_list) {
+ struct vfio_group *group;
+ group = list_entry(gpos, struct vfio_group, iommu_next);
+
+ if (group->tainted)
+ return false;
+
+ list_for_each(dpos, &group->device_list) {
+ struct vfio_device *device;
+ device = list_entry(dpos,
+ struct vfio_device, device_next);
+
+ if (!device->device_data)
+ return false;
+ }
+ }
+ return true;
+}
+
+static void __vfio_iommu_close(struct vfio_iommu *iommu)
+{
+ struct list_head *pos;
+
+ if (!iommu->domain)
+ return;
+
+ list_for_each(pos, &iommu->group_list) {
+ struct vfio_group *group;
+ group = list_entry(pos, struct vfio_group, iommu_next);
+
+ __vfio_iommu_detach_group(iommu, group);
+ }
+
+ vfio_iommu_unmapall(iommu);
+
+ iommu_domain_free(iommu->domain);
+ iommu->domain = NULL;
+ iommu->mm = NULL;
+}
+
+/*
+ * Open the IOMMU. This gates all access to the iommu or device file
+ * descriptors and sets current->mm as the exclusive user.
+ */
+static int __vfio_iommu_open(struct vfio_iommu *iommu)
+{
+ struct list_head *pos;
+ int ret;
+
+ if (!__vfio_iommu_viable(iommu))
+ return -EBUSY;
+
+ if (iommu->domain)
+ return -EINVAL;
+
+ iommu->domain = iommu_domain_alloc(iommu->bus);
+ if (!iommu->domain)
+ return -ENOMEM;
+
+ list_for_each(pos, &iommu->group_list) {
+ struct vfio_group *group;
+ group = list_entry(pos, struct vfio_group, iommu_next);
+
+ ret = __vfio_iommu_attach_group(iommu, group);
+ if (ret) {
+ __vfio_iommu_close(iommu);
+ return ret;
+ }
+ }
+
+ iommu->cache = (iommu_domain_has_cap(iommu->domain,
+ IOMMU_CAP_CACHE_COHERENCY) != 0);
+ iommu->mm = current->mm;
+
+ return 0;
+}
+
+/*
+ * Actively try to tear down the iommu and merged groups. If there are no
+ * open iommu or device fds, we close the iommu. If we close the iommu and
+ * there are also no open group fds, we can further dissolve the group to
+ * iommu association and free the iommu data structure.
+ */
+static int __vfio_try_dissolve_iommu(struct vfio_iommu *iommu)
+{
+
+ if (__vfio_iommu_inuse(iommu))
+ return -EBUSY;
+
+ __vfio_iommu_close(iommu);
+
+ if (!__vfio_iommu_groups_inuse(iommu)) {
+ struct list_head *pos, *ppos;
+
+ list_for_each_safe(pos, ppos, &iommu->group_list) {
+ struct vfio_group *group;
+
+ group = list_entry(pos, struct vfio_group, iommu_next);
+ __vfio_group_set_iommu(group, NULL);
+ }
+
+ kfree(iommu);
+ }
+
+ return 0;
+}
+
+static struct vfio_device *__vfio_lookup_dev(struct device *dev)
+{
+ struct list_head *gpos;
+ unsigned int groupid;
+
+ if (iommu_device_group(dev, &groupid))
+ return NULL;
+
+ list_for_each(gpos, &vfio.group_list) {
+ struct vfio_group *group;
+ struct list_head *dpos;
+
+ group = list_entry(gpos, struct vfio_group, group_next);
+
+ if (group->groupid != groupid || group->bus != dev->bus)
+ continue;
+
+ list_for_each(dpos, &group->device_list) {
+ struct vfio_device *device;
+
+ device = list_entry(dpos,
+ struct vfio_device, device_next);
+
+ if (device->dev == dev)
+ return device;
+ }
+ }
+ return NULL;
+}
+
+static struct vfio_group *__vfio_dev_to_group(struct device *dev,
+ unsigned int groupid)
+{
+ struct list_head *pos;
+ struct vfio_group *group;
+
+ list_for_each(pos, &vfio.group_list) {
+ group = list_entry(pos, struct vfio_group, group_next);
+ if (group->groupid == groupid && group->bus == dev->bus)
+ return group;
+ }
+
+ return NULL;
+}
+
+struct vfio_device *__vfio_group_find_device(struct vfio_group *group,
+ struct device *dev)
+{
+ struct list_head *pos;
+ struct vfio_device *device;
+
+ list_for_each(pos, &group->device_list) {
+ device = list_entry(pos, struct vfio_device, device_next);
+ if (device->dev == dev)
+ return device;
+ }
+
+ return NULL;
+}
+
+static struct vfio_group *__vfio_create_group(struct device *dev,
+ unsigned int groupid)
+{
+ struct vfio_group *group;
+ int ret, minor;
+
+ group = kzalloc(sizeof(*group), GFP_KERNEL);
+
+ /*
+ * We can't recover from this. If we can't even get memory for
+ * the group, we can't track the device and we don't have a place
+ * to mark the groupid tainted. Failures below should at least
+ * return a tainted group.
+ */
+ BUG_ON(!group);
+
+ group->groupid = groupid;
+ group->bus = dev->bus;
+ INIT_LIST_HEAD(&group->device_list);
+
+ group->tainted = true;
+ list_add(&group->group_next, &vfio.group_list);
+
+again:
+ if (unlikely(idr_pre_get(&vfio.idr, GFP_KERNEL) == 0))
+ goto out;
+
+ ret = idr_get_new(&vfio.idr, group, &minor);
+ if (ret == -EAGAIN)
+ goto again;
+ if (ret || minor > MINORMASK) {
+ if (minor > MINORMASK)
+ idr_remove(&vfio.idr, minor);
+ goto out;
+ }
+
+ group->devt = MKDEV(MAJOR(vfio.devt), minor);
+ group->dev = device_create(vfio.class, NULL, group->devt, group,
+ "%s:%u", dev->bus->name, groupid);
+ if (IS_ERR(group->dev))
+ goto out_device;
+
+ /* Create a place to link individual devices in sysfs */
+ group->devices_kobj = kobject_create_and_add("devices",
+ &group->dev->kobj);
+ if (!group->devices_kobj)
+ goto out_kobj;
+
+ group->tainted = false;
+
+ return group;
+
+out_kobj:
+ device_destroy(vfio.class, group->devt);
+out_device:
+ group->dev = NULL;
+ group->devt = 0;
+ idr_remove(&vfio.idr, minor);
+out:
+ printk(KERN_WARNING "vfio: Failed to complete setup on group %u, "
+ "marking as unusable\n", groupid);
+
+ return group;
+}
+
+static struct vfio_iommu *vfio_create_iommu(struct vfio_group *group)
+{
+ struct vfio_iommu *iommu;
+
+ iommu = kzalloc(sizeof(*iommu), GFP_KERNEL);
+ if (!iommu)
+ return ERR_PTR(-ENOMEM);
+
+ INIT_LIST_HEAD(&iommu->group_list);
+ INIT_LIST_HEAD(&iommu->dma_list);
+ mutex_init(&iommu->lock);
+ iommu->bus = group->bus;
+
+ return iommu;
+}
+
+/*
+ * All release paths simply decrement the refcnt, attempt to teardown
+ * the iommu and merged groups, and wakeup anything that might be
+ * waiting if we successfully dissolve anything.
+ */
+static int vfio_do_release(int *refcnt, struct vfio_iommu *iommu)
+{
+ bool wake;
+
+ mutex_lock(&vfio.lock);
+
+ (*refcnt)--;
+ wake = (__vfio_try_dissolve_iommu(iommu) == 0);
+
+ mutex_unlock(&vfio.lock);
+
+ if (wake)
+ wake_up(&vfio.release_q);
+
+ return 0;
+}
+
+/*
+ * Device fops - passthrough to vfio device driver w/ device_data
+ */
+static int vfio_device_release(struct inode *inode, struct file *filep)
+{
+ struct vfio_device *device = filep->private_data;
+
+ vfio_do_release(&device->refcnt, device->group->iommu);
+
+ device->ops->release(device->device_data);
+
+ return 0;
+}
+
+static long vfio_device_unl_ioctl(struct file *filep,
+ unsigned int cmd, unsigned long arg)
+{
+ struct vfio_device *device = filep->private_data;
+
+ return device->ops->ioctl(device->device_data, cmd, arg);
+}
+
+static ssize_t vfio_device_read(struct file *filep, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct vfio_device *device = filep->private_data;
+
+ return device->ops->read(device->device_data, buf, count, ppos);
+}
+
+static ssize_t vfio_device_write(struct file *filep, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct vfio_device *device = filep->private_data;
+
+ return device->ops->write(device->device_data, buf, count, ppos);
+}
+
+static int vfio_device_mmap(struct file *filep, struct vm_area_struct *vma)
+{
+ struct vfio_device *device = filep->private_data;
+
+ return device->ops->mmap(device->device_data, vma);
+}
+
+#ifdef CONFIG_COMPAT
+static long vfio_device_compat_ioctl(struct file *filep,
+ unsigned int cmd, unsigned long arg)
+{
+ arg = (unsigned long)compat_ptr(arg);
+ return vfio_device_unl_ioctl(filep, cmd, arg);
+}
+#endif /* CONFIG_COMPAT */
+
+const struct file_operations vfio_device_fops = {
+ .owner = THIS_MODULE,
+ .release = vfio_device_release,
+ .read = vfio_device_read,
+ .write = vfio_device_write,
+ .unlocked_ioctl = vfio_device_unl_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = vfio_device_compat_ioctl,
+#endif
+ .mmap = vfio_device_mmap,
+};
+
+/*
+ * Group fops
+ */
+static int vfio_group_open(struct inode *inode, struct file *filep)
+{
+ struct vfio_group *group;
+ int ret = 0;
+
+ mutex_lock(&vfio.lock);
+
+ group = idr_find(&vfio.idr, iminor(inode));
+
+ if (!group) {
+ ret = -ENODEV;
+ goto out;
+ }
+
+ filep->private_data = group;
+
+ if (!group->iommu) {
+ struct vfio_iommu *iommu;
+
+ iommu = vfio_create_iommu(group);
+ if (IS_ERR(iommu)) {
+ ret = PTR_ERR(iommu);
+ goto out;
+ }
+ __vfio_group_set_iommu(group, iommu);
+ }
+ group->refcnt++;
+
+out:
+ mutex_unlock(&vfio.lock);
+
+ return ret;
+}
+
+static int vfio_group_release(struct inode *inode, struct file *filep)
+{
+ struct vfio_group *group = filep->private_data;
+
+ return vfio_do_release(&group->refcnt, group->iommu);
+}
+
+/*
+ * Attempt to merge the group pointed to by fd into group. The merge-ee
+ * group must not have an iommu or any devices open because we cannot
+ * maintain that context across the merge. The merge-er group can be
+ * in use.
+ */
+static int vfio_group_merge(struct vfio_group *group, int fd)
+{
+ struct vfio_group *new;
+ struct vfio_iommu *old_iommu;
+ struct file *file;
+ int ret = 0;
+ bool opened = false;
+
+ mutex_lock(&vfio.lock);
+
+ file = fget(fd);
+ if (!file) {
+ ret = -EBADF;
+ goto out_noput;
+ }
+
+ /* Sanity check, is this really our fd? */
+ if (file->f_op != &vfio_group_fops) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ new = file->private_data;
+
+ if (!new || new == group || !new->iommu ||
+ new->iommu->domain || new->bus != group->bus) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /*
+ * We need to attach all the devices to each domain separately
+ * in order to validate that the capabilities match for both.
+ */
+ ret = __vfio_iommu_open(new->iommu);
+ if (ret)
+ goto out;
+
+ if (!group->iommu->domain) {
+ ret = __vfio_iommu_open(group->iommu);
+ if (ret)
+ goto out;
+ opened = true;
+ }
+
+ /*
+ * If cache coherency doesn't match we'd potentially need to
+ * remap existing iommu mappings in the merge-er domain.
+ * Poor return to bother trying to allow this currently.
+ */
+ if (iommu_domain_has_cap(group->iommu->domain,
+ IOMMU_CAP_CACHE_COHERENCY) !=
+ iommu_domain_has_cap(new->iommu->domain,
+ IOMMU_CAP_CACHE_COHERENCY)) {
+ __vfio_iommu_close(new->iommu);
+ if (opened)
+ __vfio_iommu_close(group->iommu);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /*
+ * Close the iommu for the merge-ee and attach all its devices
+ * to the merge-er iommu.
+ */
+ __vfio_iommu_close(new->iommu);
+
+ ret = __vfio_iommu_attach_group(group->iommu, new);
+ if (ret)
+ goto out;
+
+ /* set_iommu unlinks new from the iommu, so save a pointer to it */
+ old_iommu = new->iommu;
+ __vfio_group_set_iommu(new, group->iommu);
+ kfree(old_iommu);
+
+out:
+ fput(file);
+out_noput:
+ mutex_unlock(&vfio.lock);
+ return ret;
+}
+
+/* Unmerge a group */
+static int vfio_group_unmerge(struct vfio_group *group)
+{
+ struct vfio_iommu *iommu;
+ int ret = 0;
+
+ /*
+ * Since the merge-out group is already opened, it needs to
+ * have an iommu struct associated with it.
+ */
+ iommu = vfio_create_iommu(group);
+ if (IS_ERR(iommu))
+ return PTR_ERR(iommu);
+
+ mutex_lock(&vfio.lock);
+
+ if (list_is_singular(&group->iommu->group_list)) {
+ ret = -EINVAL; /* Not merged group */
+ goto out;
+ }
+
+ /* We can't merge-out a group with devices still in use. */
+ if (__vfio_group_devs_inuse(group)) {
+ ret = -EBUSY;
+ goto out;
+ }
+
+ __vfio_iommu_detach_group(group->iommu, group);
+ __vfio_group_set_iommu(group, iommu);
+
+out:
+ if (ret)
+ kfree(iommu);
+ mutex_unlock(&vfio.lock);
+ return ret;
+}
+
+/*
+ * Get a new iommu file descriptor. This will open the iommu, setting
+ * the current->mm ownership if it's not already set.
+ */
+static int vfio_group_get_iommu_fd(struct vfio_group *group)
+{
+ int ret = 0;
+
+ mutex_lock(&vfio.lock);
+
+ if (!group->iommu->domain) {
+ ret = __vfio_iommu_open(group->iommu);
+ if (ret)
+ goto out;
+ }
+
+ ret = anon_inode_getfd("[vfio-iommu]", &vfio_iommu_fops,
+ group->iommu, O_RDWR);
+ if (ret < 0)
+ goto out;
+
+ group->iommu->refcnt++;
+out:
+ mutex_unlock(&vfio.lock);
+ return ret;
+}
+
+/*
+ * Get a new device file descriptor. This will open the iommu, setting
+ * the current->mm ownership if it's not already set. It's difficult to
+ * specify the requirements for matching a user supplied buffer to a
+ * device, so we use a vfio driver callback to test for a match. For
+ * PCI, dev_name(dev) is unique, but other drivers may require including
+ * a parent device string.
+ */
+static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
+{
+ struct vfio_iommu *iommu = group->iommu;
+ struct list_head *gpos;
+ int ret = -ENODEV;
+
+ mutex_lock(&vfio.lock);
+
+ if (!iommu->domain) {
+ ret = __vfio_iommu_open(iommu);
+ if (ret)
+ goto out;
+ }
+
+ list_for_each(gpos, &iommu->group_list) {
+ struct list_head *dpos;
+
+ group = list_entry(gpos, struct vfio_group, iommu_next);
+
+ list_for_each(dpos, &group->device_list) {
+ struct vfio_device *device;
+ struct file *file;
+
+ device = list_entry(dpos,
+ struct vfio_device, device_next);
+
+ if (!device->ops->match(device->dev, buf))
+ continue;
+
+ ret = device->ops->open(device->device_data);
+ if (ret)
+ goto out;
+
+ /*
+ * We can't use anon_inode_getfd(), like above
+ * because we need to modify the f_mode flags
+ * directly to allow more than just ioctls
+ */
+ ret = get_unused_fd();
+ if (ret < 0) {
+ device->ops->release(device->device_data);
+ goto out;
+ }
+
+ file = anon_inode_getfile("[vfio-device]",
+ &vfio_device_fops,
+ device, O_RDWR);
+ if (IS_ERR(file)) {
+ put_unused_fd(ret);
+ ret = PTR_ERR(file);
+ device->ops->release(device->device_data);
+ goto out;
+ }
+
+ /*
+ * TODO: add an anon_inode interface to do this.
+ * Appears to be missing by lack of need rather than
+ * explicitly prevented. Now there's need.
+ */
+ file->f_mode |= (FMODE_LSEEK |
+ FMODE_PREAD |
+ FMODE_PWRITE);
+
+ fd_install(ret, file);
+
+ device->refcnt++;
+ goto out;
+ }
+ }
+out:
+ mutex_unlock(&vfio.lock);
+ return ret;
+}
+
+static long vfio_group_unl_ioctl(struct file *filep,
+ unsigned int cmd, unsigned long arg)
+{
+ struct vfio_group *group = filep->private_data;
+
+ if (cmd == VFIO_GROUP_GET_INFO) {
+ struct vfio_group_info info;
+ unsigned long minsz;
+
+ minsz = offsetofend(struct vfio_group_info, flags);
+
+ if (copy_from_user(&info, (void __user *)arg, minsz))
+ return -EFAULT;
+
+ if (info.argsz < minsz)
+ return -EINVAL;
+
+ mutex_lock(&vfio.lock);
+ if (__vfio_iommu_viable(group->iommu))
+ info.flags |= VFIO_GROUP_FLAGS_VIABLE;
+ mutex_unlock(&vfio.lock);
+
+ if (group->iommu->mm)
+ info.flags |= VFIO_GROUP_FLAGS_MM_LOCKED;
+
+ return copy_to_user((void __user *)arg, &info, minsz);
+ }
+
+ /* Below commands are restricted once the mm is set */
+ if (group->iommu->mm && group->iommu->mm != current->mm)
+ return -EPERM;
+
+ if (cmd == VFIO_GROUP_MERGE) {
+ int fd;
+
+ if (get_user(fd, (int __user *)arg))
+ return -EFAULT;
+ if (fd < 0)
+ return -EINVAL;
+
+ return vfio_group_merge(group, fd);
+
+ } else if (cmd == VFIO_GROUP_UNMERGE) {
+
+ return vfio_group_unmerge(group);
+
+ } else if (cmd == VFIO_GROUP_GET_IOMMU_FD) {
+
+ return vfio_group_get_iommu_fd(group);
+
+ } else if (cmd == VFIO_GROUP_GET_DEVICE_FD) {
+ char *buf;
+ int ret;
+
+ buf = strndup_user((const char __user *)arg, PAGE_SIZE);
+ if (IS_ERR(buf))
+ return PTR_ERR(buf);
+
+ ret = vfio_group_get_device_fd(group, buf);
+ kfree(buf);
+ return ret;
+ }
+
+ return -ENOTTY;
+}
+
+#ifdef CONFIG_COMPAT
+static long vfio_group_compat_ioctl(struct file *filep,
+ unsigned int cmd, unsigned long arg)
+{
+ arg = (unsigned long)compat_ptr(arg);
+ return vfio_group_unl_ioctl(filep, cmd, arg);
+}
+#endif /* CONFIG_COMPAT */
+
+static const struct file_operations vfio_group_fops = {
+ .owner = THIS_MODULE,
+ .open = vfio_group_open,
+ .release = vfio_group_release,
+ .unlocked_ioctl = vfio_group_unl_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = vfio_group_compat_ioctl,
+#endif
+};
+
+/* iommu fd release hook */
+int vfio_release_iommu(struct vfio_iommu *iommu)
+{
+ return vfio_do_release(&iommu->refcnt, iommu);
+}
+
+/*
+ * VFIO driver API
+ */
+
+/*
+ * Add a new device to the vfio framework with associated vfio driver
+ * callbacks. This is the entry point for vfio drivers to register devices.
+ */
+int vfio_group_add_dev(struct device *dev, const struct vfio_device_ops *ops)
+{
+ struct vfio_group *group;
+ struct vfio_device *device;
+ unsigned int groupid;
+ int ret = 0;
+
+ if (iommu_device_group(dev, &groupid))
+ return -ENODEV;
+
+ if (WARN_ON(!ops))
+ return -EINVAL;
+
+ mutex_lock(&vfio.lock);
+
+ group = __vfio_dev_to_group(dev, groupid);
+ if (!group)
+ group = __vfio_create_group(dev, groupid); /* No fail */
+
+ device = __vfio_group_find_device(group, dev);
+ if (!device) {
+ device = kzalloc(sizeof(*device), GFP_KERNEL);
+ if (WARN_ON(!device)) {
+ /*
+ * We created the group, but can't add this device,
+ * taint the group to prevent it being used. If
+ * it's already in use, we have to BUG_ON.
+ * XXX - Kill the user process?
+ */
+ group->tainted = true;
+ BUG_ON(group->iommu && group->iommu->domain);
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ list_add(&device->device_next, &group->device_list);
+ device->dev = dev;
+ device->ops = ops;
+ device->group = group;
+
+ if (!group->devices_kobj ||
+ sysfs_create_link(group->devices_kobj,
+ &dev->kobj, dev_name(dev)))
+ printk(KERN_WARNING
+ "vfio: Unable to create sysfs link to %s\n",
+ dev_name(dev));
+
+ if (group->iommu && group->iommu->domain) {
+ printk(KERN_WARNING "Adding device %s to group %s:%u "
+ "while group is already in use!!\n",
+ dev_name(dev), group->bus->name, group->groupid);
+
+ mutex_unlock(&vfio.lock);
+
+ ret = ops->claim(dev);
+
+ BUG_ON(ret);
+
+ goto out_unlocked;
+ }
+
+ }
+out:
+ mutex_unlock(&vfio.lock);
+out_unlocked:
+ return ret;
+}
+EXPORT_SYMBOL_GPL(vfio_group_add_dev);
+
+/* Remove a device from the vfio framework */
+void vfio_group_del_dev(struct device *dev)
+{
+ struct vfio_group *group;
+ struct vfio_device *device;
+ unsigned int groupid;
+
+ if (iommu_device_group(dev, &groupid))
+ return;
+
+ mutex_lock(&vfio.lock);
+
+ group = __vfio_dev_to_group(dev, groupid);
+
+ if (WARN_ON(!group))
+ goto out;
+
+ device = __vfio_group_find_device(group, dev);
+
+ if (WARN_ON(!device))
+ goto out;
+
+ /*
+ * If device is bound to a bus driver, we'll get a chance to
+ * unbind it first. Just mark it to be removed after unbind.
+ */
+ if (device->device_data) {
+ device->deleteme = true;
+ goto out;
+ }
+
+ if (device->attached)
+ __vfio_iommu_detach_dev(group->iommu, device);
+
+ list_del(&device->device_next);
+
+ if (group->devices_kobj)
+ sysfs_remove_link(group->devices_kobj, dev_name(dev));
+
+ kfree(device);
+
+ /*
+ * If this was the only device in the group, remove the group.
+ * Note that we intentionally unmerge empty groups here if the
+ * group fd isn't opened.
+ */
+ if (list_empty(&group->device_list) && group->refcnt == 0) {
+ struct vfio_iommu *iommu = group->iommu;
+
+ if (iommu) {
+ __vfio_group_set_iommu(group, NULL);
+ __vfio_try_dissolve_iommu(iommu);
+ }
+
+ /*
+ * Groups can be mostly placeholders if setup isn't
+ * completed, remove them carefully.
+ */
+ if (group->devices_kobj)
+ kobject_put(group->devices_kobj);
+ if (group->dev) {
+ device_destroy(vfio.class, group->devt);
+ idr_remove(&vfio.idr, MINOR(group->devt));
+ }
+ list_del(&group->group_next);
+ kfree(group);
+ }
+
+out:
+ mutex_unlock(&vfio.lock);
+}
+EXPORT_SYMBOL_GPL(vfio_group_del_dev);
+
+/*
+ * When a device is bound to a vfio device driver (ex. vfio-pci), this
+ * entry point is used to mark the device usable (viable). The vfio
+ * device driver associates a private device_data struct with the device
+ * here, which will later be return for vfio_device_fops callbacks.
+ */
+int vfio_bind_dev(struct device *dev, void *device_data)
+{
+ struct vfio_device *device;
+ int ret;
+
+ if (WARN_ON(!device_data))
+ return -EINVAL;
+
+ mutex_lock(&vfio.lock);
+
+ device = __vfio_lookup_dev(dev);
+
+ if (WARN_ON(!device)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ret = dev_set_drvdata(dev, device);
+ if (!ret)
+ device->device_data = device_data;
+
+out:
+ mutex_unlock(&vfio.lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(vfio_bind_dev);
+
+/* A device is only removable if the iommu for the group is not in use. */
+static bool vfio_device_removeable(struct vfio_device *device)
+{
+ bool ret = true;
+
+ mutex_lock(&vfio.lock);
+
+ if (device->group->iommu && __vfio_iommu_inuse(device->group->iommu))
+ ret = false;
+
+ mutex_unlock(&vfio.lock);
+ return ret;
+}
+
+/*
+ * Notify vfio that a device is being unbound from the vfio device driver
+ * and return the device private device_data pointer. If the group is
+ * in use, we need to block or take other measures to make it safe for
+ * the device to be removed from the iommu.
+ */
+void *vfio_unbind_dev(struct device *dev)
+{
+ struct vfio_device *device = dev_get_drvdata(dev);
+ void *device_data;
+
+ if (WARN_ON(!device))
+ return NULL;
+again:
+ if (!vfio_device_removeable(device)) {
+ /*
+ * XXX signal for all devices in group to be removed or
+ * resort to killing the process holding the device fds.
+ * For now just block waiting for releases to wake us.
+ */
+ wait_event(vfio.release_q, vfio_device_removeable(device));
+ }
+
+ mutex_lock(&vfio.lock);
+
+ /* Need to re-check that the device is still removable under lock. */
+ if (device->group->iommu && __vfio_iommu_inuse(device->group->iommu)) {
+ mutex_unlock(&vfio.lock);
+ goto again;
+ }
+
+ device_data = device->device_data;
+
+ device->device_data = NULL;
+ dev_set_drvdata(dev, NULL);
+
+ mutex_unlock(&vfio.lock);
+
+ if (device->deleteme)
+ vfio_group_del_dev(dev);
+
+ return device_data;
+}
+EXPORT_SYMBOL_GPL(vfio_unbind_dev);
+
+/*
+ * Module/class support
+ */
+static void vfio_class_release(struct kref *kref)
+{
+ class_destroy(vfio.class);
+ vfio.class = NULL;
+}
+
+static char *vfio_devnode(struct device *dev, umode_t *mode)
+{
+ return kasprintf(GFP_KERNEL, "vfio/%s", dev_name(dev));
+}
+
+static int __init vfio_init(void)
+{
+ int ret;
+
+ idr_init(&vfio.idr);
+ mutex_init(&vfio.lock);
+ INIT_LIST_HEAD(&vfio.group_list);
+ init_waitqueue_head(&vfio.release_q);
+
+ kref_init(&vfio.kref);
+ vfio.class = class_create(THIS_MODULE, "vfio");
+ if (IS_ERR(vfio.class)) {
+ ret = PTR_ERR(vfio.class);
+ goto err_class;
+ }
+
+ vfio.class->devnode = vfio_devnode;
+
+ /* FIXME - how many minors to allocate... all of them! */
+ ret = alloc_chrdev_region(&vfio.devt, 0, MINORMASK, "vfio");
+ if (ret)
+ goto err_chrdev;
+
+ cdev_init(&vfio.cdev, &vfio_group_fops);
+ ret = cdev_add(&vfio.cdev, vfio.devt, MINORMASK);
+ if (ret)
+ goto err_cdev;
+
+ pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
+
+ return 0;
+
+err_cdev:
+ unregister_chrdev_region(vfio.devt, MINORMASK);
+err_chrdev:
+ kref_put(&vfio.kref, vfio_class_release);
+err_class:
+ return ret;
+}
+
+static void __exit vfio_cleanup(void)
+{
+ struct list_head *gpos, *gppos;
+
+ list_for_each_safe(gpos, gppos, &vfio.group_list) {
+ struct vfio_group *group;
+ struct list_head *dpos, *dppos;
+
+ group = list_entry(gpos, struct vfio_group, group_next);
+
+ list_for_each_safe(dpos, dppos, &group->device_list) {
+ struct vfio_device *device;
+
+ device = list_entry(dpos,
+ struct vfio_device, device_next);
+ vfio_group_del_dev(device->dev);
+ }
+ }
+
+ idr_destroy(&vfio.idr);
+ cdev_del(&vfio.cdev);
+ unregister_chrdev_region(vfio.devt, MINORMASK);
+ kref_put(&vfio.kref, vfio_class_release);
+}
+
+module_init(vfio_init);
+module_exit(vfio_cleanup);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/drivers/vfio/vfio_private.h b/drivers/vfio/vfio_private.h
new file mode 100644
index 0000000..1921fb9
--- /dev/null
+++ b/drivers/vfio/vfio_private.h
@@ -0,0 +1,36 @@
+/*
+ * Copyright (C) 2012 Red Hat, Inc. All rights reserved.
+ * Author: Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc. All rights reserved.
+ * Author: Tom Lyon, pugs@cisco.com
+ */
+
+#include <linux/list.h>
+#include <linux/mutex.h>
+
+#ifndef VFIO_PRIVATE_H
+#define VFIO_PRIVATE_H
+
+struct vfio_iommu {
+ struct iommu_domain *domain;
+ struct bus_type *bus;
+ struct mutex lock;
+ struct list_head dma_list;
+ struct mm_struct *mm;
+ struct list_head group_list;
+ int refcnt;
+ bool cache;
+};
+
+extern const struct file_operations vfio_iommu_fops;
+
+extern int vfio_release_iommu(struct vfio_iommu *iommu);
+extern void vfio_iommu_unmapall(struct vfio_iommu *iommu);
+
+#endif /* VFIO_PRIVATE_H */
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v2 4/5] vfio: VFIO core IOMMU mapping support
2012-01-23 17:20 [Qemu-devel] [PATCH v2 0/5] VFIO core framework Alex Williamson
` (2 preceding siblings ...)
2012-01-23 17:21 ` [Qemu-devel] [PATCH v2 3/5] vfio: VFIO core group interface Alex Williamson
@ 2012-01-23 17:21 ` Alex Williamson
2012-01-23 17:21 ` [Qemu-devel] [PATCH v2 5/5] vfio: VFIO core Kconfig and Makefile Alex Williamson
4 siblings, 0 replies; 6+ messages in thread
From: Alex Williamson @ 2012-01-23 17:21 UTC (permalink / raw)
To: chrisw, aik, david, joerg.roedel, agraf, benve, aafabbri, B08248,
B07421, avi, konrad.wilk, kvm, qemu-devel, iommu, linux-pci,
linux-kernel
Backing for operations on the IOMMU object, including DMA
mapping and unmapping.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
drivers/vfio/vfio_iommu.c | 611 +++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 611 insertions(+), 0 deletions(-)
create mode 100644 drivers/vfio/vfio_iommu.c
diff --git a/drivers/vfio/vfio_iommu.c b/drivers/vfio/vfio_iommu.c
new file mode 100644
index 0000000..49e6b2d
--- /dev/null
+++ b/drivers/vfio/vfio_iommu.c
@@ -0,0 +1,611 @@
+/*
+ * VFIO: IOMMU DMA mapping support
+ *
+ * Copyright (C) 2012 Red Hat, Inc. All rights reserved.
+ * Author: Alex Williamson <alex.williamson@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc. All rights reserved.
+ * Author: Tom Lyon, pugs@cisco.com
+ */
+
+#include <linux/compat.h>
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/workqueue.h>
+
+#include "vfio_private.h"
+
+struct vfio_dma_map_entry {
+ struct list_head list;
+ dma_addr_t iova; /* Device address */
+ unsigned long vaddr; /* Process virtual addr */
+ long npage; /* Number of pages */
+ int prot; /* IOMMU_READ/WRITE */
+};
+
+/*
+ * This code handles mapping and unmapping of user data buffers
+ * into DMA'ble space using the IOMMU
+ */
+
+#define NPAGE_TO_SIZE(npage) ((size_t)(npage) << PAGE_SHIFT)
+
+struct vwork {
+ struct mm_struct *mm;
+ long npage;
+ struct work_struct work;
+};
+
+/* delayed decrement/increment for locked_vm */
+static void vfio_lock_acct_bg(struct work_struct *work)
+{
+ struct vwork *vwork = container_of(work, struct vwork, work);
+ struct mm_struct *mm;
+
+ mm = vwork->mm;
+ down_write(&mm->mmap_sem);
+ mm->locked_vm += vwork->npage;
+ up_write(&mm->mmap_sem);
+ mmput(mm);
+ kfree(vwork);
+}
+
+static void vfio_lock_acct(long npage)
+{
+ struct vwork *vwork;
+ struct mm_struct *mm;
+
+ if (!current->mm)
+ return; /* process exited */
+
+ if (down_write_trylock(¤t->mm->mmap_sem)) {
+ current->mm->locked_vm += npage;
+ up_write(¤t->mm->mmap_sem);
+ return;
+ }
+
+ /*
+ * Couldn't get mmap_sem lock, so must setup to update
+ * mm->locked_vm later. If locked_vm were atomic, we
+ * wouldn't need this silliness
+ */
+ vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL);
+ if (!vwork)
+ return;
+ mm = get_task_mm(current);
+ if (!mm) {
+ kfree(vwork);
+ return;
+ }
+ INIT_WORK(&vwork->work, vfio_lock_acct_bg);
+ vwork->mm = mm;
+ vwork->npage = npage;
+ schedule_work(&vwork->work);
+}
+
+/*
+ * Some mappings aren't backed by a struct page, for example an mmap'd
+ * MMIO range for our own or another device. These use a different
+ * pfn conversion and shouldn't be tracked as locked pages.
+ */
+static bool is_invalid_reserved_pfn(unsigned long pfn)
+{
+ if (pfn_valid(pfn)) {
+ bool reserved;
+ struct page *tail = pfn_to_page(pfn);
+ struct page *head = compound_trans_head(tail);
+ reserved = !!(PageReserved(head));
+ if (head != tail) {
+ /*
+ * "head" is not a dangling pointer
+ * (compound_trans_head takes care of that)
+ * but the hugepage may have been split
+ * from under us (and we may not hold a
+ * reference count on the head page so it can
+ * be reused before we run PageReferenced), so
+ * we've to check PageTail before returning
+ * what we just read.
+ */
+ smp_rmb();
+ if (PageTail(tail))
+ return reserved;
+ }
+ return PageReserved(tail);
+ }
+
+ return true;
+}
+
+static int put_pfn(unsigned long pfn, int prot)
+{
+ if (!is_invalid_reserved_pfn(pfn)) {
+ struct page *page = pfn_to_page(pfn);
+ if (prot & IOMMU_WRITE)
+ SetPageDirty(page);
+ put_page(page);
+ return 1;
+ }
+ return 0;
+}
+
+/* Unmap DMA region */
+static long __vfio_dma_do_unmap(struct vfio_iommu *iommu, dma_addr_t iova,
+ long npage, int prot)
+{
+ long i, unlocked = 0;
+
+ for (i = 0; i < npage; i++, iova += PAGE_SIZE) {
+ unsigned long pfn;
+
+ pfn = iommu_iova_to_phys(iommu->domain, iova) >> PAGE_SHIFT;
+ if (pfn) {
+ iommu_unmap(iommu->domain, iova, PAGE_SIZE);
+ unlocked += put_pfn(pfn, prot);
+ }
+ }
+ return unlocked;
+}
+
+static void vfio_dma_unmap(struct vfio_iommu *iommu, dma_addr_t iova,
+ long npage, int prot)
+{
+ long unlocked;
+
+ unlocked = __vfio_dma_do_unmap(iommu, iova, npage, prot);
+ vfio_lock_acct(-unlocked);
+}
+
+/* Unmap ALL DMA regions */
+void vfio_iommu_unmapall(struct vfio_iommu *iommu)
+{
+ struct list_head *pos, *tmp;
+
+ mutex_lock(&iommu->lock);
+ list_for_each_safe(pos, tmp, &iommu->dma_list) {
+ struct vfio_dma_map_entry *dma;
+
+ dma = list_entry(pos, struct vfio_dma_map_entry, list);
+ vfio_dma_unmap(iommu, dma->iova, dma->npage, dma->prot);
+ list_del(&dma->list);
+ kfree(dma);
+ }
+ mutex_unlock(&iommu->lock);
+}
+
+static int vaddr_get_pfn(unsigned long vaddr, int prot, unsigned long *pfn)
+{
+ struct page *page[1];
+ struct vm_area_struct *vma;
+ int ret = -EFAULT;
+
+ if (get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE), page) == 1) {
+ *pfn = page_to_pfn(page[0]);
+ return 0;
+ }
+
+ down_read(¤t->mm->mmap_sem);
+
+ vma = find_vma_intersection(current->mm, vaddr, vaddr + 1);
+
+ if (vma && vma->vm_flags & VM_PFNMAP) {
+ *pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+ if (is_invalid_reserved_pfn(*pfn))
+ ret = 0;
+ }
+
+ up_read(¤t->mm->mmap_sem);
+
+ return ret;
+}
+
+/* Map DMA region */
+static int __vfio_dma_map(struct vfio_iommu *iommu, dma_addr_t iova,
+ unsigned long vaddr, long npage, int prot)
+{
+ dma_addr_t start = iova;
+ long i, locked = 0;
+ int ret;
+
+ /* Verify that pages are not already mapped */
+ for (i = 0; i < npage; i++, iova += PAGE_SIZE)
+ if (iommu_iova_to_phys(iommu->domain, iova))
+ return -EBUSY;
+
+ iova = start;
+
+ if (iommu->cache)
+ prot |= IOMMU_CACHE;
+
+ /*
+ * XXX We break mappings into pages and use get_user_pages_fast to
+ * pin the pages in memory. It's been suggested that mlock might
+ * provide a more efficient mechanism, but nothing prevents the
+ * user from munlocking the pages, which could then allow the user
+ * access to random host memory. We also have no guarantee from the
+ * IOMMU API that the iommu driver can unmap sub-pages of previous
+ * mappings. This means we might lose an entire range if a single
+ * page within it is unmapped. Single page mappings are inefficient,
+ * but provide the most flexibility for now.
+ */
+ for (i = 0; i < npage; i++, iova += PAGE_SIZE, vaddr += PAGE_SIZE) {
+ unsigned long pfn = 0;
+
+ ret = vaddr_get_pfn(vaddr, prot, &pfn);
+ if (ret) {
+ __vfio_dma_do_unmap(iommu, start, i, prot);
+ return ret;
+ }
+
+ /*
+ * Only add actual locked pages to accounting
+ * XXX We're effectively marking a page locked for every
+ * IOVA page even though it's possible the user could be
+ * backing multiple IOVAs with the same vaddr. This over-
+ * penalizes the user process, but we currently have no
+ * easy way to do this properly.
+ */
+ if (!is_invalid_reserved_pfn(pfn))
+ locked++;
+
+ ret = iommu_map(iommu->domain, iova,
+ (phys_addr_t)pfn << PAGE_SHIFT,
+ PAGE_SIZE, prot);
+ if (ret) {
+ /* Back out mappings on error */
+ put_pfn(pfn, prot);
+ __vfio_dma_do_unmap(iommu, start, i, prot);
+ return ret;
+ }
+ }
+ vfio_lock_acct(locked);
+ return 0;
+}
+
+static inline bool ranges_overlap(dma_addr_t start1, size_t size1,
+ dma_addr_t start2, size_t size2)
+{
+ if (start1 < start2)
+ return (start2 - start1 < size1);
+ else if (start2 < start1)
+ return (start1 - start2 < size2);
+ return (size1 > 0 && size2 > 0);
+}
+
+static struct vfio_dma_map_entry *vfio_find_dma(struct vfio_iommu *iommu,
+ dma_addr_t start, size_t size)
+{
+ struct list_head *pos;
+
+ list_for_each(pos, &iommu->dma_list) {
+ struct vfio_dma_map_entry *dma;
+
+ dma = list_entry(pos, struct vfio_dma_map_entry, list);
+ if (ranges_overlap(dma->iova, NPAGE_TO_SIZE(dma->npage),
+ start, size))
+ return dma;
+ }
+ return NULL;
+}
+
+static long vfio_remove_dma_overlap(struct vfio_iommu *iommu, dma_addr_t start,
+ size_t size, struct vfio_dma_map_entry *dma)
+{
+ struct vfio_dma_map_entry *split;
+ long npage_lo, npage_hi;
+
+ /* Existing dma region is completely covered, unmap all */
+ if (start <= dma->iova &&
+ start + size >= dma->iova + NPAGE_TO_SIZE(dma->npage)) {
+ vfio_dma_unmap(iommu, dma->iova, dma->npage, dma->prot);
+ list_del(&dma->list);
+ npage_lo = dma->npage;
+ kfree(dma);
+ return npage_lo;
+ }
+
+ /* Overlap low address of existing range */
+ if (start <= dma->iova) {
+ size_t overlap;
+
+ overlap = start + size - dma->iova;
+ npage_lo = overlap >> PAGE_SHIFT;
+
+ vfio_dma_unmap(iommu, dma->iova, npage_lo, dma->prot);
+ dma->iova += overlap;
+ dma->vaddr += overlap;
+ dma->npage -= npage_lo;
+ return npage_lo;
+ }
+
+ /* Overlap high address of existing range */
+ if (start + size >= dma->iova + NPAGE_TO_SIZE(dma->npage)) {
+ size_t overlap;
+
+ overlap = dma->iova + NPAGE_TO_SIZE(dma->npage) - start;
+ npage_hi = overlap >> PAGE_SHIFT;
+
+ vfio_dma_unmap(iommu, start, npage_hi, dma->prot);
+ dma->npage -= npage_hi;
+ return npage_hi;
+ }
+
+ /* Split existing */
+ npage_lo = (start - dma->iova) >> PAGE_SHIFT;
+ npage_hi = dma->npage - (size >> PAGE_SHIFT) - npage_lo;
+
+ split = kzalloc(sizeof *split, GFP_KERNEL);
+ if (!split)
+ return -ENOMEM;
+
+ vfio_dma_unmap(iommu, start, size >> PAGE_SHIFT, dma->prot);
+
+ dma->npage = npage_lo;
+
+ split->npage = npage_hi;
+ split->iova = start + size;
+ split->vaddr = dma->vaddr + NPAGE_TO_SIZE(npage_lo) + size;
+ split->prot = dma->prot;
+ list_add(&split->list, &iommu->dma_list);
+ return size >> PAGE_SHIFT;
+}
+
+static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
+ struct vfio_dma_unmap *unmap)
+{
+ long ret = 0, npage = unmap->size >> PAGE_SHIFT;
+ struct list_head *pos, *tmp;
+ uint64_t mask;
+
+ mask = ((uint64_t)1 << __ffs(iommu->domain->ops->pgsize_bitmap)) - 1;
+
+ if (unmap->iova & mask)
+ return -EINVAL;
+ if (unmap->size & mask)
+ return -EINVAL;
+
+ /* XXX We still break these down into PAGE_SIZE */
+ WARN_ON(mask & PAGE_MASK);
+
+ mutex_lock(&iommu->lock);
+
+ list_for_each_safe(pos, tmp, &iommu->dma_list) {
+ struct vfio_dma_map_entry *dma;
+
+ dma = list_entry(pos, struct vfio_dma_map_entry, list);
+ if (ranges_overlap(dma->iova, NPAGE_TO_SIZE(dma->npage),
+ unmap->iova, unmap->size)) {
+ ret = vfio_remove_dma_overlap(iommu, unmap->iova,
+ unmap->size, dma);
+ if (ret > 0)
+ npage -= ret;
+ if (ret < 0 || npage == 0)
+ break;
+ }
+ }
+ mutex_unlock(&iommu->lock);
+ return ret > 0 ? 0 : (int)ret;
+}
+
+static int vfio_dma_do_map(struct vfio_iommu *iommu, struct vfio_dma_map *map)
+{
+ struct vfio_dma_map_entry *dma, *pdma = NULL;
+ dma_addr_t iova = map->iova;
+ unsigned long locked, lock_limit, vaddr = map->vaddr;
+ size_t size = map->size;
+ int ret = 0, prot = 0;
+ uint64_t mask;
+ long npage;
+
+ mask = ((uint64_t)1 << __ffs(iommu->domain->ops->pgsize_bitmap)) - 1;
+
+ /* READ/WRITE from device perspective */
+ if (map->flags & VFIO_DMA_MAP_FLAG_WRITE)
+ prot |= IOMMU_WRITE;
+ if (map->flags & VFIO_DMA_MAP_FLAG_READ)
+ prot |= IOMMU_READ;
+
+ if (!prot)
+ return -EINVAL; /* No READ/WRITE? */
+
+ if (vaddr & mask)
+ return -EINVAL;
+ if (iova & mask)
+ return -EINVAL;
+ if (size & mask)
+ return -EINVAL;
+
+ /* XXX We still break these down into PAGE_SIZE */
+ WARN_ON(mask & PAGE_MASK);
+
+ /* Don't allow IOVA wrap */
+ if (iova + size && iova + size < iova)
+ return -EINVAL;
+
+ /* Don't allow virtual address wrap */
+ if (vaddr + size && vaddr + size < vaddr)
+ return -EINVAL;
+
+ npage = size >> PAGE_SHIFT;
+ if (!npage)
+ return -EINVAL;
+
+ mutex_lock(&iommu->lock);
+
+ if (vfio_find_dma(iommu, iova, size)) {
+ ret = -EBUSY;
+ goto out_lock;
+ }
+
+ /* account for locked pages */
+ locked = current->mm->locked_vm + npage;
+ lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+ if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
+ printk(KERN_WARNING "%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
+ __func__, rlimit(RLIMIT_MEMLOCK));
+ ret = -ENOMEM;
+ goto out_lock;
+ }
+
+ ret = __vfio_dma_map(iommu, iova, vaddr, npage, prot);
+ if (ret)
+ goto out_lock;
+
+ /* Check if we abut a region below - nothing below 0 */
+ if (iova) {
+ dma = vfio_find_dma(iommu, iova - 1, 1);
+ if (dma && dma->prot == prot &&
+ dma->vaddr + NPAGE_TO_SIZE(dma->npage) == vaddr) {
+
+ dma->npage += npage;
+ iova = dma->iova;
+ vaddr = dma->vaddr;
+ npage = dma->npage;
+ size = NPAGE_TO_SIZE(npage);
+
+ pdma = dma;
+ }
+ }
+
+ /* Check if we abut a region above - nothing above ~0 + 1 */
+ if (iova + size) {
+ dma = vfio_find_dma(iommu, iova + size, 1);
+ if (dma && dma->prot == prot &&
+ dma->vaddr == vaddr + size) {
+
+ dma->npage += npage;
+ dma->iova = iova;
+ dma->vaddr = vaddr;
+
+ /*
+ * If merged above and below, remove previously
+ * merged entry. New entry covers it.
+ */
+ if (pdma) {
+ list_del(&pdma->list);
+ kfree(pdma);
+ }
+ pdma = dma;
+ }
+ }
+
+ /* Isolated, new region */
+ if (!pdma) {
+ dma = kzalloc(sizeof *dma, GFP_KERNEL);
+ if (!dma) {
+ ret = -ENOMEM;
+ vfio_dma_unmap(iommu, iova, npage, prot);
+ goto out_lock;
+ }
+
+ dma->npage = npage;
+ dma->iova = iova;
+ dma->vaddr = vaddr;
+ dma->prot = prot;
+ list_add(&dma->list, &iommu->dma_list);
+ }
+
+out_lock:
+ mutex_unlock(&iommu->lock);
+ return ret;
+}
+
+static int vfio_iommu_release(struct inode *inode, struct file *filep)
+{
+ struct vfio_iommu *iommu = filep->private_data;
+
+ vfio_release_iommu(iommu);
+ return 0;
+}
+
+static long vfio_iommu_unl_ioctl(struct file *filep,
+ unsigned int cmd, unsigned long arg)
+{
+ struct vfio_iommu *iommu = filep->private_data;
+ unsigned long minsz;
+
+ if (cmd == VFIO_IOMMU_GET_INFO) {
+ struct vfio_iommu_info info;
+
+ minsz = offsetofend(struct vfio_iommu_info, iova_pgsizes);
+
+ if (copy_from_user(&info, (void __user *)arg, minsz))
+ return -EFAULT;
+
+ if (info.argsz < minsz)
+ return -EINVAL;
+
+ info.flags = 0;
+
+ /*
+ * XXX Need to define an interface in IOMMU API for this.
+ * Currently only compatible with x86 VT-d/AMD-Vi which
+ * do page table based mapping and have no constraints here.
+ */
+ info.iova_start = 0;
+ info.iova_size = ~info.iova_start;
+ info.iova_entries = ~info.iova_start;
+ info.iova_pgsizes = iommu->domain->ops->pgsize_bitmap;
+
+ return copy_to_user((void __user *)arg, &info, minsz);
+
+ } else if (cmd == VFIO_IOMMU_MAP_DMA) {
+ struct vfio_dma_map map;
+ uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
+ VFIO_DMA_MAP_FLAG_WRITE;
+
+ minsz = offsetofend(struct vfio_dma_map, size);
+
+ if (copy_from_user(&map, (void __user *)arg, minsz))
+ return -EFAULT;
+
+ if (map.argsz < minsz || map.flags & ~mask)
+ return -EINVAL;
+
+ return vfio_dma_do_map(iommu, &map);
+
+ } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
+ struct vfio_dma_unmap unmap;
+
+ minsz = offsetofend(struct vfio_dma_unmap, size);
+
+ if (copy_from_user(&unmap, (void __user *)arg, minsz))
+ return -EFAULT;
+
+ if (unmap.argsz < minsz || unmap.flags)
+ return -EINVAL;
+
+ return vfio_dma_do_unmap(iommu, &unmap);
+ }
+
+ return -ENOTTY;
+}
+
+#ifdef CONFIG_COMPAT
+static long vfio_iommu_compat_ioctl(struct file *filep,
+ unsigned int cmd, unsigned long arg)
+{
+ arg = (unsigned long)compat_ptr(arg);
+ return vfio_iommu_unl_ioctl(filep, cmd, arg);
+}
+#endif /* CONFIG_COMPAT */
+
+const struct file_operations vfio_iommu_fops = {
+ .owner = THIS_MODULE,
+ .release = vfio_iommu_release,
+ .unlocked_ioctl = vfio_iommu_unl_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = vfio_iommu_compat_ioctl,
+#endif
+};
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v2 5/5] vfio: VFIO core Kconfig and Makefile
2012-01-23 17:20 [Qemu-devel] [PATCH v2 0/5] VFIO core framework Alex Williamson
` (3 preceding siblings ...)
2012-01-23 17:21 ` [Qemu-devel] [PATCH v2 4/5] vfio: VFIO core IOMMU mapping support Alex Williamson
@ 2012-01-23 17:21 ` Alex Williamson
4 siblings, 0 replies; 6+ messages in thread
From: Alex Williamson @ 2012-01-23 17:21 UTC (permalink / raw)
To: chrisw, aik, david, joerg.roedel, agraf, benve, aafabbri, B08248,
B07421, avi, konrad.wilk, kvm, qemu-devel, iommu, linux-pci,
linux-kernel
Enable the base code.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
MAINTAINERS | 8 ++++++++
drivers/Kconfig | 2 ++
drivers/Makefile | 1 +
drivers/vfio/Kconfig | 8 ++++++++
drivers/vfio/Makefile | 3 +++
5 files changed, 22 insertions(+), 0 deletions(-)
create mode 100644 drivers/vfio/Kconfig
create mode 100644 drivers/vfio/Makefile
diff --git a/MAINTAINERS b/MAINTAINERS
index df8cb66..2f3a5c8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7129,6 +7129,14 @@ S: Maintained
F: Documentation/filesystems/vfat.txt
F: fs/fat/
+VFIO DRIVER
+M: Alex Williamson <alex.williamson@redhat.com>
+L: kvm@vger.kernel.org
+S: Maintained
+F: Documentation/vfio.txt
+F: drivers/vfio/
+F: include/linux/vfio.h
+
VIDEOBUF2 FRAMEWORK
M: Pawel Osciak <pawel@osciak.com>
M: Marek Szyprowski <m.szyprowski@samsung.com>
diff --git a/drivers/Kconfig b/drivers/Kconfig
index d5138e6..f168bf3 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -114,6 +114,8 @@ source "drivers/auxdisplay/Kconfig"
source "drivers/uio/Kconfig"
+source "drivers/vfio/Kconfig"
+
source "drivers/vlynq/Kconfig"
source "drivers/virtio/Kconfig"
diff --git a/drivers/Makefile b/drivers/Makefile
index 71a1f16..6be03a1 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_ATM) += atm/
obj-$(CONFIG_FUSION) += message/
obj-y += firewire/
obj-$(CONFIG_UIO) += uio/
+obj-$(CONFIG_VFIO) += vfio/
obj-y += cdrom/
obj-y += auxdisplay/
obj-$(CONFIG_PCCARD) += pcmcia/
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
new file mode 100644
index 0000000..9acb1e7
--- /dev/null
+++ b/drivers/vfio/Kconfig
@@ -0,0 +1,8 @@
+menuconfig VFIO
+ tristate "VFIO Non-Privileged userspace driver framework"
+ depends on IOMMU_API
+ help
+ VFIO provides a framework for secure userspace device drivers.
+ See Documentation/vfio.txt for more details.
+
+ If you don't know what to do here, say N.
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
new file mode 100644
index 0000000..088faf1
--- /dev/null
+++ b/drivers/vfio/Makefile
@@ -0,0 +1,3 @@
+vfio-y := vfio_main.o vfio_iommu.o
+
+obj-$(CONFIG_VFIO) := vfio.o
^ permalink raw reply related [flat|nested] 6+ messages in thread