* [PATCH v4 01/10] vfio: Move vfio_device driver open/close code to a function
2022-11-29 20:31 [PATCH v4 00/10] Connect VFIO to IOMMUFD Jason Gunthorpe
@ 2022-11-29 20:31 ` Jason Gunthorpe
2022-11-29 20:31 ` [PATCH v4 02/10] vfio: Move vfio_device_assign_container() into vfio_device_first_open() Jason Gunthorpe
` (9 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2022-11-29 20:31 UTC (permalink / raw)
Cc: Alex Williamson, Lu Baolu, Cornelia Huck, Eric Auger, Kevin Tian,
kvm, Lixiao Yang, Matthew Rosato, Nicolin Chen, Yi Liu, Yu He
This error unwind is getting complicated. Move all the code into two
pair'd function. The functions should be called when the open_count == 1
after incrementing/before decrementing.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yu He <yu.he@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/vfio_main.c | 95 ++++++++++++++++++++++------------------
1 file changed, 53 insertions(+), 42 deletions(-)
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 2d168793d4e1ce..2e8346d13c16ca 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -734,6 +734,51 @@ bool vfio_assert_device_open(struct vfio_device *device)
return !WARN_ON_ONCE(!READ_ONCE(device->open_count));
}
+static int vfio_device_first_open(struct vfio_device *device)
+{
+ int ret;
+
+ lockdep_assert_held(&device->dev_set->lock);
+
+ if (!try_module_get(device->dev->driver->owner))
+ return -ENODEV;
+
+ /*
+ * Here we pass the KVM pointer with the group under the lock. If the
+ * device driver will use it, it must obtain a reference and release it
+ * during close_device.
+ */
+ mutex_lock(&device->group->group_lock);
+ device->kvm = device->group->kvm;
+ if (device->ops->open_device) {
+ ret = device->ops->open_device(device);
+ if (ret)
+ goto err_module_put;
+ }
+ vfio_device_container_register(device);
+ mutex_unlock(&device->group->group_lock);
+ return 0;
+
+err_module_put:
+ device->kvm = NULL;
+ mutex_unlock(&device->group->group_lock);
+ module_put(device->dev->driver->owner);
+ return ret;
+}
+
+static void vfio_device_last_close(struct vfio_device *device)
+{
+ lockdep_assert_held(&device->dev_set->lock);
+
+ mutex_lock(&device->group->group_lock);
+ vfio_device_container_unregister(device);
+ if (device->ops->close_device)
+ device->ops->close_device(device);
+ device->kvm = NULL;
+ mutex_unlock(&device->group->group_lock);
+ module_put(device->dev->driver->owner);
+}
+
static struct file *vfio_device_open(struct vfio_device *device)
{
struct file *filep;
@@ -745,29 +790,12 @@ static struct file *vfio_device_open(struct vfio_device *device)
if (ret)
return ERR_PTR(ret);
- if (!try_module_get(device->dev->driver->owner)) {
- ret = -ENODEV;
- goto err_unassign_container;
- }
-
mutex_lock(&device->dev_set->lock);
device->open_count++;
if (device->open_count == 1) {
- /*
- * Here we pass the KVM pointer with the group under the read
- * lock. If the device driver will use it, it must obtain a
- * reference and release it during close_device.
- */
- mutex_lock(&device->group->group_lock);
- device->kvm = device->group->kvm;
-
- if (device->ops->open_device) {
- ret = device->ops->open_device(device);
- if (ret)
- goto err_undo_count;
- }
- vfio_device_container_register(device);
- mutex_unlock(&device->group->group_lock);
+ ret = vfio_device_first_open(device);
+ if (ret)
+ goto err_unassign_container;
}
mutex_unlock(&device->dev_set->lock);
@@ -800,20 +828,11 @@ static struct file *vfio_device_open(struct vfio_device *device)
err_close_device:
mutex_lock(&device->dev_set->lock);
- mutex_lock(&device->group->group_lock);
- if (device->open_count == 1 && device->ops->close_device) {
- device->ops->close_device(device);
-
- vfio_device_container_unregister(device);
- }
-err_undo_count:
- mutex_unlock(&device->group->group_lock);
+ if (device->open_count == 1)
+ vfio_device_last_close(device);
+err_unassign_container:
device->open_count--;
- if (device->open_count == 0 && device->kvm)
- device->kvm = NULL;
mutex_unlock(&device->dev_set->lock);
- module_put(device->dev->driver->owner);
-err_unassign_container:
vfio_device_unassign_container(device);
return ERR_PTR(ret);
}
@@ -1016,19 +1035,11 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
mutex_lock(&device->dev_set->lock);
vfio_assert_device_open(device);
- mutex_lock(&device->group->group_lock);
- if (device->open_count == 1 && device->ops->close_device)
- device->ops->close_device(device);
-
- vfio_device_container_unregister(device);
- mutex_unlock(&device->group->group_lock);
+ if (device->open_count == 1)
+ vfio_device_last_close(device);
device->open_count--;
- if (device->open_count == 0)
- device->kvm = NULL;
mutex_unlock(&device->dev_set->lock);
- module_put(device->dev->driver->owner);
-
vfio_device_unassign_container(device);
vfio_device_put_registration(device);
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v4 02/10] vfio: Move vfio_device_assign_container() into vfio_device_first_open()
2022-11-29 20:31 [PATCH v4 00/10] Connect VFIO to IOMMUFD Jason Gunthorpe
2022-11-29 20:31 ` [PATCH v4 01/10] vfio: Move vfio_device driver open/close code to a function Jason Gunthorpe
@ 2022-11-29 20:31 ` Jason Gunthorpe
2022-11-29 20:31 ` [PATCH v4 03/10] vfio: Rename vfio_device_assign/unassign_container() Jason Gunthorpe
` (8 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2022-11-29 20:31 UTC (permalink / raw)
Cc: Alex Williamson, Lu Baolu, Cornelia Huck, Eric Auger, Kevin Tian,
kvm, Lixiao Yang, Matthew Rosato, Nicolin Chen, Yi Liu, Yu He
The only thing this function does is assert the group has an assigned
container and incrs refcounts.
The overall model we have is that once a container_users refcount is
incremented it cannot be de-assigned from the group -
vfio_group_ioctl_unset_container() will fail and the group FD cannot be
closed.
Thus we do not need to check this on every device FD open, just the
first. Reorganize the code so that only the first open and last close
manages the container.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yu He <yu.he@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/container.c | 4 ++--
drivers/vfio/vfio_main.c | 24 +++++++++++-------------
2 files changed, 13 insertions(+), 15 deletions(-)
diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
index d74164abbf401d..dd79a66ec62cad 100644
--- a/drivers/vfio/container.c
+++ b/drivers/vfio/container.c
@@ -531,11 +531,11 @@ int vfio_device_assign_container(struct vfio_device *device)
void vfio_device_unassign_container(struct vfio_device *device)
{
- mutex_lock(&device->group->group_lock);
+ lockdep_assert_held_write(&device->group->group_lock);
+
WARN_ON(device->group->container_users <= 1);
device->group->container_users--;
fput(device->group->opened_file);
- mutex_unlock(&device->group->group_lock);
}
/*
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 2e8346d13c16ca..717c7f404feeea 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -749,18 +749,24 @@ static int vfio_device_first_open(struct vfio_device *device)
* during close_device.
*/
mutex_lock(&device->group->group_lock);
+ ret = vfio_device_assign_container(device);
+ if (ret)
+ goto err_module_put;
+
device->kvm = device->group->kvm;
if (device->ops->open_device) {
ret = device->ops->open_device(device);
if (ret)
- goto err_module_put;
+ goto err_container;
}
vfio_device_container_register(device);
mutex_unlock(&device->group->group_lock);
return 0;
-err_module_put:
+err_container:
device->kvm = NULL;
+ vfio_device_unassign_container(device);
+err_module_put:
mutex_unlock(&device->group->group_lock);
module_put(device->dev->driver->owner);
return ret;
@@ -775,6 +781,7 @@ static void vfio_device_last_close(struct vfio_device *device)
if (device->ops->close_device)
device->ops->close_device(device);
device->kvm = NULL;
+ vfio_device_unassign_container(device);
mutex_unlock(&device->group->group_lock);
module_put(device->dev->driver->owner);
}
@@ -784,18 +791,12 @@ static struct file *vfio_device_open(struct vfio_device *device)
struct file *filep;
int ret;
- mutex_lock(&device->group->group_lock);
- ret = vfio_device_assign_container(device);
- mutex_unlock(&device->group->group_lock);
- if (ret)
- return ERR_PTR(ret);
-
mutex_lock(&device->dev_set->lock);
device->open_count++;
if (device->open_count == 1) {
ret = vfio_device_first_open(device);
if (ret)
- goto err_unassign_container;
+ goto err_unlock;
}
mutex_unlock(&device->dev_set->lock);
@@ -830,10 +831,9 @@ static struct file *vfio_device_open(struct vfio_device *device)
mutex_lock(&device->dev_set->lock);
if (device->open_count == 1)
vfio_device_last_close(device);
-err_unassign_container:
+err_unlock:
device->open_count--;
mutex_unlock(&device->dev_set->lock);
- vfio_device_unassign_container(device);
return ERR_PTR(ret);
}
@@ -1040,8 +1040,6 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
device->open_count--;
mutex_unlock(&device->dev_set->lock);
- vfio_device_unassign_container(device);
-
vfio_device_put_registration(device);
return 0;
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v4 03/10] vfio: Rename vfio_device_assign/unassign_container()
2022-11-29 20:31 [PATCH v4 00/10] Connect VFIO to IOMMUFD Jason Gunthorpe
2022-11-29 20:31 ` [PATCH v4 01/10] vfio: Move vfio_device driver open/close code to a function Jason Gunthorpe
2022-11-29 20:31 ` [PATCH v4 02/10] vfio: Move vfio_device_assign_container() into vfio_device_first_open() Jason Gunthorpe
@ 2022-11-29 20:31 ` Jason Gunthorpe
2022-11-29 20:31 ` [PATCH v4 04/10] vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent() Jason Gunthorpe
` (7 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2022-11-29 20:31 UTC (permalink / raw)
Cc: Alex Williamson, Lu Baolu, Cornelia Huck, Eric Auger, Kevin Tian,
kvm, Lixiao Yang, Matthew Rosato, Nicolin Chen, Yi Liu, Yu He
These functions don't really assign anything anymore, they just increment
some refcounts and do a sanity check. Call them
vfio_group_[un]use_container()
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yu He <yu.he@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/container.c | 14 ++++++--------
drivers/vfio/vfio.h | 4 ++--
drivers/vfio/vfio_main.c | 6 +++---
3 files changed, 11 insertions(+), 13 deletions(-)
diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
index dd79a66ec62cad..499777930b08fa 100644
--- a/drivers/vfio/container.c
+++ b/drivers/vfio/container.c
@@ -511,10 +511,8 @@ void vfio_group_detach_container(struct vfio_group *group)
vfio_container_put(container);
}
-int vfio_device_assign_container(struct vfio_device *device)
+int vfio_group_use_container(struct vfio_group *group)
{
- struct vfio_group *group = device->group;
-
lockdep_assert_held(&group->group_lock);
if (!group->container || !group->container->iommu_driver ||
@@ -529,13 +527,13 @@ int vfio_device_assign_container(struct vfio_device *device)
return 0;
}
-void vfio_device_unassign_container(struct vfio_device *device)
+void vfio_group_unuse_container(struct vfio_group *group)
{
- lockdep_assert_held_write(&device->group->group_lock);
+ lockdep_assert_held(&group->group_lock);
- WARN_ON(device->group->container_users <= 1);
- device->group->container_users--;
- fput(device->group->opened_file);
+ WARN_ON(group->container_users <= 1);
+ group->container_users--;
+ fput(group->opened_file);
}
/*
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index bcad54bbab08c4..f95f4925b83bbd 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -112,8 +112,8 @@ void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops);
bool vfio_assert_device_open(struct vfio_device *device);
struct vfio_container *vfio_container_from_file(struct file *filep);
-int vfio_device_assign_container(struct vfio_device *device);
-void vfio_device_unassign_container(struct vfio_device *device);
+int vfio_group_use_container(struct vfio_group *group);
+void vfio_group_unuse_container(struct vfio_group *group);
int vfio_container_attach_group(struct vfio_container *container,
struct vfio_group *group);
void vfio_group_detach_container(struct vfio_group *group);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 717c7f404feeea..8c2dcb481ae10b 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -749,7 +749,7 @@ static int vfio_device_first_open(struct vfio_device *device)
* during close_device.
*/
mutex_lock(&device->group->group_lock);
- ret = vfio_device_assign_container(device);
+ ret = vfio_group_use_container(device->group);
if (ret)
goto err_module_put;
@@ -765,7 +765,7 @@ static int vfio_device_first_open(struct vfio_device *device)
err_container:
device->kvm = NULL;
- vfio_device_unassign_container(device);
+ vfio_group_unuse_container(device->group);
err_module_put:
mutex_unlock(&device->group->group_lock);
module_put(device->dev->driver->owner);
@@ -781,7 +781,7 @@ static void vfio_device_last_close(struct vfio_device *device)
if (device->ops->close_device)
device->ops->close_device(device);
device->kvm = NULL;
- vfio_device_unassign_container(device);
+ vfio_group_unuse_container(device->group);
mutex_unlock(&device->group->group_lock);
module_put(device->dev->driver->owner);
}
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v4 04/10] vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent()
2022-11-29 20:31 [PATCH v4 00/10] Connect VFIO to IOMMUFD Jason Gunthorpe
` (2 preceding siblings ...)
2022-11-29 20:31 ` [PATCH v4 03/10] vfio: Rename vfio_device_assign/unassign_container() Jason Gunthorpe
@ 2022-11-29 20:31 ` Jason Gunthorpe
2022-11-29 20:31 ` [PATCH v4 05/10] vfio-iommufd: Allow iommufd to be used in place of a container fd Jason Gunthorpe
` (6 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2022-11-29 20:31 UTC (permalink / raw)
Cc: Alex Williamson, Lu Baolu, Cornelia Huck, Eric Auger, Kevin Tian,
kvm, Lixiao Yang, Matthew Rosato, Nicolin Chen, Yi Liu, Yu He
iommufd doesn't establish the iommu_domains until after the device FD is
opened, even if the container has been set. This design is part of moving
away from the group centric iommu APIs.
This is fine, except that the normal sequence of establishing the kvm
wbinvd won't work:
group = open("/dev/vfio/XX")
ioctl(group, VFIO_GROUP_SET_CONTAINER)
ioctl(kvm, KVM_DEV_VFIO_GROUP_ADD)
ioctl(group, VFIO_GROUP_GET_DEVICE_FD)
As the domains don't start existing until GET_DEVICE_FD. Further,
GET_DEVICE_FD requires that KVM_DEV_VFIO_GROUP_ADD already be done as that
is what sets the group->kvm and thus device->kvm for the driver to use
during open.
Now that we have device centric cap ops and the new
IOMMU_CAP_ENFORCE_CACHE_COHERENCY we know what the iommu_domain will be
capable of without having to create it. Use this to compute
vfio_file_enforced_coherent() and resolve the ordering problems.
VFIO always tries to upgrade domains to enforce cache coherency, it never
attaches a device that supports enforce cache coherency to a less capable
domain, so the cap test is a sufficient proxy for the ultimate
outcome. iommufd also ensures that devices that set the cap will be
connected to enforcing domains.
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yu He <yu.he@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/container.c | 5 +++--
drivers/vfio/vfio.h | 2 --
drivers/vfio/vfio_main.c | 29 ++++++++++++++++-------------
3 files changed, 19 insertions(+), 17 deletions(-)
diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
index 499777930b08fa..d97747dfb05d02 100644
--- a/drivers/vfio/container.c
+++ b/drivers/vfio/container.c
@@ -188,8 +188,9 @@ void vfio_device_container_unregister(struct vfio_device *device)
device->group->container->iommu_data, device);
}
-long vfio_container_ioctl_check_extension(struct vfio_container *container,
- unsigned long arg)
+static long
+vfio_container_ioctl_check_extension(struct vfio_container *container,
+ unsigned long arg)
{
struct vfio_iommu_driver *driver;
long ret = 0;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index f95f4925b83bbd..73156125870427 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -119,8 +119,6 @@ int vfio_container_attach_group(struct vfio_container *container,
void vfio_group_detach_container(struct vfio_group *group);
void vfio_device_container_register(struct vfio_device *device);
void vfio_device_container_unregister(struct vfio_device *device);
-long vfio_container_ioctl_check_extension(struct vfio_container *container,
- unsigned long arg);
int __init vfio_container_init(void);
void vfio_container_cleanup(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 8c2dcb481ae10b..77d6c0ba6a8302 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1622,24 +1622,27 @@ EXPORT_SYMBOL_GPL(vfio_file_is_group);
bool vfio_file_enforced_coherent(struct file *file)
{
struct vfio_group *group = file->private_data;
- bool ret;
+ struct vfio_device *device;
+ bool ret = true;
if (!vfio_file_is_group(file))
return true;
- mutex_lock(&group->group_lock);
- if (group->container) {
- ret = vfio_container_ioctl_check_extension(group->container,
- VFIO_DMA_CC_IOMMU);
- } else {
- /*
- * Since the coherency state is determined only once a container
- * is attached the user must do so before they can prove they
- * have permission.
- */
- ret = true;
+ /*
+ * If the device does not have IOMMU_CAP_ENFORCE_CACHE_COHERENCY then
+ * any domain later attached to it will also not support it. If the cap
+ * is set then the iommu_domain eventually attached to the device/group
+ * must use a domain with enforce_cache_coherency().
+ */
+ mutex_lock(&group->device_lock);
+ list_for_each_entry(device, &group->device_list, group_next) {
+ if (!device_iommu_capable(device->dev,
+ IOMMU_CAP_ENFORCE_CACHE_COHERENCY)) {
+ ret = false;
+ break;
+ }
}
- mutex_unlock(&group->group_lock);
+ mutex_unlock(&group->device_lock);
return ret;
}
EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v4 05/10] vfio-iommufd: Allow iommufd to be used in place of a container fd
2022-11-29 20:31 [PATCH v4 00/10] Connect VFIO to IOMMUFD Jason Gunthorpe
` (3 preceding siblings ...)
2022-11-29 20:31 ` [PATCH v4 04/10] vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent() Jason Gunthorpe
@ 2022-11-29 20:31 ` Jason Gunthorpe
2022-11-29 20:31 ` [PATCH v4 06/10] vfio-iommufd: Support iommufd for physical VFIO devices Jason Gunthorpe
` (5 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2022-11-29 20:31 UTC (permalink / raw)
Cc: Alex Williamson, Lu Baolu, Cornelia Huck, Eric Auger, Kevin Tian,
kvm, Lixiao Yang, Matthew Rosato, Nicolin Chen, Yi Liu, Yu He
This makes VFIO_GROUP_SET_CONTAINER accept both a vfio container FD and an
iommufd.
In iommufd mode an IOAS will exist after the SET_CONTAINER, but it will
not be attached to any groups.
For VFIO this means that the VFIO_GROUP_GET_STATUS and
VFIO_GROUP_FLAGS_VIABLE works subtly differently. With the container FD
the iommu_group_claim_dma_owner() is done during SET_CONTAINER but for
IOMMUFD this is done during VFIO_GROUP_GET_DEVICE_FD. Meaning that
VFIO_GROUP_FLAGS_VIABLE could be set but GET_DEVICE_FD will fail due to
viability.
As GET_DEVICE_FD can fail for many reasons already this is not expected to
be a meaningful difference.
Reorganize the tests for if the group has an assigned container or iommu
into a vfio_group_has_iommu() function and consolidate all the duplicated
WARN_ON's etc related to this.
Call container functions only if a container is actually present on the
group.
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yu He <yu.he@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/Kconfig | 1 +
drivers/vfio/container.c | 7 +++-
drivers/vfio/vfio.h | 2 +
drivers/vfio/vfio_main.c | 88 +++++++++++++++++++++++++++++++++-------
4 files changed, 82 insertions(+), 16 deletions(-)
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 86c381ceb9a1e9..1118d322eec97d 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -2,6 +2,7 @@
menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
select IOMMU_API
+ depends on IOMMUFD || !IOMMUFD
select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
select INTERVAL_TREE
help
diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
index d97747dfb05d02..8772dad6808539 100644
--- a/drivers/vfio/container.c
+++ b/drivers/vfio/container.c
@@ -516,8 +516,11 @@ int vfio_group_use_container(struct vfio_group *group)
{
lockdep_assert_held(&group->group_lock);
- if (!group->container || !group->container->iommu_driver ||
- WARN_ON(!group->container_users))
+ /*
+ * The container fd has been assigned with VFIO_GROUP_SET_CONTAINER but
+ * VFIO_SET_IOMMU hasn't been done yet.
+ */
+ if (!group->container->iommu_driver)
return -EINVAL;
if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 73156125870427..a9dd0615266cb9 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -10,6 +10,7 @@
#include <linux/cdev.h>
#include <linux/module.h>
+struct iommufd_ctx;
struct iommu_group;
struct vfio_device;
struct vfio_container;
@@ -60,6 +61,7 @@ struct vfio_group {
struct kvm *kvm;
struct file *opened_file;
struct blocking_notifier_head notifier;
+ struct iommufd_ctx *iommufd;
};
/* events for the backend driver notify callback */
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 77d6c0ba6a8302..f11157d056e6cc 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -35,6 +35,7 @@
#include <linux/pm_runtime.h>
#include <linux/interval_tree.h>
#include <linux/iova_bitmap.h>
+#include <linux/iommufd.h>
#include "vfio.h"
#define DRIVER_VERSION "0.3"
@@ -662,6 +663,18 @@ EXPORT_SYMBOL_GPL(vfio_unregister_group_dev);
/*
* VFIO Group fd, /dev/vfio/$GROUP
*/
+static bool vfio_group_has_iommu(struct vfio_group *group)
+{
+ lockdep_assert_held(&group->group_lock);
+ /*
+ * There can only be users if there is a container, and if there is a
+ * container there must be users.
+ */
+ WARN_ON(!group->container != !group->container_users);
+
+ return group->container || group->iommufd;
+}
+
/*
* VFIO_GROUP_UNSET_CONTAINER should fail if there are other users or
* if there was no container to unset. Since the ioctl is called on
@@ -673,15 +686,21 @@ static int vfio_group_ioctl_unset_container(struct vfio_group *group)
int ret = 0;
mutex_lock(&group->group_lock);
- if (!group->container) {
+ if (!vfio_group_has_iommu(group)) {
ret = -EINVAL;
goto out_unlock;
}
- if (group->container_users != 1) {
- ret = -EBUSY;
- goto out_unlock;
+ if (group->container) {
+ if (group->container_users != 1) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+ vfio_group_detach_container(group);
+ }
+ if (group->iommufd) {
+ iommufd_ctx_put(group->iommufd);
+ group->iommufd = NULL;
}
- vfio_group_detach_container(group);
out_unlock:
mutex_unlock(&group->group_lock);
@@ -692,6 +711,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
int __user *arg)
{
struct vfio_container *container;
+ struct iommufd_ctx *iommufd;
struct fd f;
int ret;
int fd;
@@ -704,7 +724,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
return -EBADF;
mutex_lock(&group->group_lock);
- if (group->container || WARN_ON(group->container_users)) {
+ if (vfio_group_has_iommu(group)) {
ret = -EINVAL;
goto out_unlock;
}
@@ -714,12 +734,28 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
}
container = vfio_container_from_file(f.file);
- ret = -EINVAL;
if (container) {
ret = vfio_container_attach_group(container, group);
goto out_unlock;
}
+ iommufd = iommufd_ctx_from_file(f.file);
+ if (!IS_ERR(iommufd)) {
+ u32 ioas_id;
+
+ ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id);
+ if (ret) {
+ iommufd_ctx_put(group->iommufd);
+ goto out_unlock;
+ }
+
+ group->iommufd = iommufd;
+ goto out_unlock;
+ }
+
+ /* The FD passed is not recognized. */
+ ret = -EBADFD;
+
out_unlock:
mutex_unlock(&group->group_lock);
fdput(f);
@@ -749,9 +785,16 @@ static int vfio_device_first_open(struct vfio_device *device)
* during close_device.
*/
mutex_lock(&device->group->group_lock);
- ret = vfio_group_use_container(device->group);
- if (ret)
+ if (!vfio_group_has_iommu(device->group)) {
+ ret = -EINVAL;
goto err_module_put;
+ }
+
+ if (device->group->container) {
+ ret = vfio_group_use_container(device->group);
+ if (ret)
+ goto err_module_put;
+ }
device->kvm = device->group->kvm;
if (device->ops->open_device) {
@@ -759,13 +802,15 @@ static int vfio_device_first_open(struct vfio_device *device)
if (ret)
goto err_container;
}
- vfio_device_container_register(device);
+ if (device->group->container)
+ vfio_device_container_register(device);
mutex_unlock(&device->group->group_lock);
return 0;
err_container:
device->kvm = NULL;
- vfio_group_unuse_container(device->group);
+ if (device->group->container)
+ vfio_group_unuse_container(device->group);
err_module_put:
mutex_unlock(&device->group->group_lock);
module_put(device->dev->driver->owner);
@@ -777,11 +822,13 @@ static void vfio_device_last_close(struct vfio_device *device)
lockdep_assert_held(&device->dev_set->lock);
mutex_lock(&device->group->group_lock);
- vfio_device_container_unregister(device);
+ if (device->group->container)
+ vfio_device_container_unregister(device);
if (device->ops->close_device)
device->ops->close_device(device);
device->kvm = NULL;
- vfio_group_unuse_container(device->group);
+ if (device->group->container)
+ vfio_group_unuse_container(device->group);
mutex_unlock(&device->group->group_lock);
module_put(device->dev->driver->owner);
}
@@ -897,7 +944,14 @@ static int vfio_group_ioctl_get_status(struct vfio_group *group,
return -ENODEV;
}
- if (group->container)
+ /*
+ * With the container FD the iommu_group_claim_dma_owner() is done
+ * during SET_CONTAINER but for IOMMFD this is done during
+ * VFIO_GROUP_GET_DEVICE_FD. Meaning that with iommufd
+ * VFIO_GROUP_FLAGS_VIABLE could be set but GET_DEVICE_FD will fail due
+ * to viability.
+ */
+ if (vfio_group_has_iommu(group))
status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET |
VFIO_GROUP_FLAGS_VIABLE;
else if (!iommu_group_dma_owner_claimed(group->iommu_group))
@@ -980,6 +1034,10 @@ static int vfio_group_fops_release(struct inode *inode, struct file *filep)
WARN_ON(group->notifier.head);
if (group->container)
vfio_group_detach_container(group);
+ if (group->iommufd) {
+ iommufd_ctx_put(group->iommufd);
+ group->iommufd = NULL;
+ }
group->opened_file = NULL;
mutex_unlock(&group->group_lock);
return 0;
@@ -1878,6 +1936,8 @@ static void __exit vfio_cleanup(void)
module_init(vfio_init);
module_exit(vfio_cleanup);
+MODULE_IMPORT_NS(IOMMUFD);
+MODULE_IMPORT_NS(IOMMUFD_VFIO);
MODULE_VERSION(DRIVER_VERSION);
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR(DRIVER_AUTHOR);
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v4 06/10] vfio-iommufd: Support iommufd for physical VFIO devices
2022-11-29 20:31 [PATCH v4 00/10] Connect VFIO to IOMMUFD Jason Gunthorpe
` (4 preceding siblings ...)
2022-11-29 20:31 ` [PATCH v4 05/10] vfio-iommufd: Allow iommufd to be used in place of a container fd Jason Gunthorpe
@ 2022-11-29 20:31 ` Jason Gunthorpe
2022-11-29 20:31 ` [PATCH v4 07/10] vfio-iommufd: Support iommufd for emulated " Jason Gunthorpe
` (4 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2022-11-29 20:31 UTC (permalink / raw)
Cc: Alex Williamson, Lu Baolu, Cornelia Huck, Eric Auger, Kevin Tian,
kvm, Lixiao Yang, Matthew Rosato, Nicolin Chen, Yi Liu, Yu He
This creates the iommufd_device for the physical VFIO drivers. These are
all the drivers that are calling vfio_register_group_dev() and expect the
type1 code to setup a real iommu_domain against their parent struct
device.
The design gives the driver a choice in how it gets connected to iommufd
by providing bind_iommufd/unbind_iommufd/attach_ioas callbacks to
implement as required. The core code provides three default callbacks for
physical mode using a real iommu_domain. This is suitable for drivers
using vfio_register_group_dev()
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yu He <yu.he@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/Makefile | 1 +
drivers/vfio/fsl-mc/vfio_fsl_mc.c | 3 +
drivers/vfio/iommufd.c | 100 ++++++++++++++++++
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 6 ++
drivers/vfio/pci/mlx5/main.c | 3 +
drivers/vfio/pci/vfio_pci.c | 3 +
drivers/vfio/platform/vfio_amba.c | 3 +
drivers/vfio/platform/vfio_platform.c | 3 +
drivers/vfio/vfio.h | 15 +++
drivers/vfio/vfio_main.c | 15 ++-
include/linux/vfio.h | 25 +++++
11 files changed, 175 insertions(+), 2 deletions(-)
create mode 100644 drivers/vfio/iommufd.c
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index b693a1169286f8..3863922529ef20 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -6,6 +6,7 @@ obj-$(CONFIG_VFIO) += vfio.o
vfio-y += vfio_main.o \
iova_bitmap.o \
container.o
+vfio-$(CONFIG_IOMMUFD) += iommufd.o
obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
index b16874e913e4f5..5cd4bb47644039 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
@@ -592,6 +592,9 @@ static const struct vfio_device_ops vfio_fsl_mc_ops = {
.read = vfio_fsl_mc_read,
.write = vfio_fsl_mc_write,
.mmap = vfio_fsl_mc_mmap,
+ .bind_iommufd = vfio_iommufd_physical_bind,
+ .unbind_iommufd = vfio_iommufd_physical_unbind,
+ .attach_ioas = vfio_iommufd_physical_attach_ioas,
};
static struct fsl_mc_driver vfio_fsl_mc_driver = {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
new file mode 100644
index 00000000000000..6e47a3df1a7102
--- /dev/null
+++ b/drivers/vfio/iommufd.c
@@ -0,0 +1,100 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES
+ */
+#include <linux/vfio.h>
+#include <linux/iommufd.h>
+
+#include "vfio.h"
+
+MODULE_IMPORT_NS(IOMMUFD);
+MODULE_IMPORT_NS(IOMMUFD_VFIO);
+
+int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
+{
+ u32 ioas_id;
+ u32 device_id;
+ int ret;
+
+ lockdep_assert_held(&vdev->dev_set->lock);
+
+ /*
+ * If the driver doesn't provide this op then it means the device does
+ * not do DMA at all. So nothing to do.
+ */
+ if (!vdev->ops->bind_iommufd)
+ return 0;
+
+ ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
+ if (ret)
+ return ret;
+
+ ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id);
+ if (ret)
+ goto err_unbind;
+ ret = vdev->ops->attach_ioas(vdev, &ioas_id);
+ if (ret)
+ goto err_unbind;
+
+ /*
+ * The legacy path has no way to return the device id or the selected
+ * pt_id
+ */
+ return 0;
+
+err_unbind:
+ if (vdev->ops->unbind_iommufd)
+ vdev->ops->unbind_iommufd(vdev);
+ return ret;
+}
+
+void vfio_iommufd_unbind(struct vfio_device *vdev)
+{
+ lockdep_assert_held(&vdev->dev_set->lock);
+
+ if (vdev->ops->unbind_iommufd)
+ vdev->ops->unbind_iommufd(vdev);
+}
+
+/*
+ * The physical standard ops mean that the iommufd_device is bound to the
+ * physical device vdev->dev that was provided to vfio_init_group_dev(). Drivers
+ * using this ops set should call vfio_register_group_dev()
+ */
+int vfio_iommufd_physical_bind(struct vfio_device *vdev,
+ struct iommufd_ctx *ictx, u32 *out_device_id)
+{
+ struct iommufd_device *idev;
+
+ idev = iommufd_device_bind(ictx, vdev->dev, out_device_id);
+ if (IS_ERR(idev))
+ return PTR_ERR(idev);
+ vdev->iommufd_device = idev;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_bind);
+
+void vfio_iommufd_physical_unbind(struct vfio_device *vdev)
+{
+ lockdep_assert_held(&vdev->dev_set->lock);
+
+ if (vdev->iommufd_attached) {
+ iommufd_device_detach(vdev->iommufd_device);
+ vdev->iommufd_attached = false;
+ }
+ iommufd_device_unbind(vdev->iommufd_device);
+ vdev->iommufd_device = NULL;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_unbind);
+
+int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
+{
+ int rc;
+
+ rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
+ if (rc)
+ return rc;
+ vdev->iommufd_attached = true;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 39eeca18a0f7c8..40019b11c5a969 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -1246,6 +1246,9 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_migrn_ops = {
.mmap = hisi_acc_vfio_pci_mmap,
.request = vfio_pci_core_request,
.match = vfio_pci_core_match,
+ .bind_iommufd = vfio_iommufd_physical_bind,
+ .unbind_iommufd = vfio_iommufd_physical_unbind,
+ .attach_ioas = vfio_iommufd_physical_attach_ioas,
};
static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
@@ -1261,6 +1264,9 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
.mmap = vfio_pci_core_mmap,
.request = vfio_pci_core_request,
.match = vfio_pci_core_match,
+ .bind_iommufd = vfio_iommufd_physical_bind,
+ .unbind_iommufd = vfio_iommufd_physical_unbind,
+ .attach_ioas = vfio_iommufd_physical_attach_ioas,
};
static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index fd6ccb8454a24a..32d1f38d351e7e 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -623,6 +623,9 @@ static const struct vfio_device_ops mlx5vf_pci_ops = {
.mmap = vfio_pci_core_mmap,
.request = vfio_pci_core_request,
.match = vfio_pci_core_match,
+ .bind_iommufd = vfio_iommufd_physical_bind,
+ .unbind_iommufd = vfio_iommufd_physical_unbind,
+ .attach_ioas = vfio_iommufd_physical_attach_ioas,
};
static int mlx5vf_pci_probe(struct pci_dev *pdev,
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 1d4919edfbde48..29091ee2e9849b 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -138,6 +138,9 @@ static const struct vfio_device_ops vfio_pci_ops = {
.mmap = vfio_pci_core_mmap,
.request = vfio_pci_core_request,
.match = vfio_pci_core_match,
+ .bind_iommufd = vfio_iommufd_physical_bind,
+ .unbind_iommufd = vfio_iommufd_physical_unbind,
+ .attach_ioas = vfio_iommufd_physical_attach_ioas,
};
static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/vfio/platform/vfio_amba.c b/drivers/vfio/platform/vfio_amba.c
index eaea63e5294c58..5a046098d0bdf4 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -117,6 +117,9 @@ static const struct vfio_device_ops vfio_amba_ops = {
.read = vfio_platform_read,
.write = vfio_platform_write,
.mmap = vfio_platform_mmap,
+ .bind_iommufd = vfio_iommufd_physical_bind,
+ .unbind_iommufd = vfio_iommufd_physical_unbind,
+ .attach_ioas = vfio_iommufd_physical_attach_ioas,
};
static const struct amba_id pl330_ids[] = {
diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index 82cedcebfd9022..b87c3b70878341 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -106,6 +106,9 @@ static const struct vfio_device_ops vfio_platform_ops = {
.read = vfio_platform_read,
.write = vfio_platform_write,
.mmap = vfio_platform_mmap,
+ .bind_iommufd = vfio_iommufd_physical_bind,
+ .unbind_iommufd = vfio_iommufd_physical_unbind,
+ .attach_ioas = vfio_iommufd_physical_attach_ioas,
};
static struct platform_driver vfio_platform_driver = {
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index a9dd0615266cb9..9766f70a12c519 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -124,6 +124,21 @@ void vfio_device_container_unregister(struct vfio_device *device);
int __init vfio_container_init(void);
void vfio_container_cleanup(void);
+#if IS_ENABLED(CONFIG_IOMMUFD)
+int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
+void vfio_iommufd_unbind(struct vfio_device *device);
+#else
+static inline int vfio_iommufd_bind(struct vfio_device *device,
+ struct iommufd_ctx *ictx)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void vfio_iommufd_unbind(struct vfio_device *device)
+{
+}
+#endif
+
#ifdef CONFIG_VFIO_NOIOMMU
extern bool vfio_noiommu __read_mostly;
#else
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index f11157d056e6cc..a74c34232c034b 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -525,6 +525,11 @@ static int __vfio_register_dev(struct vfio_device *device,
if (IS_ERR(group))
return PTR_ERR(group);
+ if (WARN_ON(device->ops->bind_iommufd &&
+ (!device->ops->unbind_iommufd ||
+ !device->ops->attach_ioas)))
+ return -EINVAL;
+
/*
* If the driver doesn't specify a set then the device is added to a
* singleton set just for itself.
@@ -794,6 +799,10 @@ static int vfio_device_first_open(struct vfio_device *device)
ret = vfio_group_use_container(device->group);
if (ret)
goto err_module_put;
+ } else if (device->group->iommufd) {
+ ret = vfio_iommufd_bind(device, device->group->iommufd);
+ if (ret)
+ goto err_module_put;
}
device->kvm = device->group->kvm;
@@ -811,6 +820,8 @@ static int vfio_device_first_open(struct vfio_device *device)
device->kvm = NULL;
if (device->group->container)
vfio_group_unuse_container(device->group);
+ else if (device->group->iommufd)
+ vfio_iommufd_unbind(device);
err_module_put:
mutex_unlock(&device->group->group_lock);
module_put(device->dev->driver->owner);
@@ -829,6 +840,8 @@ static void vfio_device_last_close(struct vfio_device *device)
device->kvm = NULL;
if (device->group->container)
vfio_group_unuse_container(device->group);
+ else if (device->group->iommufd)
+ vfio_iommufd_unbind(device);
mutex_unlock(&device->group->group_lock);
module_put(device->dev->driver->owner);
}
@@ -1936,8 +1949,6 @@ static void __exit vfio_cleanup(void)
module_init(vfio_init);
module_exit(vfio_cleanup);
-MODULE_IMPORT_NS(IOMMUFD);
-MODULE_IMPORT_NS(IOMMUFD_VFIO);
MODULE_VERSION(DRIVER_VERSION);
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR(DRIVER_AUTHOR);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e7cebeb875dd1a..a7fc4d747dc226 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -17,6 +17,8 @@
#include <linux/iova_bitmap.h>
struct kvm;
+struct iommufd_ctx;
+struct iommufd_device;
/*
* VFIO devices can be placed in a set, this allows all devices to share this
@@ -54,6 +56,10 @@ struct vfio_device {
struct completion comp;
struct list_head group_next;
struct list_head iommu_entry;
+#if IS_ENABLED(CONFIG_IOMMUFD)
+ struct iommufd_device *iommufd_device;
+ bool iommufd_attached;
+#endif
};
/**
@@ -80,6 +86,10 @@ struct vfio_device_ops {
char *name;
int (*init)(struct vfio_device *vdev);
void (*release)(struct vfio_device *vdev);
+ int (*bind_iommufd)(struct vfio_device *vdev,
+ struct iommufd_ctx *ictx, u32 *out_device_id);
+ void (*unbind_iommufd)(struct vfio_device *vdev);
+ int (*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
int (*open_device)(struct vfio_device *vdev);
void (*close_device)(struct vfio_device *vdev);
ssize_t (*read)(struct vfio_device *vdev, char __user *buf,
@@ -96,6 +106,21 @@ struct vfio_device_ops {
void __user *arg, size_t argsz);
};
+#if IS_ENABLED(CONFIG_IOMMUFD)
+int vfio_iommufd_physical_bind(struct vfio_device *vdev,
+ struct iommufd_ctx *ictx, u32 *out_device_id);
+void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
+int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
+#else
+#define vfio_iommufd_physical_bind \
+ ((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx, \
+ u32 *out_device_id)) NULL)
+#define vfio_iommufd_physical_unbind \
+ ((void (*)(struct vfio_device *vdev)) NULL)
+#define vfio_iommufd_physical_attach_ioas \
+ ((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
+#endif
+
/**
* @migration_set_state: Optional callback to change the migration state for
* devices that support migration. It's mandatory for
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v4 07/10] vfio-iommufd: Support iommufd for emulated VFIO devices
2022-11-29 20:31 [PATCH v4 00/10] Connect VFIO to IOMMUFD Jason Gunthorpe
` (5 preceding siblings ...)
2022-11-29 20:31 ` [PATCH v4 06/10] vfio-iommufd: Support iommufd for physical VFIO devices Jason Gunthorpe
@ 2022-11-29 20:31 ` Jason Gunthorpe
2022-11-29 20:31 ` [PATCH v4 08/10] vfio: Move container related MODULE_ALIAS statements into container.c Jason Gunthorpe
` (3 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2022-11-29 20:31 UTC (permalink / raw)
Cc: Alex Williamson, Lu Baolu, Cornelia Huck, Eric Auger, Kevin Tian,
kvm, Lixiao Yang, Matthew Rosato, Nicolin Chen, Yi Liu, Yu He
Emulated VFIO devices are calling vfio_register_emulated_iommu_dev() and
consist of all the mdev drivers.
Like the physical drivers, support for iommufd is provided by the driver
supplying the correct standard ops. Provide ops from the core that
duplicate what vfio_register_emulated_iommu_dev() does.
Emulated drivers are where it is more likely to see variation in the
iommfd support ops. For instance IDXD will probably need to setup both a
iommfd_device context linked to a PASID and an iommufd_access context to
support all their mdev operations.
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yu He <yu.he@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/gpu/drm/i915/gvt/kvmgt.c | 3 +
drivers/s390/cio/vfio_ccw_ops.c | 3 +
drivers/s390/crypto/vfio_ap_ops.c | 3 +
drivers/vfio/container.c | 110 +++++----------------------
drivers/vfio/iommufd.c | 58 ++++++++++++++
drivers/vfio/vfio.h | 10 ++-
drivers/vfio/vfio_main.c | 122 +++++++++++++++++++++++++++++-
include/linux/vfio.h | 14 ++++
8 files changed, 229 insertions(+), 94 deletions(-)
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index f563e5dbe66fad..f99cbc0b1eed5b 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1479,6 +1479,9 @@ static const struct vfio_device_ops intel_vgpu_dev_ops = {
.mmap = intel_vgpu_mmap,
.ioctl = intel_vgpu_ioctl,
.dma_unmap = intel_vgpu_dma_unmap,
+ .bind_iommufd = vfio_iommufd_emulated_bind,
+ .unbind_iommufd = vfio_iommufd_emulated_unbind,
+ .attach_ioas = vfio_iommufd_emulated_attach_ioas,
};
static int intel_vgpu_probe(struct mdev_device *mdev)
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 6ae4d012d80084..560453d99c24fc 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -588,6 +588,9 @@ static const struct vfio_device_ops vfio_ccw_dev_ops = {
.ioctl = vfio_ccw_mdev_ioctl,
.request = vfio_ccw_mdev_request,
.dma_unmap = vfio_ccw_dma_unmap,
+ .bind_iommufd = vfio_iommufd_emulated_bind,
+ .unbind_iommufd = vfio_iommufd_emulated_unbind,
+ .attach_ioas = vfio_iommufd_emulated_attach_ioas,
};
struct mdev_driver vfio_ccw_mdev_driver = {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 8bf353d46820e3..68eeb25fb6611c 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1805,6 +1805,9 @@ static const struct vfio_device_ops vfio_ap_matrix_dev_ops = {
.close_device = vfio_ap_mdev_close_device,
.ioctl = vfio_ap_mdev_ioctl,
.dma_unmap = vfio_ap_mdev_dma_unmap,
+ .bind_iommufd = vfio_iommufd_emulated_bind,
+ .unbind_iommufd = vfio_iommufd_emulated_unbind,
+ .attach_ioas = vfio_iommufd_emulated_attach_ioas,
};
static struct mdev_driver vfio_ap_matrix_driver = {
diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
index 8772dad6808539..7f3961fd4b5aac 100644
--- a/drivers/vfio/container.c
+++ b/drivers/vfio/container.c
@@ -540,113 +540,41 @@ void vfio_group_unuse_container(struct vfio_group *group)
fput(group->opened_file);
}
-/*
- * Pin contiguous user pages and return their associated host pages for local
- * domain only.
- * @device [in] : device
- * @iova [in] : starting IOVA of user pages to be pinned.
- * @npage [in] : count of pages to be pinned. This count should not
- * be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
- * @prot [in] : protection flags
- * @pages[out] : array of host pages
- * Return error or number of pages pinned.
- *
- * A driver may only call this function if the vfio_device was created
- * by vfio_register_emulated_iommu_dev().
- */
-int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
- int npage, int prot, struct page **pages)
+int vfio_container_pin_pages(struct vfio_container *container,
+ struct iommu_group *iommu_group, dma_addr_t iova,
+ int npage, int prot, struct page **pages)
{
- struct vfio_container *container;
- struct vfio_group *group = device->group;
- struct vfio_iommu_driver *driver;
- int ret;
-
- if (!pages || !npage || !vfio_assert_device_open(device))
- return -EINVAL;
+ struct vfio_iommu_driver *driver = container->iommu_driver;
if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
return -E2BIG;
- /* group->container cannot change while a vfio device is open */
- container = group->container;
- driver = container->iommu_driver;
- if (likely(driver && driver->ops->pin_pages))
- ret = driver->ops->pin_pages(container->iommu_data,
- group->iommu_group, iova,
- npage, prot, pages);
- else
- ret = -ENOTTY;
-
- return ret;
+ if (unlikely(!driver || !driver->ops->pin_pages))
+ return -ENOTTY;
+ return driver->ops->pin_pages(container->iommu_data, iommu_group, iova,
+ npage, prot, pages);
}
-EXPORT_SYMBOL(vfio_pin_pages);
-/*
- * Unpin contiguous host pages for local domain only.
- * @device [in] : device
- * @iova [in] : starting address of user pages to be unpinned.
- * @npage [in] : count of pages to be unpinned. This count should not
- * be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
- */
-void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage)
+void vfio_container_unpin_pages(struct vfio_container *container,
+ dma_addr_t iova, int npage)
{
- struct vfio_container *container;
- struct vfio_iommu_driver *driver;
-
if (WARN_ON(npage <= 0 || npage > VFIO_PIN_PAGES_MAX_ENTRIES))
return;
- if (WARN_ON(!vfio_assert_device_open(device)))
- return;
-
- /* group->container cannot change while a vfio device is open */
- container = device->group->container;
- driver = container->iommu_driver;
-
- driver->ops->unpin_pages(container->iommu_data, iova, npage);
+ container->iommu_driver->ops->unpin_pages(container->iommu_data, iova,
+ npage);
}
-EXPORT_SYMBOL(vfio_unpin_pages);
-/*
- * This interface allows the CPUs to perform some sort of virtual DMA on
- * behalf of the device.
- *
- * CPUs read/write from/into a range of IOVAs pointing to user space memory
- * into/from a kernel buffer.
- *
- * As the read/write of user space memory is conducted via the CPUs and is
- * not a real device DMA, it is not necessary to pin the user space memory.
- *
- * @device [in] : VFIO device
- * @iova [in] : base IOVA of a user space buffer
- * @data [in] : pointer to kernel buffer
- * @len [in] : kernel buffer length
- * @write : indicate read or write
- * Return error code on failure or 0 on success.
- */
-int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data,
- size_t len, bool write)
+int vfio_container_dma_rw(struct vfio_container *container, dma_addr_t iova,
+ void *data, size_t len, bool write)
{
- struct vfio_container *container;
- struct vfio_iommu_driver *driver;
- int ret = 0;
-
- if (!data || len <= 0 || !vfio_assert_device_open(device))
- return -EINVAL;
-
- /* group->container cannot change while a vfio device is open */
- container = device->group->container;
- driver = container->iommu_driver;
+ struct vfio_iommu_driver *driver = container->iommu_driver;
- if (likely(driver && driver->ops->dma_rw))
- ret = driver->ops->dma_rw(container->iommu_data,
- iova, data, len, write);
- else
- ret = -ENOTTY;
- return ret;
+ if (unlikely(!driver || !driver->ops->dma_rw))
+ return -ENOTTY;
+ return driver->ops->dma_rw(container->iommu_data, iova, data, len,
+ write);
}
-EXPORT_SYMBOL(vfio_dma_rw);
int __init vfio_container_init(void)
{
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 6e47a3df1a7102..4f82a6fa7c6c7f 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -98,3 +98,61 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
return 0;
}
EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);
+
+/*
+ * The emulated standard ops mean that vfio_device is going to use the
+ * "mdev path" and will call vfio_pin_pages()/vfio_dma_rw(). Drivers using this
+ * ops set should call vfio_register_emulated_iommu_dev().
+ */
+
+static void vfio_emulated_unmap(void *data, unsigned long iova,
+ unsigned long length)
+{
+ struct vfio_device *vdev = data;
+
+ vdev->ops->dma_unmap(vdev, iova, length);
+}
+
+static const struct iommufd_access_ops vfio_user_ops = {
+ .needs_pin_pages = 1,
+ .unmap = vfio_emulated_unmap,
+};
+
+int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
+ struct iommufd_ctx *ictx, u32 *out_device_id)
+{
+ lockdep_assert_held(&vdev->dev_set->lock);
+
+ vdev->iommufd_ictx = ictx;
+ iommufd_ctx_get(ictx);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_bind);
+
+void vfio_iommufd_emulated_unbind(struct vfio_device *vdev)
+{
+ lockdep_assert_held(&vdev->dev_set->lock);
+
+ if (vdev->iommufd_access) {
+ iommufd_access_destroy(vdev->iommufd_access);
+ vdev->iommufd_access = NULL;
+ }
+ iommufd_ctx_put(vdev->iommufd_ictx);
+ vdev->iommufd_ictx = NULL;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_unbind);
+
+int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
+{
+ struct iommufd_access *user;
+
+ lockdep_assert_held(&vdev->dev_set->lock);
+
+ user = iommufd_access_create(vdev->iommufd_ictx, *pt_id, &vfio_user_ops,
+ vdev);
+ if (IS_ERR(user))
+ return PTR_ERR(user);
+ vdev->iommufd_access = user;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_attach_ioas);
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 9766f70a12c519..b1ef842496372a 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -111,8 +111,6 @@ struct vfio_iommu_driver {
int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops);
-bool vfio_assert_device_open(struct vfio_device *device);
-
struct vfio_container *vfio_container_from_file(struct file *filep);
int vfio_group_use_container(struct vfio_group *group);
void vfio_group_unuse_container(struct vfio_group *group);
@@ -121,6 +119,14 @@ int vfio_container_attach_group(struct vfio_container *container,
void vfio_group_detach_container(struct vfio_group *group);
void vfio_device_container_register(struct vfio_device *device);
void vfio_device_container_unregister(struct vfio_device *device);
+int vfio_container_pin_pages(struct vfio_container *container,
+ struct iommu_group *iommu_group, dma_addr_t iova,
+ int npage, int prot, struct page **pages);
+void vfio_container_unpin_pages(struct vfio_container *container,
+ dma_addr_t iova, int npage);
+int vfio_container_dma_rw(struct vfio_container *container, dma_addr_t iova,
+ void *data, size_t len, bool write);
+
int __init vfio_container_init(void);
void vfio_container_cleanup(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index a74c34232c034b..fd5e969ab653a5 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -770,7 +770,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
static const struct file_operations vfio_device_fops;
/* true if the vfio_device has open_device() called but not close_device() */
-bool vfio_assert_device_open(struct vfio_device *device)
+static bool vfio_assert_device_open(struct vfio_device *device)
{
return !WARN_ON_ONCE(!READ_ONCE(device->open_count));
}
@@ -1876,6 +1876,126 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs,
}
EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
+/*
+ * Pin contiguous user pages and return their associated host pages for local
+ * domain only.
+ * @device [in] : device
+ * @iova [in] : starting IOVA of user pages to be pinned.
+ * @npage [in] : count of pages to be pinned. This count should not
+ * be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
+ * @prot [in] : protection flags
+ * @pages[out] : array of host pages
+ * Return error or number of pages pinned.
+ *
+ * A driver may only call this function if the vfio_device was created
+ * by vfio_register_emulated_iommu_dev() due to vfio_container_pin_pages().
+ */
+int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
+ int npage, int prot, struct page **pages)
+{
+ /* group->container cannot change while a vfio device is open */
+ if (!pages || !npage || WARN_ON(!vfio_assert_device_open(device)))
+ return -EINVAL;
+ if (device->group->container)
+ return vfio_container_pin_pages(device->group->container,
+ device->group->iommu_group,
+ iova, npage, prot, pages);
+ if (device->iommufd_access) {
+ int ret;
+
+ if (iova > ULONG_MAX)
+ return -EINVAL;
+ /*
+ * VFIO ignores the sub page offset, npages is from the start of
+ * a PAGE_SIZE chunk of IOVA. The caller is expected to recover
+ * the sub page offset by doing:
+ * pages[0] + (iova % PAGE_SIZE)
+ */
+ ret = iommufd_access_pin_pages(
+ device->iommufd_access, ALIGN_DOWN(iova, PAGE_SIZE),
+ npage * PAGE_SIZE, pages,
+ (prot & IOMMU_WRITE) ? IOMMUFD_ACCESS_RW_WRITE : 0);
+ if (ret)
+ return ret;
+ return npage;
+ }
+ return -EINVAL;
+}
+EXPORT_SYMBOL(vfio_pin_pages);
+
+/*
+ * Unpin contiguous host pages for local domain only.
+ * @device [in] : device
+ * @iova [in] : starting address of user pages to be unpinned.
+ * @npage [in] : count of pages to be unpinned. This count should not
+ * be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
+ */
+void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage)
+{
+ if (WARN_ON(!vfio_assert_device_open(device)))
+ return;
+
+ if (device->group->container) {
+ vfio_container_unpin_pages(device->group->container, iova,
+ npage);
+ return;
+ }
+ if (device->iommufd_access) {
+ if (WARN_ON(iova > ULONG_MAX))
+ return;
+ iommufd_access_unpin_pages(device->iommufd_access,
+ ALIGN_DOWN(iova, PAGE_SIZE),
+ npage * PAGE_SIZE);
+ return;
+ }
+}
+EXPORT_SYMBOL(vfio_unpin_pages);
+
+/*
+ * This interface allows the CPUs to perform some sort of virtual DMA on
+ * behalf of the device.
+ *
+ * CPUs read/write from/into a range of IOVAs pointing to user space memory
+ * into/from a kernel buffer.
+ *
+ * As the read/write of user space memory is conducted via the CPUs and is
+ * not a real device DMA, it is not necessary to pin the user space memory.
+ *
+ * @device [in] : VFIO device
+ * @iova [in] : base IOVA of a user space buffer
+ * @data [in] : pointer to kernel buffer
+ * @len [in] : kernel buffer length
+ * @write : indicate read or write
+ * Return error code on failure or 0 on success.
+ */
+int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data,
+ size_t len, bool write)
+{
+ if (!data || len <= 0 || !vfio_assert_device_open(device))
+ return -EINVAL;
+
+ if (device->group->container)
+ return vfio_container_dma_rw(device->group->container, iova,
+ data, len, write);
+
+ if (device->iommufd_access) {
+ unsigned int flags = 0;
+
+ if (iova > ULONG_MAX)
+ return -EINVAL;
+
+ /* VFIO historically tries to auto-detect a kthread */
+ if (!current->mm)
+ flags |= IOMMUFD_ACCESS_RW_KTHREAD;
+ if (write)
+ flags |= IOMMUFD_ACCESS_RW_WRITE;
+ return iommufd_access_rw(device->iommufd_access, iova, data,
+ len, flags);
+ }
+ return -EINVAL;
+}
+EXPORT_SYMBOL(vfio_dma_rw);
+
/*
* Module/class support
*/
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index a7fc4d747dc226..d5f84f98c0fa8f 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -19,6 +19,7 @@
struct kvm;
struct iommufd_ctx;
struct iommufd_device;
+struct iommufd_access;
/*
* VFIO devices can be placed in a set, this allows all devices to share this
@@ -56,8 +57,10 @@ struct vfio_device {
struct completion comp;
struct list_head group_next;
struct list_head iommu_entry;
+ struct iommufd_access *iommufd_access;
#if IS_ENABLED(CONFIG_IOMMUFD)
struct iommufd_device *iommufd_device;
+ struct iommufd_ctx *iommufd_ictx;
bool iommufd_attached;
#endif
};
@@ -111,6 +114,10 @@ int vfio_iommufd_physical_bind(struct vfio_device *vdev,
struct iommufd_ctx *ictx, u32 *out_device_id);
void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
+int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
+ struct iommufd_ctx *ictx, u32 *out_device_id);
+void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
+int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
#else
#define vfio_iommufd_physical_bind \
((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx, \
@@ -119,6 +126,13 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
((void (*)(struct vfio_device *vdev)) NULL)
#define vfio_iommufd_physical_attach_ioas \
((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
+#define vfio_iommufd_emulated_bind \
+ ((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx, \
+ u32 *out_device_id)) NULL)
+#define vfio_iommufd_emulated_unbind \
+ ((void (*)(struct vfio_device *vdev)) NULL)
+#define vfio_iommufd_emulated_attach_ioas \
+ ((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
#endif
/**
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v4 08/10] vfio: Move container related MODULE_ALIAS statements into container.c
2022-11-29 20:31 [PATCH v4 00/10] Connect VFIO to IOMMUFD Jason Gunthorpe
` (6 preceding siblings ...)
2022-11-29 20:31 ` [PATCH v4 07/10] vfio-iommufd: Support iommufd for emulated " Jason Gunthorpe
@ 2022-11-29 20:31 ` Jason Gunthorpe
2022-11-29 20:31 ` [PATCH v4 09/10] vfio: Make vfio_container optionally compiled Jason Gunthorpe
` (2 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2022-11-29 20:31 UTC (permalink / raw)
Cc: Alex Williamson, Lu Baolu, Cornelia Huck, Eric Auger, Kevin Tian,
kvm, Lixiao Yang, Matthew Rosato, Nicolin Chen, Yi Liu, Yu He
The miscdev is in container.c, so should these related MODULE_ALIAS
statements. This is necessary for the next patch to be able to fully
disable /dev/vfio/vfio.
Fixes: cdc71fe4ecbf ("vfio: Move container code into drivers/vfio/container.c")
Reported-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yu He <yu.he@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/container.c | 3 +++
drivers/vfio/vfio_main.c | 2 --
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
index 7f3961fd4b5aac..6b362d97d68220 100644
--- a/drivers/vfio/container.c
+++ b/drivers/vfio/container.c
@@ -608,3 +608,6 @@ void vfio_container_cleanup(void)
misc_deregister(&vfio_dev);
mutex_destroy(&vfio.iommu_drivers_lock);
}
+
+MODULE_ALIAS_MISCDEV(VFIO_MINOR);
+MODULE_ALIAS("devname:vfio/vfio");
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index fd5e969ab653a5..ce6e6a560c702a 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -2073,6 +2073,4 @@ MODULE_VERSION(DRIVER_VERSION);
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR(DRIVER_AUTHOR);
MODULE_DESCRIPTION(DRIVER_DESC);
-MODULE_ALIAS_MISCDEV(VFIO_MINOR);
-MODULE_ALIAS("devname:vfio/vfio");
MODULE_SOFTDEP("post: vfio_iommu_type1 vfio_iommu_spapr_tce");
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v4 09/10] vfio: Make vfio_container optionally compiled
2022-11-29 20:31 [PATCH v4 00/10] Connect VFIO to IOMMUFD Jason Gunthorpe
` (7 preceding siblings ...)
2022-11-29 20:31 ` [PATCH v4 08/10] vfio: Move container related MODULE_ALIAS statements into container.c Jason Gunthorpe
@ 2022-11-29 20:31 ` Jason Gunthorpe
2022-11-29 20:31 ` [PATCH v4 10/10] iommufd: Allow iommufd to supply /dev/vfio/vfio Jason Gunthorpe
2022-11-30 20:34 ` [PATCH v4 00/10] Connect VFIO to IOMMUFD Alex Williamson
10 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2022-11-29 20:31 UTC (permalink / raw)
Cc: Alex Williamson, Lu Baolu, Cornelia Huck, Eric Auger, Kevin Tian,
kvm, Lixiao Yang, Matthew Rosato, Nicolin Chen, Yi Liu, Yu He
Add a kconfig CONFIG_VFIO_CONTAINER that controls compiling the container
code. If 'n' then only iommufd will provide the container service. All the
support for vfio iommu drivers, including type1, will not be built.
This allows a compilation check that no inappropriate dependencies between
the device/group and container have been created.
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yu He <yu.he@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/vfio/Kconfig | 35 +++++++++++++++--------
drivers/vfio/Makefile | 4 +--
drivers/vfio/vfio.h | 65 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 91 insertions(+), 13 deletions(-)
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 1118d322eec97d..286c1663bd7564 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -3,8 +3,8 @@ menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
select IOMMU_API
depends on IOMMUFD || !IOMMUFD
- select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
select INTERVAL_TREE
+ select VFIO_CONTAINER if IOMMUFD=n
help
VFIO provides a framework for secure userspace device drivers.
See Documentation/driver-api/vfio.rst for more details.
@@ -12,6 +12,18 @@ menuconfig VFIO
If you don't know what to do here, say N.
if VFIO
+config VFIO_CONTAINER
+ bool "Support for the VFIO container /dev/vfio/vfio"
+ select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
+ default y
+ help
+ The VFIO container is the classic interface to VFIO for establishing
+ IOMMU mappings. If N is selected here then IOMMUFD must be used to
+ manage the mappings.
+
+ Unless testing IOMMUFD say Y here.
+
+if VFIO_CONTAINER
config VFIO_IOMMU_TYPE1
tristate
default n
@@ -21,16 +33,6 @@ config VFIO_IOMMU_SPAPR_TCE
depends on SPAPR_TCE_IOMMU
default VFIO
-config VFIO_SPAPR_EEH
- tristate
- depends on EEH && VFIO_IOMMU_SPAPR_TCE
- default VFIO
-
-config VFIO_VIRQFD
- tristate
- select EVENTFD
- default n
-
config VFIO_NOIOMMU
bool "VFIO No-IOMMU support"
help
@@ -44,6 +46,17 @@ config VFIO_NOIOMMU
this mode since there is no IOMMU to provide DMA translation.
If you don't know what to do here, say N.
+endif
+
+config VFIO_SPAPR_EEH
+ tristate
+ depends on EEH && VFIO_IOMMU_SPAPR_TCE
+ default VFIO
+
+config VFIO_VIRQFD
+ tristate
+ select EVENTFD
+ default n
source "drivers/vfio/pci/Kconfig"
source "drivers/vfio/platform/Kconfig"
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 3863922529ef20..b953517dc70f99 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -4,9 +4,9 @@ vfio_virqfd-y := virqfd.o
obj-$(CONFIG_VFIO) += vfio.o
vfio-y += vfio_main.o \
- iova_bitmap.o \
- container.o
+ iova_bitmap.o
vfio-$(CONFIG_IOMMUFD) += iommufd.o
+vfio-$(CONFIG_VFIO_CONTAINER) += container.o
obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index b1ef842496372a..ce5fe3fc493b4e 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -55,7 +55,9 @@ struct vfio_group {
struct list_head device_list;
struct mutex device_lock;
struct list_head vfio_next;
+#if IS_ENABLED(CONFIG_VFIO_CONTAINER)
struct list_head container_next;
+#endif
enum vfio_group_type type;
struct mutex group_lock;
struct kvm *kvm;
@@ -64,6 +66,7 @@ struct vfio_group {
struct iommufd_ctx *iommufd;
};
+#if IS_ENABLED(CONFIG_VFIO_CONTAINER)
/* events for the backend driver notify callback */
enum vfio_iommu_notify_type {
VFIO_IOMMU_CONTAINER_CLOSE = 0,
@@ -129,6 +132,68 @@ int vfio_container_dma_rw(struct vfio_container *container, dma_addr_t iova,
int __init vfio_container_init(void);
void vfio_container_cleanup(void);
+#else
+static inline struct vfio_container *
+vfio_container_from_file(struct file *filep)
+{
+ return NULL;
+}
+
+static inline int vfio_group_use_container(struct vfio_group *group)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void vfio_group_unuse_container(struct vfio_group *group)
+{
+}
+
+static inline int vfio_container_attach_group(struct vfio_container *container,
+ struct vfio_group *group)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void vfio_group_detach_container(struct vfio_group *group)
+{
+}
+
+static inline void vfio_device_container_register(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_container_unregister(struct vfio_device *device)
+{
+}
+
+static inline int vfio_container_pin_pages(struct vfio_container *container,
+ struct iommu_group *iommu_group,
+ dma_addr_t iova, int npage, int prot,
+ struct page **pages)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void vfio_container_unpin_pages(struct vfio_container *container,
+ dma_addr_t iova, int npage)
+{
+}
+
+static inline int vfio_container_dma_rw(struct vfio_container *container,
+ dma_addr_t iova, void *data, size_t len,
+ bool write)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int vfio_container_init(void)
+{
+ return 0;
+}
+static inline void vfio_container_cleanup(void)
+{
+}
+#endif
#if IS_ENABLED(CONFIG_IOMMUFD)
int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v4 10/10] iommufd: Allow iommufd to supply /dev/vfio/vfio
2022-11-29 20:31 [PATCH v4 00/10] Connect VFIO to IOMMUFD Jason Gunthorpe
` (8 preceding siblings ...)
2022-11-29 20:31 ` [PATCH v4 09/10] vfio: Make vfio_container optionally compiled Jason Gunthorpe
@ 2022-11-29 20:31 ` Jason Gunthorpe
2022-11-30 20:34 ` [PATCH v4 00/10] Connect VFIO to IOMMUFD Alex Williamson
10 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2022-11-29 20:31 UTC (permalink / raw)
Cc: Alex Williamson, Lu Baolu, Cornelia Huck, Eric Auger, Kevin Tian,
kvm, Lixiao Yang, Matthew Rosato, Nicolin Chen, Yi Liu, Yu He
If the VFIO container is compiled out, give a kconfig option for iommufd
to provide the miscdev node with the same name and permissions as vfio
uses.
The compatibility node supports the same ioctls as VFIO and automatically
enables the VFIO compatible pinned page accounting mode.
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yu He <yu.he@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/iommufd/Kconfig | 20 +++++++++++++++++++
drivers/iommu/iommufd/main.c | 36 +++++++++++++++++++++++++++++++++++
2 files changed, 56 insertions(+)
diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
index 871244f2443fbb..8306616b6d8198 100644
--- a/drivers/iommu/iommufd/Kconfig
+++ b/drivers/iommu/iommufd/Kconfig
@@ -12,6 +12,26 @@ config IOMMUFD
If you don't know what to do here, say N.
if IOMMUFD
+config IOMMUFD_VFIO_CONTAINER
+ bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
+ depends on VFIO && !VFIO_CONTAINER
+ default VFIO && !VFIO_CONTAINER
+ help
+ IOMMUFD will provide /dev/vfio/vfio instead of VFIO. This relies on
+ IOMMUFD providing compatibility emulation to give the same ioctls.
+ It provides an option to build a kernel with legacy VFIO components
+ removed.
+
+ IOMMUFD VFIO container emulation is known to lack certain features
+ of the native VFIO container, such as no-IOMMU support, peer-to-peer
+ DMA mapping, PPC IOMMU support, as well as other potentially
+ undiscovered gaps. This option is currently intended for the
+ purpose of testing IOMMUFD with unmodified userspace supporting VFIO
+ and making use of the Type1 VFIO IOMMU backend. General purpose
+ enabling of this option is currently discouraged.
+
+ Unless testing IOMMUFD, say N here.
+
config IOMMUFD_TEST
bool "IOMMU Userspace API Test support"
depends on DEBUG_KERNEL
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index bcb463e581009c..083e6fcbe10ad9 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -18,6 +18,7 @@
#include <uapi/linux/iommufd.h>
#include <linux/iommufd.h>
+#include "io_pagetable.h"
#include "iommufd_private.h"
#include "iommufd_test.h"
@@ -25,6 +26,7 @@ struct iommufd_object_ops {
void (*destroy)(struct iommufd_object *obj);
};
static const struct iommufd_object_ops iommufd_object_ops[];
+static struct miscdevice vfio_misc_dev;
struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx,
size_t size,
@@ -170,6 +172,16 @@ static int iommufd_fops_open(struct inode *inode, struct file *filp)
if (!ictx)
return -ENOMEM;
+ /*
+ * For compatibility with VFIO when /dev/vfio/vfio is opened we default
+ * to the same rlimit accounting as vfio uses.
+ */
+ if (IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER) &&
+ filp->private_data == &vfio_misc_dev) {
+ ictx->account_mode = IOPT_PAGES_ACCOUNT_MM;
+ pr_info_once("IOMMUFD is providing /dev/vfio/vfio, not VFIO.\n");
+ }
+
xa_init_flags(&ictx->objects, XA_FLAGS_ALLOC1 | XA_FLAGS_ACCOUNT);
ictx->file = filp;
filp->private_data = ictx;
@@ -400,6 +412,15 @@ static struct miscdevice iommu_misc_dev = {
.mode = 0660,
};
+
+static struct miscdevice vfio_misc_dev = {
+ .minor = VFIO_MINOR,
+ .name = "vfio",
+ .fops = &iommufd_fops,
+ .nodename = "vfio/vfio",
+ .mode = 0666,
+};
+
static int __init iommufd_init(void)
{
int ret;
@@ -407,18 +428,33 @@ static int __init iommufd_init(void)
ret = misc_register(&iommu_misc_dev);
if (ret)
return ret;
+
+ if (IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER)) {
+ ret = misc_register(&vfio_misc_dev);
+ if (ret)
+ goto err_misc;
+ }
iommufd_test_init();
return 0;
+err_misc:
+ misc_deregister(&iommu_misc_dev);
+ return ret;
}
static void __exit iommufd_exit(void)
{
iommufd_test_exit();
+ if (IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER))
+ misc_deregister(&vfio_misc_dev);
misc_deregister(&iommu_misc_dev);
}
module_init(iommufd_init);
module_exit(iommufd_exit);
+#if IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER)
+MODULE_ALIAS_MISCDEV(VFIO_MINOR);
+MODULE_ALIAS("devname:vfio/vfio");
+#endif
MODULE_DESCRIPTION("I/O Address Space Management for passthrough devices");
MODULE_LICENSE("GPL");
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH v4 00/10] Connect VFIO to IOMMUFD
2022-11-29 20:31 [PATCH v4 00/10] Connect VFIO to IOMMUFD Jason Gunthorpe
` (9 preceding siblings ...)
2022-11-29 20:31 ` [PATCH v4 10/10] iommufd: Allow iommufd to supply /dev/vfio/vfio Jason Gunthorpe
@ 2022-11-30 20:34 ` Alex Williamson
2022-12-01 0:44 ` Jason Gunthorpe
10 siblings, 1 reply; 13+ messages in thread
From: Alex Williamson @ 2022-11-30 20:34 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Lu Baolu, Cornelia Huck, Eric Auger, Kevin Tian, kvm, Lixiao Yang,
Matthew Rosato, Nicolin Chen, Yi Liu, Yu He
On Tue, 29 Nov 2022 16:31:45 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:
> [ As with the iommufd series, this will be the last posting unless
> something major happens, futher fixes will be in new commits ]
>
> This series provides an alternative container layer for VFIO implemented
> using iommufd. This is optional, if CONFIG_IOMMUFD is not set then it will
> not be compiled in.
>
> At this point iommufd can be injected by passing in a iommfd FD to
> VFIO_GROUP_SET_CONTAINER which will use the VFIO compat layer in iommufd
> to obtain the compat IOAS and then connect up all the VFIO drivers as
> appropriate.
>
> This is temporary stopping point, a following series will provide a way to
> directly open a VFIO device FD and directly connect it to IOMMUFD using
> native ioctls that can expose the IOMMUFD features like hwpt, future
> vPASID and dynamic attachment.
>
> This series, in compat mode, has passed all the qemu tests we have
> available, including the test suites for the Intel GVT mdev. Aside from
> the temporary limitation with P2P memory this is belived to be fully
> compatible with VFIO.
>
> This is on github: https://github.com/jgunthorpe/linux/commits/vfio_iommufd
>
> It requires the iommufd series:
>
> https://lore.kernel.org/r/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
>
> v4:
> - Change the assertion in vfio_group_has_iommu to be clearer
> - Use vfio_group_has_iommu()
> - Remove allow_unsafe_interrupts stuff
> - Update IOMMUFD_VFIO_CONTAINER kconfig description
> - Use DEBUG_KERNEL insted of RUNTIME_TESTING_MENU for IOMMUFD_TEST kconfig
This looks ok to me and passes all my testing. What's your merge plan?
I'm guessing you'd like to send it along with the main IOMMUFD pull
request, there are currently no conflicts with my next branch,
therefore:
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Otherwise, let's figure out the branch merge plan. Thanks,
Alex
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v4 00/10] Connect VFIO to IOMMUFD
2022-11-30 20:34 ` [PATCH v4 00/10] Connect VFIO to IOMMUFD Alex Williamson
@ 2022-12-01 0:44 ` Jason Gunthorpe
0 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2022-12-01 0:44 UTC (permalink / raw)
To: Alex Williamson
Cc: Lu Baolu, Cornelia Huck, Eric Auger, Kevin Tian, kvm, Lixiao Yang,
Matthew Rosato, Nicolin Chen, Yi Liu, Yu He
On Wed, Nov 30, 2022 at 01:34:55PM -0700, Alex Williamson wrote:
> On Tue, 29 Nov 2022 16:31:45 -0400
> Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> > [ As with the iommufd series, this will be the last posting unless
> > something major happens, futher fixes will be in new commits ]
> >
> > This series provides an alternative container layer for VFIO implemented
> > using iommufd. This is optional, if CONFIG_IOMMUFD is not set then it will
> > not be compiled in.
> >
> > At this point iommufd can be injected by passing in a iommfd FD to
> > VFIO_GROUP_SET_CONTAINER which will use the VFIO compat layer in iommufd
> > to obtain the compat IOAS and then connect up all the VFIO drivers as
> > appropriate.
> >
> > This is temporary stopping point, a following series will provide a way to
> > directly open a VFIO device FD and directly connect it to IOMMUFD using
> > native ioctls that can expose the IOMMUFD features like hwpt, future
> > vPASID and dynamic attachment.
> >
> > This series, in compat mode, has passed all the qemu tests we have
> > available, including the test suites for the Intel GVT mdev. Aside from
> > the temporary limitation with P2P memory this is belived to be fully
> > compatible with VFIO.
> >
> > This is on github: https://github.com/jgunthorpe/linux/commits/vfio_iommufd
> >
> > It requires the iommufd series:
> >
> > https://lore.kernel.org/r/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
> >
> > v4:
> > - Change the assertion in vfio_group_has_iommu to be clearer
> > - Use vfio_group_has_iommu()
> > - Remove allow_unsafe_interrupts stuff
> > - Update IOMMUFD_VFIO_CONTAINER kconfig description
> > - Use DEBUG_KERNEL insted of RUNTIME_TESTING_MENU for IOMMUFD_TEST kconfig
>
> This looks ok to me and passes all my testing. What's your merge plan?
> I'm guessing you'd like to send it along with the main IOMMUFD pull
> request, there are currently no conflicts with my next branch,
> therefore:
Yes, it has to go with the iommufd branch, it won't apply
otherwise.
Here is everything ready-to-go:
https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/log/?h=for-next
It has been in linux-next for about 4 weeks now
Thanks,
Jason
^ permalink raw reply [flat|nested] 13+ messages in thread