* [PATCH 1/2] vfio: Fix unbalanced vfio_df_close call in no-iommu mode
@ 2025-05-16 16:45 Jacob Pan
2025-05-16 16:45 ` [PATCH 2/2] vfio: Prevent open_count decrement to negative Jacob Pan
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Jacob Pan @ 2025-05-16 16:45 UTC (permalink / raw)
To: linux-kernel, iommu@lists.linux.dev, Alex Williamson, Liu, Yi L,
jgg@nvidia.com, Jacob Pan
Cc: Zhang Yu, Easwar Hariharan
For no-iommu enabled devices working under IOMMUFD VFIO compat mode, the
group open path does not call vfio_df_open() and the open_count is 0. So
calling vfio_df_close() in the group close path will trigger warning in
vfio_assert_device_open(device);
E.g. The following warning can be seen by running VFIO test.
https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c
CONFIG_VFIO_CONTAINER = n
[ 29.094781] vfio-pci 0000:02:01.0: vfio-noiommu device opened by user (vfio-noiommu-pc:164)
Failed to get device info
[ 29.096540] ------------[ cut here ]------------
[ 29.096616] WARNING: CPU: 1 PID: 164 at drivers/vfio/vfio_main.c:487 vfio_df_close+0xac/0xb4
This patch adds checks for no-iommu mode and open_count to skip calling vfio_df_close.
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
---
drivers/vfio/group.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index c321d442f0da..834421149ffe 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -238,12 +238,13 @@ void vfio_df_group_close(struct vfio_device_file *df)
mutex_lock(&device->group->group_lock);
mutex_lock(&device->dev_set->lock);
- vfio_df_close(df);
- df->iommufd = NULL;
-
if (device->open_count == 0)
vfio_device_put_kvm(device);
+ if (!vfio_device_is_noiommu(device))
+ vfio_df_close(df);
+
+ df->iommufd = NULL;
mutex_unlock(&device->dev_set->lock);
mutex_unlock(&device->group->group_lock);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 2/2] vfio: Prevent open_count decrement to negative 2025-05-16 16:45 [PATCH 1/2] vfio: Fix unbalanced vfio_df_close call in no-iommu mode Jacob Pan @ 2025-05-16 16:45 ` Jacob Pan 2025-05-26 23:53 ` Jason Gunthorpe 2025-05-28 7:24 ` Yi Liu 2025-05-26 23:52 ` [PATCH 1/2] vfio: Fix unbalanced vfio_df_close call in no-iommu mode Jason Gunthorpe 2025-05-27 0:05 ` Jason Gunthorpe 2 siblings, 2 replies; 12+ messages in thread From: Jacob Pan @ 2025-05-16 16:45 UTC (permalink / raw) To: linux-kernel, iommu@lists.linux.dev, Alex Williamson, Liu, Yi L, jgg@nvidia.com, Jacob Pan Cc: Zhang Yu, Easwar Hariharan When vfio_df_close() is called with open_count=0, it triggers a warning in vfio_assert_device_open() but still decrements open_count to -1. This allows a subsequent open to incorrectly pass the open_count == 0 check, leading to unintended behavior, such as setting df->access_granted = true. For example, running an IOMMUFD compat no-IOMMU device with VFIO tests (https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c) results in a warning and a failed VFIO_GROUP_GET_DEVICE_FD ioctl on the first run, but the second run succeeds incorrectly. Add checks to avoid decrementing open_count below zero Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> --- drivers/vfio/vfio_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 1fd261efc582..5046cae05222 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -583,7 +583,8 @@ void vfio_df_close(struct vfio_device_file *df) lockdep_assert_held(&device->dev_set->lock); - vfio_assert_device_open(device); + if (!vfio_assert_device_open(device)) + return; if (device->open_count == 1) vfio_df_device_last_close(df); device->open_count--; -- 2.34.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] vfio: Prevent open_count decrement to negative 2025-05-16 16:45 ` [PATCH 2/2] vfio: Prevent open_count decrement to negative Jacob Pan @ 2025-05-26 23:53 ` Jason Gunthorpe 2025-05-28 7:24 ` Yi Liu 1 sibling, 0 replies; 12+ messages in thread From: Jason Gunthorpe @ 2025-05-26 23:53 UTC (permalink / raw) To: Jacob Pan Cc: linux-kernel, iommu@lists.linux.dev, Alex Williamson, Liu, Yi L, Zhang Yu, Easwar Hariharan On Fri, May 16, 2025 at 09:45:22AM -0700, Jacob Pan wrote: > When vfio_df_close() is called with open_count=0, it triggers a warning in > vfio_assert_device_open() but still decrements open_count to -1. This allows > a subsequent open to incorrectly pass the open_count == 0 check, leading to > unintended behavior, such as setting df->access_granted = true. > > For example, running an IOMMUFD compat no-IOMMU device with VFIO tests > (https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c) > results in a warning and a failed VFIO_GROUP_GET_DEVICE_FD ioctl on the first > run, but the second run succeeds incorrectly. > > Add checks to avoid decrementing open_count below zero > > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> > --- > drivers/vfio/vfio_main.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] vfio: Prevent open_count decrement to negative 2025-05-16 16:45 ` [PATCH 2/2] vfio: Prevent open_count decrement to negative Jacob Pan 2025-05-26 23:53 ` Jason Gunthorpe @ 2025-05-28 7:24 ` Yi Liu 1 sibling, 0 replies; 12+ messages in thread From: Yi Liu @ 2025-05-28 7:24 UTC (permalink / raw) To: Jacob Pan, linux-kernel, iommu@lists.linux.dev, Alex Williamson, jgg@nvidia.com Cc: Zhang Yu, Easwar Hariharan On 2025/5/17 00:45, Jacob Pan wrote: > When vfio_df_close() is called with open_count=0, it triggers a warning in > vfio_assert_device_open() but still decrements open_count to -1. This allows > a subsequent open to incorrectly pass the open_count == 0 check, leading to > unintended behavior, such as setting df->access_granted = true. > > For example, running an IOMMUFD compat no-IOMMU device with VFIO tests > (https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c) > results in a warning and a failed VFIO_GROUP_GET_DEVICE_FD ioctl on the first > run, but the second run succeeds incorrectly. > > Add checks to avoid decrementing open_count below zero > > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> > --- > drivers/vfio/vfio_main.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) Reviewed-by: Yi Liu <yi.l.liu@intel.com> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c > index 1fd261efc582..5046cae05222 100644 > --- a/drivers/vfio/vfio_main.c > +++ b/drivers/vfio/vfio_main.c > @@ -583,7 +583,8 @@ void vfio_df_close(struct vfio_device_file *df) > > lockdep_assert_held(&device->dev_set->lock); > > - vfio_assert_device_open(device); > + if (!vfio_assert_device_open(device)) > + return; > if (device->open_count == 1) > vfio_df_device_last_close(df); > device->open_count--; -- Regards, Yi Liu ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] vfio: Fix unbalanced vfio_df_close call in no-iommu mode 2025-05-16 16:45 [PATCH 1/2] vfio: Fix unbalanced vfio_df_close call in no-iommu mode Jacob Pan 2025-05-16 16:45 ` [PATCH 2/2] vfio: Prevent open_count decrement to negative Jacob Pan @ 2025-05-26 23:52 ` Jason Gunthorpe 2025-05-27 0:05 ` Jason Gunthorpe 2 siblings, 0 replies; 12+ messages in thread From: Jason Gunthorpe @ 2025-05-26 23:52 UTC (permalink / raw) To: Jacob Pan Cc: linux-kernel, iommu@lists.linux.dev, Alex Williamson, Liu, Yi L, Zhang Yu, Easwar Hariharan On Fri, May 16, 2025 at 09:45:21AM -0700, Jacob Pan wrote: > For no-iommu enabled devices working under IOMMUFD VFIO compat mode, the > group open path does not call vfio_df_open() and the open_count is 0. So > calling vfio_df_close() in the group close path will trigger warning in > vfio_assert_device_open(device); > > E.g. The following warning can be seen by running VFIO test. > https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c > CONFIG_VFIO_CONTAINER = n > [ 29.094781] vfio-pci 0000:02:01.0: vfio-noiommu device opened by user (vfio-noiommu-pc:164) > Failed to get device info > [ 29.096540] ------------[ cut here ]------------ > [ 29.096616] WARNING: CPU: 1 PID: 164 at drivers/vfio/vfio_main.c:487 vfio_df_close+0xac/0xb4 > > This patch adds checks for no-iommu mode and open_count to skip calling vfio_df_close. > > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> > --- > drivers/vfio/group.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] vfio: Fix unbalanced vfio_df_close call in no-iommu mode 2025-05-16 16:45 [PATCH 1/2] vfio: Fix unbalanced vfio_df_close call in no-iommu mode Jacob Pan 2025-05-16 16:45 ` [PATCH 2/2] vfio: Prevent open_count decrement to negative Jacob Pan 2025-05-26 23:52 ` [PATCH 1/2] vfio: Fix unbalanced vfio_df_close call in no-iommu mode Jason Gunthorpe @ 2025-05-27 0:05 ` Jason Gunthorpe 2025-05-28 7:16 ` Yi Liu 2025-05-30 16:51 ` Jacob Pan 2 siblings, 2 replies; 12+ messages in thread From: Jason Gunthorpe @ 2025-05-27 0:05 UTC (permalink / raw) To: Jacob Pan Cc: linux-kernel, iommu@lists.linux.dev, Alex Williamson, Liu, Yi L, Zhang Yu, Easwar Hariharan On Fri, May 16, 2025 at 09:45:21AM -0700, Jacob Pan wrote: > For no-iommu enabled devices working under IOMMUFD VFIO compat mode, the > group open path does not call vfio_df_open() and the open_count is 0. So > calling vfio_df_close() in the group close path will trigger warning in > vfio_assert_device_open(device); > > E.g. The following warning can be seen by running VFIO test. > https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c > CONFIG_VFIO_CONTAINER = n > [ 29.094781] vfio-pci 0000:02:01.0: vfio-noiommu device opened by user (vfio-noiommu-pc:164) > Failed to get device info > [ 29.096540] ------------[ cut here ]------------ > [ 29.096616] WARNING: CPU: 1 PID: 164 at drivers/vfio/vfio_main.c:487 vfio_df_close+0xac/0xb4 > > This patch adds checks for no-iommu mode and open_count to skip calling vfio_df_close. > > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> > --- > drivers/vfio/group.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) Sorry, this should have a fixes line: I think it is probably Fixes: 6086efe73498 ("vfio-iommufd: Move noiommu compat validation out of vfio_iommufd_bind()") By the look of it, since that is what started skipping the vfio_df_open() But after looking at that patch I'm now doubting that this is the right fix. Previously we'd still do vfio_df_device_first_open(), just the vfio_df_iommufd_bind() was skipped. Now we skip all of vfio_df_device_first_open() which also means we skip: if (!try_module_get(device->dev->driver->owner)) return -ENODEV; and if (device->ops->open_device) { ret = device->ops->open_device(device); Which seems wrong to me?? We only want to skip the bind, we should still do open_device! At least that is how it was before 6086e So.. This may not be the right fix. Maybe more like: diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index c321d442f0da09..1b6a0e30544401 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -192,18 +192,18 @@ static int vfio_df_group_open(struct vfio_device_file *df) * implies they expected translation to exist */ if (!capable(CAP_SYS_RAWIO) || - vfio_iommufd_device_has_compat_ioas(device, df->iommufd)) + vfio_iommufd_device_has_compat_ioas(device, df->iommufd)) { ret = -EPERM; - else - ret = 0; - goto out_put_kvm; + goto out_put_kvm; + } } ret = vfio_df_open(df); if (ret) goto out_put_kvm; - if (df->iommufd && device->open_count == 1) { + if (df->iommufd && device->open_count == 1 && + !vfio_device_is_noiommu(device)) { ret = vfio_iommufd_compat_attach_ioas(device, df->iommufd); if (ret) goto out_close_device; diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c index c8c3a2d53f86e1..26c9c3068c77da 100644 --- a/drivers/vfio/iommufd.c +++ b/drivers/vfio/iommufd.c @@ -54,9 +54,6 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df) lockdep_assert_held(&vdev->dev_set->lock); - if (vfio_device_is_noiommu(vdev)) - return; - if (vdev->ops->unbind_iommufd) vdev->ops->unbind_iommufd(vdev); } diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 1fd261efc582d0..ff19ea05442e7d 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -506,17 +506,19 @@ static int vfio_df_device_first_open(struct vfio_device_file *df) { struct vfio_device *device = df->device; struct iommufd_ctx *iommufd = df->iommufd; - int ret; + int ret = 0; lockdep_assert_held(&device->dev_set->lock); if (!try_module_get(device->dev->driver->owner)) return -ENODEV; - if (iommufd) - ret = vfio_df_iommufd_bind(df); - else + if (iommufd) { + if (!vfio_device_is_noiommu(device)) + ret = vfio_df_iommufd_bind(df); + } else { ret = vfio_device_group_use_iommu(device); + } if (ret) goto err_module_put; @@ -528,10 +530,12 @@ static int vfio_df_device_first_open(struct vfio_device_file *df) return 0; err_unuse_iommu: - if (iommufd) - vfio_df_iommufd_unbind(df); - else + if (iommufd) { + if (!vfio_device_is_noiommu(device)) + vfio_df_iommufd_unbind(df); + } else { vfio_device_group_unuse_iommu(device); + } err_module_put: module_put(device->dev->driver->owner); return ret; @@ -546,10 +550,12 @@ static void vfio_df_device_last_close(struct vfio_device_file *df) if (device->ops->close_device) device->ops->close_device(device); - if (iommufd) - vfio_df_iommufd_unbind(df); - else + if (iommufd) { + if (!vfio_device_is_noiommu(device)) + vfio_df_iommufd_unbind(df); + } else { vfio_device_group_unuse_iommu(device); + } module_put(device->dev->driver->owner); } Jason ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] vfio: Fix unbalanced vfio_df_close call in no-iommu mode 2025-05-27 0:05 ` Jason Gunthorpe @ 2025-05-28 7:16 ` Yi Liu 2025-06-03 16:25 ` Jacob Pan 2025-05-30 16:51 ` Jacob Pan 1 sibling, 1 reply; 12+ messages in thread From: Yi Liu @ 2025-05-28 7:16 UTC (permalink / raw) To: Jason Gunthorpe, Jacob Pan Cc: linux-kernel, iommu@lists.linux.dev, Alex Williamson, Zhang Yu, Easwar Hariharan On 2025/5/27 08:05, Jason Gunthorpe wrote: > On Fri, May 16, 2025 at 09:45:21AM -0700, Jacob Pan wrote: >> For no-iommu enabled devices working under IOMMUFD VFIO compat mode, the >> group open path does not call vfio_df_open() and the open_count is 0. So >> calling vfio_df_close() in the group close path will trigger warning in >> vfio_assert_device_open(device); >> >> E.g. The following warning can be seen by running VFIO test. >> https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c >> CONFIG_VFIO_CONTAINER = n >> [ 29.094781] vfio-pci 0000:02:01.0: vfio-noiommu device opened by user (vfio-noiommu-pc:164) >> Failed to get device info >> [ 29.096540] ------------[ cut here ]------------ >> [ 29.096616] WARNING: CPU: 1 PID: 164 at drivers/vfio/vfio_main.c:487 vfio_df_close+0xac/0xb4 >> >> This patch adds checks for no-iommu mode and open_count to skip calling vfio_df_close. thanks for catching it. :) >> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> >> --- >> drivers/vfio/group.c | 7 ++++--- >> 1 file changed, 4 insertions(+), 3 deletions(-) > > Sorry, this should have a fixes line: > > I think it is probably > > Fixes: 6086efe73498 ("vfio-iommufd: Move noiommu compat validation out of vfio_iommufd_bind()") > > By the look of it, since that is what started skipping the vfio_df_open() > > But after looking at that patch I'm now doubting that this is the > right fix. > > Previously we'd still do vfio_df_device_first_open(), just the > vfio_df_iommufd_bind() was skipped. > > Now we skip all of vfio_df_device_first_open() which also means we skip: > > if (!try_module_get(device->dev->driver->owner)) > return -ENODEV; > > and > if (device->ops->open_device) { > ret = device->ops->open_device(device); > > Which seems wrong to me?? We only want to skip the bind, we should > still do open_device! At least that is how it was before 6086e > > So.. This may not be the right fix. yes. this makes sense. If not opened, userspace is not able to use the device. > Maybe more like: > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c > index c321d442f0da09..1b6a0e30544401 100644 > --- a/drivers/vfio/group.c > +++ b/drivers/vfio/group.c > @@ -192,18 +192,18 @@ static int vfio_df_group_open(struct vfio_device_file *df) > * implies they expected translation to exist > */ > if (!capable(CAP_SYS_RAWIO) || > - vfio_iommufd_device_has_compat_ioas(device, df->iommufd)) > + vfio_iommufd_device_has_compat_ioas(device, df->iommufd)) { > ret = -EPERM; > - else > - ret = 0; > - goto out_put_kvm; > + goto out_put_kvm; > + } > } > > ret = vfio_df_open(df); > if (ret) > goto out_put_kvm; > > - if (df->iommufd && device->open_count == 1) { > + if (df->iommufd && device->open_count == 1 && > + !vfio_device_is_noiommu(device)) { > ret = vfio_iommufd_compat_attach_ioas(device, df->iommufd); > if (ret) > goto out_close_device; > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c > index c8c3a2d53f86e1..26c9c3068c77da 100644 > --- a/drivers/vfio/iommufd.c > +++ b/drivers/vfio/iommufd.c > @@ -54,9 +54,6 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df) > > lockdep_assert_held(&vdev->dev_set->lock); > > - if (vfio_device_is_noiommu(vdev)) > - return; > - > if (vdev->ops->unbind_iommufd) > vdev->ops->unbind_iommufd(vdev); > } > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c > index 1fd261efc582d0..ff19ea05442e7d 100644 > --- a/drivers/vfio/vfio_main.c > +++ b/drivers/vfio/vfio_main.c > @@ -506,17 +506,19 @@ static int vfio_df_device_first_open(struct vfio_device_file *df) > { > struct vfio_device *device = df->device; > struct iommufd_ctx *iommufd = df->iommufd; > - int ret; > + int ret = 0; > > lockdep_assert_held(&device->dev_set->lock); > > if (!try_module_get(device->dev->driver->owner)) > return -ENODEV; > > - if (iommufd) > - ret = vfio_df_iommufd_bind(df); > - else > + if (iommufd) { > + if (!vfio_device_is_noiommu(device)) > + ret = vfio_df_iommufd_bind(df); > + } else { > ret = vfio_device_group_use_iommu(device); > + } > if (ret) > goto err_module_put; > > @@ -528,10 +530,12 @@ static int vfio_df_device_first_open(struct vfio_device_file *df) > return 0; > > err_unuse_iommu: > - if (iommufd) > - vfio_df_iommufd_unbind(df); > - else > + if (iommufd) { > + if (!vfio_device_is_noiommu(device)) > + vfio_df_iommufd_unbind(df); > + } else { > vfio_device_group_unuse_iommu(device); > + } > err_module_put: > module_put(device->dev->driver->owner); > return ret; > @@ -546,10 +550,12 @@ static void vfio_df_device_last_close(struct vfio_device_file *df) > > if (device->ops->close_device) > device->ops->close_device(device); > - if (iommufd) > - vfio_df_iommufd_unbind(df); > - else > + if (iommufd) { > + if (!vfio_device_is_noiommu(device)) > + vfio_df_iommufd_unbind(df); > + } else { > vfio_device_group_unuse_iommu(device); > + } > module_put(device->dev->driver->owner); > } > this looks good to me. -- Regards, Yi Liu ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] vfio: Fix unbalanced vfio_df_close call in no-iommu mode 2025-05-28 7:16 ` Yi Liu @ 2025-06-03 16:25 ` Jacob Pan 2025-06-03 17:54 ` Jason Gunthorpe 0 siblings, 1 reply; 12+ messages in thread From: Jacob Pan @ 2025-06-03 16:25 UTC (permalink / raw) To: Yi Liu Cc: Jason Gunthorpe, linux-kernel, iommu@lists.linux.dev, Alex Williamson, Zhang Yu, Easwar Hariharan, jacob.pan Hi Yi, On Wed, 28 May 2025 15:16:57 +0800 Yi Liu <yi.l.liu@intel.com> wrote: > On 2025/5/27 08:05, Jason Gunthorpe wrote: > > On Fri, May 16, 2025 at 09:45:21AM -0700, Jacob Pan wrote: > >> For no-iommu enabled devices working under IOMMUFD VFIO compat > >> mode, the group open path does not call vfio_df_open() and the > >> open_count is 0. So calling vfio_df_close() in the group close > >> path will trigger warning in vfio_assert_device_open(device); > >> > >> E.g. The following warning can be seen by running VFIO test. > >> https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c > >> CONFIG_VFIO_CONTAINER = n > >> [ 29.094781] vfio-pci 0000:02:01.0: vfio-noiommu device opened > >> by user (vfio-noiommu-pc:164) Failed to get device info > >> [ 29.096540] ------------[ cut here ]------------ > >> [ 29.096616] WARNING: CPU: 1 PID: 164 at > >> drivers/vfio/vfio_main.c:487 vfio_df_close+0xac/0xb4 > >> > >> This patch adds checks for no-iommu mode and open_count to skip > >> calling vfio_df_close. > > thanks for catching it. :) > > >> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> > >> --- > >> drivers/vfio/group.c | 7 ++++--- > >> 1 file changed, 4 insertions(+), 3 deletions(-) > > > > Sorry, this should have a fixes line: > > > > I think it is probably > > > > Fixes: 6086efe73498 ("vfio-iommufd: Move noiommu compat validation > > out of vfio_iommufd_bind()") > > > > By the look of it, since that is what started skipping the > > vfio_df_open() > > > > But after looking at that patch I'm now doubting that this is the > > right fix. > > > > Previously we'd still do vfio_df_device_first_open(), just the > > vfio_df_iommufd_bind() was skipped. > > > > Now we skip all of vfio_df_device_first_open() which also means we > > skip: > > > > if (!try_module_get(device->dev->driver->owner)) > > return -ENODEV; > > > > and > > if (device->ops->open_device) { > > ret = device->ops->open_device(device); > > > > Which seems wrong to me?? We only want to skip the bind, we should > > still do open_device! At least that is how it was before 6086e > > > > So.. This may not be the right fix. > > yes. this makes sense. If not opened, userspace is not able to use the > device. > Put this bug aside for now, I'm still unclear on why we do not allow bind for no-IOMMU devices. Per my understanding, no-IOMMU only means no translation. But since device still has been granted access, we should be able to allow binding device in no-IOMMU mode with IOMMU-FD context while simply disallowing IOAS attachment? The reason I am asking is that I am working on enabling cdev with noiommu mode based on Yi's patch (https://lore.kernel.org/kvm/20230601082413.22a55ac4.alex.williamson@redhat.com/), it seems having iommufd implicit ownership model is key to enable PCI HOT RESET. Our goal is to leverage persistent iommufd context for kexec handle off (KHO) usage, we currently have noiommu mode. This requires binding of device with iommufd ctx (can be marked as persistent) AFAIK, any suggestions?" Thanks, Jacob ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] vfio: Fix unbalanced vfio_df_close call in no-iommu mode 2025-06-03 16:25 ` Jacob Pan @ 2025-06-03 17:54 ` Jason Gunthorpe 0 siblings, 0 replies; 12+ messages in thread From: Jason Gunthorpe @ 2025-06-03 17:54 UTC (permalink / raw) To: jacob.pan, Saurabh Sengar Cc: Yi Liu, linux-kernel, iommu@lists.linux.dev, Alex Williamson, Zhang Yu, Easwar Hariharan On Tue, Jun 03, 2025 at 09:25:49AM -0700, Jacob Pan wrote: > Put this bug aside for now, I'm still unclear on why we do not allow > bind for no-IOMMU devices. Per my understanding, no-IOMMU only means no > translation. But since device still has been granted access, we should > be able to allow binding device in no-IOMMU mode with IOMMU-FD > context while simply disallowing IOAS attachment? Because it isn't finished, the no-iommu mode under cdev should work the same as the real iommu mode, complete with an ioas, hwpt and idevice object. There is a bunch of work still missing, so the uapi was blocked off until it is done. Don't want to end up with uapi that doesn't implement the plan and can't be fixed later.. > The reason I am asking is that I am working on enabling cdev with > noiommu mode based on Yi's patch > (https://lore.kernel.org/kvm/20230601082413.22a55ac4.alex.williamson@redhat.com/), > it seems having iommufd implicit ownership model is key to enable PCI > HOT RESET. The idea is that the IOAS and HWPT will allow iommufd to do the page pinning exactly as normal-iommu would. So no mlock. A new ioctl would let userspace retrieve the virtual to physical mapping, so no /proc/ mess. I haven't thought too much about it, but I'm suspecting the easiest thing is to make a "VFIO no-iommu iommu_domain" with special ops/etc that can be put under a HWPT. The selftest has something like this already. Then when you bind VFIO would request a no-iommu mode and it wouuld through the normal flow but create/find a no-iommu HWPT with the special no-iommu iommu_domain. Aside from that special logic everything else would be exactly the same. Add the virtual to physical ioctl and you are done. > Our goal is to leverage persistent iommufd context for kexec > handle off (KHO) usage, we currently have noiommu mode. This requires > binding of device with iommufd ctx (can be marked as persistent) AFAIK, > any suggestions?" Then when you get here things can be much more similar to real KHO as you could have a real page table (couresty of the iommupt patches) with identical KHO semantics for no-iommu as real-iommu. Then it becomes much less of a special case. Something approximately like this Jason diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 2111bad72c720f..7d171a7c34a287 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -241,6 +241,62 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, } EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, "IOMMUFD"); +struct iommufd_device *iommufd_device_bind_noiommu(struct iommufd_ctx *ictx, + struct device *dev, u32 *id) +{ + struct iommufd_group *igroup; + struct iommufd_device *idev; + int rc; + + /* FIXME better code sharing with iommufd_get_group() and iommufd_device_bind() */ + + igroup = kzalloc(sizeof(*igroup), GFP_KERNEL); + if (!igroup) + return ERR_PTR(-ENOMEM); + + kref_init(&igroup->ref); + mutex_init(&igroup->lock); + xa_init(&igroup->pasid_attach); + igroup->sw_msi_start = PHYS_ADDR_MAX; + + // FIXME: need to check that igroup->group == NULL is OK for the no iommu paths. + + rc = iommu_device_claim_dma_owner(dev, ictx); + if (rc && rc != -ENODEV) + goto out_group_put; + + idev = iommufd_object_alloc(ictx, idev, IOMMUFD_OBJ_DEVICE); + if (IS_ERR(idev)) { + rc = PTR_ERR(idev); + goto out_release_owner; + } + idev->no_iommu = true; + idev->ictx = ictx; + iommufd_ctx_get(ictx); + idev->dev = dev; + /* The calling driver is a user until iommufd_device_unbind() */ + refcount_inc(&idev->obj.users); + /* igroup refcount moves into iommufd_device */ + idev->igroup = igroup; + mutex_init(&idev->iopf_lock); + /* + * If the caller fails after this success it must call + * iommufd_unbind_device() which is safe since we hold this refcount. + * This also means the device is a leaf in the graph and no other object + * can take a reference on it. + */ + iommufd_object_finalize(ictx, &idev->obj); + *id = idev->obj.id; + return idev; + +out_release_owner: + iommu_device_release_dma_owner(dev); +out_group_put: + iommufd_put_group(igroup); + return ERR_PTR(rc); +} +EXPORT_SYMBOL_NS_GPL(iommufd_device_bind_noiommu, "IOMMUFD"); + /** * iommufd_ctx_has_group - True if any device within the group is bound * to the ictx @@ -366,6 +422,9 @@ iommufd_device_attach_reserved_iova(struct iommufd_device *idev, struct iommufd_group *igroup = idev->igroup; int rc; + if (idev->no_iommu) + return 0; + lockdep_assert_held(&igroup->lock); rc = iopt_table_enforce_dev_resv_regions(&hwpt_paging->ioas->iopt, @@ -432,6 +491,9 @@ static int iommufd_hwpt_attach_device(struct iommufd_hw_pagetable *hwpt, struct iommufd_attach_handle *handle; int rc; + if (idev->no_iommu) + return 0; + rc = iommufd_hwpt_pasid_compat(hwpt, idev, pasid); if (rc) return rc; @@ -486,6 +548,9 @@ static void iommufd_hwpt_detach_device(struct iommufd_hw_pagetable *hwpt, { struct iommufd_attach_handle *handle; + if (idev->no_iommu) + return; + handle = iommufd_device_get_attach_handle(idev, pasid); if (pasid == IOMMU_NO_PASID) iommu_detach_group_handle(hwpt->domain, idev->igroup->group); @@ -507,6 +572,9 @@ static int iommufd_hwpt_replace_device(struct iommufd_device *idev, struct iommufd_attach_handle *handle, *old_handle; int rc; + if (idev->no_iommu) + return 0; + rc = iommufd_hwpt_pasid_compat(hwpt, idev, pasid); if (rc) return rc; @@ -560,6 +628,9 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, struct iommufd_attach *attach; int rc; + if (hwpt_paging->no_iommu != idev->no_iommu) + return -EOPNOTSUPP; + mutex_lock(&igroup->lock); attach = xa_cmpxchg(&igroup->pasid_attach, pasid, NULL, @@ -703,6 +774,9 @@ iommufd_group_do_replace_reserved_iova(struct iommufd_group *igroup, unsigned long index; int rc; + if (hwpt_paging->no_iommu) + return 0; + lockdep_assert_held(&igroup->lock); attach = xa_load(&igroup->pasid_attach, IOMMU_NO_PASID); diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 487779470261a7..58102df9cdfeb0 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -145,7 +145,16 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, hwpt_paging->ioas = ioas; hwpt_paging->nest_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT; - if (ops->domain_alloc_paging_flags) { + if (idev->no_iommu) { + hwpt->domain = iommufd_create_no_iommu_domain(); + if (IS_ERR(hwpt->domain)) { + rc = PTR_ERR(hwpt->domain); + hwpt->domain = NULL; + goto out_abort; + } + immediate_attach = false; + hwpt_paging->no_iommu = true; + } else if (ops->domain_alloc_paging_flags) { hwpt->domain = ops->domain_alloc_paging_flags(idev->dev, flags & ~IOMMU_HWPT_FAULT_ID_VALID, user_data); if (IS_ERR(hwpt->domain)) { diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 80e8c76d25f234..7a4a019ad7e5ab 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -308,6 +308,7 @@ struct iommufd_hwpt_paging { bool auto_domain : 1; bool enforce_cache_coherency : 1; bool nest_parent : 1; + bool no_iommu : 1; /* Head at iommufd_ioas::hwpt_list */ struct list_head hwpt_item; struct iommufd_sw_msi_maps present_sw_msi; @@ -425,6 +426,7 @@ struct iommufd_device { /* always the physical device */ struct device *dev; bool enforce_cache_coherency; + bool no_iommu; /* protect iopf_enabled counter */ struct mutex iopf_lock; unsigned int iopf_enabled; diff --git a/drivers/iommu/iommufd/noiommu.c b/drivers/iommu/iommufd/noiommu.c new file mode 100644 index 00000000000000..6b419a37f0713d --- /dev/null +++ b/drivers/iommu/iommufd/noiommu.c @@ -0,0 +1,180 @@ + +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2021-2025, NVIDIA CORPORATION & AFFILIATES. + * + */ +#include <linux/iommu.h> + +/* FIXME s/mock/noiommu/ */ + +struct mock_iommu_domain { + unsigned long flags; + struct iommu_domain domain; + struct xarray pfns; +}; + +enum { + MOCK_DIRTY_TRACK = 1, + MOCK_HUGE_PAGE_SIZE = 512 * PAGE_SIZE, + + /* + * Like a real page table alignment requires the low bits of the address + * to be zero. xarray also requires the high bit to be zero, so we store + * the pfns shifted. The upper bits are used for metadata. + */ + MOCK_PFN_MASK = ULONG_MAX / PAGE_SIZE, + + _MOCK_PFN_START = MOCK_PFN_MASK + 1, + MOCK_PFN_START_IOVA = _MOCK_PFN_START, + MOCK_PFN_LAST_IOVA = _MOCK_PFN_START, + MOCK_PFN_DIRTY_IOVA = _MOCK_PFN_START << 1, + MOCK_PFN_HUGE_IOVA = _MOCK_PFN_START << 2, +}; + +static inline struct mock_iommu_domain * +to_mock_domain(struct iommu_domain *domain) +{ + return container_of(domain, struct mock_iommu_domain, domain); +} + +static int mock_domain_map_pages(struct iommu_domain *domain, + unsigned long iova, phys_addr_t paddr, + size_t pgsize, size_t pgcount, int prot, + gfp_t gfp, size_t *mapped) +{ + struct mock_iommu_domain *mock = to_mock_domain(domain); + unsigned long flags = MOCK_PFN_START_IOVA; + unsigned long start_iova = iova; + + WARN_ON(iova % PAGE_SIZE); + WARN_ON(pgsize % PAGE_SIZE); + for (; pgcount; pgcount--) { + size_t cur; + + for (cur = 0; cur != pgsize; cur += PAGE_SIZE) { + void *old; + + if (pgcount == 1 && cur + PAGE_SIZE == pgsize) + flags = MOCK_PFN_LAST_IOVA; + if (pgsize != PAGE_SIZE) { + flags |= MOCK_PFN_HUGE_IOVA; + } + old = xa_store(&mock->pfns, iova / PAGE_SIZE, + xa_mk_value((paddr / PAGE_SIZE) | + flags), + gfp); + if (xa_is_err(old)) { + for (; start_iova != iova; + start_iova += PAGE_SIZE) + xa_erase(&mock->pfns, + start_iova / + PAGE_SIZE); + return xa_err(old); + } + WARN_ON(old); + iova += PAGE_SIZE; + paddr += PAGE_SIZE; + *mapped += PAGE_SIZE; + flags = 0; + } + } + return 0; +} + +static size_t mock_domain_unmap_pages(struct iommu_domain *domain, + unsigned long iova, size_t pgsize, + size_t pgcount, + struct iommu_iotlb_gather *iotlb_gather) +{ + struct mock_iommu_domain *mock = to_mock_domain(domain); + bool first = true; + size_t ret = 0; + void *ent; + + WARN_ON(iova % PAGE_SIZE); + WARN_ON(pgsize % PAGE_SIZE); + + for (; pgcount; pgcount--) { + size_t cur; + + for (cur = 0; cur != pgsize; cur += PAGE_SIZE) { + ent = xa_erase(&mock->pfns, iova / PAGE_SIZE); + + /* + * iommufd generates unmaps that must be a strict + * superset of the map's performend So every + * starting/ending IOVA should have been an iova passed + * to map. + * + * This simple logic doesn't work when the HUGE_PAGE is + * turned on since the core code will automatically + * switch between the two page sizes creating a break in + * the unmap calls. The break can land in the middle of + * contiguous IOVA. + */ + if (!(domain->pgsize_bitmap & MOCK_HUGE_PAGE_SIZE)) { + if (first) { + WARN_ON(ent && !(xa_to_value(ent) & + MOCK_PFN_START_IOVA)); + first = false; + } + if (pgcount == 1 && + cur + PAGE_SIZE == pgsize) + WARN_ON(ent && !(xa_to_value(ent) & + MOCK_PFN_LAST_IOVA)); + } + + iova += PAGE_SIZE; + ret += PAGE_SIZE; + } + } + return ret; +} + +static phys_addr_t mock_domain_iova_to_phys(struct iommu_domain *domain, + dma_addr_t iova) +{ + struct mock_iommu_domain *mock = to_mock_domain(domain); + void *ent; + + WARN_ON(iova % PAGE_SIZE); + ent = xa_load(&mock->pfns, iova / PAGE_SIZE); + WARN_ON(!ent); + return (xa_to_value(ent) & MOCK_PFN_MASK) * PAGE_SIZE; +} + +static void mock_domain_free(struct iommu_domain *domain) +{ + struct mock_iommu_domain *mock = to_mock_domain(domain); + + WARN_ON(!xa_empty(&mock->pfns)); + kfree(mock); +} + +static const struct iommu_domain_ops mock_domain_ops = { + .free = mock_domain_free, + .map_pages = mock_domain_map_pages, + .unmap_pages = mock_domain_unmap_pages, + .iova_to_phys = mock_domain_iova_to_phys, +}; + +static const struct iommu_ops no_iommu_owner; + +struct iommu_domain *iommufd_create_no_iommu_domain(void) +{ + struct mock_iommu_domain *mock; + + mock = kzalloc(sizeof(*mock), GFP_KERNEL); + if (!mock) + return ERR_PTR(-ENOMEM); + + mock->domain.geometry.aperture_start = 0; + mock->domain.geometry.aperture_end = ULONG_MAX; + mock->domain.pgsize_bitmap = PAGE_SIZE; + mock->domain.ops = &mock_domain_ops; + mock->domain.type = IOMMU_DOMAIN_UNMANAGED; + xa_init(&mock->pfns); + mock->domain.owner = &no_iommu_owner; + + return &mock->domain; +} ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] vfio: Fix unbalanced vfio_df_close call in no-iommu mode 2025-05-27 0:05 ` Jason Gunthorpe 2025-05-28 7:16 ` Yi Liu @ 2025-05-30 16:51 ` Jacob Pan 2025-05-30 16:53 ` Jason Gunthorpe 1 sibling, 1 reply; 12+ messages in thread From: Jacob Pan @ 2025-05-30 16:51 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-kernel, iommu@lists.linux.dev, Alex Williamson, Liu, Yi L, Zhang Yu, Easwar Hariharan, jacob.pan Hi Jason, On Mon, 26 May 2025 21:05:11 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote: > On Fri, May 16, 2025 at 09:45:21AM -0700, Jacob Pan wrote: > > For no-iommu enabled devices working under IOMMUFD VFIO compat > > mode, the group open path does not call vfio_df_open() and the > > open_count is 0. So calling vfio_df_close() in the group close path > > will trigger warning in vfio_assert_device_open(device); > > > > E.g. The following warning can be seen by running VFIO test. > > https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c > > CONFIG_VFIO_CONTAINER = n > > [ 29.094781] vfio-pci 0000:02:01.0: vfio-noiommu device opened by > > user (vfio-noiommu-pc:164) Failed to get device info > > [ 29.096540] ------------[ cut here ]------------ > > [ 29.096616] WARNING: CPU: 1 PID: 164 at > > drivers/vfio/vfio_main.c:487 vfio_df_close+0xac/0xb4 > > > > This patch adds checks for no-iommu mode and open_count to skip > > calling vfio_df_close. > > > > Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> > > --- > > drivers/vfio/group.c | 7 ++++--- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > Sorry, this should have a fixes line: > > I think it is probably > > Fixes: 6086efe73498 ("vfio-iommufd: Move noiommu compat validation > out of vfio_iommufd_bind()") > > By the look of it, since that is what started skipping the > vfio_df_open() > > But after looking at that patch I'm now doubting that this is the > right fix. > > Previously we'd still do vfio_df_device_first_open(), just the > vfio_df_iommufd_bind() was skipped. > > Now we skip all of vfio_df_device_first_open() which also means we > skip: > > if (!try_module_get(device->dev->driver->owner)) > return -ENODEV; > > and > if (device->ops->open_device) { > ret = device->ops->open_device(device); > > Which seems wrong to me?? We only want to skip the bind, we should > still do open_device! At least that is how it was before 6086e > > So.. This may not be the right fix. > > Maybe more like: Looks good to me, please disregard my patch. Tested-by: Jacob Pan <jacob.pan@linux.microsoft.com> I guess you will submit this? > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c > index c321d442f0da09..1b6a0e30544401 100644 > --- a/drivers/vfio/group.c > +++ b/drivers/vfio/group.c > @@ -192,18 +192,18 @@ static int vfio_df_group_open(struct > vfio_device_file *df) > * implies they expected translation to exist > */ > if (!capable(CAP_SYS_RAWIO) || > - vfio_iommufd_device_has_compat_ioas(device, > df->iommufd)) > + vfio_iommufd_device_has_compat_ioas(device, > df->iommufd)) { ret = -EPERM; > - else > - ret = 0; > - goto out_put_kvm; > + goto out_put_kvm; > + } > } > > ret = vfio_df_open(df); > if (ret) > goto out_put_kvm; > > - if (df->iommufd && device->open_count == 1) { > + if (df->iommufd && device->open_count == 1 && > + !vfio_device_is_noiommu(device)) { > ret = vfio_iommufd_compat_attach_ioas(device, > df->iommufd); if (ret) > goto out_close_device; > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c > index c8c3a2d53f86e1..26c9c3068c77da 100644 > --- a/drivers/vfio/iommufd.c > +++ b/drivers/vfio/iommufd.c > @@ -54,9 +54,6 @@ void vfio_df_iommufd_unbind(struct vfio_device_file > *df) > lockdep_assert_held(&vdev->dev_set->lock); > > - if (vfio_device_is_noiommu(vdev)) > - return; > - > if (vdev->ops->unbind_iommufd) > vdev->ops->unbind_iommufd(vdev); > } > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c > index 1fd261efc582d0..ff19ea05442e7d 100644 > --- a/drivers/vfio/vfio_main.c > +++ b/drivers/vfio/vfio_main.c > @@ -506,17 +506,19 @@ static int vfio_df_device_first_open(struct > vfio_device_file *df) { > struct vfio_device *device = df->device; > struct iommufd_ctx *iommufd = df->iommufd; > - int ret; > + int ret = 0; > > lockdep_assert_held(&device->dev_set->lock); > > if (!try_module_get(device->dev->driver->owner)) > return -ENODEV; > > - if (iommufd) > - ret = vfio_df_iommufd_bind(df); > - else > + if (iommufd) { > + if (!vfio_device_is_noiommu(device)) > + ret = vfio_df_iommufd_bind(df); > + } else { > ret = vfio_device_group_use_iommu(device); > + } > if (ret) > goto err_module_put; > > @@ -528,10 +530,12 @@ static int vfio_df_device_first_open(struct > vfio_device_file *df) return 0; > > err_unuse_iommu: > - if (iommufd) > - vfio_df_iommufd_unbind(df); > - else > + if (iommufd) { > + if (!vfio_device_is_noiommu(device)) > + vfio_df_iommufd_unbind(df); > + } else { > vfio_device_group_unuse_iommu(device); > + } > err_module_put: > module_put(device->dev->driver->owner); > return ret; > @@ -546,10 +550,12 @@ static void vfio_df_device_last_close(struct > vfio_device_file *df) > if (device->ops->close_device) > device->ops->close_device(device); > - if (iommufd) > - vfio_df_iommufd_unbind(df); > - else > + if (iommufd) { > + if (!vfio_device_is_noiommu(device)) > + vfio_df_iommufd_unbind(df); > + } else { > vfio_device_group_unuse_iommu(device); > + } > module_put(device->dev->driver->owner); > } > > > Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] vfio: Fix unbalanced vfio_df_close call in no-iommu mode 2025-05-30 16:51 ` Jacob Pan @ 2025-05-30 16:53 ` Jason Gunthorpe 0 siblings, 0 replies; 12+ messages in thread From: Jason Gunthorpe @ 2025-05-30 16:53 UTC (permalink / raw) To: Jacob Pan Cc: linux-kernel, iommu@lists.linux.dev, Alex Williamson, Liu, Yi L, Zhang Yu, Easwar Hariharan On Fri, May 30, 2025 at 09:51:12AM -0700, Jacob Pan wrote: > > Maybe more like: > Looks good to me, please disregard my patch. > Tested-by: Jacob Pan <jacob.pan@linux.microsoft.com> > > I guess you will submit this? Oh I didn't think it would work first try :) Can you post it as part of your two part series? It is probably simpler Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] vfio: Fix unbalanced vfio_df_close call in no-iommu mode @ 2025-06-02 23:43 Jacob Pan 2025-06-02 23:43 ` [PATCH 2/2] vfio: Prevent open_count decrement to negative Jacob Pan 0 siblings, 1 reply; 12+ messages in thread From: Jacob Pan @ 2025-06-02 23:43 UTC (permalink / raw) To: linux-kernel, iommu@lists.linux.dev, Alex Williamson, Liu, Yi L, jgg@nvidia.com, Jacob Pan Cc: Zhang Yu, Easwar Hariharan, Saurabh Sengar For no-iommu enabled devices working under IOMMUFD VFIO compat mode, the group open path does not call vfio_df_open() and the open_count is 0. So calling vfio_df_close() in the group close path will trigger warning in vfio_assert_device_open(device); E.g. The following warning can be seen by running VFIO test. https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c CONFIG_VFIO_CONTAINER = n [ 29.094781] vfio-pci 0000:02:01.0: vfio-noiommu device opened by user (vfio-noiommu-pc:164) Failed to get device info [ 29.096540] ------------[ cut here ]------------ [ 29.096616] WARNING: CPU: 1 PID: 164 at drivers/vfio/vfio_main.c:487 vfio_df_close+0xac/0xb4 This patch adds checks for no-iommu mode and open_count to skip calling vfio_df_close. Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> --- drivers/vfio/group.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index c321d442f0da..834421149ffe 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -238,12 +238,13 @@ void vfio_df_group_close(struct vfio_device_file *df) mutex_lock(&device->group->group_lock); mutex_lock(&device->dev_set->lock); - vfio_df_close(df); - df->iommufd = NULL; - if (device->open_count == 0) vfio_device_put_kvm(device); + if (!vfio_device_is_noiommu(device)) + vfio_df_close(df); + + df->iommufd = NULL; mutex_unlock(&device->dev_set->lock); mutex_unlock(&device->group->group_lock); } -- 2.34.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] vfio: Prevent open_count decrement to negative 2025-06-02 23:43 Jacob Pan @ 2025-06-02 23:43 ` Jacob Pan 0 siblings, 0 replies; 12+ messages in thread From: Jacob Pan @ 2025-06-02 23:43 UTC (permalink / raw) To: linux-kernel, iommu@lists.linux.dev, Alex Williamson, Liu, Yi L, jgg@nvidia.com, Jacob Pan Cc: Zhang Yu, Easwar Hariharan, Saurabh Sengar When vfio_df_close() is called with open_count=0, it triggers a warning in vfio_assert_device_open() but still decrements open_count to -1. This allows a subsequent open to incorrectly pass the open_count == 0 check, leading to unintended behavior, such as setting df->access_granted = true. For example, running an IOMMUFD compat no-IOMMU device with VFIO tests (https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c) results in a warning and a failed VFIO_GROUP_GET_DEVICE_FD ioctl on the first run, but the second run succeeds incorrectly. Add checks to avoid decrementing open_count below zero Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> --- drivers/vfio/vfio_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 1fd261efc582..5046cae05222 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -583,7 +583,8 @@ void vfio_df_close(struct vfio_device_file *df) lockdep_assert_held(&device->dev_set->lock); - vfio_assert_device_open(device); + if (!vfio_assert_device_open(device)) + return; if (device->open_count == 1) vfio_df_device_last_close(df); device->open_count--; -- 2.34.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-06-03 17:54 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-16 16:45 [PATCH 1/2] vfio: Fix unbalanced vfio_df_close call in no-iommu mode Jacob Pan 2025-05-16 16:45 ` [PATCH 2/2] vfio: Prevent open_count decrement to negative Jacob Pan 2025-05-26 23:53 ` Jason Gunthorpe 2025-05-28 7:24 ` Yi Liu 2025-05-26 23:52 ` [PATCH 1/2] vfio: Fix unbalanced vfio_df_close call in no-iommu mode Jason Gunthorpe 2025-05-27 0:05 ` Jason Gunthorpe 2025-05-28 7:16 ` Yi Liu 2025-06-03 16:25 ` Jacob Pan 2025-06-03 17:54 ` Jason Gunthorpe 2025-05-30 16:51 ` Jacob Pan 2025-05-30 16:53 ` Jason Gunthorpe -- strict thread matches above, loose matches on Subject: below -- 2025-06-02 23:43 Jacob Pan 2025-06-02 23:43 ` [PATCH 2/2] vfio: Prevent open_count decrement to negative Jacob Pan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.