* [PATCH v3 01/12] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset()
2023-04-01 14:44 [PATCH v3 00/12] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
@ 2023-04-01 14:44 ` Yi Liu
2023-04-04 13:59 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 02/12] vfio/pci: Only check ownership of opened devices in hot reset Yi Liu
` (10 subsequent siblings)
11 siblings, 1 reply; 142+ messages in thread
From: Yi Liu @ 2023-04-01 14:44 UTC (permalink / raw)
To: alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
this suits more on what the code does.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
drivers/vfio/pci/vfio_pci_core.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a5ab416cf476..65bbef562268 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1308,9 +1308,8 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
}
/*
- * For each group_fd, get the group through the vfio external user
- * interface and store the group and iommu ID. This ensures the group
- * is held across the reset.
+ * Get the group file for each fd to ensure the group held across
+ * the reset
*/
for (file_idx = 0; file_idx < hdr.count; file_idx++) {
struct file *file = fget(group_fds[file_idx]);
--
2.34.1
^ permalink raw reply related [flat|nested] 142+ messages in thread* Re: [PATCH v3 01/12] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset()
2023-04-01 14:44 ` [PATCH v3 01/12] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
@ 2023-04-04 13:59 ` Eric Auger
2023-04-04 14:37 ` Liu, Yi L
0 siblings, 1 reply; 142+ messages in thread
From: Eric Auger @ 2023-04-04 13:59 UTC (permalink / raw)
To: Yi Liu, alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang
Hi Yi,
On 4/1/23 16:44, Yi Liu wrote:
> this suits more on what the code does.
>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> drivers/vfio/pci/vfio_pci_core.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index a5ab416cf476..65bbef562268 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1308,9 +1308,8 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
> }
>
> /*
> - * For each group_fd, get the group through the vfio external user
> - * interface and store the group and iommu ID. This ensures the group
> - * is held across the reset.
> + * Get the group file for each fd to ensure the group held across
to ensure the group is held
Besides
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
> + * the reset
> */
> for (file_idx = 0; file_idx < hdr.count; file_idx++) {
> struct file *file = fget(group_fds[file_idx]);
^ permalink raw reply [flat|nested] 142+ messages in thread* RE: [PATCH v3 01/12] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset()
2023-04-04 13:59 ` Eric Auger
@ 2023-04-04 14:37 ` Liu, Yi L
0 siblings, 0 replies; 142+ messages in thread
From: Liu, Yi L @ 2023-04-04 14:37 UTC (permalink / raw)
To: eric.auger@redhat.com, alex.williamson@redhat.com, jgg@nvidia.com,
Tian, Kevin
Cc: joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Tuesday, April 4, 2023 10:00 PM
>
> Hi Yi,
>
> On 4/1/23 16:44, Yi Liu wrote:
> > this suits more on what the code does.
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> > drivers/vfio/pci/vfio_pci_core.c | 5 ++---
> > 1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index a5ab416cf476..65bbef562268 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -1308,9 +1308,8 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
> > }
> >
> > /*
> > - * For each group_fd, get the group through the vfio external user
> > - * interface and store the group and iommu ID. This ensures the group
> > - * is held across the reset.
> > + * Get the group file for each fd to ensure the group held across
> to ensure the group is held
got it.
> Besides
>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
>
> Eric
>
>
> > + * the reset
> > */
> > for (file_idx = 0; file_idx < hdr.count; file_idx++) {
> > struct file *file = fget(group_fds[file_idx]);
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH v3 02/12] vfio/pci: Only check ownership of opened devices in hot reset
2023-04-01 14:44 [PATCH v3 00/12] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
2023-04-01 14:44 ` [PATCH v3 01/12] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
@ 2023-04-01 14:44 ` Yi Liu
2023-04-04 13:59 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 03/12] vfio/pci: Move the existing hot reset logic to be a helper Yi Liu
` (9 subsequent siblings)
11 siblings, 1 reply; 142+ messages in thread
From: Yi Liu @ 2023-04-01 14:44 UTC (permalink / raw)
To: alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
If the affected device is not opened by any user, it's safe to reset it
given it's not in use.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
drivers/vfio/pci/vfio_pci_core.c | 14 +++++++++++---
include/uapi/linux/vfio.h | 8 ++++++++
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 65bbef562268..5d745c9abf05 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2429,10 +2429,18 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
/*
- * Test whether all the affected devices are contained by the
- * set of groups provided by the user.
+ * Test whether all the affected devices can be reset by the
+ * user.
+ *
+ * Resetting an unused device (not opened) is safe, because
+ * dev_set->lock is held in hot reset path so this device
+ * cannot race being opened by another user simultaneously.
+ *
+ * Otherwise all opened devices in the dev_set must be
+ * contained by the set of groups provided by the user.
*/
- if (!vfio_dev_in_groups(cur_vma, groups)) {
+ if (cur_vma->vdev.open_count &&
+ !vfio_dev_in_groups(cur_vma, groups)) {
ret = -EINVAL;
goto err_undo;
}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 0552e8dcf0cb..f96e5689cffc 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -673,6 +673,14 @@ struct vfio_pci_hot_reset_info {
* VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
* struct vfio_pci_hot_reset)
*
+ * Userspace requests hot reset for the devices it uses. Due to the
+ * underlying topology, multiple devices can be affected in the reset
+ * while some might be opened by another user. To avoid interference
+ * the calling user must ensure all affected devices, if opened, are
+ * owned by itself.
+ *
+ * The ownership is proved by an array of group fds.
+ *
* Return: 0 on success, -errno on failure.
*/
struct vfio_pci_hot_reset {
--
2.34.1
^ permalink raw reply related [flat|nested] 142+ messages in thread* Re: [PATCH v3 02/12] vfio/pci: Only check ownership of opened devices in hot reset
2023-04-01 14:44 ` [PATCH v3 02/12] vfio/pci: Only check ownership of opened devices in hot reset Yi Liu
@ 2023-04-04 13:59 ` Eric Auger
2023-04-04 14:37 ` Liu, Yi L
0 siblings, 1 reply; 142+ messages in thread
From: Eric Auger @ 2023-04-04 13:59 UTC (permalink / raw)
To: Yi Liu, alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang
Hi YI,
On 4/1/23 16:44, Yi Liu wrote:
> If the affected device is not opened by any user, it's safe to reset it
> given it's not in use.
>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> drivers/vfio/pci/vfio_pci_core.c | 14 +++++++++++---
> include/uapi/linux/vfio.h | 8 ++++++++
> 2 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 65bbef562268..5d745c9abf05 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -2429,10 +2429,18 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
>
> list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
> /*
> - * Test whether all the affected devices are contained by the
> - * set of groups provided by the user.
> + * Test whether all the affected devices can be reset by the
> + * user.
> + *
> + * Resetting an unused device (not opened) is safe, because
> + * dev_set->lock is held in hot reset path so this device
> + * cannot race being opened by another user simultaneously.
> + *
> + * Otherwise all opened devices in the dev_set must be
> + * contained by the set of groups provided by the user.
> */
> - if (!vfio_dev_in_groups(cur_vma, groups)) {
> + if (cur_vma->vdev.open_count &&
> + !vfio_dev_in_groups(cur_vma, groups)) {
> ret = -EINVAL;
> goto err_undo;
> }
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 0552e8dcf0cb..f96e5689cffc 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -673,6 +673,14 @@ struct vfio_pci_hot_reset_info {
> * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
> * struct vfio_pci_hot_reset)
> *
> + * Userspace requests hot reset for the devices it uses. Due to the
> + * underlying topology, multiple devices can be affected in the reset
by the reset
> + * while some might be opened by another user. To avoid interference
s/interference/hot reset failure?
> + * the calling user must ensure all affected devices, if opened, are
> + * owned by itself.
> + *
> + * The ownership is proved by an array of group fds.
> + *
> * Return: 0 on success, -errno on failure.
> */
> struct vfio_pci_hot_reset {
Thanks
Eric
^ permalink raw reply [flat|nested] 142+ messages in thread* RE: [PATCH v3 02/12] vfio/pci: Only check ownership of opened devices in hot reset
2023-04-04 13:59 ` Eric Auger
@ 2023-04-04 14:37 ` Liu, Yi L
2023-04-04 15:18 ` Eric Auger
0 siblings, 1 reply; 142+ messages in thread
From: Liu, Yi L @ 2023-04-04 14:37 UTC (permalink / raw)
To: eric.auger@redhat.com, alex.williamson@redhat.com, jgg@nvidia.com,
Tian, Kevin
Cc: joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
Hi Eric,
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Tuesday, April 4, 2023 10:00 PM
>
> Hi YI,
>
> On 4/1/23 16:44, Yi Liu wrote:
> > If the affected device is not opened by any user, it's safe to reset it
> > given it's not in use.
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> > drivers/vfio/pci/vfio_pci_core.c | 14 +++++++++++---
> > include/uapi/linux/vfio.h | 8 ++++++++
> > 2 files changed, 19 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index 65bbef562268..5d745c9abf05 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -2429,10 +2429,18 @@ static int vfio_pci_dev_set_hot_reset(struct
> vfio_device_set *dev_set,
> >
> > list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
> > /*
> > - * Test whether all the affected devices are contained by the
> > - * set of groups provided by the user.
> > + * Test whether all the affected devices can be reset by the
> > + * user.
> > + *
> > + * Resetting an unused device (not opened) is safe, because
> > + * dev_set->lock is held in hot reset path so this device
> > + * cannot race being opened by another user simultaneously.
> > + *
> > + * Otherwise all opened devices in the dev_set must be
> > + * contained by the set of groups provided by the user.
> > */
> > - if (!vfio_dev_in_groups(cur_vma, groups)) {
> > + if (cur_vma->vdev.open_count &&
> > + !vfio_dev_in_groups(cur_vma, groups)) {
> > ret = -EINVAL;
> > goto err_undo;
> > }
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 0552e8dcf0cb..f96e5689cffc 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -673,6 +673,14 @@ struct vfio_pci_hot_reset_info {
> > * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
> > * struct vfio_pci_hot_reset)
> > *
> > + * Userspace requests hot reset for the devices it uses. Due to the
> > + * underlying topology, multiple devices can be affected in the reset
> by the reset
> > + * while some might be opened by another user. To avoid interference
> s/interference/hot reset failure?
I don’t think user can really avoid hot reset failure since there may
be new devices plugged into the affected slot. Even user has opened
all the groups/devices reported by VFIO_DEVICE_GET_PCI_HOT_RESET_INFO,
the hot reset can fail if new device is plugged in and has not been
bound to vfio or opened by another user during the window of
_INFO and HOT_RESET.
maybe the whole statement should be as below:
To avoid interference, the hot reset can only be conducted when all
the affected devices are either opened by the calling user or not
opened yet at the moment of the hot reset attempt.
> > + * the calling user must ensure all affected devices, if opened, are
> > + * owned by itself.
> > + *
> > + * The ownership is proved by an array of group fds.
> > + *
> > * Return: 0 on success, -errno on failure.
> > */
> > struct vfio_pci_hot_reset {
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread* Re: [PATCH v3 02/12] vfio/pci: Only check ownership of opened devices in hot reset
2023-04-04 14:37 ` Liu, Yi L
@ 2023-04-04 15:18 ` Eric Auger
2023-04-04 15:29 ` Liu, Yi L
0 siblings, 1 reply; 142+ messages in thread
From: Eric Auger @ 2023-04-04 15:18 UTC (permalink / raw)
To: Liu, Yi L, alex.williamson@redhat.com, jgg@nvidia.com,
Tian, Kevin
Cc: joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
Hi Yi,
On 4/4/23 16:37, Liu, Yi L wrote:
> Hi Eric,
>
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Tuesday, April 4, 2023 10:00 PM
>>
>> Hi YI,
>>
>> On 4/1/23 16:44, Yi Liu wrote:
>>> If the affected device is not opened by any user, it's safe to reset it
>>> given it's not in use.
>>>
>>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>>> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>> ---
>>> drivers/vfio/pci/vfio_pci_core.c | 14 +++++++++++---
>>> include/uapi/linux/vfio.h | 8 ++++++++
>>> 2 files changed, 19 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>>> index 65bbef562268..5d745c9abf05 100644
>>> --- a/drivers/vfio/pci/vfio_pci_core.c
>>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>>> @@ -2429,10 +2429,18 @@ static int vfio_pci_dev_set_hot_reset(struct
>> vfio_device_set *dev_set,
>>> list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
>>> /*
>>> - * Test whether all the affected devices are contained by the
>>> - * set of groups provided by the user.
>>> + * Test whether all the affected devices can be reset by the
>>> + * user.
>>> + *
>>> + * Resetting an unused device (not opened) is safe, because
>>> + * dev_set->lock is held in hot reset path so this device
>>> + * cannot race being opened by another user simultaneously.
>>> + *
>>> + * Otherwise all opened devices in the dev_set must be
>>> + * contained by the set of groups provided by the user.
>>> */
>>> - if (!vfio_dev_in_groups(cur_vma, groups)) {
>>> + if (cur_vma->vdev.open_count &&
>>> + !vfio_dev_in_groups(cur_vma, groups)) {
>>> ret = -EINVAL;
>>> goto err_undo;
>>> }
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index 0552e8dcf0cb..f96e5689cffc 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -673,6 +673,14 @@ struct vfio_pci_hot_reset_info {
>>> * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
>>> * struct vfio_pci_hot_reset)
>>> *
>>> + * Userspace requests hot reset for the devices it uses. Due to the
>>> + * underlying topology, multiple devices can be affected in the reset
>> by the reset
>>> + * while some might be opened by another user. To avoid interference
>> s/interference/hot reset failure?
> I don’t think user can really avoid hot reset failure since there may
> be new devices plugged into the affected slot. Even user has opened
I don't know the legacy wrt that issue but this sounds a serious issue,
meaning the reset of an assigned device could impact another device
belonging to another group not not owned by the user?
> all the groups/devices reported by VFIO_DEVICE_GET_PCI_HOT_RESET_INFO,
> the hot reset can fail if new device is plugged in and has not been
> bound to vfio or opened by another user during the window of
> _INFO and HOT_RESET.
with respect to the latter isn't the dev_set lock held during the hot
reset and sufficient to prevent any new opening to occur?
>
> maybe the whole statement should be as below:
>
> To avoid interference, the hot reset can only be conducted when all
> the affected devices are either opened by the calling user or not
> opened yet at the moment of the hot reset attempt.
OK
Eric
>
>>> + * the calling user must ensure all affected devices, if opened, are
>>> + * owned by itself.
>>> + *
>>> + * The ownership is proved by an array of group fds.
>>> + *
>>> * Return: 0 on success, -errno on failure.
>>> */
>>> struct vfio_pci_hot_reset {
> Regards,
> Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread* RE: [PATCH v3 02/12] vfio/pci: Only check ownership of opened devices in hot reset
2023-04-04 15:18 ` Eric Auger
@ 2023-04-04 15:29 ` Liu, Yi L
2023-04-04 15:59 ` Eric Auger
0 siblings, 1 reply; 142+ messages in thread
From: Liu, Yi L @ 2023-04-04 15:29 UTC (permalink / raw)
To: eric.auger@redhat.com, alex.williamson@redhat.com, jgg@nvidia.com,
Tian, Kevin
Cc: joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Tuesday, April 4, 2023 11:19 PM
>
> Hi Yi,
>
> On 4/4/23 16:37, Liu, Yi L wrote:
> > Hi Eric,
> >
> >> From: Eric Auger <eric.auger@redhat.com>
> >> Sent: Tuesday, April 4, 2023 10:00 PM
> >>
> >> Hi YI,
> >>
> >> On 4/1/23 16:44, Yi Liu wrote:
> >>> If the affected device is not opened by any user, it's safe to reset it
> >>> given it's not in use.
> >>>
> >>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> >>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> >>> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> >>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> >>> ---
> >>> drivers/vfio/pci/vfio_pci_core.c | 14 +++++++++++---
> >>> include/uapi/linux/vfio.h | 8 ++++++++
> >>> 2 files changed, 19 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> >>> index 65bbef562268..5d745c9abf05 100644
> >>> --- a/drivers/vfio/pci/vfio_pci_core.c
> >>> +++ b/drivers/vfio/pci/vfio_pci_core.c
> >>> @@ -2429,10 +2429,18 @@ static int vfio_pci_dev_set_hot_reset(struct
> >> vfio_device_set *dev_set,
> >>> list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
> >>> /*
> >>> - * Test whether all the affected devices are contained by the
> >>> - * set of groups provided by the user.
> >>> + * Test whether all the affected devices can be reset by the
> >>> + * user.
> >>> + *
> >>> + * Resetting an unused device (not opened) is safe, because
> >>> + * dev_set->lock is held in hot reset path so this device
> >>> + * cannot race being opened by another user simultaneously.
> >>> + *
> >>> + * Otherwise all opened devices in the dev_set must be
> >>> + * contained by the set of groups provided by the user.
> >>> */
> >>> - if (!vfio_dev_in_groups(cur_vma, groups)) {
> >>> + if (cur_vma->vdev.open_count &&
> >>> + !vfio_dev_in_groups(cur_vma, groups)) {
> >>> ret = -EINVAL;
> >>> goto err_undo;
> >>> }
> >>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >>> index 0552e8dcf0cb..f96e5689cffc 100644
> >>> --- a/include/uapi/linux/vfio.h
> >>> +++ b/include/uapi/linux/vfio.h
> >>> @@ -673,6 +673,14 @@ struct vfio_pci_hot_reset_info {
> >>> * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
> >>> * struct vfio_pci_hot_reset)
> >>> *
> >>> + * Userspace requests hot reset for the devices it uses. Due to the
> >>> + * underlying topology, multiple devices can be affected in the reset
> >> by the reset
> >>> + * while some might be opened by another user. To avoid interference
> >> s/interference/hot reset failure?
> > I don’t think user can really avoid hot reset failure since there may
> > be new devices plugged into the affected slot. Even user has opened
> I don't know the legacy wrt that issue but this sounds a serious issue,
> meaning the reset of an assigned device could impact another device
> belonging to another group not not owned by the user?
but the hot reset shall fail as the group is not owned by the user.
> > all the groups/devices reported by VFIO_DEVICE_GET_PCI_HOT_RESET_INFO,
> > the hot reset can fail if new device is plugged in and has not been
> > bound to vfio or opened by another user during the window of
> > _INFO and HOT_RESET.
> with respect to the latter isn't the dev_set lock held during the hot
> reset and sufficient to prevent any new opening to occur?
yes. new open needs to acquire the dev_set lock. So when hot reset
acquires the dev_set lock, then no new open can occur.
Regards,
Yi Liu
> >
> > maybe the whole statement should be as below:
> >
> > To avoid interference, the hot reset can only be conducted when all
> > the affected devices are either opened by the calling user or not
> > opened yet at the moment of the hot reset attempt.
>
> OK
>
> Eric
> >
> >>> + * the calling user must ensure all affected devices, if opened, are
> >>> + * owned by itself.
> >>> + *
> >>> + * The ownership is proved by an array of group fds.
> >>> + *
> >>> * Return: 0 on success, -errno on failure.
> >>> */
> >>> struct vfio_pci_hot_reset {
> > Regards,
> > Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread* Re: [PATCH v3 02/12] vfio/pci: Only check ownership of opened devices in hot reset
2023-04-04 15:29 ` Liu, Yi L
@ 2023-04-04 15:59 ` Eric Auger
2023-04-05 11:41 ` Jason Gunthorpe
0 siblings, 1 reply; 142+ messages in thread
From: Eric Auger @ 2023-04-04 15:59 UTC (permalink / raw)
To: Liu, Yi L, alex.williamson@redhat.com, jgg@nvidia.com,
Tian, Kevin
Cc: joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On 4/4/23 17:29, Liu, Yi L wrote:
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Tuesday, April 4, 2023 11:19 PM
>>
>> Hi Yi,
>>
>> On 4/4/23 16:37, Liu, Yi L wrote:
>>> Hi Eric,
>>>
>>>> From: Eric Auger <eric.auger@redhat.com>
>>>> Sent: Tuesday, April 4, 2023 10:00 PM
>>>>
>>>> Hi YI,
>>>>
>>>> On 4/1/23 16:44, Yi Liu wrote:
>>>>> If the affected device is not opened by any user, it's safe to reset it
>>>>> given it's not in use.
>>>>>
>>>>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>>>>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>>>>> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
>>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>>> ---
>>>>> drivers/vfio/pci/vfio_pci_core.c | 14 +++++++++++---
>>>>> include/uapi/linux/vfio.h | 8 ++++++++
>>>>> 2 files changed, 19 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>>>>> index 65bbef562268..5d745c9abf05 100644
>>>>> --- a/drivers/vfio/pci/vfio_pci_core.c
>>>>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>>>>> @@ -2429,10 +2429,18 @@ static int vfio_pci_dev_set_hot_reset(struct
>>>> vfio_device_set *dev_set,
>>>>> list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
>>>>> /*
>>>>> - * Test whether all the affected devices are contained by the
>>>>> - * set of groups provided by the user.
>>>>> + * Test whether all the affected devices can be reset by the
>>>>> + * user.
>>>>> + *
>>>>> + * Resetting an unused device (not opened) is safe, because
>>>>> + * dev_set->lock is held in hot reset path so this device
>>>>> + * cannot race being opened by another user simultaneously.
>>>>> + *
>>>>> + * Otherwise all opened devices in the dev_set must be
>>>>> + * contained by the set of groups provided by the user.
>>>>> */
>>>>> - if (!vfio_dev_in_groups(cur_vma, groups)) {
>>>>> + if (cur_vma->vdev.open_count &&
>>>>> + !vfio_dev_in_groups(cur_vma, groups)) {
>>>>> ret = -EINVAL;
>>>>> goto err_undo;
>>>>> }
>>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>>>> index 0552e8dcf0cb..f96e5689cffc 100644
>>>>> --- a/include/uapi/linux/vfio.h
>>>>> +++ b/include/uapi/linux/vfio.h
>>>>> @@ -673,6 +673,14 @@ struct vfio_pci_hot_reset_info {
>>>>> * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
>>>>> * struct vfio_pci_hot_reset)
>>>>> *
>>>>> + * Userspace requests hot reset for the devices it uses. Due to the
>>>>> + * underlying topology, multiple devices can be affected in the reset
>>>> by the reset
>>>>> + * while some might be opened by another user. To avoid interference
>>>> s/interference/hot reset failure?
>>> I don’t think user can really avoid hot reset failure since there may
>>> be new devices plugged into the affected slot. Even user has opened
>> I don't know the legacy wrt that issue but this sounds a serious issue,
>> meaning the reset of an assigned device could impact another device
>> belonging to another group not not owned by the user?
> but the hot reset shall fail as the group is not owned by the user.
sure it shall but I fail to understand if the reset fails or the device
plug is somehow delayed until the reset completes.
>
>>> all the groups/devices reported by VFIO_DEVICE_GET_PCI_HOT_RESET_INFO,
>>> the hot reset can fail if new device is plugged in and has not been
>>> bound to vfio or opened by another user during the window of
>>> _INFO and HOT_RESET.
>> with respect to the latter isn't the dev_set lock held during the hot
>> reset and sufficient to prevent any new opening to occur?
> yes. new open needs to acquire the dev_set lock. So when hot reset
> acquires the dev_set lock, then no new open can occur.
>
> Regards,
> Yi Liu
>
>>> maybe the whole statement should be as below:
>>>
>>> To avoid interference, the hot reset can only be conducted when all
>>> the affected devices are either opened by the calling user or not
>>> opened yet at the moment of the hot reset attempt.
>> OK
>>
>> Eric
>>>>> + * the calling user must ensure all affected devices, if opened, are
>>>>> + * owned by itself.
>>>>> + *
>>>>> + * The ownership is proved by an array of group fds.
>>>>> + *
>>>>> * Return: 0 on success, -errno on failure.
>>>>> */
>>>>> struct vfio_pci_hot_reset {
>>> Regards,
>>> Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread* Re: [PATCH v3 02/12] vfio/pci: Only check ownership of opened devices in hot reset
2023-04-04 15:59 ` Eric Auger
@ 2023-04-05 11:41 ` Jason Gunthorpe
2023-04-05 15:14 ` Eric Auger
0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2023-04-05 11:41 UTC (permalink / raw)
To: Eric Auger
Cc: Liu, Yi L, alex.williamson@redhat.com, Tian, Kevin,
joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Tue, Apr 04, 2023 at 05:59:01PM +0200, Eric Auger wrote:
> > but the hot reset shall fail as the group is not owned by the user.
>
> sure it shall but I fail to understand if the reset fails or the device
> plug is somehow delayed until the reset completes.
It is just racy today - vfio_pci_dev_set_resettable() doesn't hold any
locks across the pci_walk_bus() check to prevent hot plug in while it is
working on the reset.
Jason
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 02/12] vfio/pci: Only check ownership of opened devices in hot reset
2023-04-05 11:41 ` Jason Gunthorpe
@ 2023-04-05 15:14 ` Eric Auger
0 siblings, 0 replies; 142+ messages in thread
From: Eric Auger @ 2023-04-05 15:14 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Liu, Yi L, alex.williamson@redhat.com, Tian, Kevin,
joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
Hi Jason,
On 4/5/23 13:41, Jason Gunthorpe wrote:
> On Tue, Apr 04, 2023 at 05:59:01PM +0200, Eric Auger wrote:
>
>>> but the hot reset shall fail as the group is not owned by the user.
>> sure it shall but I fail to understand if the reset fails or the device
>> plug is somehow delayed until the reset completes.
> It is just racy today - vfio_pci_dev_set_resettable() doesn't hold any
> locks across the pci_walk_bus() check to prevent hot plug in while it is
> working on the reset.
OK thanks
Eric
>
> Jason
>
^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH v3 03/12] vfio/pci: Move the existing hot reset logic to be a helper
2023-04-01 14:44 [PATCH v3 00/12] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
2023-04-01 14:44 ` [PATCH v3 01/12] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
2023-04-01 14:44 ` [PATCH v3 02/12] vfio/pci: Only check ownership of opened devices in hot reset Yi Liu
@ 2023-04-01 14:44 ` Yi Liu
2023-04-04 13:59 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 04/12] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device Yi Liu
` (8 subsequent siblings)
11 siblings, 1 reply; 142+ messages in thread
From: Yi Liu @ 2023-04-01 14:44 UTC (permalink / raw)
To: alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
This prepares to add another method for hot reset. The major hot reset logic
are moved to vfio_pci_ioctl_pci_hot_reset_groups().
No functional change is intended.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
drivers/vfio/pci/vfio_pci_core.c | 56 +++++++++++++++++++-------------
1 file changed, 33 insertions(+), 23 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 5d745c9abf05..3696b8e58445 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1255,29 +1255,17 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
return ret;
}
-static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
- struct vfio_pci_hot_reset __user *arg)
+static int
+vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
+ struct vfio_pci_hot_reset *hdr,
+ bool slot,
+ struct vfio_pci_hot_reset __user *arg)
{
- unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
- struct vfio_pci_hot_reset hdr;
int32_t *group_fds;
struct file **files;
struct vfio_pci_group_info info;
- bool slot = false;
int file_idx, count = 0, ret = 0;
- if (copy_from_user(&hdr, arg, minsz))
- return -EFAULT;
-
- if (hdr.argsz < minsz || hdr.flags)
- return -EINVAL;
-
- /* Can we do a slot or bus reset or neither? */
- if (!pci_probe_reset_slot(vdev->pdev->slot))
- slot = true;
- else if (pci_probe_reset_bus(vdev->pdev->bus))
- return -ENODEV;
-
/*
* We can't let userspace give us an arbitrarily large buffer to copy,
* so verify how many we think there could be. Note groups can have
@@ -1289,11 +1277,11 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
return ret;
/* Somewhere between 1 and count is OK */
- if (!hdr.count || hdr.count > count)
+ if (!hdr->count || hdr->count > count)
return -EINVAL;
- group_fds = kcalloc(hdr.count, sizeof(*group_fds), GFP_KERNEL);
- files = kcalloc(hdr.count, sizeof(*files), GFP_KERNEL);
+ group_fds = kcalloc(hdr->count, sizeof(*group_fds), GFP_KERNEL);
+ files = kcalloc(hdr->count, sizeof(*files), GFP_KERNEL);
if (!group_fds || !files) {
kfree(group_fds);
kfree(files);
@@ -1301,7 +1289,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
}
if (copy_from_user(group_fds, arg->group_fds,
- hdr.count * sizeof(*group_fds))) {
+ hdr->count * sizeof(*group_fds))) {
kfree(group_fds);
kfree(files);
return -EFAULT;
@@ -1311,7 +1299,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
* Get the group file for each fd to ensure the group held across
* the reset
*/
- for (file_idx = 0; file_idx < hdr.count; file_idx++) {
+ for (file_idx = 0; file_idx < hdr->count; file_idx++) {
struct file *file = fget(group_fds[file_idx]);
if (!file) {
@@ -1335,7 +1323,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
if (ret)
goto hot_reset_release;
- info.count = hdr.count;
+ info.count = hdr->count;
info.files = files;
ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
@@ -1348,6 +1336,28 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
return ret;
}
+static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
+ struct vfio_pci_hot_reset __user *arg)
+{
+ unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
+ struct vfio_pci_hot_reset hdr;
+ bool slot = false;
+
+ if (copy_from_user(&hdr, arg, minsz))
+ return -EFAULT;
+
+ if (hdr.argsz < minsz || hdr.flags)
+ return -EINVAL;
+
+ /* Can we do a slot or bus reset or neither? */
+ if (!pci_probe_reset_slot(vdev->pdev->slot))
+ slot = true;
+ else if (pci_probe_reset_bus(vdev->pdev->bus))
+ return -ENODEV;
+
+ return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
+}
+
static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
struct vfio_device_ioeventfd __user *arg)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 142+ messages in thread* Re: [PATCH v3 03/12] vfio/pci: Move the existing hot reset logic to be a helper
2023-04-01 14:44 ` [PATCH v3 03/12] vfio/pci: Move the existing hot reset logic to be a helper Yi Liu
@ 2023-04-04 13:59 ` Eric Auger
2023-04-04 14:24 ` Liu, Yi L
0 siblings, 1 reply; 142+ messages in thread
From: Eric Auger @ 2023-04-04 13:59 UTC (permalink / raw)
To: Yi Liu, alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang
Hi Yi,
On 4/1/23 16:44, Yi Liu wrote:
> This prepares to add another method for hot reset. The major hot reset logic
> are moved to vfio_pci_ioctl_pci_hot_reset_groups().
>
> No functional change is intended.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> drivers/vfio/pci/vfio_pci_core.c | 56 +++++++++++++++++++-------------
> 1 file changed, 33 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 5d745c9abf05..3696b8e58445 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1255,29 +1255,17 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
> return ret;
> }
>
> -static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
> - struct vfio_pci_hot_reset __user *arg)
> +static int
> +vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
> + struct vfio_pci_hot_reset *hdr,
nit why don't you simply pass the user group count as decoded earlier.
hdr sounds like a dup of arg.
> + bool slot,
> + struct vfio_pci_hot_reset __user *arg)
> {
> - unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
> - struct vfio_pci_hot_reset hdr;
> int32_t *group_fds;
> struct file **files;
> struct vfio_pci_group_info info;
> - bool slot = false;
> int file_idx, count = 0, ret = 0;
>
> - if (copy_from_user(&hdr, arg, minsz))
> - return -EFAULT;
> -
> - if (hdr.argsz < minsz || hdr.flags)
> - return -EINVAL;
> -
> - /* Can we do a slot or bus reset or neither? */
> - if (!pci_probe_reset_slot(vdev->pdev->slot))
> - slot = true;
> - else if (pci_probe_reset_bus(vdev->pdev->bus))
> - return -ENODEV;
> -
> /*
> * We can't let userspace give us an arbitrarily large buffer to copy,
> * so verify how many we think there could be. Note groups can have
> @@ -1289,11 +1277,11 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
> return ret;
>
> /* Somewhere between 1 and count is OK */
> - if (!hdr.count || hdr.count > count)
> + if (!hdr->count || hdr->count > count)
> return -EINVAL;
>
> - group_fds = kcalloc(hdr.count, sizeof(*group_fds), GFP_KERNEL);
> - files = kcalloc(hdr.count, sizeof(*files), GFP_KERNEL);
> + group_fds = kcalloc(hdr->count, sizeof(*group_fds), GFP_KERNEL);
> + files = kcalloc(hdr->count, sizeof(*files), GFP_KERNEL);
> if (!group_fds || !files) {
> kfree(group_fds);
> kfree(files);
> @@ -1301,7 +1289,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
> }
>
> if (copy_from_user(group_fds, arg->group_fds,
> - hdr.count * sizeof(*group_fds))) {
> + hdr->count * sizeof(*group_fds))) {
> kfree(group_fds);
> kfree(files);
> return -EFAULT;
> @@ -1311,7 +1299,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
> * Get the group file for each fd to ensure the group held across
> * the reset
> */
> - for (file_idx = 0; file_idx < hdr.count; file_idx++) {
> + for (file_idx = 0; file_idx < hdr->count; file_idx++) {
> struct file *file = fget(group_fds[file_idx]);
>
> if (!file) {
> @@ -1335,7 +1323,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
> if (ret)
> goto hot_reset_release;
>
> - info.count = hdr.count;
> + info.count = hdr->count;
> info.files = files;
>
> ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
> @@ -1348,6 +1336,28 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
> return ret;
> }
>
> +static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
> + struct vfio_pci_hot_reset __user *arg)
> +{
> + unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
> + struct vfio_pci_hot_reset hdr;
> + bool slot = false;
> +
> + if (copy_from_user(&hdr, arg, minsz))
> + return -EFAULT;
> +
> + if (hdr.argsz < minsz || hdr.flags)
> + return -EINVAL;
> +
> + /* Can we do a slot or bus reset or neither? */
> + if (!pci_probe_reset_slot(vdev->pdev->slot))
> + slot = true;
> + else if (pci_probe_reset_bus(vdev->pdev->bus))
> + return -ENODEV;
> +
> + return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
> +}
> +
> static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
> struct vfio_device_ioeventfd __user *arg)
> {
Besides
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Thanks
Eric
^ permalink raw reply [flat|nested] 142+ messages in thread* RE: [PATCH v3 03/12] vfio/pci: Move the existing hot reset logic to be a helper
2023-04-04 13:59 ` Eric Auger
@ 2023-04-04 14:24 ` Liu, Yi L
0 siblings, 0 replies; 142+ messages in thread
From: Liu, Yi L @ 2023-04-04 14:24 UTC (permalink / raw)
To: eric.auger@redhat.com, alex.williamson@redhat.com, jgg@nvidia.com,
Tian, Kevin
Cc: joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Tuesday, April 4, 2023 10:00 PM
>
> Hi Yi,
>
> On 4/1/23 16:44, Yi Liu wrote:
> > This prepares to add another method for hot reset. The major hot reset logic
> > are moved to vfio_pci_ioctl_pci_hot_reset_groups().
> >
> > No functional change is intended.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> > drivers/vfio/pci/vfio_pci_core.c | 56 +++++++++++++++++++-------------
> > 1 file changed, 33 insertions(+), 23 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index 5d745c9abf05..3696b8e58445 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -1255,29 +1255,17 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
> > return ret;
> > }
> >
> > -static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
> > - struct vfio_pci_hot_reset __user *arg)
> > +static int
> > +vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
> > + struct vfio_pci_hot_reset *hdr,
> nit why don't you simply pass the user group count as decoded earlier.
> hdr sounds like a dup of arg.
indeed. only hdr->count is needed.
> > + bool slot,
> > + struct vfio_pci_hot_reset __user *arg)
> > {
> > - unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
> > - struct vfio_pci_hot_reset hdr;
> > int32_t *group_fds;
> > struct file **files;
> > struct vfio_pci_group_info info;
> > - bool slot = false;
> > int file_idx, count = 0, ret = 0;
> >
> > - if (copy_from_user(&hdr, arg, minsz))
> > - return -EFAULT;
> > -
> > - if (hdr.argsz < minsz || hdr.flags)
> > - return -EINVAL;
> > -
> > - /* Can we do a slot or bus reset or neither? */
> > - if (!pci_probe_reset_slot(vdev->pdev->slot))
> > - slot = true;
> > - else if (pci_probe_reset_bus(vdev->pdev->bus))
> > - return -ENODEV;
> > -
> > /*
> > * We can't let userspace give us an arbitrarily large buffer to copy,
> > * so verify how many we think there could be. Note groups can have
> > @@ -1289,11 +1277,11 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
> > return ret;
> >
> > /* Somewhere between 1 and count is OK */
> > - if (!hdr.count || hdr.count > count)
> > + if (!hdr->count || hdr->count > count)
> > return -EINVAL;
> >
> > - group_fds = kcalloc(hdr.count, sizeof(*group_fds), GFP_KERNEL);
> > - files = kcalloc(hdr.count, sizeof(*files), GFP_KERNEL);
> > + group_fds = kcalloc(hdr->count, sizeof(*group_fds), GFP_KERNEL);
> > + files = kcalloc(hdr->count, sizeof(*files), GFP_KERNEL);
> > if (!group_fds || !files) {
> > kfree(group_fds);
> > kfree(files);
> > @@ -1301,7 +1289,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
> > }
> >
> > if (copy_from_user(group_fds, arg->group_fds,
> > - hdr.count * sizeof(*group_fds))) {
> > + hdr->count * sizeof(*group_fds))) {
> > kfree(group_fds);
> > kfree(files);
> > return -EFAULT;
> > @@ -1311,7 +1299,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
> > * Get the group file for each fd to ensure the group held across
> > * the reset
> > */
> > - for (file_idx = 0; file_idx < hdr.count; file_idx++) {
> > + for (file_idx = 0; file_idx < hdr->count; file_idx++) {
> > struct file *file = fget(group_fds[file_idx]);
> >
> > if (!file) {
> > @@ -1335,7 +1323,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
> > if (ret)
> > goto hot_reset_release;
> >
> > - info.count = hdr.count;
> > + info.count = hdr->count;
> > info.files = files;
> >
> > ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
> > @@ -1348,6 +1336,28 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
> > return ret;
> > }
> >
> > +static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
> > + struct vfio_pci_hot_reset __user *arg)
> > +{
> > + unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
> > + struct vfio_pci_hot_reset hdr;
> > + bool slot = false;
> > +
> > + if (copy_from_user(&hdr, arg, minsz))
> > + return -EFAULT;
> > +
> > + if (hdr.argsz < minsz || hdr.flags)
> > + return -EINVAL;
> > +
> > + /* Can we do a slot or bus reset or neither? */
> > + if (!pci_probe_reset_slot(vdev->pdev->slot))
> > + slot = true;
> > + else if (pci_probe_reset_bus(vdev->pdev->bus))
> > + return -ENODEV;
> > +
> > + return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
> > +}
> > +
> > static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
> > struct vfio_device_ioeventfd __user *arg)
> > {
> Besides
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
Thanks,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH v3 04/12] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device
2023-04-01 14:44 [PATCH v3 00/12] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
` (2 preceding siblings ...)
2023-04-01 14:44 ` [PATCH v3 03/12] vfio/pci: Move the existing hot reset logic to be a helper Yi Liu
@ 2023-04-01 14:44 ` Yi Liu
2023-04-04 15:28 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 05/12] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
` (7 subsequent siblings)
11 siblings, 1 reply; 142+ messages in thread
From: Yi Liu @ 2023-04-01 14:44 UTC (permalink / raw)
To: alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
This is needed by the vfio-pci driver to report affected devices in the
hot reset for a given device.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
drivers/iommu/iommufd/device.c | 12 ++++++++++++
drivers/vfio/iommufd.c | 14 ++++++++++++++
include/linux/iommufd.h | 3 +++
include/linux/vfio.h | 13 +++++++++++++
4 files changed, 42 insertions(+)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 25115d401d8f..04a57aa1ae2c 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -131,6 +131,18 @@ void iommufd_device_unbind(struct iommufd_device *idev)
}
EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
+struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev)
+{
+ return idev->ictx;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_to_ictx, IOMMUFD);
+
+u32 iommufd_device_to_id(struct iommufd_device *idev)
+{
+ return idev->obj.id;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, IOMMUFD);
+
static int iommufd_device_setup_msi(struct iommufd_device *idev,
struct iommufd_hw_pagetable *hwpt,
phys_addr_t sw_msi_start)
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 88b00c501015..809f2dd73b9e 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -66,6 +66,20 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
vdev->ops->unbind_iommufd(vdev);
}
+struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev)
+{
+ if (!vdev->iommufd_device)
+ return NULL;
+ return iommufd_device_to_ictx(vdev->iommufd_device);
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_ictx);
+
+void vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id)
+{
+ if (vdev->iommufd_device)
+ *id = iommufd_device_to_id(vdev->iommufd_device);
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_devid);
/*
* The physical standard ops mean that the iommufd_device is bound to the
* physical device vdev->dev that was provided to vfio_init_group_dev(). Drivers
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 1129a36a74c4..ac96df406833 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -24,6 +24,9 @@ void iommufd_device_unbind(struct iommufd_device *idev);
int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id);
void iommufd_device_detach(struct iommufd_device *idev);
+struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev);
+u32 iommufd_device_to_id(struct iommufd_device *idev);
+
struct iommufd_access_ops {
u8 needs_pin_pages : 1;
void (*unmap)(void *data, unsigned long iova, unsigned long length);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 3188d8a374bd..97a1174b922f 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -113,6 +113,8 @@ struct vfio_device_ops {
};
#if IS_ENABLED(CONFIG_IOMMUFD)
+struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev);
+void vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id);
int vfio_iommufd_physical_bind(struct vfio_device *vdev,
struct iommufd_ctx *ictx, u32 *out_device_id);
void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
@@ -122,6 +124,17 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
#else
+static inline struct iommufd_ctx *
+vfio_iommufd_physical_ictx(struct vfio_device *vdev)
+{
+ return NULL;
+}
+
+static inline void
+vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id)
+{
+}
+
#define vfio_iommufd_physical_bind \
((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx, \
u32 *out_device_id)) NULL)
--
2.34.1
^ permalink raw reply related [flat|nested] 142+ messages in thread* Re: [PATCH v3 04/12] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device
2023-04-01 14:44 ` [PATCH v3 04/12] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device Yi Liu
@ 2023-04-04 15:28 ` Eric Auger
2023-04-04 21:48 ` Alex Williamson
0 siblings, 1 reply; 142+ messages in thread
From: Eric Auger @ 2023-04-04 15:28 UTC (permalink / raw)
To: Yi Liu, alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang
Hi,
On 4/1/23 16:44, Yi Liu wrote:
> This is needed by the vfio-pci driver to report affected devices in the
> hot reset for a given device.
>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> drivers/iommu/iommufd/device.c | 12 ++++++++++++
> drivers/vfio/iommufd.c | 14 ++++++++++++++
> include/linux/iommufd.h | 3 +++
> include/linux/vfio.h | 13 +++++++++++++
> 4 files changed, 42 insertions(+)
>
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 25115d401d8f..04a57aa1ae2c 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -131,6 +131,18 @@ void iommufd_device_unbind(struct iommufd_device *idev)
> }
> EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
>
> +struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev)
> +{
> + return idev->ictx;
> +}
> +EXPORT_SYMBOL_NS_GPL(iommufd_device_to_ictx, IOMMUFD);
> +
> +u32 iommufd_device_to_id(struct iommufd_device *idev)
> +{
> + return idev->obj.id;
> +}
> +EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, IOMMUFD);
> +
> static int iommufd_device_setup_msi(struct iommufd_device *idev,
> struct iommufd_hw_pagetable *hwpt,
> phys_addr_t sw_msi_start)
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index 88b00c501015..809f2dd73b9e 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -66,6 +66,20 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
> vdev->ops->unbind_iommufd(vdev);
> }
>
> +struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev)
> +{
> + if (!vdev->iommufd_device)
> + return NULL;
> + return iommufd_device_to_ictx(vdev->iommufd_device);
> +}
> +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_ictx);
> +
> +void vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id)
> +{
> + if (vdev->iommufd_device)
> + *id = iommufd_device_to_id(vdev->iommufd_device);
since there is no return value, may be worth to add at least a WARN_ON
in case of !vdev->iommufd_device
> +}
> +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_devid);
> /*
> * The physical standard ops mean that the iommufd_device is bound to the
> * physical device vdev->dev that was provided to vfio_init_group_dev(). Drivers
> diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
> index 1129a36a74c4..ac96df406833 100644
> --- a/include/linux/iommufd.h
> +++ b/include/linux/iommufd.h
> @@ -24,6 +24,9 @@ void iommufd_device_unbind(struct iommufd_device *idev);
> int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id);
> void iommufd_device_detach(struct iommufd_device *idev);
>
> +struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev);
> +u32 iommufd_device_to_id(struct iommufd_device *idev);
> +
> struct iommufd_access_ops {
> u8 needs_pin_pages : 1;
> void (*unmap)(void *data, unsigned long iova, unsigned long length);
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 3188d8a374bd..97a1174b922f 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -113,6 +113,8 @@ struct vfio_device_ops {
> };
>
> #if IS_ENABLED(CONFIG_IOMMUFD)
> +struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev);
> +void vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id);
> int vfio_iommufd_physical_bind(struct vfio_device *vdev,
> struct iommufd_ctx *ictx, u32 *out_device_id);
> void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
> @@ -122,6 +124,17 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
> void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
> int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
> #else
> +static inline struct iommufd_ctx *
> +vfio_iommufd_physical_ictx(struct vfio_device *vdev)
> +{
> + return NULL;
> +}
> +
> +static inline void
> +vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id)
> +{
> +}
> +
> #define vfio_iommufd_physical_bind \
> ((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx, \
> u32 *out_device_id)) NULL)
besides
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
^ permalink raw reply [flat|nested] 142+ messages in thread* Re: [PATCH v3 04/12] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device
2023-04-04 15:28 ` Eric Auger
@ 2023-04-04 21:48 ` Alex Williamson
2023-04-21 7:11 ` Liu, Yi L
0 siblings, 1 reply; 142+ messages in thread
From: Alex Williamson @ 2023-04-04 21:48 UTC (permalink / raw)
To: Eric Auger
Cc: Yi Liu, jgg, kevin.tian, joro, robin.murphy, cohuck, nicolinc,
kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
On Tue, 4 Apr 2023 17:28:40 +0200
Eric Auger <eric.auger@redhat.com> wrote:
> Hi,
>
> On 4/1/23 16:44, Yi Liu wrote:
> > This is needed by the vfio-pci driver to report affected devices in the
> > hot reset for a given device.
> >
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> > drivers/iommu/iommufd/device.c | 12 ++++++++++++
> > drivers/vfio/iommufd.c | 14 ++++++++++++++
> > include/linux/iommufd.h | 3 +++
> > include/linux/vfio.h | 13 +++++++++++++
> > 4 files changed, 42 insertions(+)
> >
> > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > index 25115d401d8f..04a57aa1ae2c 100644
> > --- a/drivers/iommu/iommufd/device.c
> > +++ b/drivers/iommu/iommufd/device.c
> > @@ -131,6 +131,18 @@ void iommufd_device_unbind(struct iommufd_device *idev)
> > }
> > EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
> >
> > +struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev)
> > +{
> > + return idev->ictx;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_device_to_ictx, IOMMUFD);
> > +
> > +u32 iommufd_device_to_id(struct iommufd_device *idev)
> > +{
> > + return idev->obj.id;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, IOMMUFD);
> > +
> > static int iommufd_device_setup_msi(struct iommufd_device *idev,
> > struct iommufd_hw_pagetable *hwpt,
> > phys_addr_t sw_msi_start)
> > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > index 88b00c501015..809f2dd73b9e 100644
> > --- a/drivers/vfio/iommufd.c
> > +++ b/drivers/vfio/iommufd.c
> > @@ -66,6 +66,20 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
> > vdev->ops->unbind_iommufd(vdev);
> > }
> >
> > +struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev)
> > +{
> > + if (!vdev->iommufd_device)
> > + return NULL;
> > + return iommufd_device_to_ictx(vdev->iommufd_device);
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_ictx);
> > +
> > +void vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id)
> > +{
> > + if (vdev->iommufd_device)
> > + *id = iommufd_device_to_id(vdev->iommufd_device);
> since there is no return value, may be worth to add at least a WARN_ON
> in case of !vdev->iommufd_device
Yeah, this is bizarre and makes the one caller of this interface very
awkward. We later go on to define IOMMUFD_INVALID_ID, so this should
simply return that in the case of no iommufd_device and skip this
unnecessary pointer passing. Thanks,
Alex
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_devid);
> > /*
> > * The physical standard ops mean that the iommufd_device is bound to the
> > * physical device vdev->dev that was provided to vfio_init_group_dev(). Drivers
> > diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
> > index 1129a36a74c4..ac96df406833 100644
> > --- a/include/linux/iommufd.h
> > +++ b/include/linux/iommufd.h
> > @@ -24,6 +24,9 @@ void iommufd_device_unbind(struct iommufd_device *idev);
> > int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id);
> > void iommufd_device_detach(struct iommufd_device *idev);
> >
> > +struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev);
> > +u32 iommufd_device_to_id(struct iommufd_device *idev);
> > +
> > struct iommufd_access_ops {
> > u8 needs_pin_pages : 1;
> > void (*unmap)(void *data, unsigned long iova, unsigned long length);
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index 3188d8a374bd..97a1174b922f 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -113,6 +113,8 @@ struct vfio_device_ops {
> > };
> >
> > #if IS_ENABLED(CONFIG_IOMMUFD)
> > +struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev);
> > +void vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id);
> > int vfio_iommufd_physical_bind(struct vfio_device *vdev,
> > struct iommufd_ctx *ictx, u32 *out_device_id);
> > void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
> > @@ -122,6 +124,17 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
> > void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
> > int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
> > #else
> > +static inline struct iommufd_ctx *
> > +vfio_iommufd_physical_ictx(struct vfio_device *vdev)
> > +{
> > + return NULL;
> > +}
> > +
> > +static inline void
> > +vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id)
> > +{
> > +}
> > +
> > #define vfio_iommufd_physical_bind \
> > ((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx, \
> > u32 *out_device_id)) NULL)
> besides
>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
>
> Eric
>
^ permalink raw reply [flat|nested] 142+ messages in thread* RE: [PATCH v3 04/12] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device
2023-04-04 21:48 ` Alex Williamson
@ 2023-04-21 7:11 ` Liu, Yi L
0 siblings, 0 replies; 142+ messages in thread
From: Liu, Yi L @ 2023-04-21 7:11 UTC (permalink / raw)
To: Alex Williamson, Eric Auger
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, April 5, 2023 5:49 AM
> On Tue, 4 Apr 2023 17:28:40 +0200
> Eric Auger <eric.auger@redhat.com> wrote:
>
> > Hi,
> >
> > On 4/1/23 16:44, Yi Liu wrote:
> > > This is needed by the vfio-pci driver to report affected devices in the
> > > hot reset for a given device.
> > >
> > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > > drivers/iommu/iommufd/device.c | 12 ++++++++++++
> > > drivers/vfio/iommufd.c | 14 ++++++++++++++
> > > include/linux/iommufd.h | 3 +++
> > > include/linux/vfio.h | 13 +++++++++++++
> > > 4 files changed, 42 insertions(+)
> > >
> > > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > > index 25115d401d8f..04a57aa1ae2c 100644
> > > --- a/drivers/iommu/iommufd/device.c
> > > +++ b/drivers/iommu/iommufd/device.c
> > > @@ -131,6 +131,18 @@ void iommufd_device_unbind(struct iommufd_device
> *idev)
> > > }
> > > EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
> > >
> > > +struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev)
> > > +{
> > > + return idev->ictx;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(iommufd_device_to_ictx, IOMMUFD);
> > > +
> > > +u32 iommufd_device_to_id(struct iommufd_device *idev)
> > > +{
> > > + return idev->obj.id;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, IOMMUFD);
> > > +
> > > static int iommufd_device_setup_msi(struct iommufd_device *idev,
> > > struct iommufd_hw_pagetable *hwpt,
> > > phys_addr_t sw_msi_start)
> > > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > > index 88b00c501015..809f2dd73b9e 100644
> > > --- a/drivers/vfio/iommufd.c
> > > +++ b/drivers/vfio/iommufd.c
> > > @@ -66,6 +66,20 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
> > > vdev->ops->unbind_iommufd(vdev);
> > > }
> > >
> > > +struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev)
> > > +{
> > > + if (!vdev->iommufd_device)
> > > + return NULL;
> > > + return iommufd_device_to_ictx(vdev->iommufd_device);
> > > +}
> > > +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_ictx);
> > > +
> > > +void vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id)
> > > +{
> > > + if (vdev->iommufd_device)
> > > + *id = iommufd_device_to_id(vdev->iommufd_device);
> > since there is no return value, may be worth to add at least a WARN_ON
> > in case of !vdev->iommufd_device
This may be a user-triggerable warning if the input device is not bound
to iommufd.
> Yeah, this is bizarre and makes the one caller of this interface very
> awkward. We later go on to define IOMMUFD_INVALID_ID, so this should
> simply return that in the case of no iommufd_device and skip this
> unnecessary pointer passing. Thanks,
Ok. then it can return invalid id when !CONFIG_IOMMUFD. Also
Needs to wait for the decision in the thread that is talking errr-code.
Regards,
Yi Liu
> Alex
>
> > > +}
> > > +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_devid);
> > > /*
> > > * The physical standard ops mean that the iommufd_device is bound to the
> > > * physical device vdev->dev that was provided to vfio_init_group_dev(). Drivers
> > > diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
> > > index 1129a36a74c4..ac96df406833 100644
> > > --- a/include/linux/iommufd.h
> > > +++ b/include/linux/iommufd.h
> > > @@ -24,6 +24,9 @@ void iommufd_device_unbind(struct iommufd_device *idev);
> > > int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id);
> > > void iommufd_device_detach(struct iommufd_device *idev);
> > >
> > > +struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev);
> > > +u32 iommufd_device_to_id(struct iommufd_device *idev);
> > > +
> > > struct iommufd_access_ops {
> > > u8 needs_pin_pages : 1;
> > > void (*unmap)(void *data, unsigned long iova, unsigned long length);
> > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > index 3188d8a374bd..97a1174b922f 100644
> > > --- a/include/linux/vfio.h
> > > +++ b/include/linux/vfio.h
> > > @@ -113,6 +113,8 @@ struct vfio_device_ops {
> > > };
> > >
> > > #if IS_ENABLED(CONFIG_IOMMUFD)
> > > +struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev);
> > > +void vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id);
> > > int vfio_iommufd_physical_bind(struct vfio_device *vdev,
> > > struct iommufd_ctx *ictx, u32 *out_device_id);
> > > void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
> > > @@ -122,6 +124,17 @@ int vfio_iommufd_emulated_bind(struct vfio_device
> *vdev,
> > > void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
> > > int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
> > > #else
> > > +static inline struct iommufd_ctx *
> > > +vfio_iommufd_physical_ictx(struct vfio_device *vdev)
> > > +{
> > > + return NULL;
> > > +}
> > > +
> > > +static inline void
> > > +vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id)
> > > +{
> > > +}
> > > +
> > > #define vfio_iommufd_physical_bind \
> > > ((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx, \
> > > u32 *out_device_id)) NULL)
> > besides
> >
> > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> >
> > Eric
> >
^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH v3 05/12] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
2023-04-01 14:44 [PATCH v3 00/12] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
` (3 preceding siblings ...)
2023-04-01 14:44 ` [PATCH v3 04/12] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device Yi Liu
@ 2023-04-01 14:44 ` Yi Liu
2023-04-04 16:54 ` Eric Auger
2023-04-04 20:18 ` Alex Williamson
2023-04-01 14:44 ` [PATCH v3 06/12] vfio: Refine vfio file kAPIs for vfio PCI hot reset Yi Liu
` (6 subsequent siblings)
11 siblings, 2 replies; 142+ messages in thread
From: Yi Liu @ 2023-04-01 14:44 UTC (permalink / raw)
To: alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
as an alternative method for ownership check when iommufd is used. In
this case all opened devices in the affected dev_set are verified to
be bound to a same valid iommufd value to allow reset. It's simpler
and faster as user does not need to pass a set of fds and kernel no
need to search the device within the given fds.
a device in noiommu mode doesn't have a valid iommufd, so this method
should not be used in a dev_set which contains multiple devices and one
of them is in noiommu. The only allowed noiommu scenario is that the
calling device is noiommu and it's in a singleton dev_set.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
drivers/vfio/pci/vfio_pci_core.c | 42 +++++++++++++++++++++++++++-----
include/uapi/linux/vfio.h | 9 ++++++-
2 files changed, 44 insertions(+), 7 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 3696b8e58445..b68fcba67a4b 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -180,7 +180,8 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
struct vfio_pci_group_info;
static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
- struct vfio_pci_group_info *groups);
+ struct vfio_pci_group_info *groups,
+ struct iommufd_ctx *iommufd_ctx);
/*
* INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
@@ -1277,7 +1278,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
return ret;
/* Somewhere between 1 and count is OK */
- if (!hdr->count || hdr->count > count)
+ if (hdr->count > count)
return -EINVAL;
group_fds = kcalloc(hdr->count, sizeof(*group_fds), GFP_KERNEL);
@@ -1326,7 +1327,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
info.count = hdr->count;
info.files = files;
- ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
+ ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info, NULL);
hot_reset_release:
for (file_idx--; file_idx >= 0; file_idx--)
@@ -1341,6 +1342,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
{
unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
struct vfio_pci_hot_reset hdr;
+ struct iommufd_ctx *iommufd;
bool slot = false;
if (copy_from_user(&hdr, arg, minsz))
@@ -1355,7 +1357,12 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
else if (pci_probe_reset_bus(vdev->pdev->bus))
return -ENODEV;
- return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
+ if (hdr.count)
+ return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
+
+ iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
+
+ return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL, iommufd);
}
static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
@@ -2327,6 +2334,9 @@ static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
{
unsigned int i;
+ if (!groups)
+ return false;
+
for (i = 0; i < groups->count; i++)
if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
return true;
@@ -2402,13 +2412,25 @@ static int vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set)
return ret;
}
+static bool vfio_dev_in_iommufd_ctx(struct vfio_pci_core_device *vdev,
+ struct iommufd_ctx *iommufd_ctx)
+{
+ struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
+
+ if (!iommufd)
+ return false;
+
+ return iommufd == iommufd_ctx;
+}
+
/*
* We need to get memory_lock for each device, but devices can share mmap_lock,
* therefore we need to zap and hold the vma_lock for each device, and only then
* get each memory_lock.
*/
static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
- struct vfio_pci_group_info *groups)
+ struct vfio_pci_group_info *groups,
+ struct iommufd_ctx *iommufd_ctx)
{
struct vfio_pci_core_device *cur_mem;
struct vfio_pci_core_device *cur_vma;
@@ -2448,9 +2470,17 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
*
* Otherwise all opened devices in the dev_set must be
* contained by the set of groups provided by the user.
+ *
+ * If user provides a zero-length array, then all the
+ * opened devices must be bound to a same iommufd_ctx.
+ *
+ * If all above checks are failed, reset is allowed only if
+ * the calling device is in a singleton dev_set.
*/
if (cur_vma->vdev.open_count &&
- !vfio_dev_in_groups(cur_vma, groups)) {
+ !vfio_dev_in_groups(cur_vma, groups) &&
+ !vfio_dev_in_iommufd_ctx(cur_vma, iommufd_ctx) &&
+ (dev_set->device_count > 1)) {
ret = -EINVAL;
goto err_undo;
}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index f96e5689cffc..17aa5d09db41 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -679,7 +679,14 @@ struct vfio_pci_hot_reset_info {
* the calling user must ensure all affected devices, if opened, are
* owned by itself.
*
- * The ownership is proved by an array of group fds.
+ * The ownership can be proved by:
+ * - An array of group fds
+ * - A zero-length array
+ *
+ * In the last case all affected devices which are opened by this user
+ * must have been bound to a same iommufd. If the calling device is in
+ * noiommu mode (no valid iommufd) then it can be reset only if the reset
+ * doesn't affect other devices.
*
* Return: 0 on success, -errno on failure.
*/
--
2.34.1
^ permalink raw reply related [flat|nested] 142+ messages in thread* Re: [PATCH v3 05/12] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
2023-04-01 14:44 ` [PATCH v3 05/12] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
@ 2023-04-04 16:54 ` Eric Auger
2023-04-04 20:18 ` Alex Williamson
1 sibling, 0 replies; 142+ messages in thread
From: Eric Auger @ 2023-04-04 16:54 UTC (permalink / raw)
To: Yi Liu, alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang
Hi Yi,
On 4/1/23 16:44, Yi Liu wrote:
> as an alternative method for ownership check when iommufd is used. In
I don't understand the 1st sentence.
> this case all opened devices in the affected dev_set are verified to
> be bound to a same valid iommufd value to allow reset. It's simpler
> and faster as user does not need to pass a set of fds and kernel no
kernel does not need to search
> need to search the device within the given fds.
>
> a device in noiommu mode doesn't have a valid iommufd, so this method
> should not be used in a dev_set which contains multiple devices and one
> of them is in noiommu. The only allowed noiommu scenario is that the
> calling device is noiommu and it's in a singleton dev_set.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> drivers/vfio/pci/vfio_pci_core.c | 42 +++++++++++++++++++++++++++-----
> include/uapi/linux/vfio.h | 9 ++++++-
> 2 files changed, 44 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 3696b8e58445..b68fcba67a4b 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -180,7 +180,8 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
> struct vfio_pci_group_info;
> static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
> static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> - struct vfio_pci_group_info *groups);
> + struct vfio_pci_group_info *groups,
> + struct iommufd_ctx *iommufd_ctx);
>
> /*
> * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
> @@ -1277,7 +1278,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
> return ret;
>
> /* Somewhere between 1 and count is OK */
> - if (!hdr->count || hdr->count > count)
> + if (hdr->count > count)
then I would simply remove the above comment since !count check is done
by the caller.
> return -EINVAL;
>
> group_fds = kcalloc(hdr->count, sizeof(*group_fds), GFP_KERNEL);
> @@ -1326,7 +1327,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
> info.count = hdr->count;
> info.files = files;
>
> - ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
> + ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info, NULL);
>
> hot_reset_release:
> for (file_idx--; file_idx >= 0; file_idx--)
> @@ -1341,6 +1342,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
> {
> unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
> struct vfio_pci_hot_reset hdr;
> + struct iommufd_ctx *iommufd;
> bool slot = false;
>
> if (copy_from_user(&hdr, arg, minsz))
> @@ -1355,7 +1357,12 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
> else if (pci_probe_reset_bus(vdev->pdev->bus))
> return -ENODEV;
>
> - return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
> + if (hdr.count)
> + return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
> +
> + iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> +
> + return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL, iommufd);
> }
>
> static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
> @@ -2327,6 +2334,9 @@ static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
> {
> unsigned int i;
>
> + if (!groups)
> + return false;
> +
> for (i = 0; i < groups->count; i++)
> if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
> return true;
> @@ -2402,13 +2412,25 @@ static int vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set)
> return ret;
> }
>
> +static bool vfio_dev_in_iommufd_ctx(struct vfio_pci_core_device *vdev,
> + struct iommufd_ctx *iommufd_ctx)
> +{
> + struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> +
> + if (!iommufd)
> + return false;
> +
> + return iommufd == iommufd_ctx;
> +}
> +
> /*
> * We need to get memory_lock for each device, but devices can share mmap_lock,
> * therefore we need to zap and hold the vma_lock for each device, and only then
> * get each memory_lock.
> */
> static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> - struct vfio_pci_group_info *groups)
> + struct vfio_pci_group_info *groups,
> + struct iommufd_ctx *iommufd_ctx)
> {
> struct vfio_pci_core_device *cur_mem;
> struct vfio_pci_core_device *cur_vma;
> @@ -2448,9 +2470,17 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> *
> * Otherwise all opened devices in the dev_set must be
> * contained by the set of groups provided by the user.
> + *
> + * If user provides a zero-length array, then all the
> + * opened devices must be bound to a same iommufd_ctx.
> + *
> + * If all above checks are failed, reset is allowed only if
> + * the calling device is in a singleton dev_set.
> */
> if (cur_vma->vdev.open_count &&
> - !vfio_dev_in_groups(cur_vma, groups)) {
> + !vfio_dev_in_groups(cur_vma, groups) &&
> + !vfio_dev_in_iommufd_ctx(cur_vma, iommufd_ctx) &&
> + (dev_set->device_count > 1)) {
> ret = -EINVAL;
> goto err_undo;
> }
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index f96e5689cffc..17aa5d09db41 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -679,7 +679,14 @@ struct vfio_pci_hot_reset_info {
> * the calling user must ensure all affected devices, if opened, are
> * owned by itself.
> *
> - * The ownership is proved by an array of group fds.
> + * The ownership can be proved by:
> + * - An array of group fds
> + * - A zero-length array
I would suggest something alike
in case a non void group fd array is passed, the devices affected by the
reset must belong to those opened VFIO groups.
in case a zero length array is passed, the other devices affected by the
reset, if any, must be bound to the same iommufd as this VFIO device
Either of the 2 methods is applied to check the feasibility of the reset
> + *
> + * In the last case all affected devices which are opened by this user
> + * must have been bound to a same iommufd. If the calling device is in
> + * noiommu mode (no valid iommufd) then it can be reset only if the reset
> + * doesn't affect other devices.
and keep that too
> *
> * Return: 0 on success, -errno on failure.
> */
Thanks
Eric
^ permalink raw reply [flat|nested] 142+ messages in thread* Re: [PATCH v3 05/12] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
2023-04-01 14:44 ` [PATCH v3 05/12] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
2023-04-04 16:54 ` Eric Auger
@ 2023-04-04 20:18 ` Alex Williamson
2023-04-05 7:55 ` Liu, Yi L
2023-04-05 8:02 ` Eric Auger
1 sibling, 2 replies; 142+ messages in thread
From: Alex Williamson @ 2023-04-04 20:18 UTC (permalink / raw)
To: Yi Liu
Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger, nicolinc,
kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
On Sat, 1 Apr 2023 07:44:22 -0700
Yi Liu <yi.l.liu@intel.com> wrote:
> as an alternative method for ownership check when iommufd is used. In
> this case all opened devices in the affected dev_set are verified to
> be bound to a same valid iommufd value to allow reset. It's simpler
> and faster as user does not need to pass a set of fds and kernel no
> need to search the device within the given fds.
>
> a device in noiommu mode doesn't have a valid iommufd, so this method
> should not be used in a dev_set which contains multiple devices and one
> of them is in noiommu. The only allowed noiommu scenario is that the
> calling device is noiommu and it's in a singleton dev_set.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> drivers/vfio/pci/vfio_pci_core.c | 42 +++++++++++++++++++++++++++-----
> include/uapi/linux/vfio.h | 9 ++++++-
> 2 files changed, 44 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 3696b8e58445..b68fcba67a4b 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -180,7 +180,8 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
> struct vfio_pci_group_info;
> static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
> static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> - struct vfio_pci_group_info *groups);
> + struct vfio_pci_group_info *groups,
> + struct iommufd_ctx *iommufd_ctx);
>
> /*
> * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
> @@ -1277,7 +1278,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
> return ret;
>
> /* Somewhere between 1 and count is OK */
> - if (!hdr->count || hdr->count > count)
> + if (hdr->count > count)
> return -EINVAL;
>
> group_fds = kcalloc(hdr->count, sizeof(*group_fds), GFP_KERNEL);
> @@ -1326,7 +1327,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
> info.count = hdr->count;
> info.files = files;
>
> - ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
> + ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info, NULL);
>
> hot_reset_release:
> for (file_idx--; file_idx >= 0; file_idx--)
> @@ -1341,6 +1342,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
> {
> unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
> struct vfio_pci_hot_reset hdr;
> + struct iommufd_ctx *iommufd;
> bool slot = false;
>
> if (copy_from_user(&hdr, arg, minsz))
> @@ -1355,7 +1357,12 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
> else if (pci_probe_reset_bus(vdev->pdev->bus))
> return -ENODEV;
>
> - return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
> + if (hdr.count)
> + return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
> +
> + iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> +
> + return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL, iommufd);
> }
>
> static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
> @@ -2327,6 +2334,9 @@ static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
> {
> unsigned int i;
>
> + if (!groups)
> + return false;
> +
> for (i = 0; i < groups->count; i++)
> if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
> return true;
> @@ -2402,13 +2412,25 @@ static int vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set)
> return ret;
> }
>
> +static bool vfio_dev_in_iommufd_ctx(struct vfio_pci_core_device *vdev,
> + struct iommufd_ctx *iommufd_ctx)
> +{
> + struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> +
> + if (!iommufd)
> + return false;
> +
> + return iommufd == iommufd_ctx;
> +}
> +
> /*
> * We need to get memory_lock for each device, but devices can share mmap_lock,
> * therefore we need to zap and hold the vma_lock for each device, and only then
> * get each memory_lock.
> */
> static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> - struct vfio_pci_group_info *groups)
> + struct vfio_pci_group_info *groups,
> + struct iommufd_ctx *iommufd_ctx)
> {
> struct vfio_pci_core_device *cur_mem;
> struct vfio_pci_core_device *cur_vma;
> @@ -2448,9 +2470,17 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> *
> * Otherwise all opened devices in the dev_set must be
> * contained by the set of groups provided by the user.
> + *
> + * If user provides a zero-length array, then all the
> + * opened devices must be bound to a same iommufd_ctx.
> + *
> + * If all above checks are failed, reset is allowed only if
> + * the calling device is in a singleton dev_set.
> */
> if (cur_vma->vdev.open_count &&
> - !vfio_dev_in_groups(cur_vma, groups)) {
> + !vfio_dev_in_groups(cur_vma, groups) &&
> + !vfio_dev_in_iommufd_ctx(cur_vma, iommufd_ctx) &&
> + (dev_set->device_count > 1)) {
This last condition looks buggy to me, we need all conditions to be
true to generate an error here, which means that for a singleton
dev_set, it doesn't matter what group fds are passed, if any, or whether
the iommufd context matches. I think in fact this means that the empty
array path is equally available for group use cases with a singleton
dev_set, but we don't enable it for multiple device dev_sets like we do
iommufd.
You pointed out a previous issue with hot-reset info and no-iommu where
if other affected devices are not bound to vfio-pci the info ioctl
returns error. That's handled in the hot-reset ioctl by the fact that
all affected devices must be in the dev_set and therefore bound to
vfio-pci drivers. So it seems to me that aside from the spurious error
because we can't report an iommu group when none exists, and didn't
spot it to invent an invalid group for debugging, hot-reset otherwise
works with no-iommu just like it does for iommu backed devices. We
don't currently require singleton no-iommu dev_sets afaict.
I'll also note that if the dev_set is singleton, this suggests that
pci_reset_function() can make use of bus reset, so a hot-reset is
accessible via VFIO_DEVICE_RESET if the appropriate reset method is
selected.
Therefore, I think as written, the singleton dev_set hot-reset is
enabled for iommufd and (unintentionally?) for the group path, while
also negating a requirement for a group fd or that a provided group fd
actually matches the device in this latter case. The null-array
approach is not however extended to groups for more general use.
Additionally, limiting no-iommu hot-reset to singleton dev_sets
provides only a marginal functional difference vs VFIO_DEVICE_RESET.
Thanks,
Alex
> ret = -EINVAL;
> goto err_undo;
> }
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index f96e5689cffc..17aa5d09db41 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -679,7 +679,14 @@ struct vfio_pci_hot_reset_info {
> * the calling user must ensure all affected devices, if opened, are
> * owned by itself.
> *
> - * The ownership is proved by an array of group fds.
> + * The ownership can be proved by:
> + * - An array of group fds
> + * - A zero-length array
> + *
> + * In the last case all affected devices which are opened by this user
> + * must have been bound to a same iommufd. If the calling device is in
> + * noiommu mode (no valid iommufd) then it can be reset only if the reset
> + * doesn't affect other devices.
> *
> * Return: 0 on success, -errno on failure.
> */
^ permalink raw reply [flat|nested] 142+ messages in thread* RE: [PATCH v3 05/12] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
2023-04-04 20:18 ` Alex Williamson
@ 2023-04-05 7:55 ` Liu, Yi L
2023-04-05 8:01 ` Liu, Yi L
2023-04-05 8:02 ` Eric Auger
1 sibling, 1 reply; 142+ messages in thread
From: Liu, Yi L @ 2023-04-05 7:55 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, April 5, 2023 4:19 AM
>
> On Sat, 1 Apr 2023 07:44:22 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
>
> > as an alternative method for ownership check when iommufd is used. In
> > this case all opened devices in the affected dev_set are verified to
> > be bound to a same valid iommufd value to allow reset. It's simpler
> > and faster as user does not need to pass a set of fds and kernel no
> > need to search the device within the given fds.
> >
> > a device in noiommu mode doesn't have a valid iommufd, so this method
> > should not be used in a dev_set which contains multiple devices and one
> > of them is in noiommu. The only allowed noiommu scenario is that the
> > calling device is noiommu and it's in a singleton dev_set.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> > drivers/vfio/pci/vfio_pci_core.c | 42 +++++++++++++++++++++++++++-----
> > include/uapi/linux/vfio.h | 9 ++++++-
> > 2 files changed, 44 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index 3696b8e58445..b68fcba67a4b 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -180,7 +180,8 @@ static void vfio_pci_probe_mmaps(struct
> vfio_pci_core_device *vdev)
> > struct vfio_pci_group_info;
> > static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
> > static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> > - struct vfio_pci_group_info *groups);
> > + struct vfio_pci_group_info *groups,
> > + struct iommufd_ctx *iommufd_ctx);
> >
> > /*
> > * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
> > @@ -1277,7 +1278,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct
> vfio_pci_core_device *vdev,
> > return ret;
> >
> > /* Somewhere between 1 and count is OK */
> > - if (!hdr->count || hdr->count > count)
> > + if (hdr->count > count)
> > return -EINVAL;
> >
> > group_fds = kcalloc(hdr->count, sizeof(*group_fds), GFP_KERNEL);
> > @@ -1326,7 +1327,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct
> vfio_pci_core_device *vdev,
> > info.count = hdr->count;
> > info.files = files;
> >
> > - ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
> > + ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info, NULL);
> >
> > hot_reset_release:
> > for (file_idx--; file_idx >= 0; file_idx--)
> > @@ -1341,6 +1342,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
> > {
> > unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
> > struct vfio_pci_hot_reset hdr;
> > + struct iommufd_ctx *iommufd;
> > bool slot = false;
> >
> > if (copy_from_user(&hdr, arg, minsz))
> > @@ -1355,7 +1357,12 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
> > else if (pci_probe_reset_bus(vdev->pdev->bus))
> > return -ENODEV;
> >
> > - return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
> > + if (hdr.count)
> > + return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
> > +
> > + iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> > +
> > + return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL, iommufd);
> > }
> >
> > static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
> > @@ -2327,6 +2334,9 @@ static bool vfio_dev_in_groups(struct
> vfio_pci_core_device *vdev,
> > {
> > unsigned int i;
> >
> > + if (!groups)
> > + return false;
> > +
> > for (i = 0; i < groups->count; i++)
> > if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
> > return true;
> > @@ -2402,13 +2412,25 @@ static int vfio_pci_dev_set_pm_runtime_get(struct
> vfio_device_set *dev_set)
> > return ret;
> > }
> >
> > +static bool vfio_dev_in_iommufd_ctx(struct vfio_pci_core_device *vdev,
> > + struct iommufd_ctx *iommufd_ctx)
> > +{
> > + struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> > +
> > + if (!iommufd)
> > + return false;
> > +
> > + return iommufd == iommufd_ctx;
> > +}
> > +
> > /*
> > * We need to get memory_lock for each device, but devices can share mmap_lock,
> > * therefore we need to zap and hold the vma_lock for each device, and only then
> > * get each memory_lock.
> > */
> > static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> > - struct vfio_pci_group_info *groups)
> > + struct vfio_pci_group_info *groups,
> > + struct iommufd_ctx *iommufd_ctx)
> > {
> > struct vfio_pci_core_device *cur_mem;
> > struct vfio_pci_core_device *cur_vma;
> > @@ -2448,9 +2470,17 @@ static int vfio_pci_dev_set_hot_reset(struct
> vfio_device_set *dev_set,
> > *
> > * Otherwise all opened devices in the dev_set must be
> > * contained by the set of groups provided by the user.
> > + *
> > + * If user provides a zero-length array, then all the
> > + * opened devices must be bound to a same iommufd_ctx.
> > + *
> > + * If all above checks are failed, reset is allowed only if
> > + * the calling device is in a singleton dev_set.
> > */
> > if (cur_vma->vdev.open_count &&
> > - !vfio_dev_in_groups(cur_vma, groups)) {
> > + !vfio_dev_in_groups(cur_vma, groups) &&
> > + !vfio_dev_in_iommufd_ctx(cur_vma, iommufd_ctx) &&
> > + (dev_set->device_count > 1)) {
>
> This last condition looks buggy to me, we need all conditions to be
> true to generate an error here, which means that for a singleton
> dev_set, it doesn't matter what group fds are passed, if any, or whether
> the iommufd context matches. I think in fact this means that the empty
> array path is equally available for group use cases with a singleton
> dev_set, but we don't enable it for multiple device dev_sets like we do
> iommufd.
you are right. The last condition allows the empty-fd array path to
work for the group use case if the dev_set happens to be a singleton.
>
> You pointed out a previous issue with hot-reset info and no-iommu where
> if other affected devices are not bound to vfio-pci the info ioctl
> returns error. That's handled in the hot-reset ioctl by the fact that
> all affected devices must be in the dev_set and therefore bound to
> vfio-pci drivers.
yes, hot-reset ioctl requires all affected devices listed in the dev_set.
So for the case there are devices not bound to vfio yet, hot-reset ioctl
just fails. If all affected devices are in the dev_set, they will have a
fake group allocated by vfio. So the info ioctl won't fail.
> So it seems to me that aside from the spurious error
> because we can't report an iommu group when none exists, and didn't
> spot it to invent an invalid group for debugging, hot-reset otherwise
> works with no-iommu just like it does for iommu backed devices. We
> don't currently require singleton no-iommu dev_sets afaict.
yes. the requirement for hot-reset is the same between no-iommu and
the iommufd backed devices.
> I'll also note that if the dev_set is singleton, this suggests that
> pci_reset_function() can make use of bus reset, so a hot-reset is
> accessible via VFIO_DEVICE_RESET if the appropriate reset method is
> selected.
yes. so does it mean not necessary to allow singleton dev_set support
in hot-reset ioctl? If user uses hot-reset, it should because of unable to
use VFIO_DEVICE_RESET, is it?
>
> Therefore, I think as written, the singleton dev_set hot-reset is
> enabled for iommufd and (unintentionally?) for the group path, while
> also negating a requirement for a group fd or that a provided group fd
> actually matches the device in this latter case. The null-array
> approach is not however extended to groups for more general use.
> Additionally, limiting no-iommu hot-reset to singleton dev_sets
> provides only a marginal functional difference vs VFIO_DEVICE_RESET.
I think the singletion dev_set hot-reset is for iommufd (or more accurately
for the noiommu case in cdev path).
> Thanks,
>
> Alex
>
> > ret = -EINVAL;
> > goto err_undo;
> > }
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index f96e5689cffc..17aa5d09db41 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -679,7 +679,14 @@ struct vfio_pci_hot_reset_info {
> > * the calling user must ensure all affected devices, if opened, are
> > * owned by itself.
> > *
> > - * The ownership is proved by an array of group fds.
> > + * The ownership can be proved by:
> > + * - An array of group fds
> > + * - A zero-length array
> > + *
> > + * In the last case all affected devices which are opened by this user
> > + * must have been bound to a same iommufd. If the calling device is in
> > + * noiommu mode (no valid iommufd) then it can be reset only if the reset
> > + * doesn't affect other devices.
> > *
> > * Return: 0 on success, -errno on failure.
> > */
^ permalink raw reply [flat|nested] 142+ messages in thread* RE: [PATCH v3 05/12] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
2023-04-05 7:55 ` Liu, Yi L
@ 2023-04-05 8:01 ` Liu, Yi L
2023-04-05 15:36 ` Alex Williamson
0 siblings, 1 reply; 142+ messages in thread
From: Liu, Yi L @ 2023-04-05 8:01 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Wednesday, April 5, 2023 3:55 PM
> >
> > Therefore, I think as written, the singleton dev_set hot-reset is
> > enabled for iommufd and (unintentionally?) for the group path, while
> > also negating a requirement for a group fd or that a provided group fd
> > actually matches the device in this latter case. The null-array
> > approach is not however extended to groups for more general use.
> > Additionally, limiting no-iommu hot-reset to singleton dev_sets
> > provides only a marginal functional difference vs VFIO_DEVICE_RESET.
>
> I think the singletion dev_set hot-reset is for iommufd (or more accurately
> for the noiommu case in cdev path).
but actually, singleton dev_set hot-reset can work for group path as well.
Based on this, I'm also wondering do we really want to have singleton dev_set
hot-reset only for cdev noiommu case? or we allow it generally or just
don't support it as it is equivalent with VFIO_DEVICE_RESET?
If we don't support singletion dev_set hot-reset, noiommu devices in cdev
path shall fail the hot-reset if empty-fd array is provided. But we may just
document that empty-fd array does not work for noiommu. User should
use the device fd array.
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 05/12] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
2023-04-05 8:01 ` Liu, Yi L
@ 2023-04-05 15:36 ` Alex Williamson
2023-04-05 16:46 ` Jason Gunthorpe
0 siblings, 1 reply; 142+ messages in thread
From: Alex Williamson @ 2023-04-05 15:36 UTC (permalink / raw)
To: Liu, Yi L
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Wed, 5 Apr 2023 08:01:49 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Wednesday, April 5, 2023 3:55 PM
>
> > >
> > > Therefore, I think as written, the singleton dev_set hot-reset is
> > > enabled for iommufd and (unintentionally?) for the group path, while
> > > also negating a requirement for a group fd or that a provided group fd
> > > actually matches the device in this latter case. The null-array
> > > approach is not however extended to groups for more general use.
> > > Additionally, limiting no-iommu hot-reset to singleton dev_sets
> > > provides only a marginal functional difference vs VFIO_DEVICE_RESET.
> >
> > I think the singletion dev_set hot-reset is for iommufd (or more accurately
> > for the noiommu case in cdev path).
>
> but actually, singleton dev_set hot-reset can work for group path as well.
> Based on this, I'm also wondering do we really want to have singleton dev_set
> hot-reset only for cdev noiommu case? or we allow it generally or just
> don't support it as it is equivalent with VFIO_DEVICE_RESET?
I think you're taking the potential that VFIO_DEVICE_RESET and
hot-reset could do the same thing too far. The former is more likely
to do an FLR, or even a PM reset. QEMU even tries to guess what reset
VFIO_DEVICE_RESET might use in order to choose to do a hot-reset if it
seems like the device might only support a PM reset otherwise.
Changing the reset method of a device requires privilege, which is
maybe something we'd compromise on for no-iommu, but the general
expectation is that VFIO_DEVICE_RESET provides a device level scope and
hot-reset provides a... hot-reset, and sometimes those are the same
thing, but that doesn't mean we can lean on the former.
> If we don't support singletion dev_set hot-reset, noiommu devices in cdev
> path shall fail the hot-reset if empty-fd array is provided. But we may just
> document that empty-fd array does not work for noiommu. User should
> use the device fd array.
I don't see any replies to my comment on 08/12 where I again question
why we need an empty array option. It's causing all sorts of headaches
and I don't see the justification for it beyond some hand waving that
it reduces complexity for the user. This singleton dev-set notion
seems equally unjustified. Do we just need to deal with hot-reset
being unsupported for no-iommu devices with iommufd? Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 05/12] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
2023-04-05 15:36 ` Alex Williamson
@ 2023-04-05 16:46 ` Jason Gunthorpe
0 siblings, 0 replies; 142+ messages in thread
From: Jason Gunthorpe @ 2023-04-05 16:46 UTC (permalink / raw)
To: Alex Williamson
Cc: Liu, Yi L, Tian, Kevin, joro@8bytes.org, robin.murphy@arm.com,
cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Wed, Apr 05, 2023 at 09:36:46AM -0600, Alex Williamson wrote:
> > If we don't support singletion dev_set hot-reset, noiommu devices in cdev
> > path shall fail the hot-reset if empty-fd array is provided. But we may just
> > document that empty-fd array does not work for noiommu. User should
> > use the device fd array.
>
> I don't see any replies to my comment on 08/12 where I again question
> why we need an empty array option.
I was pressing we'd do empty-fd only and not do the device fd array at
all since it is such an ugly fit for the use cases we have.
But it is such a minor detail if you don't want it then take it out.
> This singleton dev-set notion seems equally unjustified. Do we just
> need to deal with hot-reset being unsupported for no-iommu devices
> with iommufd?
It was to support no-iommu, if you want to de-support it then it can
go away too. AFAIK dpdk doesn't use this feature and it is the only
user we know of that has support for no-iommu so it is probably safe.
Jason
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 05/12] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
2023-04-04 20:18 ` Alex Williamson
2023-04-05 7:55 ` Liu, Yi L
@ 2023-04-05 8:02 ` Eric Auger
2023-04-05 8:09 ` Liu, Yi L
1 sibling, 1 reply; 142+ messages in thread
From: Eric Auger @ 2023-04-05 8:02 UTC (permalink / raw)
To: Alex Williamson, Yi Liu
Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, nicolinc, kvm,
mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
On 4/4/23 22:18, Alex Williamson wrote:
> On Sat, 1 Apr 2023 07:44:22 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
>
>> as an alternative method for ownership check when iommufd is used. In
>> this case all opened devices in the affected dev_set are verified to
>> be bound to a same valid iommufd value to allow reset. It's simpler
>> and faster as user does not need to pass a set of fds and kernel no
>> need to search the device within the given fds.
>>
>> a device in noiommu mode doesn't have a valid iommufd, so this method
>> should not be used in a dev_set which contains multiple devices and one
>> of them is in noiommu. The only allowed noiommu scenario is that the
>> calling device is noiommu and it's in a singleton dev_set.
>>
>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> ---
>> drivers/vfio/pci/vfio_pci_core.c | 42 +++++++++++++++++++++++++++-----
>> include/uapi/linux/vfio.h | 9 ++++++-
>> 2 files changed, 44 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index 3696b8e58445..b68fcba67a4b 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -180,7 +180,8 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
>> struct vfio_pci_group_info;
>> static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
>> static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
>> - struct vfio_pci_group_info *groups);
>> + struct vfio_pci_group_info *groups,
>> + struct iommufd_ctx *iommufd_ctx);
>>
>> /*
>> * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
>> @@ -1277,7 +1278,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
>> return ret;
>>
>> /* Somewhere between 1 and count is OK */
>> - if (!hdr->count || hdr->count > count)
>> + if (hdr->count > count)
>> return -EINVAL;
>>
>> group_fds = kcalloc(hdr->count, sizeof(*group_fds), GFP_KERNEL);
>> @@ -1326,7 +1327,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
>> info.count = hdr->count;
>> info.files = files;
>>
>> - ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
>> + ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info, NULL);
>>
>> hot_reset_release:
>> for (file_idx--; file_idx >= 0; file_idx--)
>> @@ -1341,6 +1342,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
>> {
>> unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
>> struct vfio_pci_hot_reset hdr;
>> + struct iommufd_ctx *iommufd;
>> bool slot = false;
>>
>> if (copy_from_user(&hdr, arg, minsz))
>> @@ -1355,7 +1357,12 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
>> else if (pci_probe_reset_bus(vdev->pdev->bus))
>> return -ENODEV;
>>
>> - return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
>> + if (hdr.count)
>> + return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
>> +
>> + iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
>> +
>> + return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL, iommufd);
>> }
>>
>> static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
>> @@ -2327,6 +2334,9 @@ static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
>> {
>> unsigned int i;
>>
>> + if (!groups)
>> + return false;
>> +
>> for (i = 0; i < groups->count; i++)
>> if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
>> return true;
>> @@ -2402,13 +2412,25 @@ static int vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set)
>> return ret;
>> }
>>
>> +static bool vfio_dev_in_iommufd_ctx(struct vfio_pci_core_device *vdev,
>> + struct iommufd_ctx *iommufd_ctx)
>> +{
>> + struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
>> +
>> + if (!iommufd)
>> + return false;
>> +
>> + return iommufd == iommufd_ctx;
>> +}
>> +
>> /*
>> * We need to get memory_lock for each device, but devices can share mmap_lock,
>> * therefore we need to zap and hold the vma_lock for each device, and only then
>> * get each memory_lock.
>> */
>> static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
>> - struct vfio_pci_group_info *groups)
>> + struct vfio_pci_group_info *groups,
>> + struct iommufd_ctx *iommufd_ctx)
>> {
>> struct vfio_pci_core_device *cur_mem;
>> struct vfio_pci_core_device *cur_vma;
>> @@ -2448,9 +2470,17 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
>> *
>> * Otherwise all opened devices in the dev_set must be
>> * contained by the set of groups provided by the user.
>> + *
>> + * If user provides a zero-length array, then all the
>> + * opened devices must be bound to a same iommufd_ctx.
>> + *
>> + * If all above checks are failed, reset is allowed only if
>> + * the calling device is in a singleton dev_set.
>> */
>> if (cur_vma->vdev.open_count &&
>> - !vfio_dev_in_groups(cur_vma, groups)) {
>> + !vfio_dev_in_groups(cur_vma, groups) &&
>> + !vfio_dev_in_iommufd_ctx(cur_vma, iommufd_ctx) &&
>> + (dev_set->device_count > 1)) {
> This last condition looks buggy to me, we need all conditions to be
> true to generate an error here, which means that for a singleton
> dev_set, it doesn't matter what group fds are passed, if any, or whether
> the iommufd context matches. I think in fact this means that the empty
> array path is equally available for group use cases with a singleton
> dev_set, but we don't enable it for multiple device dev_sets like we do
> iommufd.
>
> You pointed out a previous issue with hot-reset info and no-iommu where
> if other affected devices are not bound to vfio-pci the info ioctl
> returns error. That's handled in the hot-reset ioctl by the fact that
> all affected devices must be in the dev_set and therefore bound to
> vfio-pci drivers. So it seems to me that aside from the spurious error
> because we can't report an iommu group when none exists, and didn't
> spot it to invent an invalid group for debugging, hot-reset otherwise
> works with no-iommu just like it does for iommu backed devices. We
> don't currently require singleton no-iommu dev_sets afaict.
>
> I'll also note that if the dev_set is singleton, this suggests that
> pci_reset_function() can make use of bus reset, so a hot-reset is
> accessible via VFIO_DEVICE_RESET if the appropriate reset method is
> selected.
>
> Therefore, I think as written, the singleton dev_set hot-reset is
> enabled for iommufd and (unintentionally?) for the group path, while
> also negating a requirement for a group fd or that a provided group fd
> actually matches the device in this latter case. The null-array
> approach is not however extended to groups for more general use.
> Additionally, limiting no-iommu hot-reset to singleton dev_sets
> provides only a marginal functional difference vs VFIO_DEVICE_RESET.
> Thanks,
>
> Alex
What bout introducing a helper
static bool is_reset_ok(pdev, groups, ctx) {
if (!pdev->vdev.open_count)
return true;
if (groups && vfio_dev_in_groups(pdev, groups))
return true;
if (ctx && vfio_dev_in_iommufd_ctx(pdev, ctx)
return true;
return false;
}
Assuming the above logic is correct I think this would make the code
more readable
Thanks
Eric
>> ret = -EINVAL;
>> goto err_undo;
>> }
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index f96e5689cffc..17aa5d09db41 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -679,7 +679,14 @@ struct vfio_pci_hot_reset_info {
>> * the calling user must ensure all affected devices, if opened, are
>> * owned by itself.
>> *
>> - * The ownership is proved by an array of group fds.
>> + * The ownership can be proved by:
>> + * - An array of group fds
>> + * - A zero-length array
>> + *
>> + * In the last case all affected devices which are opened by this user
>> + * must have been bound to a same iommufd. If the calling device is in
>> + * noiommu mode (no valid iommufd) then it can be reset only if the reset
>> + * doesn't affect other devices.
>> *
>> * Return: 0 on success, -errno on failure.
>> */
^ permalink raw reply [flat|nested] 142+ messages in thread* RE: [PATCH v3 05/12] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
2023-04-05 8:02 ` Eric Auger
@ 2023-04-05 8:09 ` Liu, Yi L
0 siblings, 0 replies; 142+ messages in thread
From: Liu, Yi L @ 2023-04-05 8:09 UTC (permalink / raw)
To: eric.auger@redhat.com, Alex Williamson
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
Hi Eric,
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Wednesday, April 5, 2023 4:02 PM
>
> On 4/4/23 22:18, Alex Williamson wrote:
> > On Sat, 1 Apr 2023 07:44:22 -0700
> > Yi Liu <yi.l.liu@intel.com> wrote:
> >
> >> as an alternative method for ownership check when iommufd is used. In
> >> this case all opened devices in the affected dev_set are verified to
> >> be bound to a same valid iommufd value to allow reset. It's simpler
> >> and faster as user does not need to pass a set of fds and kernel no
> >> need to search the device within the given fds.
> >>
> >> a device in noiommu mode doesn't have a valid iommufd, so this method
> >> should not be used in a dev_set which contains multiple devices and one
> >> of them is in noiommu. The only allowed noiommu scenario is that the
> >> calling device is noiommu and it's in a singleton dev_set.
> >>
> >> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> >> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> >> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> >> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> >> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> >> ---
> >> drivers/vfio/pci/vfio_pci_core.c | 42 +++++++++++++++++++++++++++-----
> >> include/uapi/linux/vfio.h | 9 ++++++-
> >> 2 files changed, 44 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> >> index 3696b8e58445..b68fcba67a4b 100644
> >> --- a/drivers/vfio/pci/vfio_pci_core.c
> >> +++ b/drivers/vfio/pci/vfio_pci_core.c
> >> @@ -180,7 +180,8 @@ static void vfio_pci_probe_mmaps(struct
> vfio_pci_core_device *vdev)
> >> struct vfio_pci_group_info;
> >> static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
> >> static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> >> - struct vfio_pci_group_info *groups);
> >> + struct vfio_pci_group_info *groups,
> >> + struct iommufd_ctx *iommufd_ctx);
> >>
> >> /*
> >> * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
> >> @@ -1277,7 +1278,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct
> vfio_pci_core_device *vdev,
> >> return ret;
> >>
> >> /* Somewhere between 1 and count is OK */
> >> - if (!hdr->count || hdr->count > count)
> >> + if (hdr->count > count)
> >> return -EINVAL;
> >>
> >> group_fds = kcalloc(hdr->count, sizeof(*group_fds), GFP_KERNEL);
> >> @@ -1326,7 +1327,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct
> vfio_pci_core_device *vdev,
> >> info.count = hdr->count;
> >> info.files = files;
> >>
> >> - ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
> >> + ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info, NULL);
> >>
> >> hot_reset_release:
> >> for (file_idx--; file_idx >= 0; file_idx--)
> >> @@ -1341,6 +1342,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
> >> {
> >> unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
> >> struct vfio_pci_hot_reset hdr;
> >> + struct iommufd_ctx *iommufd;
> >> bool slot = false;
> >>
> >> if (copy_from_user(&hdr, arg, minsz))
> >> @@ -1355,7 +1357,12 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
> >> else if (pci_probe_reset_bus(vdev->pdev->bus))
> >> return -ENODEV;
> >>
> >> - return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
> >> + if (hdr.count)
> >> + return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
> >> +
> >> + iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> >> +
> >> + return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL, iommufd);
> >> }
> >>
> >> static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
> >> @@ -2327,6 +2334,9 @@ static bool vfio_dev_in_groups(struct
> vfio_pci_core_device *vdev,
> >> {
> >> unsigned int i;
> >>
> >> + if (!groups)
> >> + return false;
> >> +
> >> for (i = 0; i < groups->count; i++)
> >> if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
> >> return true;
> >> @@ -2402,13 +2412,25 @@ static int vfio_pci_dev_set_pm_runtime_get(struct
> vfio_device_set *dev_set)
> >> return ret;
> >> }
> >>
> >> +static bool vfio_dev_in_iommufd_ctx(struct vfio_pci_core_device *vdev,
> >> + struct iommufd_ctx *iommufd_ctx)
> >> +{
> >> + struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> >> +
> >> + if (!iommufd)
> >> + return false;
> >> +
> >> + return iommufd == iommufd_ctx;
> >> +}
> >> +
> >> /*
> >> * We need to get memory_lock for each device, but devices can share mmap_lock,
> >> * therefore we need to zap and hold the vma_lock for each device, and only then
> >> * get each memory_lock.
> >> */
> >> static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> >> - struct vfio_pci_group_info *groups)
> >> + struct vfio_pci_group_info *groups,
> >> + struct iommufd_ctx *iommufd_ctx)
> >> {
> >> struct vfio_pci_core_device *cur_mem;
> >> struct vfio_pci_core_device *cur_vma;
> >> @@ -2448,9 +2470,17 @@ static int vfio_pci_dev_set_hot_reset(struct
> vfio_device_set *dev_set,
> >> *
> >> * Otherwise all opened devices in the dev_set must be
> >> * contained by the set of groups provided by the user.
> >> + *
> >> + * If user provides a zero-length array, then all the
> >> + * opened devices must be bound to a same iommufd_ctx.
> >> + *
> >> + * If all above checks are failed, reset is allowed only if
> >> + * the calling device is in a singleton dev_set.
> >> */
> >> if (cur_vma->vdev.open_count &&
> >> - !vfio_dev_in_groups(cur_vma, groups)) {
> >> + !vfio_dev_in_groups(cur_vma, groups) &&
> >> + !vfio_dev_in_iommufd_ctx(cur_vma, iommufd_ctx) &&
> >> + (dev_set->device_count > 1)) {
> > This last condition looks buggy to me, we need all conditions to be
> > true to generate an error here, which means that for a singleton
> > dev_set, it doesn't matter what group fds are passed, if any, or whether
> > the iommufd context matches. I think in fact this means that the empty
> > array path is equally available for group use cases with a singleton
> > dev_set, but we don't enable it for multiple device dev_sets like we do
> > iommufd.
> >
> > You pointed out a previous issue with hot-reset info and no-iommu where
> > if other affected devices are not bound to vfio-pci the info ioctl
> > returns error. That's handled in the hot-reset ioctl by the fact that
> > all affected devices must be in the dev_set and therefore bound to
> > vfio-pci drivers. So it seems to me that aside from the spurious error
> > because we can't report an iommu group when none exists, and didn't
> > spot it to invent an invalid group for debugging, hot-reset otherwise
> > works with no-iommu just like it does for iommu backed devices. We
> > don't currently require singleton no-iommu dev_sets afaict.
> >
> > I'll also note that if the dev_set is singleton, this suggests that
> > pci_reset_function() can make use of bus reset, so a hot-reset is
> > accessible via VFIO_DEVICE_RESET if the appropriate reset method is
> > selected.
> >
> > Therefore, I think as written, the singleton dev_set hot-reset is
> > enabled for iommufd and (unintentionally?) for the group path, while
> > also negating a requirement for a group fd or that a provided group fd
> > actually matches the device in this latter case. The null-array
> > approach is not however extended to groups for more general use.
> > Additionally, limiting no-iommu hot-reset to singleton dev_sets
> > provides only a marginal functional difference vs VFIO_DEVICE_RESET.
> > Thanks,
> >
> > Alex
> What bout introducing a helper
> static bool is_reset_ok(pdev, groups, ctx) {
> if (!pdev->vdev.open_count)
> return true;
> if (groups && vfio_dev_in_groups(pdev, groups))
> return true;
> if (ctx && vfio_dev_in_iommufd_ctx(pdev, ctx)
> return true;
> return false;
> }
>
> Assuming the above logic is correct I think this would make the code
> more readable
this logic may fail the noiommu devices in the cdev path as the
cdev path binds the devices to iommufd==-1. The ctx would be
NULL. So we agreed to allow the reset if the dev_set is sigletion.
Detail can be found in below paragraph. As I replied in another
email. Maybe this singleton support can be dropped since singleton
dev_set may just do reset with VFIO_DEVICE_RESET. Alex may
correct me if userspace is not so intelligent.
"However the iommufd method has difficulty working with noiommu devices
since those devices don't have a valid iommufd, unless the noiommu device
is in a singleton dev_set hence no ownership check is required. [3]
[3] https://lore.kernel.org/kvm/ZACX+Np%2FIY7ygqL5@nvidia.com/"
Regards,
Yi Liu
> Thanks
>
> Eric
> >> ret = -EINVAL;
> >> goto err_undo;
> >> }
> >> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >> index f96e5689cffc..17aa5d09db41 100644
> >> --- a/include/uapi/linux/vfio.h
> >> +++ b/include/uapi/linux/vfio.h
> >> @@ -679,7 +679,14 @@ struct vfio_pci_hot_reset_info {
> >> * the calling user must ensure all affected devices, if opened, are
> >> * owned by itself.
> >> *
> >> - * The ownership is proved by an array of group fds.
> >> + * The ownership can be proved by:
> >> + * - An array of group fds
> >> + * - A zero-length array
> >> + *
> >> + * In the last case all affected devices which are opened by this user
> >> + * must have been bound to a same iommufd. If the calling device is in
> >> + * noiommu mode (no valid iommufd) then it can be reset only if the reset
> >> + * doesn't affect other devices.
> >> *
> >> * Return: 0 on success, -errno on failure.
> >> */
^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH v3 06/12] vfio: Refine vfio file kAPIs for vfio PCI hot reset
2023-04-01 14:44 [PATCH v3 00/12] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
` (4 preceding siblings ...)
2023-04-01 14:44 ` [PATCH v3 05/12] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
@ 2023-04-01 14:44 ` Yi Liu
2023-04-05 8:27 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 07/12] vfio: Accpet device file from vfio PCI hot reset path Yi Liu
` (5 subsequent siblings)
11 siblings, 1 reply; 142+ messages in thread
From: Yi Liu @ 2023-04-01 14:44 UTC (permalink / raw)
To: alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
This prepares vfio core to accept vfio device file from the vfio PCI
hot reset path. vfio_file_is_group() is still kept for KVM usage.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
drivers/vfio/group.c | 32 ++++++++++++++------------------
drivers/vfio/pci/vfio_pci_core.c | 4 ++--
drivers/vfio/vfio.h | 2 ++
drivers/vfio/vfio_main.c | 29 +++++++++++++++++++++++++++++
include/linux/vfio.h | 1 +
5 files changed, 48 insertions(+), 20 deletions(-)
diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 27d5ba7cf9dc..d0c95d033605 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -745,6 +745,15 @@ bool vfio_device_has_container(struct vfio_device *device)
return device->group->container;
}
+struct vfio_group *vfio_group_from_file(struct file *file)
+{
+ struct vfio_group *group = file->private_data;
+
+ if (file->f_op != &vfio_group_fops)
+ return NULL;
+ return group;
+}
+
/**
* vfio_file_iommu_group - Return the struct iommu_group for the vfio group file
* @file: VFIO group file
@@ -755,13 +764,13 @@ bool vfio_device_has_container(struct vfio_device *device)
*/
struct iommu_group *vfio_file_iommu_group(struct file *file)
{
- struct vfio_group *group = file->private_data;
+ struct vfio_group *group = vfio_group_from_file(file);
struct iommu_group *iommu_group = NULL;
if (!IS_ENABLED(CONFIG_SPAPR_TCE_IOMMU))
return NULL;
- if (!vfio_file_is_group(file))
+ if (!group)
return NULL;
mutex_lock(&group->group_lock);
@@ -775,12 +784,12 @@ struct iommu_group *vfio_file_iommu_group(struct file *file)
EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
/**
- * vfio_file_is_group - True if the file is usable with VFIO aPIS
+ * vfio_file_is_group - True if the file is a vfio group file
* @file: VFIO group file
*/
bool vfio_file_is_group(struct file *file)
{
- return file->f_op == &vfio_group_fops;
+ return vfio_group_from_file(file);
}
EXPORT_SYMBOL_GPL(vfio_file_is_group);
@@ -842,23 +851,10 @@ void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
}
EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
-/**
- * vfio_file_has_dev - True if the VFIO file is a handle for device
- * @file: VFIO file to check
- * @device: Device that must be part of the file
- *
- * Returns true if given file has permission to manipulate the given device.
- */
-bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
+bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device)
{
- struct vfio_group *group = file->private_data;
-
- if (!vfio_file_is_group(file))
- return false;
-
return group == device->group;
}
-EXPORT_SYMBOL_GPL(vfio_file_has_dev);
static char *vfio_devnode(const struct device *dev, umode_t *mode)
{
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index b68fcba67a4b..2a510b71edcb 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1308,8 +1308,8 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
break;
}
- /* Ensure the FD is a vfio group FD.*/
- if (!vfio_file_is_group(file)) {
+ /* Ensure the FD is a vfio FD. vfio group or vfio device */
+ if (!vfio_file_is_valid(file)) {
fput(file);
ret = -EINVAL;
break;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 7b19c621e0e6..c0aeea24fbd6 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -84,6 +84,8 @@ void vfio_device_group_unregister(struct vfio_device *device);
int vfio_device_group_use_iommu(struct vfio_device *device);
void vfio_device_group_unuse_iommu(struct vfio_device *device);
void vfio_device_group_close(struct vfio_device *device);
+struct vfio_group *vfio_group_from_file(struct file *file);
+bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device);
bool vfio_device_has_container(struct vfio_device *device);
int __init vfio_group_init(void);
void vfio_group_cleanup(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 89497c933490..fe7446805afd 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1154,6 +1154,35 @@ const struct file_operations vfio_device_fops = {
.mmap = vfio_device_fops_mmap,
};
+/**
+ * vfio_file_is_valid - True if the file is valid vfio file
+ * @file: VFIO group file or VFIO device file
+ */
+bool vfio_file_is_valid(struct file *file)
+{
+ return vfio_group_from_file(file);
+}
+EXPORT_SYMBOL_GPL(vfio_file_is_valid);
+
+/**
+ * vfio_file_has_dev - True if the VFIO file is a handle for device
+ * @file: VFIO file to check
+ * @device: Device that must be part of the file
+ *
+ * Returns true if given file has permission to manipulate the given device.
+ */
+bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
+{
+ struct vfio_group *group;
+
+ group = vfio_group_from_file(file);
+ if (!group)
+ return false;
+
+ return vfio_group_has_dev(group, device);
+}
+EXPORT_SYMBOL_GPL(vfio_file_has_dev);
+
/*
* Sub-module support
*/
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 97a1174b922f..f8fb9ab25188 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -258,6 +258,7 @@ int vfio_mig_get_next_state(struct vfio_device *device,
*/
struct iommu_group *vfio_file_iommu_group(struct file *file);
bool vfio_file_is_group(struct file *file);
+bool vfio_file_is_valid(struct file *file);
bool vfio_file_enforced_coherent(struct file *file);
void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
--
2.34.1
^ permalink raw reply related [flat|nested] 142+ messages in thread* Re: [PATCH v3 06/12] vfio: Refine vfio file kAPIs for vfio PCI hot reset
2023-04-01 14:44 ` [PATCH v3 06/12] vfio: Refine vfio file kAPIs for vfio PCI hot reset Yi Liu
@ 2023-04-05 8:27 ` Eric Auger
2023-04-05 9:23 ` Liu, Yi L
0 siblings, 1 reply; 142+ messages in thread
From: Eric Auger @ 2023-04-05 8:27 UTC (permalink / raw)
To: Yi Liu, alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang
Hi Yi,
On 4/1/23 16:44, Yi Liu wrote:
> This prepares vfio core to accept vfio device file from the vfio PCI
> hot reset path. vfio_file_is_group() is still kept for KVM usage.
>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> drivers/vfio/group.c | 32 ++++++++++++++------------------
> drivers/vfio/pci/vfio_pci_core.c | 4 ++--
> drivers/vfio/vfio.h | 2 ++
> drivers/vfio/vfio_main.c | 29 +++++++++++++++++++++++++++++
> include/linux/vfio.h | 1 +
> 5 files changed, 48 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 27d5ba7cf9dc..d0c95d033605 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -745,6 +745,15 @@ bool vfio_device_has_container(struct vfio_device *device)
> return device->group->container;
> }
>
> +struct vfio_group *vfio_group_from_file(struct file *file)
> +{
> + struct vfio_group *group = file->private_data;
> +
> + if (file->f_op != &vfio_group_fops)
> + return NULL;
> + return group;
> +}
> +
> /**
> * vfio_file_iommu_group - Return the struct iommu_group for the vfio group file
> * @file: VFIO group file
> @@ -755,13 +764,13 @@ bool vfio_device_has_container(struct vfio_device *device)
> */
> struct iommu_group *vfio_file_iommu_group(struct file *file)
> {
> - struct vfio_group *group = file->private_data;
> + struct vfio_group *group = vfio_group_from_file(file);
> struct iommu_group *iommu_group = NULL;
>
> if (!IS_ENABLED(CONFIG_SPAPR_TCE_IOMMU))
> return NULL;
>
> - if (!vfio_file_is_group(file))
> + if (!group)
> return NULL;
>
> mutex_lock(&group->group_lock);
> @@ -775,12 +784,12 @@ struct iommu_group *vfio_file_iommu_group(struct file *file)
> EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
>
> /**
> - * vfio_file_is_group - True if the file is usable with VFIO aPIS
> + * vfio_file_is_group - True if the file is a vfio group file
> * @file: VFIO group file
> */
> bool vfio_file_is_group(struct file *file)
> {
> - return file->f_op == &vfio_group_fops;
> + return vfio_group_from_file(file);
> }
> EXPORT_SYMBOL_GPL(vfio_file_is_group);
>
> @@ -842,23 +851,10 @@ void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
> }
> EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
>
> -/**
> - * vfio_file_has_dev - True if the VFIO file is a handle for device
> - * @file: VFIO file to check
> - * @device: Device that must be part of the file
> - *
> - * Returns true if given file has permission to manipulate the given device.
> - */
> -bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
> +bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device)
> {
> - struct vfio_group *group = file->private_data;
> -
> - if (!vfio_file_is_group(file))
> - return false;
> -
> return group == device->group;
> }
> -EXPORT_SYMBOL_GPL(vfio_file_has_dev);
>
> static char *vfio_devnode(const struct device *dev, umode_t *mode)
> {
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index b68fcba67a4b..2a510b71edcb 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1308,8 +1308,8 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
> break;
> }
>
> - /* Ensure the FD is a vfio group FD.*/
> - if (!vfio_file_is_group(file)) {
> + /* Ensure the FD is a vfio FD. vfio group or vfio device */
it is a bit strange to update the comment here and in the other places
in this patch whereas file_is_valid still sticks to group file check
By the way I would simply remove the comment which does not bring much
> + if (!vfio_file_is_valid(file)) {
> fput(file);
> ret = -EINVAL;
> break;
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 7b19c621e0e6..c0aeea24fbd6 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -84,6 +84,8 @@ void vfio_device_group_unregister(struct vfio_device *device);
> int vfio_device_group_use_iommu(struct vfio_device *device);
> void vfio_device_group_unuse_iommu(struct vfio_device *device);
> void vfio_device_group_close(struct vfio_device *device);
> +struct vfio_group *vfio_group_from_file(struct file *file);
> +bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device);
> bool vfio_device_has_container(struct vfio_device *device);
> int __init vfio_group_init(void);
> void vfio_group_cleanup(void);
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 89497c933490..fe7446805afd 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -1154,6 +1154,35 @@ const struct file_operations vfio_device_fops = {
> .mmap = vfio_device_fops_mmap,
> };
>
> +/**
> + * vfio_file_is_valid - True if the file is valid vfio file
> + * @file: VFIO group file or VFIO device file
I wonder if you shouldn't squash with next patch tbh.
> + */
> +bool vfio_file_is_valid(struct file *file)
> +{
> + return vfio_group_from_file(file);
> +}
> +EXPORT_SYMBOL_GPL(vfio_file_is_valid);
> +
> +/**
> + * vfio_file_has_dev - True if the VFIO file is a handle for device
> + * @file: VFIO file to check
> + * @device: Device that must be part of the file
> + *
> + * Returns true if given file has permission to manipulate the given device.
> + */
> +bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
> +{
> + struct vfio_group *group;
> +
> + group = vfio_group_from_file(file);
> + if (!group)
> + return false;
> +
> + return vfio_group_has_dev(group, device);
> +}
> +EXPORT_SYMBOL_GPL(vfio_file_has_dev);
> +
> /*
> * Sub-module support
> */
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 97a1174b922f..f8fb9ab25188 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -258,6 +258,7 @@ int vfio_mig_get_next_state(struct vfio_device *device,
> */
> struct iommu_group *vfio_file_iommu_group(struct file *file);
> bool vfio_file_is_group(struct file *file);
> +bool vfio_file_is_valid(struct file *file);
> bool vfio_file_enforced_coherent(struct file *file);
> void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
> bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
Eric
^ permalink raw reply [flat|nested] 142+ messages in thread* RE: [PATCH v3 06/12] vfio: Refine vfio file kAPIs for vfio PCI hot reset
2023-04-05 8:27 ` Eric Auger
@ 2023-04-05 9:23 ` Liu, Yi L
0 siblings, 0 replies; 142+ messages in thread
From: Liu, Yi L @ 2023-04-05 9:23 UTC (permalink / raw)
To: eric.auger@redhat.com, alex.williamson@redhat.com, jgg@nvidia.com,
Tian, Kevin
Cc: joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Wednesday, April 5, 2023 4:28 PM
>
> Hi Yi,
> On 4/1/23 16:44, Yi Liu wrote:
> > This prepares vfio core to accept vfio device file from the vfio PCI
> > hot reset path. vfio_file_is_group() is still kept for KVM usage.
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> > drivers/vfio/group.c | 32 ++++++++++++++------------------
> > drivers/vfio/pci/vfio_pci_core.c | 4 ++--
> > drivers/vfio/vfio.h | 2 ++
> > drivers/vfio/vfio_main.c | 29 +++++++++++++++++++++++++++++
> > include/linux/vfio.h | 1 +
> > 5 files changed, 48 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index 27d5ba7cf9dc..d0c95d033605 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -745,6 +745,15 @@ bool vfio_device_has_container(struct vfio_device *device)
> > return device->group->container;
> > }
> >
> > +struct vfio_group *vfio_group_from_file(struct file *file)
> > +{
> > + struct vfio_group *group = file->private_data;
> > +
> > + if (file->f_op != &vfio_group_fops)
> > + return NULL;
> > + return group;
> > +}
> > +
> > /**
> > * vfio_file_iommu_group - Return the struct iommu_group for the vfio group file
> > * @file: VFIO group file
> > @@ -755,13 +764,13 @@ bool vfio_device_has_container(struct vfio_device
> *device)
> > */
> > struct iommu_group *vfio_file_iommu_group(struct file *file)
> > {
> > - struct vfio_group *group = file->private_data;
> > + struct vfio_group *group = vfio_group_from_file(file);
> > struct iommu_group *iommu_group = NULL;
> >
> > if (!IS_ENABLED(CONFIG_SPAPR_TCE_IOMMU))
> > return NULL;
> >
> > - if (!vfio_file_is_group(file))
> > + if (!group)
> > return NULL;
> >
> > mutex_lock(&group->group_lock);
> > @@ -775,12 +784,12 @@ struct iommu_group *vfio_file_iommu_group(struct file
> *file)
> > EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
> >
> > /**
> > - * vfio_file_is_group - True if the file is usable with VFIO aPIS
> > + * vfio_file_is_group - True if the file is a vfio group file
> > * @file: VFIO group file
> > */
> > bool vfio_file_is_group(struct file *file)
> > {
> > - return file->f_op == &vfio_group_fops;
> > + return vfio_group_from_file(file);
> > }
> > EXPORT_SYMBOL_GPL(vfio_file_is_group);
> >
> > @@ -842,23 +851,10 @@ void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
> > }
> > EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
> >
> > -/**
> > - * vfio_file_has_dev - True if the VFIO file is a handle for device
> > - * @file: VFIO file to check
> > - * @device: Device that must be part of the file
> > - *
> > - * Returns true if given file has permission to manipulate the given device.
> > - */
> > -bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
> > +bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device)
> > {
> > - struct vfio_group *group = file->private_data;
> > -
> > - if (!vfio_file_is_group(file))
> > - return false;
> > -
> > return group == device->group;
> > }
> > -EXPORT_SYMBOL_GPL(vfio_file_has_dev);
> >
> > static char *vfio_devnode(const struct device *dev, umode_t *mode)
> > {
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index b68fcba67a4b..2a510b71edcb 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -1308,8 +1308,8 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct
> vfio_pci_core_device *vdev,
> > break;
> > }
> >
> > - /* Ensure the FD is a vfio group FD.*/
> > - if (!vfio_file_is_group(file)) {
> > + /* Ensure the FD is a vfio FD. vfio group or vfio device */
> it is a bit strange to update the comment here and in the other places
> in this patch whereas file_is_valid still sticks to group file check
> By the way I would simply remove the comment which does not bring much
ok. yeah, at this moment, it's still group file. may just delete this comment.
> > + if (!vfio_file_is_valid(file)) {
> > fput(file);
> > ret = -EINVAL;
> > break;
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index 7b19c621e0e6..c0aeea24fbd6 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -84,6 +84,8 @@ void vfio_device_group_unregister(struct vfio_device *device);
> > int vfio_device_group_use_iommu(struct vfio_device *device);
> > void vfio_device_group_unuse_iommu(struct vfio_device *device);
> > void vfio_device_group_close(struct vfio_device *device);
> > +struct vfio_group *vfio_group_from_file(struct file *file);
> > +bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device);
> > bool vfio_device_has_container(struct vfio_device *device);
> > int __init vfio_group_init(void);
> > void vfio_group_cleanup(void);
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 89497c933490..fe7446805afd 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -1154,6 +1154,35 @@ const struct file_operations vfio_device_fops = {
> > .mmap = vfio_device_fops_mmap,
> > };
> >
> > +/**
> > + * vfio_file_is_valid - True if the file is valid vfio file
> > + * @file: VFIO group file or VFIO device file
> I wonder if you shouldn't squash with next patch tbh.
yes. this is still group file, no device file yet.
Thanks,
Yi Liu
> > + */
> > +bool vfio_file_is_valid(struct file *file)
> > +{
> > + return vfio_group_from_file(file);
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_file_is_valid);
> > +
> > +/**
> > + * vfio_file_has_dev - True if the VFIO file is a handle for device
> > + * @file: VFIO file to check
> > + * @device: Device that must be part of the file
> > + *
> > + * Returns true if given file has permission to manipulate the given device.
> > + */
> > +bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
> > +{
> > + struct vfio_group *group;
> > +
> > + group = vfio_group_from_file(file);
> > + if (!group)
> > + return false;
> > +
> > + return vfio_group_has_dev(group, device);
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_file_has_dev);
> > +
> > /*
> > * Sub-module support
> > */
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index 97a1174b922f..f8fb9ab25188 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -258,6 +258,7 @@ int vfio_mig_get_next_state(struct vfio_device *device,
> > */
> > struct iommu_group *vfio_file_iommu_group(struct file *file);
> > bool vfio_file_is_group(struct file *file);
> > +bool vfio_file_is_valid(struct file *file);
> > bool vfio_file_enforced_coherent(struct file *file);
> > void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
> > bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
> Eric
^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH v3 07/12] vfio: Accpet device file from vfio PCI hot reset path
2023-04-01 14:44 [PATCH v3 00/12] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
` (5 preceding siblings ...)
2023-04-01 14:44 ` [PATCH v3 06/12] vfio: Refine vfio file kAPIs for vfio PCI hot reset Yi Liu
@ 2023-04-01 14:44 ` Yi Liu
2023-04-04 20:31 ` Alex Williamson
2023-04-05 8:07 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 08/12] vfio/pci: Renaming for accepting device fd in " Yi Liu
` (4 subsequent siblings)
11 siblings, 2 replies; 142+ messages in thread
From: Yi Liu @ 2023-04-01 14:44 UTC (permalink / raw)
To: alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
This extends both vfio_file_is_valid() and vfio_file_has_dev() to accept
device file from the vfio PCI hot reset.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
drivers/vfio/vfio_main.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index fe7446805afd..ebbb6b91a498 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1154,13 +1154,23 @@ const struct file_operations vfio_device_fops = {
.mmap = vfio_device_fops_mmap,
};
+static struct vfio_device *vfio_device_from_file(struct file *file)
+{
+ struct vfio_device *device = file->private_data;
+
+ if (file->f_op != &vfio_device_fops)
+ return NULL;
+ return device;
+}
+
/**
* vfio_file_is_valid - True if the file is valid vfio file
* @file: VFIO group file or VFIO device file
*/
bool vfio_file_is_valid(struct file *file)
{
- return vfio_group_from_file(file);
+ return vfio_group_from_file(file) ||
+ vfio_device_from_file(file);
}
EXPORT_SYMBOL_GPL(vfio_file_is_valid);
@@ -1174,12 +1184,17 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid);
bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
{
struct vfio_group *group;
+ struct vfio_device *vdev;
group = vfio_group_from_file(file);
- if (!group)
- return false;
+ if (group)
+ return vfio_group_has_dev(group, device);
+
+ vdev = vfio_device_from_file(file);
+ if (vdev)
+ return vdev == device;
- return vfio_group_has_dev(group, device);
+ return false;
}
EXPORT_SYMBOL_GPL(vfio_file_has_dev);
--
2.34.1
^ permalink raw reply related [flat|nested] 142+ messages in thread* Re: [PATCH v3 07/12] vfio: Accpet device file from vfio PCI hot reset path
2023-04-01 14:44 ` [PATCH v3 07/12] vfio: Accpet device file from vfio PCI hot reset path Yi Liu
@ 2023-04-04 20:31 ` Alex Williamson
2023-04-05 8:07 ` Eric Auger
1 sibling, 0 replies; 142+ messages in thread
From: Alex Williamson @ 2023-04-04 20:31 UTC (permalink / raw)
To: Yi Liu
Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger, nicolinc,
kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
On Sat, 1 Apr 2023 07:44:24 -0700
Yi Liu <yi.l.liu@intel.com> wrote:
> This extends both vfio_file_is_valid() and vfio_file_has_dev() to accept
> device file from the vfio PCI hot reset.
>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> drivers/vfio/vfio_main.c | 23 +++++++++++++++++++----
> 1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index fe7446805afd..ebbb6b91a498 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -1154,13 +1154,23 @@ const struct file_operations vfio_device_fops = {
> .mmap = vfio_device_fops_mmap,
> };
>
> +static struct vfio_device *vfio_device_from_file(struct file *file)
> +{
> + struct vfio_device *device = file->private_data;
> +
> + if (file->f_op != &vfio_device_fops)
> + return NULL;
> + return device;
> +}
> +
> /**
> * vfio_file_is_valid - True if the file is valid vfio file
> * @file: VFIO group file or VFIO device file
> */
> bool vfio_file_is_valid(struct file *file)
> {
> - return vfio_group_from_file(file);
> + return vfio_group_from_file(file) ||
> + vfio_device_from_file(file);
> }
> EXPORT_SYMBOL_GPL(vfio_file_is_valid);
>
> @@ -1174,12 +1184,17 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid);
> bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
> {
> struct vfio_group *group;
> + struct vfio_device *vdev;
>
> group = vfio_group_from_file(file);
> - if (!group)
> - return false;
> + if (group)
> + return vfio_group_has_dev(group, device);
> +
> + vdev = vfio_device_from_file(file);
> + if (vdev)
> + return vdev == device;
>
> - return vfio_group_has_dev(group, device);
> + return false;
Nit, unless we expect to be testing against NULL devices, this could
just be:
return device == vfio_device_from_file(file);
Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread* Re: [PATCH v3 07/12] vfio: Accpet device file from vfio PCI hot reset path
2023-04-01 14:44 ` [PATCH v3 07/12] vfio: Accpet device file from vfio PCI hot reset path Yi Liu
2023-04-04 20:31 ` Alex Williamson
@ 2023-04-05 8:07 ` Eric Auger
2023-04-05 8:10 ` Liu, Yi L
1 sibling, 1 reply; 142+ messages in thread
From: Eric Auger @ 2023-04-05 8:07 UTC (permalink / raw)
To: Yi Liu, alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang
Hi Yi,
On 4/1/23 16:44, Yi Liu wrote:
> This extends both vfio_file_is_valid() and vfio_file_has_dev() to accept
> device file from the vfio PCI hot reset.
typo in the title s/Accpet/Accept
>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> drivers/vfio/vfio_main.c | 23 +++++++++++++++++++----
> 1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index fe7446805afd..ebbb6b91a498 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -1154,13 +1154,23 @@ const struct file_operations vfio_device_fops = {
> .mmap = vfio_device_fops_mmap,
> };
>
> +static struct vfio_device *vfio_device_from_file(struct file *file)
> +{
> + struct vfio_device *device = file->private_data;
> +
> + if (file->f_op != &vfio_device_fops)
> + return NULL;
> + return device;
> +}
> +
> /**
> * vfio_file_is_valid - True if the file is valid vfio file
> * @file: VFIO group file or VFIO device file
> */
> bool vfio_file_is_valid(struct file *file)
> {
> - return vfio_group_from_file(file);
> + return vfio_group_from_file(file) ||
> + vfio_device_from_file(file);
> }
> EXPORT_SYMBOL_GPL(vfio_file_is_valid);
>
> @@ -1174,12 +1184,17 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid);
> bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
> {
> struct vfio_group *group;
> + struct vfio_device *vdev;
>
> group = vfio_group_from_file(file);
> - if (!group)
> - return false;
> + if (group)
> + return vfio_group_has_dev(group, device);
> +
> + vdev = vfio_device_from_file(file);
> + if (vdev)
> + return vdev == device;
>
> - return vfio_group_has_dev(group, device);
> + return false;
> }
> EXPORT_SYMBOL_GPL(vfio_file_has_dev);
>
With Alex' suggestion
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
^ permalink raw reply [flat|nested] 142+ messages in thread* RE: [PATCH v3 07/12] vfio: Accpet device file from vfio PCI hot reset path
2023-04-05 8:07 ` Eric Auger
@ 2023-04-05 8:10 ` Liu, Yi L
0 siblings, 0 replies; 142+ messages in thread
From: Liu, Yi L @ 2023-04-05 8:10 UTC (permalink / raw)
To: eric.auger@redhat.com, alex.williamson@redhat.com, jgg@nvidia.com,
Tian, Kevin
Cc: joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Wednesday, April 5, 2023 4:08 PM
>
> Hi Yi,
>
> On 4/1/23 16:44, Yi Liu wrote:
> > This extends both vfio_file_is_valid() and vfio_file_has_dev() to accept
> > device file from the vfio PCI hot reset.
> typo in the title s/Accpet/Accept
thanks. would correct it.
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> > drivers/vfio/vfio_main.c | 23 +++++++++++++++++++----
> > 1 file changed, 19 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index fe7446805afd..ebbb6b91a498 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -1154,13 +1154,23 @@ const struct file_operations vfio_device_fops = {
> > .mmap = vfio_device_fops_mmap,
> > };
> >
> > +static struct vfio_device *vfio_device_from_file(struct file *file)
> > +{
> > + struct vfio_device *device = file->private_data;
> > +
> > + if (file->f_op != &vfio_device_fops)
> > + return NULL;
> > + return device;
> > +}
> > +
> > /**
> > * vfio_file_is_valid - True if the file is valid vfio file
> > * @file: VFIO group file or VFIO device file
> > */
> > bool vfio_file_is_valid(struct file *file)
> > {
> > - return vfio_group_from_file(file);
> > + return vfio_group_from_file(file) ||
> > + vfio_device_from_file(file);
> > }
> > EXPORT_SYMBOL_GPL(vfio_file_is_valid);
> >
> > @@ -1174,12 +1184,17 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid);
> > bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
> > {
> > struct vfio_group *group;
> > + struct vfio_device *vdev;
> >
> > group = vfio_group_from_file(file);
> > - if (!group)
> > - return false;
> > + if (group)
> > + return vfio_group_has_dev(group, device);
> > +
> > + vdev = vfio_device_from_file(file);
> > + if (vdev)
> > + return vdev == device;
> >
> > - return vfio_group_has_dev(group, device);
> > + return false;
> > }
> > EXPORT_SYMBOL_GPL(vfio_file_has_dev);
> >
> With Alex' suggestion
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
>
> Eric
^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH v3 08/12] vfio/pci: Renaming for accepting device fd in hot reset path
2023-04-01 14:44 [PATCH v3 00/12] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
` (6 preceding siblings ...)
2023-04-01 14:44 ` [PATCH v3 07/12] vfio: Accpet device file from vfio PCI hot reset path Yi Liu
@ 2023-04-01 14:44 ` Yi Liu
2023-04-04 21:23 ` Alex Williamson
2023-04-05 9:32 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 09/12] vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl Yi Liu
` (3 subsequent siblings)
11 siblings, 2 replies; 142+ messages in thread
From: Yi Liu @ 2023-04-01 14:44 UTC (permalink / raw)
To: alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
No functional change is intended.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
drivers/vfio/pci/vfio_pci_core.c | 52 ++++++++++++++++----------------
1 file changed, 26 insertions(+), 26 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 2a510b71edcb..da6325008872 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -177,10 +177,10 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
}
}
-struct vfio_pci_group_info;
+struct vfio_pci_file_info;
static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
- struct vfio_pci_group_info *groups,
+ struct vfio_pci_file_info *info,
struct iommufd_ctx *iommufd_ctx);
/*
@@ -800,7 +800,7 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
return 0;
}
-struct vfio_pci_group_info {
+struct vfio_pci_file_info {
int count;
struct file **files;
};
@@ -1257,14 +1257,14 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
}
static int
-vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
- struct vfio_pci_hot_reset *hdr,
- bool slot,
- struct vfio_pci_hot_reset __user *arg)
+vfio_pci_ioctl_pci_hot_reset_files(struct vfio_pci_core_device *vdev,
+ struct vfio_pci_hot_reset *hdr,
+ bool slot,
+ struct vfio_pci_hot_reset __user *arg)
{
- int32_t *group_fds;
+ int32_t *fds;
struct file **files;
- struct vfio_pci_group_info info;
+ struct vfio_pci_file_info info;
int file_idx, count = 0, ret = 0;
/*
@@ -1281,17 +1281,17 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
if (hdr->count > count)
return -EINVAL;
- group_fds = kcalloc(hdr->count, sizeof(*group_fds), GFP_KERNEL);
+ fds = kcalloc(hdr->count, sizeof(*fds), GFP_KERNEL);
files = kcalloc(hdr->count, sizeof(*files), GFP_KERNEL);
- if (!group_fds || !files) {
- kfree(group_fds);
+ if (!fds || !files) {
+ kfree(fds);
kfree(files);
return -ENOMEM;
}
- if (copy_from_user(group_fds, arg->group_fds,
- hdr->count * sizeof(*group_fds))) {
- kfree(group_fds);
+ if (copy_from_user(fds, arg->group_fds,
+ hdr->count * sizeof(*fds))) {
+ kfree(fds);
kfree(files);
return -EFAULT;
}
@@ -1301,7 +1301,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
* the reset
*/
for (file_idx = 0; file_idx < hdr->count; file_idx++) {
- struct file *file = fget(group_fds[file_idx]);
+ struct file *file = fget(fds[file_idx]);
if (!file) {
ret = -EBADF;
@@ -1318,9 +1318,9 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
files[file_idx] = file;
}
- kfree(group_fds);
+ kfree(fds);
- /* release reference to groups on error */
+ /* release reference to fds on error */
if (ret)
goto hot_reset_release;
@@ -1358,7 +1358,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
return -ENODEV;
if (hdr.count)
- return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
+ return vfio_pci_ioctl_pci_hot_reset_files(vdev, &hdr, slot, arg);
iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
@@ -2329,16 +2329,16 @@ const struct pci_error_handlers vfio_pci_core_err_handlers = {
};
EXPORT_SYMBOL_GPL(vfio_pci_core_err_handlers);
-static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
- struct vfio_pci_group_info *groups)
+static bool vfio_dev_in_files(struct vfio_pci_core_device *vdev,
+ struct vfio_pci_file_info *info)
{
unsigned int i;
- if (!groups)
+ if (!info)
return false;
- for (i = 0; i < groups->count; i++)
- if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
+ for (i = 0; i < info->count; i++)
+ if (vfio_file_has_dev(info->files[i], &vdev->vdev))
return true;
return false;
}
@@ -2429,7 +2429,7 @@ static bool vfio_dev_in_iommufd_ctx(struct vfio_pci_core_device *vdev,
* get each memory_lock.
*/
static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
- struct vfio_pci_group_info *groups,
+ struct vfio_pci_file_info *info,
struct iommufd_ctx *iommufd_ctx)
{
struct vfio_pci_core_device *cur_mem;
@@ -2478,7 +2478,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
* the calling device is in a singleton dev_set.
*/
if (cur_vma->vdev.open_count &&
- !vfio_dev_in_groups(cur_vma, groups) &&
+ !vfio_dev_in_files(cur_vma, info) &&
!vfio_dev_in_iommufd_ctx(cur_vma, iommufd_ctx) &&
(dev_set->device_count > 1)) {
ret = -EINVAL;
--
2.34.1
^ permalink raw reply related [flat|nested] 142+ messages in thread* Re: [PATCH v3 08/12] vfio/pci: Renaming for accepting device fd in hot reset path
2023-04-01 14:44 ` [PATCH v3 08/12] vfio/pci: Renaming for accepting device fd in " Yi Liu
@ 2023-04-04 21:23 ` Alex Williamson
2023-04-05 9:32 ` Eric Auger
1 sibling, 0 replies; 142+ messages in thread
From: Alex Williamson @ 2023-04-04 21:23 UTC (permalink / raw)
To: Yi Liu
Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger, nicolinc,
kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
On Sat, 1 Apr 2023 07:44:25 -0700
Yi Liu <yi.l.liu@intel.com> wrote:
> No functional change is intended.
>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> drivers/vfio/pci/vfio_pci_core.c | 52 ++++++++++++++++----------------
> 1 file changed, 26 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 2a510b71edcb..da6325008872 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -177,10 +177,10 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
> }
> }
>
> -struct vfio_pci_group_info;
> +struct vfio_pci_file_info;
> static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
> static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> - struct vfio_pci_group_info *groups,
> + struct vfio_pci_file_info *info,
> struct iommufd_ctx *iommufd_ctx);
>
> /*
> @@ -800,7 +800,7 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> return 0;
> }
>
> -struct vfio_pci_group_info {
> +struct vfio_pci_file_info {
> int count;
> struct file **files;
> };
> @@ -1257,14 +1257,14 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
> }
>
> static int
> -vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
> - struct vfio_pci_hot_reset *hdr,
> - bool slot,
> - struct vfio_pci_hot_reset __user *arg)
> +vfio_pci_ioctl_pci_hot_reset_files(struct vfio_pci_core_device *vdev,
> + struct vfio_pci_hot_reset *hdr,
> + bool slot,
> + struct vfio_pci_hot_reset __user *arg)
> {
> - int32_t *group_fds;
> + int32_t *fds;
> struct file **files;
> - struct vfio_pci_group_info info;
> + struct vfio_pci_file_info info;
> int file_idx, count = 0, ret = 0;
>
> /*
> @@ -1281,17 +1281,17 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
> if (hdr->count > count)
> return -EINVAL;
>
> - group_fds = kcalloc(hdr->count, sizeof(*group_fds), GFP_KERNEL);
> + fds = kcalloc(hdr->count, sizeof(*fds), GFP_KERNEL);
> files = kcalloc(hdr->count, sizeof(*files), GFP_KERNEL);
> - if (!group_fds || !files) {
> - kfree(group_fds);
> + if (!fds || !files) {
> + kfree(fds);
> kfree(files);
> return -ENOMEM;
> }
>
> - if (copy_from_user(group_fds, arg->group_fds,
> - hdr->count * sizeof(*group_fds))) {
> - kfree(group_fds);
> + if (copy_from_user(fds, arg->group_fds,
> + hdr->count * sizeof(*fds))) {
> + kfree(fds);
> kfree(files);
> return -EFAULT;
> }
> @@ -1301,7 +1301,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
> * the reset
> */
> for (file_idx = 0; file_idx < hdr->count; file_idx++) {
> - struct file *file = fget(group_fds[file_idx]);
> + struct file *file = fget(fds[file_idx]);
>
> if (!file) {
> ret = -EBADF;
> @@ -1318,9 +1318,9 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
> files[file_idx] = file;
> }
>
> - kfree(group_fds);
> + kfree(fds);
>
> - /* release reference to groups on error */
> + /* release reference to fds on error */
> if (ret)
> goto hot_reset_release;
>
> @@ -1358,7 +1358,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
> return -ENODEV;
>
> if (hdr.count)
> - return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
> + return vfio_pci_ioctl_pci_hot_reset_files(vdev, &hdr, slot, arg);
>
> iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
>
> @@ -2329,16 +2329,16 @@ const struct pci_error_handlers vfio_pci_core_err_handlers = {
> };
> EXPORT_SYMBOL_GPL(vfio_pci_core_err_handlers);
>
> -static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
> - struct vfio_pci_group_info *groups)
> +static bool vfio_dev_in_files(struct vfio_pci_core_device *vdev,
> + struct vfio_pci_file_info *info)
> {
> unsigned int i;
>
> - if (!groups)
> + if (!info)
> return false;
>
> - for (i = 0; i < groups->count; i++)
> - if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
> + for (i = 0; i < info->count; i++)
> + if (vfio_file_has_dev(info->files[i], &vdev->vdev))
> return true;
> return false;
> }
> @@ -2429,7 +2429,7 @@ static bool vfio_dev_in_iommufd_ctx(struct vfio_pci_core_device *vdev,
> * get each memory_lock.
> */
> static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> - struct vfio_pci_group_info *groups,
> + struct vfio_pci_file_info *info,
> struct iommufd_ctx *iommufd_ctx)
> {
> struct vfio_pci_core_device *cur_mem;
> @@ -2478,7 +2478,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> * the calling device is in a singleton dev_set.
> */
> if (cur_vma->vdev.open_count &&
> - !vfio_dev_in_groups(cur_vma, groups) &&
> + !vfio_dev_in_files(cur_vma, info) &&
> !vfio_dev_in_iommufd_ctx(cur_vma, iommufd_ctx) &&
> (dev_set->device_count > 1)) {
> ret = -EINVAL;
At this point, vfio_dev_in_files() supports both group and cdev fds and
these can be used for both regular IOMMU protected and no-IOMMU
devices AFAICT. We only add this 1-off dev_set device count test and
its subtle side-effects in order to support the null-array mode, which
IMO really has yet to be shown as a requirement.
IIRC, we were wanting to add that mode as part of the cdev interface so
that the existence of cdevs implies this support, but now we're already
making use of vfio_pci_hot_reset_info.flags to indicate group-id vs
dev-id in the output, so does anything prevent us from setting another
bit there if/when this feature proves itself useful and error free, to
indicate it's an available mode for the hot-reset ioctl?
With that I think we could drop patches 4 & 5 with a plan for
introducing them later without trying to strong arm the feature in
without a proven and available use case now. Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread* Re: [PATCH v3 08/12] vfio/pci: Renaming for accepting device fd in hot reset path
2023-04-01 14:44 ` [PATCH v3 08/12] vfio/pci: Renaming for accepting device fd in " Yi Liu
2023-04-04 21:23 ` Alex Williamson
@ 2023-04-05 9:32 ` Eric Auger
1 sibling, 0 replies; 142+ messages in thread
From: Eric Auger @ 2023-04-05 9:32 UTC (permalink / raw)
To: Yi Liu, alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang
On 4/1/23 16:44, Yi Liu wrote:
> No functional change is intended.
>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
> ---
> drivers/vfio/pci/vfio_pci_core.c | 52 ++++++++++++++++----------------
> 1 file changed, 26 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 2a510b71edcb..da6325008872 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -177,10 +177,10 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
> }
> }
>
> -struct vfio_pci_group_info;
> +struct vfio_pci_file_info;
> static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
> static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> - struct vfio_pci_group_info *groups,
> + struct vfio_pci_file_info *info,
> struct iommufd_ctx *iommufd_ctx);
>
> /*
> @@ -800,7 +800,7 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> return 0;
> }
>
> -struct vfio_pci_group_info {
> +struct vfio_pci_file_info {
> int count;
> struct file **files;
> };
> @@ -1257,14 +1257,14 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
> }
>
> static int
> -vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
> - struct vfio_pci_hot_reset *hdr,
> - bool slot,
> - struct vfio_pci_hot_reset __user *arg)
> +vfio_pci_ioctl_pci_hot_reset_files(struct vfio_pci_core_device *vdev,
> + struct vfio_pci_hot_reset *hdr,
> + bool slot,
> + struct vfio_pci_hot_reset __user *arg)
> {
> - int32_t *group_fds;
> + int32_t *fds;
> struct file **files;
> - struct vfio_pci_group_info info;
> + struct vfio_pci_file_info info;
> int file_idx, count = 0, ret = 0;
>
> /*
> @@ -1281,17 +1281,17 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
> if (hdr->count > count)
> return -EINVAL;
>
> - group_fds = kcalloc(hdr->count, sizeof(*group_fds), GFP_KERNEL);
> + fds = kcalloc(hdr->count, sizeof(*fds), GFP_KERNEL);
> files = kcalloc(hdr->count, sizeof(*files), GFP_KERNEL);
> - if (!group_fds || !files) {
> - kfree(group_fds);
> + if (!fds || !files) {
> + kfree(fds);
> kfree(files);
> return -ENOMEM;
> }
>
> - if (copy_from_user(group_fds, arg->group_fds,
> - hdr->count * sizeof(*group_fds))) {
> - kfree(group_fds);
> + if (copy_from_user(fds, arg->group_fds,
> + hdr->count * sizeof(*fds))) {
> + kfree(fds);
> kfree(files);
> return -EFAULT;
> }
> @@ -1301,7 +1301,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
> * the reset
> */
> for (file_idx = 0; file_idx < hdr->count; file_idx++) {
> - struct file *file = fget(group_fds[file_idx]);
> + struct file *file = fget(fds[file_idx]);
>
> if (!file) {
> ret = -EBADF;
> @@ -1318,9 +1318,9 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
> files[file_idx] = file;
> }
>
> - kfree(group_fds);
> + kfree(fds);
>
> - /* release reference to groups on error */
> + /* release reference to fds on error */
> if (ret)
> goto hot_reset_release;
>
> @@ -1358,7 +1358,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
> return -ENODEV;
>
> if (hdr.count)
> - return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
> + return vfio_pci_ioctl_pci_hot_reset_files(vdev, &hdr, slot, arg);
>
> iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
>
> @@ -2329,16 +2329,16 @@ const struct pci_error_handlers vfio_pci_core_err_handlers = {
> };
> EXPORT_SYMBOL_GPL(vfio_pci_core_err_handlers);
>
> -static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
> - struct vfio_pci_group_info *groups)
> +static bool vfio_dev_in_files(struct vfio_pci_core_device *vdev,
> + struct vfio_pci_file_info *info)
> {
> unsigned int i;
>
> - if (!groups)
> + if (!info)
> return false;
>
> - for (i = 0; i < groups->count; i++)
> - if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
> + for (i = 0; i < info->count; i++)
> + if (vfio_file_has_dev(info->files[i], &vdev->vdev))
> return true;
> return false;
> }
> @@ -2429,7 +2429,7 @@ static bool vfio_dev_in_iommufd_ctx(struct vfio_pci_core_device *vdev,
> * get each memory_lock.
> */
> static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> - struct vfio_pci_group_info *groups,
> + struct vfio_pci_file_info *info,
> struct iommufd_ctx *iommufd_ctx)
> {
> struct vfio_pci_core_device *cur_mem;
> @@ -2478,7 +2478,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> * the calling device is in a singleton dev_set.
> */
> if (cur_vma->vdev.open_count &&
> - !vfio_dev_in_groups(cur_vma, groups) &&
> + !vfio_dev_in_files(cur_vma, info) &&
> !vfio_dev_in_iommufd_ctx(cur_vma, iommufd_ctx) &&
> (dev_set->device_count > 1)) {
> ret = -EINVAL;
^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH v3 09/12] vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl
2023-04-01 14:44 [PATCH v3 00/12] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
` (7 preceding siblings ...)
2023-04-01 14:44 ` [PATCH v3 08/12] vfio/pci: Renaming for accepting device fd in " Yi Liu
@ 2023-04-01 14:44 ` Yi Liu
2023-04-05 9:36 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 10/12] vfio: Mark cdev usage in vfio_device Yi Liu
` (2 subsequent siblings)
11 siblings, 1 reply; 142+ messages in thread
From: Yi Liu @ 2023-04-01 14:44 UTC (permalink / raw)
To: alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
Now user can also provide an array of device fds as a 3rd method to verify
the reset ownership. It's not useful at this point when the device fds are
acquired via group fds. But it's necessary when moving to device cdev which
allows the user to directly acquire device fds by skipping group. In that
case this method can be used as a last resort when the preferred iommufd
verification doesn't work, e.g. in noiommu usages.
Clarify it in uAPI.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
drivers/vfio/pci/vfio_pci_core.c | 9 +++++----
include/uapi/linux/vfio.h | 3 ++-
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index da6325008872..19f5b075d70a 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1289,7 +1289,7 @@ vfio_pci_ioctl_pci_hot_reset_files(struct vfio_pci_core_device *vdev,
return -ENOMEM;
}
- if (copy_from_user(fds, arg->group_fds,
+ if (copy_from_user(fds, arg->fds,
hdr->count * sizeof(*fds))) {
kfree(fds);
kfree(files);
@@ -1297,8 +1297,8 @@ vfio_pci_ioctl_pci_hot_reset_files(struct vfio_pci_core_device *vdev,
}
/*
- * Get the group file for each fd to ensure the group held across
- * the reset
+ * Get the file for each fd to ensure the group/device file
+ * is held across the reset
*/
for (file_idx = 0; file_idx < hdr->count; file_idx++) {
struct file *file = fget(fds[file_idx]);
@@ -2469,7 +2469,8 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
* cannot race being opened by another user simultaneously.
*
* Otherwise all opened devices in the dev_set must be
- * contained by the set of groups provided by the user.
+ * contained by the set of groups/devices provided by
+ * the user.
*
* If user provides a zero-length array, then all the
* opened devices must be bound to a same iommufd_ctx.
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 17aa5d09db41..25432ef213ee 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -681,6 +681,7 @@ struct vfio_pci_hot_reset_info {
*
* The ownership can be proved by:
* - An array of group fds
+ * - An array of device fds
* - A zero-length array
*
* In the last case all affected devices which are opened by this user
@@ -694,7 +695,7 @@ struct vfio_pci_hot_reset {
__u32 argsz;
__u32 flags;
__u32 count;
- __s32 group_fds[];
+ __s32 fds[];
};
#define VFIO_DEVICE_PCI_HOT_RESET _IO(VFIO_TYPE, VFIO_BASE + 13)
--
2.34.1
^ permalink raw reply related [flat|nested] 142+ messages in thread* Re: [PATCH v3 09/12] vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl
2023-04-01 14:44 ` [PATCH v3 09/12] vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl Yi Liu
@ 2023-04-05 9:36 ` Eric Auger
0 siblings, 0 replies; 142+ messages in thread
From: Eric Auger @ 2023-04-05 9:36 UTC (permalink / raw)
To: Yi Liu, alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang
On 4/1/23 16:44, Yi Liu wrote:
> Now user can also provide an array of device fds as a 3rd method to verify
> the reset ownership. It's not useful at this point when the device fds are
> acquired via group fds. But it's necessary when moving to device cdev which
> allows the user to directly acquire device fds by skipping group. In that
> case this method can be used as a last resort when the preferred iommufd
> verification doesn't work, e.g. in noiommu usages.
>
> Clarify it in uAPI.
>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
> ---
> drivers/vfio/pci/vfio_pci_core.c | 9 +++++----
> include/uapi/linux/vfio.h | 3 ++-
> 2 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index da6325008872..19f5b075d70a 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1289,7 +1289,7 @@ vfio_pci_ioctl_pci_hot_reset_files(struct vfio_pci_core_device *vdev,
> return -ENOMEM;
> }
>
> - if (copy_from_user(fds, arg->group_fds,
> + if (copy_from_user(fds, arg->fds,
> hdr->count * sizeof(*fds))) {
> kfree(fds);
> kfree(files);
> @@ -1297,8 +1297,8 @@ vfio_pci_ioctl_pci_hot_reset_files(struct vfio_pci_core_device *vdev,
> }
>
> /*
> - * Get the group file for each fd to ensure the group held across
> - * the reset
> + * Get the file for each fd to ensure the group/device file
> + * is held across the reset
> */
> for (file_idx = 0; file_idx < hdr->count; file_idx++) {
> struct file *file = fget(fds[file_idx]);
> @@ -2469,7 +2469,8 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> * cannot race being opened by another user simultaneously.
> *
> * Otherwise all opened devices in the dev_set must be
> - * contained by the set of groups provided by the user.
> + * contained by the set of groups/devices provided by
> + * the user.
> *
> * If user provides a zero-length array, then all the
> * opened devices must be bound to a same iommufd_ctx.
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 17aa5d09db41..25432ef213ee 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -681,6 +681,7 @@ struct vfio_pci_hot_reset_info {
> *
> * The ownership can be proved by:
> * - An array of group fds
> + * - An array of device fds
> * - A zero-length array
> *
> * In the last case all affected devices which are opened by this user
> @@ -694,7 +695,7 @@ struct vfio_pci_hot_reset {
> __u32 argsz;
> __u32 flags;
> __u32 count;
> - __s32 group_fds[];
> + __s32 fds[];
> };
>
> #define VFIO_DEVICE_PCI_HOT_RESET _IO(VFIO_TYPE, VFIO_BASE + 13)
^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH v3 10/12] vfio: Mark cdev usage in vfio_device
2023-04-01 14:44 [PATCH v3 00/12] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
` (8 preceding siblings ...)
2023-04-01 14:44 ` [PATCH v3 09/12] vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl Yi Liu
@ 2023-04-01 14:44 ` Yi Liu
2023-04-05 11:48 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 11/12] iommufd: Define IOMMUFD_INVALID_ID in uapi Yi Liu
2023-04-01 14:44 ` [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO Yi Liu
11 siblings, 1 reply; 142+ messages in thread
From: Yi Liu @ 2023-04-01 14:44 UTC (permalink / raw)
To: alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
There are users that need to check if vfio_device is opened as cdev.
e.g. vfio-pci. This adds a flag in vfio_device, it will be set in the
cdev path when device is opened. This is not used at this moment, but
a preparation for vfio device cdev support.
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
include/linux/vfio.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index f8fb9ab25188..d9a0770e5fc1 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -62,6 +62,7 @@ struct vfio_device {
struct iommufd_device *iommufd_device;
bool iommufd_attached;
#endif
+ bool cdev_opened;
};
/**
@@ -151,6 +152,12 @@ vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id)
((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
#endif
+static inline bool vfio_device_cdev_opened(struct vfio_device *device)
+{
+ lockdep_assert_held(&device->dev_set->lock);
+ return device->cdev_opened;
+}
+
/**
* @migration_set_state: Optional callback to change the migration state for
* devices that support migration. It's mandatory for
--
2.34.1
^ permalink raw reply related [flat|nested] 142+ messages in thread* Re: [PATCH v3 10/12] vfio: Mark cdev usage in vfio_device
2023-04-01 14:44 ` [PATCH v3 10/12] vfio: Mark cdev usage in vfio_device Yi Liu
@ 2023-04-05 11:48 ` Eric Auger
2023-04-21 7:06 ` Liu, Yi L
0 siblings, 1 reply; 142+ messages in thread
From: Eric Auger @ 2023-04-05 11:48 UTC (permalink / raw)
To: Yi Liu, alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang
On 4/1/23 16:44, Yi Liu wrote:
> There are users that need to check if vfio_device is opened as cdev.
> e.g. vfio-pci. This adds a flag in vfio_device, it will be set in the
> cdev path when device is opened. This is not used at this moment, but
> a preparation for vfio device cdev support.
better to squash this patch with the patch setting cdev_opened then?
Thanks
Eric
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> include/linux/vfio.h | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index f8fb9ab25188..d9a0770e5fc1 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -62,6 +62,7 @@ struct vfio_device {
> struct iommufd_device *iommufd_device;
> bool iommufd_attached;
> #endif
> + bool cdev_opened;
> };
>
> /**
> @@ -151,6 +152,12 @@ vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id)
> ((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
> #endif
>
> +static inline bool vfio_device_cdev_opened(struct vfio_device *device)
> +{
> + lockdep_assert_held(&device->dev_set->lock);
> + return device->cdev_opened;
> +}
> +
> /**
> * @migration_set_state: Optional callback to change the migration state for
> * devices that support migration. It's mandatory for
^ permalink raw reply [flat|nested] 142+ messages in thread* RE: [PATCH v3 10/12] vfio: Mark cdev usage in vfio_device
2023-04-05 11:48 ` Eric Auger
@ 2023-04-21 7:06 ` Liu, Yi L
0 siblings, 0 replies; 142+ messages in thread
From: Liu, Yi L @ 2023-04-21 7:06 UTC (permalink / raw)
To: eric.auger@redhat.com, alex.williamson@redhat.com, jgg@nvidia.com,
Tian, Kevin
Cc: joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Wednesday, April 5, 2023 7:48 PM
>
> On 4/1/23 16:44, Yi Liu wrote:
> > There are users that need to check if vfio_device is opened as cdev.
> > e.g. vfio-pci. This adds a flag in vfio_device, it will be set in the
> > cdev path when device is opened. This is not used at this moment, but
> > a preparation for vfio device cdev support.
>
> better to squash this patch with the patch setting cdev_opened then?
But that would be in the cdev series. Maybe only add this helper to
return false and add the cdev_opened in below patch. Will this be
better?
https://lore.kernel.org/kvm/20230401151833.124749-23-yi.l.liu@intel.com/
> Thanks
>
> Eric
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> > include/linux/vfio.h | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index f8fb9ab25188..d9a0770e5fc1 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -62,6 +62,7 @@ struct vfio_device {
> > struct iommufd_device *iommufd_device;
> > bool iommufd_attached;
> > #endif
> > + bool cdev_opened;
> > };
> >
> > /**
> > @@ -151,6 +152,12 @@ vfio_iommufd_physical_devid(struct vfio_device *vdev,
> u32 *id)
> > ((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
> > #endif
> >
> > +static inline bool vfio_device_cdev_opened(struct vfio_device *device)
> > +{
> > + lockdep_assert_held(&device->dev_set->lock);
> > + return device->cdev_opened;
> > +}
> > +
> > /**
> > * @migration_set_state: Optional callback to change the migration state for
> > * devices that support migration. It's mandatory for
^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH v3 11/12] iommufd: Define IOMMUFD_INVALID_ID in uapi
2023-04-01 14:44 [PATCH v3 00/12] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
` (9 preceding siblings ...)
2023-04-01 14:44 ` [PATCH v3 10/12] vfio: Mark cdev usage in vfio_device Yi Liu
@ 2023-04-01 14:44 ` Yi Liu
2023-04-04 21:00 ` Alex Williamson
2023-04-05 11:46 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO Yi Liu
11 siblings, 2 replies; 142+ messages in thread
From: Yi Liu @ 2023-04-01 14:44 UTC (permalink / raw)
To: alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
as there are IOMMUFD users that want to know check if an ID generated
by IOMMUFD is valid or not. e.g. vfio-pci optionaly returns invalid
dev_id to user in the VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. User
needs to check if the ID is valid or not.
IOMMUFD_INVALID_ID is defined as 0 since the IDs generated by IOMMUFD
starts from 0.
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
include/uapi/linux/iommufd.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 98ebba80cfa1..aeae73a93833 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -9,6 +9,9 @@
#define IOMMUFD_TYPE (';')
+/* IDs allocated by IOMMUFD starts from 0 */
+#define IOMMUFD_INVALID_ID 0
+
/**
* DOC: General ioctl format
*
--
2.34.1
^ permalink raw reply related [flat|nested] 142+ messages in thread* Re: [PATCH v3 11/12] iommufd: Define IOMMUFD_INVALID_ID in uapi
2023-04-01 14:44 ` [PATCH v3 11/12] iommufd: Define IOMMUFD_INVALID_ID in uapi Yi Liu
@ 2023-04-04 21:00 ` Alex Williamson
2023-04-05 9:31 ` Liu, Yi L
2023-04-05 11:46 ` Eric Auger
1 sibling, 1 reply; 142+ messages in thread
From: Alex Williamson @ 2023-04-04 21:00 UTC (permalink / raw)
To: Yi Liu
Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger, nicolinc,
kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
On Sat, 1 Apr 2023 07:44:28 -0700
Yi Liu <yi.l.liu@intel.com> wrote:
> as there are IOMMUFD users that want to know check if an ID generated
> by IOMMUFD is valid or not. e.g. vfio-pci optionaly returns invalid
> dev_id to user in the VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. User
> needs to check if the ID is valid or not.
>
> IOMMUFD_INVALID_ID is defined as 0 since the IDs generated by IOMMUFD
> starts from 0.
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> include/uapi/linux/iommufd.h | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index 98ebba80cfa1..aeae73a93833 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -9,6 +9,9 @@
>
> #define IOMMUFD_TYPE (';')
>
> +/* IDs allocated by IOMMUFD starts from 0 */
> +#define IOMMUFD_INVALID_ID 0
> +
> /**
> * DOC: General ioctl format
> *
If allocation "starts from 0" then 0 is a valid id, no? Does allocation
start from 1, ie. skip 0? Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread* RE: [PATCH v3 11/12] iommufd: Define IOMMUFD_INVALID_ID in uapi
2023-04-04 21:00 ` Alex Williamson
@ 2023-04-05 9:31 ` Liu, Yi L
2023-04-05 15:13 ` Alex Williamson
0 siblings, 1 reply; 142+ messages in thread
From: Liu, Yi L @ 2023-04-05 9:31 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, April 5, 2023 5:01 AM
>
> On Sat, 1 Apr 2023 07:44:28 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
>
> > as there are IOMMUFD users that want to know check if an ID generated
> > by IOMMUFD is valid or not. e.g. vfio-pci optionaly returns invalid
> > dev_id to user in the VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. User
> > needs to check if the ID is valid or not.
> >
> > IOMMUFD_INVALID_ID is defined as 0 since the IDs generated by IOMMUFD
> > starts from 0.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> > include/uapi/linux/iommufd.h | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> > index 98ebba80cfa1..aeae73a93833 100644
> > --- a/include/uapi/linux/iommufd.h
> > +++ b/include/uapi/linux/iommufd.h
> > @@ -9,6 +9,9 @@
> >
> > #define IOMMUFD_TYPE (';')
> >
> > +/* IDs allocated by IOMMUFD starts from 0 */
> > +#define IOMMUFD_INVALID_ID 0
> > +
> > /**
> > * DOC: General ioctl format
> > *
>
> If allocation "starts from 0" then 0 is a valid id, no? Does allocation
> start from 1, ie. skip 0? Thanks,
yes, it starts from 1, that's why we can use 0 as invalid id.
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread* Re: [PATCH v3 11/12] iommufd: Define IOMMUFD_INVALID_ID in uapi
2023-04-05 9:31 ` Liu, Yi L
@ 2023-04-05 15:13 ` Alex Williamson
2023-04-05 15:17 ` Liu, Yi L
0 siblings, 1 reply; 142+ messages in thread
From: Alex Williamson @ 2023-04-05 15:13 UTC (permalink / raw)
To: Liu, Yi L
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Wed, 5 Apr 2023 09:31:39 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Wednesday, April 5, 2023 5:01 AM
> >
> > On Sat, 1 Apr 2023 07:44:28 -0700
> > Yi Liu <yi.l.liu@intel.com> wrote:
> >
> > > as there are IOMMUFD users that want to know check if an ID generated
> > > by IOMMUFD is valid or not. e.g. vfio-pci optionaly returns invalid
> > > dev_id to user in the VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. User
> > > needs to check if the ID is valid or not.
> > >
> > > IOMMUFD_INVALID_ID is defined as 0 since the IDs generated by IOMMUFD
> > > starts from 0.
> > >
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > > include/uapi/linux/iommufd.h | 3 +++
> > > 1 file changed, 3 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> > > index 98ebba80cfa1..aeae73a93833 100644
> > > --- a/include/uapi/linux/iommufd.h
> > > +++ b/include/uapi/linux/iommufd.h
> > > @@ -9,6 +9,9 @@
> > >
> > > #define IOMMUFD_TYPE (';')
> > >
> > > +/* IDs allocated by IOMMUFD starts from 0 */
> > > +#define IOMMUFD_INVALID_ID 0
> > > +
> > > /**
> > > * DOC: General ioctl format
> > > *
> >
> > If allocation "starts from 0" then 0 is a valid id, no? Does allocation
> > start from 1, ie. skip 0? Thanks,
>
> yes, it starts from 1, that's why we can use 0 as invalid id.
So the comment is wrong, correct?
^ permalink raw reply [flat|nested] 142+ messages in thread* RE: [PATCH v3 11/12] iommufd: Define IOMMUFD_INVALID_ID in uapi
2023-04-05 15:13 ` Alex Williamson
@ 2023-04-05 15:17 ` Liu, Yi L
0 siblings, 0 replies; 142+ messages in thread
From: Liu, Yi L @ 2023-04-05 15:17 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, April 5, 2023 11:13 PM
>
> On Wed, 5 Apr 2023 09:31:39 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
>
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Wednesday, April 5, 2023 5:01 AM
> > >
> > > On Sat, 1 Apr 2023 07:44:28 -0700
> > > Yi Liu <yi.l.liu@intel.com> wrote:
> > >
> > > > as there are IOMMUFD users that want to know check if an ID generated
> > > > by IOMMUFD is valid or not. e.g. vfio-pci optionaly returns invalid
> > > > dev_id to user in the VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. User
> > > > needs to check if the ID is valid or not.
> > > >
> > > > IOMMUFD_INVALID_ID is defined as 0 since the IDs generated by IOMMUFD
> > > > starts from 0.
> > > >
> > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > ---
> > > > include/uapi/linux/iommufd.h | 3 +++
> > > > 1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> > > > index 98ebba80cfa1..aeae73a93833 100644
> > > > --- a/include/uapi/linux/iommufd.h
> > > > +++ b/include/uapi/linux/iommufd.h
> > > > @@ -9,6 +9,9 @@
> > > >
> > > > #define IOMMUFD_TYPE (';')
> > > >
> > > > +/* IDs allocated by IOMMUFD starts from 0 */
> > > > +#define IOMMUFD_INVALID_ID 0
> > > > +
> > > > /**
> > > > * DOC: General ioctl format
> > > > *
> > >
> > > If allocation "starts from 0" then 0 is a valid id, no? Does allocation
> > > start from 1, ie. skip 0? Thanks,
> >
> > yes, it starts from 1, that's why we can use 0 as invalid id.
>
> So the comment is wrong, correct?
yes.
Regards
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 11/12] iommufd: Define IOMMUFD_INVALID_ID in uapi
2023-04-01 14:44 ` [PATCH v3 11/12] iommufd: Define IOMMUFD_INVALID_ID in uapi Yi Liu
2023-04-04 21:00 ` Alex Williamson
@ 2023-04-05 11:46 ` Eric Auger
1 sibling, 0 replies; 142+ messages in thread
From: Eric Auger @ 2023-04-05 11:46 UTC (permalink / raw)
To: Yi Liu, alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang
Hi Yi
On 4/1/23 16:44, Yi Liu wrote:
> as there are IOMMUFD users that want to know check if an ID generated
s/want to know check/ need to check
which type of ID?
> by IOMMUFD is valid or not. e.g. vfio-pci optionaly returns invalid
optionally
> dev_id to user in the VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. User
> needs to check if the ID is valid or not.
so dev id ...
>
> IOMMUFD_INVALID_ID is defined as 0 since the IDs generated by IOMMUFD
> starts from 0.
from 1, same as below
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> include/uapi/linux/iommufd.h | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index 98ebba80cfa1..aeae73a93833 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -9,6 +9,9 @@
>
> #define IOMMUFD_TYPE (';')
>
> +/* IDs allocated by IOMMUFD starts from 0 */
ditto
> +#define IOMMUFD_INVALID_ID 0
> +
> /**
> * DOC: General ioctl format
> *
Eric
^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-01 14:44 [PATCH v3 00/12] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
` (10 preceding siblings ...)
2023-04-01 14:44 ` [PATCH v3 11/12] iommufd: Define IOMMUFD_INVALID_ID in uapi Yi Liu
@ 2023-04-01 14:44 ` Yi Liu
2023-04-03 9:25 ` Liu, Yi L
` (2 more replies)
11 siblings, 3 replies; 142+ messages in thread
From: Yi Liu @ 2023-04-01 14:44 UTC (permalink / raw)
To: alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
for the users that accept device fds passed from management stacks to be
able to figure out the host reset affected devices among the devices
opened by the user. This is needed as such users do not have BDF (bus,
devfn) knowledge about the devices it has opened, hence unable to use
the information reported by existing VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
to figure out the affected devices.
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
drivers/vfio/pci/vfio_pci_core.c | 58 ++++++++++++++++++++++++++++----
include/uapi/linux/vfio.h | 24 ++++++++++++-
2 files changed, 74 insertions(+), 8 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 19f5b075d70a..a5a7e148dce1 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -30,6 +30,7 @@
#if IS_ENABLED(CONFIG_EEH)
#include <asm/eeh.h>
#endif
+#include <uapi/linux/iommufd.h>
#include "vfio_pci_priv.h"
@@ -767,6 +768,20 @@ static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_typ
return 0;
}
+static struct vfio_device *
+vfio_pci_find_device_in_devset(struct vfio_device_set *dev_set,
+ struct pci_dev *pdev)
+{
+ struct vfio_device *cur;
+
+ lockdep_assert_held(&dev_set->lock);
+
+ list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
+ if (cur->dev == &pdev->dev)
+ return cur;
+ return NULL;
+}
+
static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
{
(*(int *)data)++;
@@ -776,13 +791,20 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
struct vfio_pci_fill_info {
int max;
int cur;
+ bool require_devid;
+ struct iommufd_ctx *iommufd;
+ struct vfio_device_set *dev_set;
struct vfio_pci_dependent_device *devices;
};
static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
{
struct vfio_pci_fill_info *fill = data;
+ struct vfio_device_set *dev_set = fill->dev_set;
struct iommu_group *iommu_group;
+ struct vfio_device *vdev;
+
+ lockdep_assert_held(&dev_set->lock);
if (fill->cur == fill->max)
return -EAGAIN; /* Something changed, try again */
@@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
if (!iommu_group)
return -EPERM; /* Cannot reset non-isolated devices */
- fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
+ if (fill->require_devid) {
+ /*
+ * Report dev_id of the devices that are opened as cdev
+ * and have the same iommufd with the fill->iommufd.
+ * Otherwise, just fill IOMMUFD_INVALID_ID.
+ */
+ vdev = vfio_pci_find_device_in_devset(dev_set, pdev);
+ if (vdev && vfio_device_cdev_opened(vdev) &&
+ fill->iommufd == vfio_iommufd_physical_ictx(vdev))
+ vfio_iommufd_physical_devid(vdev, &fill->devices[fill->cur].dev_id);
+ else
+ fill->devices[fill->cur].dev_id = IOMMUFD_INVALID_ID;
+ } else {
+ fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
+ }
fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
fill->devices[fill->cur].bus = pdev->bus->number;
fill->devices[fill->cur].devfn = pdev->devfn;
@@ -1230,17 +1266,27 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
return -ENOMEM;
fill.devices = devices;
+ fill.dev_set = vdev->vdev.dev_set;
+ mutex_lock(&vdev->vdev.dev_set->lock);
+ if (vfio_device_cdev_opened(&vdev->vdev)) {
+ fill.require_devid = true;
+ fill.iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
+ }
ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
&fill, slot);
+ mutex_unlock(&vdev->vdev.dev_set->lock);
/*
* If a device was removed between counting and filling, we may come up
* short of fill.max. If a device was added, we'll have a return of
* -EAGAIN above.
*/
- if (!ret)
+ if (!ret) {
hdr.count = fill.cur;
+ if (fill.require_devid)
+ hdr.flags = VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID;
+ }
reset_info_exit:
if (copy_to_user(arg, &hdr, minsz))
@@ -2346,12 +2392,10 @@ static bool vfio_dev_in_files(struct vfio_pci_core_device *vdev,
static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data)
{
struct vfio_device_set *dev_set = data;
- struct vfio_device *cur;
- list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
- if (cur->dev == &pdev->dev)
- return 0;
- return -EBUSY;
+ lockdep_assert_held(&dev_set->lock);
+
+ return vfio_pci_find_device_in_devset(dev_set, pdev) ? 0 : -EBUSY;
}
/*
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 25432ef213ee..5a34364e3b94 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -650,11 +650,32 @@ enum {
* VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
* struct vfio_pci_hot_reset_info)
*
+ * This command is used to query the affected devices in the hot reset for
+ * a given device. User could use the information reported by this command
+ * to figure out the affected devices among the devices it has opened.
+ * This command always reports the segment, bus and devfn information for
+ * each affected device, and selectively report the group_id or the dev_id
+ * per the way how the device being queried is opened.
+ * - If the device is opened via the traditional group/container manner,
+ * this command reports the group_id for each affected device.
+ *
+ * - If the device is opened as a cdev, this command needs to report
+ * dev_id for each affected device and set the
+ * VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID flag. For the affected
+ * devices that are not opened as cdev or bound to different iommufds
+ * with the device that is queried, report an invalid dev_id to avoid
+ * potential dev_id conflict as dev_id is local to iommufd. For such
+ * affected devices, user shall fall back to use the segment, bus and
+ * devfn info to map it to opened device.
+ *
* Return: 0 on success, -errno on failure:
* -enospc = insufficient buffer, -enodev = unsupported for device.
*/
struct vfio_pci_dependent_device {
- __u32 group_id;
+ union {
+ __u32 group_id;
+ __u32 dev_id;
+ };
__u16 segment;
__u8 bus;
__u8 devfn; /* Use PCI_SLOT/PCI_FUNC */
@@ -663,6 +684,7 @@ struct vfio_pci_dependent_device {
struct vfio_pci_hot_reset_info {
__u32 argsz;
__u32 flags;
+#define VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID (1 << 0)
__u32 count;
struct vfio_pci_dependent_device devices[];
};
--
2.34.1
^ permalink raw reply related [flat|nested] 142+ messages in thread* RE: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-01 14:44 ` [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO Yi Liu
@ 2023-04-03 9:25 ` Liu, Yi L
2023-04-03 15:01 ` Alex Williamson
2023-04-04 22:20 ` Alex Williamson
2023-04-05 12:19 ` Eric Auger
2 siblings, 1 reply; 142+ messages in thread
From: Liu, Yi L @ 2023-04-03 9:25 UTC (permalink / raw)
To: alex.williamson@redhat.com, jgg@nvidia.com, Tian, Kevin
Cc: joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
eric.auger@redhat.com, nicolinc@nvidia.com, kvm@vger.kernel.org,
mjrosato@linux.ibm.com, chao.p.peng@linux.intel.com,
yi.y.sun@linux.intel.com, peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Saturday, April 1, 2023 10:44 PM
> @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> if (!iommu_group)
> return -EPERM; /* Cannot reset non-isolated devices */
Hi Alex,
Is disabling iommu a sane way to test vfio noiommu mode? If no, just skip
the below contents. 😊 If yes, then may need to check if below is expected.
I added intel_iommu=off to disable intel iommu and bind a device to vfio-pci.
I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-vfio0. Bind
iommufd==-1 can succeed, but failed to get hot reset info due to the above
group check. Reason is that this happens to have some affected devices, and
these devices have no valid iommu_group (because they are not bound to vfio-pci
hence nobody allocates noiommu group for them). So when hot reset info loops
such devices, it failed with -EPERM. Is this expected?
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-03 9:25 ` Liu, Yi L
@ 2023-04-03 15:01 ` Alex Williamson
2023-04-03 15:22 ` Liu, Yi L
2023-04-07 10:09 ` Liu, Yi L
0 siblings, 2 replies; 142+ messages in thread
From: Alex Williamson @ 2023-04-03 15:01 UTC (permalink / raw)
To: Liu, Yi L
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Mon, 3 Apr 2023 09:25:06 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Saturday, April 1, 2023 10:44 PM
>
> > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> > if (!iommu_group)
> > return -EPERM; /* Cannot reset non-isolated devices */
>
> Hi Alex,
>
> Is disabling iommu a sane way to test vfio noiommu mode?
Yes
> I added intel_iommu=off to disable intel iommu and bind a device to vfio-pci.
> I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-vfio0. Bind
> iommufd==-1 can succeed, but failed to get hot reset info due to the above
> group check. Reason is that this happens to have some affected devices, and
> these devices have no valid iommu_group (because they are not bound to vfio-pci
> hence nobody allocates noiommu group for them). So when hot reset info loops
> such devices, it failed with -EPERM. Is this expected?
Hmm, I didn't recall that we put in such a limitation, but given the
minimally intrusive approach to no-iommu and the fact that we never
defined an invalid group ID to return to the user, it makes sense that
we just blocked the ioctl for no-iommu use. I guess we can do the same
for no-iommu cdev.
BTW, what does this series apply on? I'm assuming[1], but I don't see
a branch from Jason yet. Thanks,
Alex
[1]https://lore.kernel.org/all/20230327093351.44505-1-yi.l.liu@intel.com/
^ permalink raw reply [flat|nested] 142+ messages in thread
* RE: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-03 15:01 ` Alex Williamson
@ 2023-04-03 15:22 ` Liu, Yi L
2023-04-03 15:32 ` Alex Williamson
2023-04-07 10:09 ` Liu, Yi L
1 sibling, 1 reply; 142+ messages in thread
From: Liu, Yi L @ 2023-04-03 15:22 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Monday, April 3, 2023 11:02 PM
>
> On Mon, 3 Apr 2023 09:25:06 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
>
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Saturday, April 1, 2023 10:44 PM
> >
> > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void
> *data)
> > > if (!iommu_group)
> > > return -EPERM; /* Cannot reset non-isolated devices */
> >
> > Hi Alex,
> >
> > Is disabling iommu a sane way to test vfio noiommu mode?
>
> Yes
>
> > I added intel_iommu=off to disable intel iommu and bind a device to vfio-pci.
> > I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-vfio0. Bind
> > iommufd==-1 can succeed, but failed to get hot reset info due to the above
> > group check. Reason is that this happens to have some affected devices, and
> > these devices have no valid iommu_group (because they are not bound to vfio-pci
> > hence nobody allocates noiommu group for them). So when hot reset info loops
> > such devices, it failed with -EPERM. Is this expected?
>
> Hmm, I didn't recall that we put in such a limitation, but given the
> minimally intrusive approach to no-iommu and the fact that we never
> defined an invalid group ID to return to the user, it makes sense that
> we just blocked the ioctl for no-iommu use. I guess we can do the same
> for no-iommu cdev.
sure.
>
> BTW, what does this series apply on? I'm assuming[1], but I don't see
> a branch from Jason yet. Thanks,
yes, this series is applied on [1]. I put the [1], this series and cdev series
in https://github.com/yiliu1765/iommufd/commits/vfio_device_cdev_v9.
Jason has taken [1] in the below branch. It is based on rc1. So I hesitated
to apply this series and cdev series on top of it. Maybe I should have done
it to make life easier. 😊
https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/log/?h=for-next
> Alex
>
> [1]https://lore.kernel.org/all/20230327093351.44505-1-yi.l.liu@intel.com/
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-03 15:22 ` Liu, Yi L
@ 2023-04-03 15:32 ` Alex Williamson
2023-04-03 16:12 ` Jason Gunthorpe
0 siblings, 1 reply; 142+ messages in thread
From: Alex Williamson @ 2023-04-03 15:32 UTC (permalink / raw)
To: Liu, Yi L
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Mon, 3 Apr 2023 15:22:03 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Monday, April 3, 2023 11:02 PM
> >
> > On Mon, 3 Apr 2023 09:25:06 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Saturday, April 1, 2023 10:44 PM
> > >
> > > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void
> > *data)
> > > > if (!iommu_group)
> > > > return -EPERM; /* Cannot reset non-isolated devices */
> > >
> > > Hi Alex,
> > >
> > > Is disabling iommu a sane way to test vfio noiommu mode?
> >
> > Yes
> >
> > > I added intel_iommu=off to disable intel iommu and bind a device to vfio-pci.
> > > I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-vfio0. Bind
> > > iommufd==-1 can succeed, but failed to get hot reset info due to the above
> > > group check. Reason is that this happens to have some affected devices, and
> > > these devices have no valid iommu_group (because they are not bound to vfio-pci
> > > hence nobody allocates noiommu group for them). So when hot reset info loops
> > > such devices, it failed with -EPERM. Is this expected?
> >
> > Hmm, I didn't recall that we put in such a limitation, but given the
> > minimally intrusive approach to no-iommu and the fact that we never
> > defined an invalid group ID to return to the user, it makes sense that
> > we just blocked the ioctl for no-iommu use. I guess we can do the same
> > for no-iommu cdev.
>
> sure.
>
> >
> > BTW, what does this series apply on? I'm assuming[1], but I don't see
> > a branch from Jason yet. Thanks,
>
> yes, this series is applied on [1]. I put the [1], this series and cdev series
> in https://github.com/yiliu1765/iommufd/commits/vfio_device_cdev_v9.
>
> Jason has taken [1] in the below branch. It is based on rc1. So I hesitated
> to apply this series and cdev series on top of it. Maybe I should have done
> it to make life easier. 😊
>
> https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/log/?h=for-next
Seems like it must be in the vfio_mdev_ops branch which has not been
pushed aside from the merge back to for-next. Jason? Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-03 15:32 ` Alex Williamson
@ 2023-04-03 16:12 ` Jason Gunthorpe
0 siblings, 0 replies; 142+ messages in thread
From: Jason Gunthorpe @ 2023-04-03 16:12 UTC (permalink / raw)
To: Alex Williamson
Cc: Liu, Yi L, Tian, Kevin, joro@8bytes.org, robin.murphy@arm.com,
cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Mon, Apr 03, 2023 at 09:32:18AM -0600, Alex Williamson wrote:
> > yes, this series is applied on [1]. I put the [1], this series and cdev series
> > in https://github.com/yiliu1765/iommufd/commits/vfio_device_cdev_v9.
> >
> > Jason has taken [1] in the below branch. It is based on rc1. So I hesitated
> > to apply this series and cdev series on top of it. Maybe I should have done
> > it to make life easier. 😊
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/log/?h=for-next
>
> Seems like it must be in the vfio_mdev_ops branch which has not been
> pushed aside from the merge back to for-next. Jason? Thanks,
Yeah, I didn't think we'd need it until we got to the cdev series, let
me do the steps..
Jason
^ permalink raw reply [flat|nested] 142+ messages in thread
* RE: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-03 15:01 ` Alex Williamson
2023-04-03 15:22 ` Liu, Yi L
@ 2023-04-07 10:09 ` Liu, Yi L
2023-04-07 12:03 ` Alex Williamson
1 sibling, 1 reply; 142+ messages in thread
From: Liu, Yi L @ 2023-04-07 10:09 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
Hi Alex,
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Monday, April 3, 2023 11:02 PM
>
> On Mon, 3 Apr 2023 09:25:06 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
>
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Saturday, April 1, 2023 10:44 PM
> >
> > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void
> *data)
> > > if (!iommu_group)
> > > return -EPERM; /* Cannot reset non-isolated devices */
> >
> > Hi Alex,
> >
> > Is disabling iommu a sane way to test vfio noiommu mode?
>
> Yes
>
> > I added intel_iommu=off to disable intel iommu and bind a device to vfio-pci.
> > I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-vfio0. Bind
> > iommufd==-1 can succeed, but failed to get hot reset info due to the above
> > group check. Reason is that this happens to have some affected devices, and
> > these devices have no valid iommu_group (because they are not bound to vfio-pci
> > hence nobody allocates noiommu group for them). So when hot reset info loops
> > such devices, it failed with -EPERM. Is this expected?
>
> Hmm, I didn't recall that we put in such a limitation, but given the
> minimally intrusive approach to no-iommu and the fact that we never
> defined an invalid group ID to return to the user, it makes sense that
> we just blocked the ioctl for no-iommu use. I guess we can do the same
> for no-iommu cdev.
I just realize a further issue related to this limitation. Remember that we
may finally compile out the vfio group infrastructure in the future. Say I
want to test noiommu, I may boot such a kernel with iommu disabled. I think
the _INFO ioctl would fail as there is no iommu_group. Does it mean we will
not support hot reset for noiommu in future if vfio group infrastructure is
compiled out?
As another thread, we are going to add a new bdf/group capability to
DEVICE_GET_INFO. If the above kernel is booted, shall we exclude the new
bdf/group capability or add a flag in the capability to mark the group_id
is invalid?
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-07 10:09 ` Liu, Yi L
@ 2023-04-07 12:03 ` Alex Williamson
2023-04-07 13:24 ` Liu, Yi L
2023-04-11 6:16 ` Liu, Yi L
0 siblings, 2 replies; 142+ messages in thread
From: Alex Williamson @ 2023-04-07 12:03 UTC (permalink / raw)
To: Liu, Yi L
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Fri, 7 Apr 2023 10:09:58 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:
> Hi Alex,
>
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Monday, April 3, 2023 11:02 PM
> >
> > On Mon, 3 Apr 2023 09:25:06 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Saturday, April 1, 2023 10:44 PM
> > >
> > > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void
> > *data)
> > > > if (!iommu_group)
> > > > return -EPERM; /* Cannot reset non-isolated devices */
> > >
> > > Hi Alex,
> > >
> > > Is disabling iommu a sane way to test vfio noiommu mode?
> >
> > Yes
> >
> > > I added intel_iommu=off to disable intel iommu and bind a device to vfio-pci.
> > > I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-vfio0. Bind
> > > iommufd==-1 can succeed, but failed to get hot reset info due to the above
> > > group check. Reason is that this happens to have some affected devices, and
> > > these devices have no valid iommu_group (because they are not bound to vfio-pci
> > > hence nobody allocates noiommu group for them). So when hot reset info loops
> > > such devices, it failed with -EPERM. Is this expected?
> >
> > Hmm, I didn't recall that we put in such a limitation, but given the
> > minimally intrusive approach to no-iommu and the fact that we never
> > defined an invalid group ID to return to the user, it makes sense that
> > we just blocked the ioctl for no-iommu use. I guess we can do the same
> > for no-iommu cdev.
>
> I just realize a further issue related to this limitation. Remember that we
> may finally compile out the vfio group infrastructure in the future. Say I
> want to test noiommu, I may boot such a kernel with iommu disabled. I think
> the _INFO ioctl would fail as there is no iommu_group. Does it mean we will
> not support hot reset for noiommu in future if vfio group infrastructure is
> compiled out?
We're talking about IOMMU groups, IOMMU groups are always present
regardless of whether we expose a vfio group interface to userspace.
Remember, we create IOMMU groups even in the no-iommu case. Even with
pure cdev, there are underlying IOMMU groups that maintain the DMA
ownership.
> As another thread, we are going to add a new bdf/group capability to
> DEVICE_GET_INFO. If the above kernel is booted, shall we exclude the new
> bdf/group capability or add a flag in the capability to mark the group_id
> is invalid?
As above, there's always an IOMMU group, it's never invalid. Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread
* RE: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-07 12:03 ` Alex Williamson
@ 2023-04-07 13:24 ` Liu, Yi L
2023-04-07 13:51 ` Alex Williamson
2023-04-11 6:16 ` Liu, Yi L
1 sibling, 1 reply; 142+ messages in thread
From: Liu, Yi L @ 2023-04-07 13:24 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, April 7, 2023 8:04 PM
>
> > > > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void
> > > *data)
> > > > > if (!iommu_group)
> > > > > return -EPERM; /* Cannot reset non-isolated devices */
[1]
> > > >
> > > > Hi Alex,
> > > >
> > > > Is disabling iommu a sane way to test vfio noiommu mode?
> > >
> > > Yes
> > >
> > > > I added intel_iommu=off to disable intel iommu and bind a device to vfio-pci.
> > > > I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-vfio0. Bind
> > > > iommufd==-1 can succeed, but failed to get hot reset info due to the above
> > > > group check. Reason is that this happens to have some affected devices, and
> > > > these devices have no valid iommu_group (because they are not bound to vfio-
> pci
> > > > hence nobody allocates noiommu group for them). So when hot reset info loops
> > > > such devices, it failed with -EPERM. Is this expected?
> > >
> > > Hmm, I didn't recall that we put in such a limitation, but given the
> > > minimally intrusive approach to no-iommu and the fact that we never
> > > defined an invalid group ID to return to the user, it makes sense that
> > > we just blocked the ioctl for no-iommu use. I guess we can do the same
> > > for no-iommu cdev.
> >
> > I just realize a further issue related to this limitation. Remember that we
> > may finally compile out the vfio group infrastructure in the future. Say I
> > want to test noiommu, I may boot such a kernel with iommu disabled. I think
> > the _INFO ioctl would fail as there is no iommu_group. Does it mean we will
> > not support hot reset for noiommu in future if vfio group infrastructure is
> > compiled out?
>
> We're talking about IOMMU groups, IOMMU groups are always present
> regardless of whether we expose a vfio group interface to userspace.
> Remember, we create IOMMU groups even in the no-iommu case. Even with
> pure cdev, there are underlying IOMMU groups that maintain the DMA
> ownership.
hmmm. As [1], when iommu is disabled, there will be no iommu_group for a
given device unless it is registered to VFIO, which a fake group is created.
That's why I hit the limitation [1]. When vfio_group is compiled out, then
even fake group goes away.
>
> > As another thread, we are going to add a new bdf/group capability to
> > DEVICE_GET_INFO. If the above kernel is booted, shall we exclude the new
> > bdf/group capability or add a flag in the capability to mark the group_id
> > is invalid?
>
> As above, there's always an IOMMU group, it's never invalid. Thanks,
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-07 13:24 ` Liu, Yi L
@ 2023-04-07 13:51 ` Alex Williamson
2023-04-07 14:04 ` Liu, Yi L
0 siblings, 1 reply; 142+ messages in thread
From: Alex Williamson @ 2023-04-07 13:51 UTC (permalink / raw)
To: Liu, Yi L
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Fri, 7 Apr 2023 13:24:25 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Friday, April 7, 2023 8:04 PM
> >
> > > > > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void
> > > > *data)
> > > > > > if (!iommu_group)
> > > > > > return -EPERM; /* Cannot reset non-isolated devices */
>
> [1]
>
> > > > >
> > > > > Hi Alex,
> > > > >
> > > > > Is disabling iommu a sane way to test vfio noiommu mode?
> > > >
> > > > Yes
> > > >
> > > > > I added intel_iommu=off to disable intel iommu and bind a device to vfio-pci.
> > > > > I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-vfio0. Bind
> > > > > iommufd==-1 can succeed, but failed to get hot reset info due to the above
> > > > > group check. Reason is that this happens to have some affected devices, and
> > > > > these devices have no valid iommu_group (because they are not bound to vfio-
> > pci
> > > > > hence nobody allocates noiommu group for them). So when hot reset info loops
> > > > > such devices, it failed with -EPERM. Is this expected?
> > > >
> > > > Hmm, I didn't recall that we put in such a limitation, but given the
> > > > minimally intrusive approach to no-iommu and the fact that we never
> > > > defined an invalid group ID to return to the user, it makes sense that
> > > > we just blocked the ioctl for no-iommu use. I guess we can do the same
> > > > for no-iommu cdev.
> > >
> > > I just realize a further issue related to this limitation. Remember that we
> > > may finally compile out the vfio group infrastructure in the future. Say I
> > > want to test noiommu, I may boot such a kernel with iommu disabled. I think
> > > the _INFO ioctl would fail as there is no iommu_group. Does it mean we will
> > > not support hot reset for noiommu in future if vfio group infrastructure is
> > > compiled out?
> >
> > We're talking about IOMMU groups, IOMMU groups are always present
> > regardless of whether we expose a vfio group interface to userspace.
> > Remember, we create IOMMU groups even in the no-iommu case. Even with
> > pure cdev, there are underlying IOMMU groups that maintain the DMA
> > ownership.
>
> hmmm. As [1], when iommu is disabled, there will be no iommu_group for a
> given device unless it is registered to VFIO, which a fake group is created.
> That's why I hit the limitation [1]. When vfio_group is compiled out, then
> even fake group goes away.
In the vfio group case, [1] can be hit with no-iommu only when there
are affected devices which are not bound to vfio. Why are we not
allocating an IOMMU group to no-iommu devices when vfio group is
disabled? Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread
* RE: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-07 13:51 ` Alex Williamson
@ 2023-04-07 14:04 ` Liu, Yi L
2023-04-07 15:14 ` Alex Williamson
0 siblings, 1 reply; 142+ messages in thread
From: Liu, Yi L @ 2023-04-07 14:04 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, April 7, 2023 9:52 PM
>
> On Fri, 7 Apr 2023 13:24:25 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
>
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Friday, April 7, 2023 8:04 PM
> > >
> > > > > > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev,
> void
> > > > > *data)
> > > > > > > if (!iommu_group)
> > > > > > > return -EPERM; /* Cannot reset non-isolated devices */
> >
> > [1]
> >
> > > > > >
> > > > > > Hi Alex,
> > > > > >
> > > > > > Is disabling iommu a sane way to test vfio noiommu mode?
> > > > >
> > > > > Yes
> > > > >
> > > > > > I added intel_iommu=off to disable intel iommu and bind a device to vfio-pci.
> > > > > > I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-vfio0.
> Bind
> > > > > > iommufd==-1 can succeed, but failed to get hot reset info due to the above
> > > > > > group check. Reason is that this happens to have some affected devices, and
> > > > > > these devices have no valid iommu_group (because they are not bound to
> vfio-
> > > pci
> > > > > > hence nobody allocates noiommu group for them). So when hot reset info
> loops
> > > > > > such devices, it failed with -EPERM. Is this expected?
> > > > >
> > > > > Hmm, I didn't recall that we put in such a limitation, but given the
> > > > > minimally intrusive approach to no-iommu and the fact that we never
> > > > > defined an invalid group ID to return to the user, it makes sense that
> > > > > we just blocked the ioctl for no-iommu use. I guess we can do the same
> > > > > for no-iommu cdev.
> > > >
> > > > I just realize a further issue related to this limitation. Remember that we
> > > > may finally compile out the vfio group infrastructure in the future. Say I
> > > > want to test noiommu, I may boot such a kernel with iommu disabled. I think
> > > > the _INFO ioctl would fail as there is no iommu_group. Does it mean we will
> > > > not support hot reset for noiommu in future if vfio group infrastructure is
> > > > compiled out?
> > >
> > > We're talking about IOMMU groups, IOMMU groups are always present
> > > regardless of whether we expose a vfio group interface to userspace.
> > > Remember, we create IOMMU groups even in the no-iommu case. Even with
> > > pure cdev, there are underlying IOMMU groups that maintain the DMA
> > > ownership.
> >
> > hmmm. As [1], when iommu is disabled, there will be no iommu_group for a
> > given device unless it is registered to VFIO, which a fake group is created.
> > That's why I hit the limitation [1]. When vfio_group is compiled out, then
> > even fake group goes away.
>
> In the vfio group case, [1] can be hit with no-iommu only when there
> are affected devices which are not bound to vfio.
yes. because vfio would allocate fake group when device is registered to
it.
> Why are we not
> allocating an IOMMU group to no-iommu devices when vfio group is
> disabled? Thanks,
hmmm. when the vfio group code is configured out. The
vfio_device_set_group() just returns 0 after below patch is
applied and CONFIG_VFIO_GROUP=n. So when there is no
vfio group, the fake group also goes away.
https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l.liu@intel.com/
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-07 14:04 ` Liu, Yi L
@ 2023-04-07 15:14 ` Alex Williamson
2023-04-07 15:47 ` Liu, Yi L
0 siblings, 1 reply; 142+ messages in thread
From: Alex Williamson @ 2023-04-07 15:14 UTC (permalink / raw)
To: Liu, Yi L
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Fri, 7 Apr 2023 14:04:02 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Friday, April 7, 2023 9:52 PM
> >
> > On Fri, 7 Apr 2023 13:24:25 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Friday, April 7, 2023 8:04 PM
> > > >
> > > > > > > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev,
> > void
> > > > > > *data)
> > > > > > > > if (!iommu_group)
> > > > > > > > return -EPERM; /* Cannot reset non-isolated devices */
> > >
> > > [1]
> > >
> > > > > > >
> > > > > > > Hi Alex,
> > > > > > >
> > > > > > > Is disabling iommu a sane way to test vfio noiommu mode?
> > > > > >
> > > > > > Yes
> > > > > >
> > > > > > > I added intel_iommu=off to disable intel iommu and bind a device to vfio-pci.
> > > > > > > I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-vfio0.
> > Bind
> > > > > > > iommufd==-1 can succeed, but failed to get hot reset info due to the above
> > > > > > > group check. Reason is that this happens to have some affected devices, and
> > > > > > > these devices have no valid iommu_group (because they are not bound to
> > vfio-
> > > > pci
> > > > > > > hence nobody allocates noiommu group for them). So when hot reset info
> > loops
> > > > > > > such devices, it failed with -EPERM. Is this expected?
> > > > > >
> > > > > > Hmm, I didn't recall that we put in such a limitation, but given the
> > > > > > minimally intrusive approach to no-iommu and the fact that we never
> > > > > > defined an invalid group ID to return to the user, it makes sense that
> > > > > > we just blocked the ioctl for no-iommu use. I guess we can do the same
> > > > > > for no-iommu cdev.
> > > > >
> > > > > I just realize a further issue related to this limitation. Remember that we
> > > > > may finally compile out the vfio group infrastructure in the future. Say I
> > > > > want to test noiommu, I may boot such a kernel with iommu disabled. I think
> > > > > the _INFO ioctl would fail as there is no iommu_group. Does it mean we will
> > > > > not support hot reset for noiommu in future if vfio group infrastructure is
> > > > > compiled out?
> > > >
> > > > We're talking about IOMMU groups, IOMMU groups are always present
> > > > regardless of whether we expose a vfio group interface to userspace.
> > > > Remember, we create IOMMU groups even in the no-iommu case. Even with
> > > > pure cdev, there are underlying IOMMU groups that maintain the DMA
> > > > ownership.
> > >
> > > hmmm. As [1], when iommu is disabled, there will be no iommu_group for a
> > > given device unless it is registered to VFIO, which a fake group is created.
> > > That's why I hit the limitation [1]. When vfio_group is compiled out, then
> > > even fake group goes away.
> >
> > In the vfio group case, [1] can be hit with no-iommu only when there
> > are affected devices which are not bound to vfio.
>
> yes. because vfio would allocate fake group when device is registered to
> it.
>
> > Why are we not
> > allocating an IOMMU group to no-iommu devices when vfio group is
> > disabled? Thanks,
>
> hmmm. when the vfio group code is configured out. The
> vfio_device_set_group() just returns 0 after below patch is
> applied and CONFIG_VFIO_GROUP=n. So when there is no
> vfio group, the fake group also goes away.
>
> https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l.liu@intel.com/
Is this a fundamental issue or just a problem with the current
implementation proposal? It seems like the latter. FWIW, I also don't
see a taint happening in the cdev path for no-iommu use. Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread
* RE: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-07 15:14 ` Alex Williamson
@ 2023-04-07 15:47 ` Liu, Yi L
2023-04-07 21:07 ` Alex Williamson
0 siblings, 1 reply; 142+ messages in thread
From: Liu, Yi L @ 2023-04-07 15:47 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, April 7, 2023 11:14 PM
>
> On Fri, 7 Apr 2023 14:04:02 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
>
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Friday, April 7, 2023 9:52 PM
> > >
> > > On Fri, 7 Apr 2023 13:24:25 +0000
> > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > >
> > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > Sent: Friday, April 7, 2023 8:04 PM
> > > > >
> > > > > > > > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev
> *pdev,
> > > void
> > > > > > > *data)
> > > > > > > > > if (!iommu_group)
> > > > > > > > > return -EPERM; /* Cannot reset non-isolated devices */
> > > >
> > > > [1]
> > > >
> > > > > > > >
> > > > > > > > Hi Alex,
> > > > > > > >
> > > > > > > > Is disabling iommu a sane way to test vfio noiommu mode?
> > > > > > >
> > > > > > > Yes
> > > > > > >
> > > > > > > > I added intel_iommu=off to disable intel iommu and bind a device to vfio-
> pci.
> > > > > > > > I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-vfio0.
> > > Bind
> > > > > > > > iommufd==-1 can succeed, but failed to get hot reset info due to the
> above
> > > > > > > > group check. Reason is that this happens to have some affected devices,
> and
> > > > > > > > these devices have no valid iommu_group (because they are not bound to
> > > vfio-
> > > > > pci
> > > > > > > > hence nobody allocates noiommu group for them). So when hot reset info
> > > loops
> > > > > > > > such devices, it failed with -EPERM. Is this expected?
> > > > > > >
> > > > > > > Hmm, I didn't recall that we put in such a limitation, but given the
> > > > > > > minimally intrusive approach to no-iommu and the fact that we never
> > > > > > > defined an invalid group ID to return to the user, it makes sense that
> > > > > > > we just blocked the ioctl for no-iommu use. I guess we can do the same
> > > > > > > for no-iommu cdev.
> > > > > >
> > > > > > I just realize a further issue related to this limitation. Remember that we
> > > > > > may finally compile out the vfio group infrastructure in the future. Say I
> > > > > > want to test noiommu, I may boot such a kernel with iommu disabled. I think
> > > > > > the _INFO ioctl would fail as there is no iommu_group. Does it mean we will
> > > > > > not support hot reset for noiommu in future if vfio group infrastructure is
> > > > > > compiled out?
> > > > >
> > > > > We're talking about IOMMU groups, IOMMU groups are always present
> > > > > regardless of whether we expose a vfio group interface to userspace.
> > > > > Remember, we create IOMMU groups even in the no-iommu case. Even with
> > > > > pure cdev, there are underlying IOMMU groups that maintain the DMA
> > > > > ownership.
> > > >
> > > > hmmm. As [1], when iommu is disabled, there will be no iommu_group for a
> > > > given device unless it is registered to VFIO, which a fake group is created.
> > > > That's why I hit the limitation [1]. When vfio_group is compiled out, then
> > > > even fake group goes away.
> > >
> > > In the vfio group case, [1] can be hit with no-iommu only when there
> > > are affected devices which are not bound to vfio.
> >
> > yes. because vfio would allocate fake group when device is registered to
> > it.
> >
> > > Why are we not
> > > allocating an IOMMU group to no-iommu devices when vfio group is
> > > disabled? Thanks,
> >
> > hmmm. when the vfio group code is configured out. The
> > vfio_device_set_group() just returns 0 after below patch is
> > applied and CONFIG_VFIO_GROUP=n. So when there is no
> > vfio group, the fake group also goes away.
> >
> > https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l.liu@intel.com/
>
> Is this a fundamental issue or just a problem with the current
> implementation proposal? It seems like the latter. FWIW, I also don't
> see a taint happening in the cdev path for no-iommu use. Thanks,
yes. the latter case. The reason I raised it here is to confirm the
policy on the new group/bdf capability in the DEVICE_GET_INFO. If
there is no iommu group, perhaps I only need to exclude the new
group/bdf capability from the cap chain of DEVICE_GET_INFO. is it?
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-07 15:47 ` Liu, Yi L
@ 2023-04-07 21:07 ` Alex Williamson
2023-04-08 5:07 ` Liu, Yi L
2023-04-11 13:33 ` Jason Gunthorpe
0 siblings, 2 replies; 142+ messages in thread
From: Alex Williamson @ 2023-04-07 21:07 UTC (permalink / raw)
To: Liu, Yi L
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Fri, 7 Apr 2023 15:47:10 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Friday, April 7, 2023 11:14 PM
> >
> > On Fri, 7 Apr 2023 14:04:02 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Friday, April 7, 2023 9:52 PM
> > > >
> > > > On Fri, 7 Apr 2023 13:24:25 +0000
> > > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > > >
> > > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > > Sent: Friday, April 7, 2023 8:04 PM
> > > > > >
> > > > > > > > > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev
> > *pdev,
> > > > void
> > > > > > > > *data)
> > > > > > > > > > if (!iommu_group)
> > > > > > > > > > return -EPERM; /* Cannot reset non-isolated devices */
> > > > >
> > > > > [1]
> > > > >
> > > > > > > > >
> > > > > > > > > Hi Alex,
> > > > > > > > >
> > > > > > > > > Is disabling iommu a sane way to test vfio noiommu mode?
> > > > > > > >
> > > > > > > > Yes
> > > > > > > >
> > > > > > > > > I added intel_iommu=off to disable intel iommu and bind a device to vfio-
> > pci.
> > > > > > > > > I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-vfio0.
> > > > Bind
> > > > > > > > > iommufd==-1 can succeed, but failed to get hot reset info due to the
> > above
> > > > > > > > > group check. Reason is that this happens to have some affected devices,
> > and
> > > > > > > > > these devices have no valid iommu_group (because they are not bound to
> > > > vfio-
> > > > > > pci
> > > > > > > > > hence nobody allocates noiommu group for them). So when hot reset info
> > > > loops
> > > > > > > > > such devices, it failed with -EPERM. Is this expected?
> > > > > > > >
> > > > > > > > Hmm, I didn't recall that we put in such a limitation, but given the
> > > > > > > > minimally intrusive approach to no-iommu and the fact that we never
> > > > > > > > defined an invalid group ID to return to the user, it makes sense that
> > > > > > > > we just blocked the ioctl for no-iommu use. I guess we can do the same
> > > > > > > > for no-iommu cdev.
> > > > > > >
> > > > > > > I just realize a further issue related to this limitation. Remember that we
> > > > > > > may finally compile out the vfio group infrastructure in the future. Say I
> > > > > > > want to test noiommu, I may boot such a kernel with iommu disabled. I think
> > > > > > > the _INFO ioctl would fail as there is no iommu_group. Does it mean we will
> > > > > > > not support hot reset for noiommu in future if vfio group infrastructure is
> > > > > > > compiled out?
> > > > > >
> > > > > > We're talking about IOMMU groups, IOMMU groups are always present
> > > > > > regardless of whether we expose a vfio group interface to userspace.
> > > > > > Remember, we create IOMMU groups even in the no-iommu case. Even with
> > > > > > pure cdev, there are underlying IOMMU groups that maintain the DMA
> > > > > > ownership.
> > > > >
> > > > > hmmm. As [1], when iommu is disabled, there will be no iommu_group for a
> > > > > given device unless it is registered to VFIO, which a fake group is created.
> > > > > That's why I hit the limitation [1]. When vfio_group is compiled out, then
> > > > > even fake group goes away.
> > > >
> > > > In the vfio group case, [1] can be hit with no-iommu only when there
> > > > are affected devices which are not bound to vfio.
> > >
> > > yes. because vfio would allocate fake group when device is registered to
> > > it.
> > >
> > > > Why are we not
> > > > allocating an IOMMU group to no-iommu devices when vfio group is
> > > > disabled? Thanks,
> > >
> > > hmmm. when the vfio group code is configured out. The
> > > vfio_device_set_group() just returns 0 after below patch is
> > > applied and CONFIG_VFIO_GROUP=n. So when there is no
> > > vfio group, the fake group also goes away.
> > >
> > > https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l.liu@intel.com/
> >
> > Is this a fundamental issue or just a problem with the current
> > implementation proposal? It seems like the latter. FWIW, I also don't
> > see a taint happening in the cdev path for no-iommu use. Thanks,
>
> yes. the latter case. The reason I raised it here is to confirm the
> policy on the new group/bdf capability in the DEVICE_GET_INFO. If
> there is no iommu group, perhaps I only need to exclude the new
> group/bdf capability from the cap chain of DEVICE_GET_INFO. is it?
I think we need to revisit the question of why allocating an IOMMU
group for a no-iommu device is exclusive to the vfio group support.
We've already been down the path of trying to report a field that only
exists for devices with certain properties with dev-id. It doesn't
work well. I think we've said all along that while the cdev interface
is device based, there are still going to be underlying IOMMU groups
for the user to be aware of, they're just not as much a fundamental
part of the interface. There should not be a case where a device
doesn't have a group to report. Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread
* RE: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-07 21:07 ` Alex Williamson
@ 2023-04-08 5:07 ` Liu, Yi L
2023-04-08 14:20 ` Alex Williamson
2023-04-11 13:33 ` Jason Gunthorpe
1 sibling, 1 reply; 142+ messages in thread
From: Liu, Yi L @ 2023-04-08 5:07 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Saturday, April 8, 2023 5:07 AM
>
> On Fri, 7 Apr 2023 15:47:10 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
>
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Friday, April 7, 2023 11:14 PM
> > >
> > > On Fri, 7 Apr 2023 14:04:02 +0000
> > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > >
> > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > Sent: Friday, April 7, 2023 9:52 PM
> > > > >
> > > > > On Fri, 7 Apr 2023 13:24:25 +0000
> > > > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > > > >
> > > > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > > > Sent: Friday, April 7, 2023 8:04 PM
> > > > > > >
> > > > > > > > > > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev
> > > *pdev,
> > > > > void
> > > > > > > > > *data)
> > > > > > > > > > > if (!iommu_group)
> > > > > > > > > > > return -EPERM; /* Cannot reset non-isolated devices
> */
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi Alex,
> > > > > > > > > >
> > > > > > > > > > Is disabling iommu a sane way to test vfio noiommu mode?
> > > > > > > > >
> > > > > > > > > Yes
> > > > > > > > >
> > > > > > > > > > I added intel_iommu=off to disable intel iommu and bind a device to
> vfio-
> > > pci.
> > > > > > > > > > I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-
> vfio0.
> > > > > Bind
> > > > > > > > > > iommufd==-1 can succeed, but failed to get hot reset info due to the
> > > above
> > > > > > > > > > group check. Reason is that this happens to have some affected
> devices,
> > > and
> > > > > > > > > > these devices have no valid iommu_group (because they are not
> bound to
> > > > > vfio-
> > > > > > > pci
> > > > > > > > > > hence nobody allocates noiommu group for them). So when hot reset
> info
> > > > > loops
> > > > > > > > > > such devices, it failed with -EPERM. Is this expected?
> > > > > > > > >
> > > > > > > > > Hmm, I didn't recall that we put in such a limitation, but given the
> > > > > > > > > minimally intrusive approach to no-iommu and the fact that we never
> > > > > > > > > defined an invalid group ID to return to the user, it makes sense that
> > > > > > > > > we just blocked the ioctl for no-iommu use. I guess we can do the same
> > > > > > > > > for no-iommu cdev.
> > > > > > > >
> > > > > > > > I just realize a further issue related to this limitation. Remember that we
> > > > > > > > may finally compile out the vfio group infrastructure in the future. Say I
> > > > > > > > want to test noiommu, I may boot such a kernel with iommu disabled. I
> think
> > > > > > > > the _INFO ioctl would fail as there is no iommu_group. Does it mean we
> will
> > > > > > > > not support hot reset for noiommu in future if vfio group infrastructure is
> > > > > > > > compiled out?
> > > > > > >
> > > > > > > We're talking about IOMMU groups, IOMMU groups are always present
> > > > > > > regardless of whether we expose a vfio group interface to userspace.
> > > > > > > Remember, we create IOMMU groups even in the no-iommu case. Even
> with
> > > > > > > pure cdev, there are underlying IOMMU groups that maintain the DMA
> > > > > > > ownership.
> > > > > >
> > > > > > hmmm. As [1], when iommu is disabled, there will be no iommu_group for a
> > > > > > given device unless it is registered to VFIO, which a fake group is created.
> > > > > > That's why I hit the limitation [1]. When vfio_group is compiled out, then
> > > > > > even fake group goes away.
> > > > >
> > > > > In the vfio group case, [1] can be hit with no-iommu only when there
> > > > > are affected devices which are not bound to vfio.
> > > >
> > > > yes. because vfio would allocate fake group when device is registered to
> > > > it.
> > > >
> > > > > Why are we not
> > > > > allocating an IOMMU group to no-iommu devices when vfio group is
> > > > > disabled? Thanks,
> > > >
> > > > hmmm. when the vfio group code is configured out. The
> > > > vfio_device_set_group() just returns 0 after below patch is
> > > > applied and CONFIG_VFIO_GROUP=n. So when there is no
> > > > vfio group, the fake group also goes away.
> > > >
> > > > https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l.liu@intel.com/
> > >
> > > Is this a fundamental issue or just a problem with the current
> > > implementation proposal? It seems like the latter. FWIW, I also don't
> > > see a taint happening in the cdev path for no-iommu use. Thanks,
> >
> > yes. the latter case. The reason I raised it here is to confirm the
> > policy on the new group/bdf capability in the DEVICE_GET_INFO. If
> > there is no iommu group, perhaps I only need to exclude the new
> > group/bdf capability from the cap chain of DEVICE_GET_INFO. is it?
>
> I think we need to revisit the question of why allocating an IOMMU
> group for a no-iommu device is exclusive to the vfio group support.
For no-iommu device, the iommu group is a fake group allocated by vfio.
is it? And the fake group allocation is part of the vfio group code.
It is the vfio_device_set_group() in group.c. If vfio group code is not
compiled in, vfio does not allocate fake groups. Detail for this compiling
can be found in link [1].
> We've already been down the path of trying to report a field that only
> exists for devices with certain properties with dev-id. It doesn't
> work well. I think we've said all along that while the cdev interface
> is device based, there are still going to be underlying IOMMU groups
> for the user to be aware of, they're just not as much a fundamental
> part of the interface. There should not be a case where a device
> doesn't have a group to report. Thanks,
As the patch in link [1] makes vfio group optional, so if compile a kernel
with CONFIG_VFIO_GROUP=n, and boot it with iommu disabled, then there is no
group to report. Perhaps this is not a typical usage but still a sane usage
for noiommu mode as I confirmed with you in this thread. So when it comes,
needs to consider what to report for the group field.
Perhaps I messed up the discussion by referring to a patch that is part of
another series. But I think it should be considered when talking about the
group to be reported.
[1] https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l.liu@intel.com/
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-08 5:07 ` Liu, Yi L
@ 2023-04-08 14:20 ` Alex Williamson
2023-04-09 11:58 ` Yi Liu
0 siblings, 1 reply; 142+ messages in thread
From: Alex Williamson @ 2023-04-08 14:20 UTC (permalink / raw)
To: Liu, Yi L
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Sat, 8 Apr 2023 05:07:16 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Saturday, April 8, 2023 5:07 AM
> >
> > On Fri, 7 Apr 2023 15:47:10 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Friday, April 7, 2023 11:14 PM
> > > >
> > > > On Fri, 7 Apr 2023 14:04:02 +0000
> > > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > > >
> > > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > > Sent: Friday, April 7, 2023 9:52 PM
> > > > > >
> > > > > > On Fri, 7 Apr 2023 13:24:25 +0000
> > > > > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > > > > >
> > > > > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > > > > Sent: Friday, April 7, 2023 8:04 PM
> > > > > > > >
> > > > > > > > > > > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev
> > > > *pdev,
> > > > > > void
> > > > > > > > > > *data)
> > > > > > > > > > > > if (!iommu_group)
> > > > > > > > > > > > return -EPERM; /* Cannot reset non-isolated devices
> > */
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Hi Alex,
> > > > > > > > > > >
> > > > > > > > > > > Is disabling iommu a sane way to test vfio noiommu mode?
> > > > > > > > > >
> > > > > > > > > > Yes
> > > > > > > > > >
> > > > > > > > > > > I added intel_iommu=off to disable intel iommu and bind a device to
> > vfio-
> > > > pci.
> > > > > > > > > > > I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-
> > vfio0.
> > > > > > Bind
> > > > > > > > > > > iommufd==-1 can succeed, but failed to get hot reset info due to the
> > > > above
> > > > > > > > > > > group check. Reason is that this happens to have some affected
> > devices,
> > > > and
> > > > > > > > > > > these devices have no valid iommu_group (because they are not
> > bound to
> > > > > > vfio-
> > > > > > > > pci
> > > > > > > > > > > hence nobody allocates noiommu group for them). So when hot reset
> > info
> > > > > > loops
> > > > > > > > > > > such devices, it failed with -EPERM. Is this expected?
> > > > > > > > > >
> > > > > > > > > > Hmm, I didn't recall that we put in such a limitation, but given the
> > > > > > > > > > minimally intrusive approach to no-iommu and the fact that we never
> > > > > > > > > > defined an invalid group ID to return to the user, it makes sense that
> > > > > > > > > > we just blocked the ioctl for no-iommu use. I guess we can do the same
> > > > > > > > > > for no-iommu cdev.
> > > > > > > > >
> > > > > > > > > I just realize a further issue related to this limitation. Remember that we
> > > > > > > > > may finally compile out the vfio group infrastructure in the future. Say I
> > > > > > > > > want to test noiommu, I may boot such a kernel with iommu disabled. I
> > think
> > > > > > > > > the _INFO ioctl would fail as there is no iommu_group. Does it mean we
> > will
> > > > > > > > > not support hot reset for noiommu in future if vfio group infrastructure is
> > > > > > > > > compiled out?
> > > > > > > >
> > > > > > > > We're talking about IOMMU groups, IOMMU groups are always present
> > > > > > > > regardless of whether we expose a vfio group interface to userspace.
> > > > > > > > Remember, we create IOMMU groups even in the no-iommu case. Even
> > with
> > > > > > > > pure cdev, there are underlying IOMMU groups that maintain the DMA
> > > > > > > > ownership.
> > > > > > >
> > > > > > > hmmm. As [1], when iommu is disabled, there will be no iommu_group for a
> > > > > > > given device unless it is registered to VFIO, which a fake group is created.
> > > > > > > That's why I hit the limitation [1]. When vfio_group is compiled out, then
> > > > > > > even fake group goes away.
> > > > > >
> > > > > > In the vfio group case, [1] can be hit with no-iommu only when there
> > > > > > are affected devices which are not bound to vfio.
> > > > >
> > > > > yes. because vfio would allocate fake group when device is registered to
> > > > > it.
> > > > >
> > > > > > Why are we not
> > > > > > allocating an IOMMU group to no-iommu devices when vfio group is
> > > > > > disabled? Thanks,
> > > > >
> > > > > hmmm. when the vfio group code is configured out. The
> > > > > vfio_device_set_group() just returns 0 after below patch is
> > > > > applied and CONFIG_VFIO_GROUP=n. So when there is no
> > > > > vfio group, the fake group also goes away.
> > > > >
> > > > > https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l.liu@intel.com/
> > > >
> > > > Is this a fundamental issue or just a problem with the current
> > > > implementation proposal? It seems like the latter. FWIW, I also don't
> > > > see a taint happening in the cdev path for no-iommu use. Thanks,
> > >
> > > yes. the latter case. The reason I raised it here is to confirm the
> > > policy on the new group/bdf capability in the DEVICE_GET_INFO. If
> > > there is no iommu group, perhaps I only need to exclude the new
> > > group/bdf capability from the cap chain of DEVICE_GET_INFO. is it?
> >
> > I think we need to revisit the question of why allocating an IOMMU
> > group for a no-iommu device is exclusive to the vfio group support.
>
> For no-iommu device, the iommu group is a fake group allocated by vfio.
> is it? And the fake group allocation is part of the vfio group code.
> It is the vfio_device_set_group() in group.c. If vfio group code is not
> compiled in, vfio does not allocate fake groups. Detail for this compiling
> can be found in link [1].
>
> > We've already been down the path of trying to report a field that only
> > exists for devices with certain properties with dev-id. It doesn't
> > work well. I think we've said all along that while the cdev interface
> > is device based, there are still going to be underlying IOMMU groups
> > for the user to be aware of, they're just not as much a fundamental
> > part of the interface. There should not be a case where a device
> > doesn't have a group to report. Thanks,
>
> As the patch in link [1] makes vfio group optional, so if compile a kernel
> with CONFIG_VFIO_GROUP=n, and boot it with iommu disabled, then there is no
> group to report. Perhaps this is not a typical usage but still a sane usage
> for noiommu mode as I confirmed with you in this thread. So when it comes,
> needs to consider what to report for the group field.
>
> Perhaps I messed up the discussion by referring to a patch that is part of
> another series. But I think it should be considered when talking about the
> group to be reported.
The question is whether the split that group.c code handles both the
vfio group AND creation of the IOMMU group in such cases is the correct
split. I'm not arguing that the way the code is currently laid out has
the fake IOMMU group for no-iommu devices created in vfio group
specific code, but we have a common interface that makes use of IOMMU
group information for which we don't have an equivalent alternative
data field to report.
We've shown that dev-id doesn't work here because dev-ids only exist
for devices within the user's IOMMU context. Also reporting an invalid
ID of any sort fails to indicate the potential implied ownership.
Therefore I recognize that if this interface is to report an IOMMU
group, then the creation of fake IOMMU groups existing only in vfio
group code would need to be refactored. Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-08 14:20 ` Alex Williamson
@ 2023-04-09 11:58 ` Yi Liu
2023-04-09 13:29 ` Alex Williamson
0 siblings, 1 reply; 142+ messages in thread
From: Yi Liu @ 2023-04-09 11:58 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On 2023/4/8 22:20, Alex Williamson wrote:
> On Sat, 8 Apr 2023 05:07:16 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
>
>>> From: Alex Williamson <alex.williamson@redhat.com>
>>> Sent: Saturday, April 8, 2023 5:07 AM
>>>
>>> On Fri, 7 Apr 2023 15:47:10 +0000
>>> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
>>>
>>>>> From: Alex Williamson <alex.williamson@redhat.com>
>>>>> Sent: Friday, April 7, 2023 11:14 PM
>>>>>
>>>>> On Fri, 7 Apr 2023 14:04:02 +0000
>>>>> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
>>>>>
>>>>>>> From: Alex Williamson <alex.williamson@redhat.com>
>>>>>>> Sent: Friday, April 7, 2023 9:52 PM
>>>>>>>
>>>>>>> On Fri, 7 Apr 2023 13:24:25 +0000
>>>>>>> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
>>>>>>>
>>>>>>>>> From: Alex Williamson <alex.williamson@redhat.com>
>>>>>>>>> Sent: Friday, April 7, 2023 8:04 PM
>>>>>>>>>
>>>>>>>>>>>>> @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev
>>>>> *pdev,
>>>>>>> void
>>>>>>>>>>> *data)
>>>>>>>>>>>>> if (!iommu_group)
>>>>>>>>>>>>> return -EPERM; /* Cannot reset non-isolated devices
>>> */
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>>
>>>>>>>>>>>> Is disabling iommu a sane way to test vfio noiommu mode?
>>>>>>>>>>>
>>>>>>>>>>> Yes
>>>>>>>>>>>
>>>>>>>>>>>> I added intel_iommu=off to disable intel iommu and bind a device to
>>> vfio-
>>>>> pci.
>>>>>>>>>>>> I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-
>>> vfio0.
>>>>>>> Bind
>>>>>>>>>>>> iommufd==-1 can succeed, but failed to get hot reset info due to the
>>>>> above
>>>>>>>>>>>> group check. Reason is that this happens to have some affected
>>> devices,
>>>>> and
>>>>>>>>>>>> these devices have no valid iommu_group (because they are not
>>> bound to
>>>>>>> vfio-
>>>>>>>>> pci
>>>>>>>>>>>> hence nobody allocates noiommu group for them). So when hot reset
>>> info
>>>>>>> loops
>>>>>>>>>>>> such devices, it failed with -EPERM. Is this expected?
>>>>>>>>>>>
>>>>>>>>>>> Hmm, I didn't recall that we put in such a limitation, but given the
>>>>>>>>>>> minimally intrusive approach to no-iommu and the fact that we never
>>>>>>>>>>> defined an invalid group ID to return to the user, it makes sense that
>>>>>>>>>>> we just blocked the ioctl for no-iommu use. I guess we can do the same
>>>>>>>>>>> for no-iommu cdev.
>>>>>>>>>>
>>>>>>>>>> I just realize a further issue related to this limitation. Remember that we
>>>>>>>>>> may finally compile out the vfio group infrastructure in the future. Say I
>>>>>>>>>> want to test noiommu, I may boot such a kernel with iommu disabled. I
>>> think
>>>>>>>>>> the _INFO ioctl would fail as there is no iommu_group. Does it mean we
>>> will
>>>>>>>>>> not support hot reset for noiommu in future if vfio group infrastructure is
>>>>>>>>>> compiled out?
>>>>>>>>>
>>>>>>>>> We're talking about IOMMU groups, IOMMU groups are always present
>>>>>>>>> regardless of whether we expose a vfio group interface to userspace.
>>>>>>>>> Remember, we create IOMMU groups even in the no-iommu case. Even
>>> with
>>>>>>>>> pure cdev, there are underlying IOMMU groups that maintain the DMA
>>>>>>>>> ownership.
>>>>>>>>
>>>>>>>> hmmm. As [1], when iommu is disabled, there will be no iommu_group for a
>>>>>>>> given device unless it is registered to VFIO, which a fake group is created.
>>>>>>>> That's why I hit the limitation [1]. When vfio_group is compiled out, then
>>>>>>>> even fake group goes away.
>>>>>>>
>>>>>>> In the vfio group case, [1] can be hit with no-iommu only when there
>>>>>>> are affected devices which are not bound to vfio.
>>>>>>
>>>>>> yes. because vfio would allocate fake group when device is registered to
>>>>>> it.
>>>>>>
>>>>>>> Why are we not
>>>>>>> allocating an IOMMU group to no-iommu devices when vfio group is
>>>>>>> disabled? Thanks,
>>>>>>
>>>>>> hmmm. when the vfio group code is configured out. The
>>>>>> vfio_device_set_group() just returns 0 after below patch is
>>>>>> applied and CONFIG_VFIO_GROUP=n. So when there is no
>>>>>> vfio group, the fake group also goes away.
>>>>>>
>>>>>> https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l.liu@intel.com/
>>>>>
>>>>> Is this a fundamental issue or just a problem with the current
>>>>> implementation proposal? It seems like the latter. FWIW, I also don't
>>>>> see a taint happening in the cdev path for no-iommu use. Thanks,
>>>>
>>>> yes. the latter case. The reason I raised it here is to confirm the
>>>> policy on the new group/bdf capability in the DEVICE_GET_INFO. If
>>>> there is no iommu group, perhaps I only need to exclude the new
>>>> group/bdf capability from the cap chain of DEVICE_GET_INFO. is it?
>>>
>>> I think we need to revisit the question of why allocating an IOMMU
>>> group for a no-iommu device is exclusive to the vfio group support.
>>
>> For no-iommu device, the iommu group is a fake group allocated by vfio.
>> is it? And the fake group allocation is part of the vfio group code.
>> It is the vfio_device_set_group() in group.c. If vfio group code is not
>> compiled in, vfio does not allocate fake groups. Detail for this compiling
>> can be found in link [1].
>>
>>> We've already been down the path of trying to report a field that only
>>> exists for devices with certain properties with dev-id. It doesn't
>>> work well. I think we've said all along that while the cdev interface
>>> is device based, there are still going to be underlying IOMMU groups
>>> for the user to be aware of, they're just not as much a fundamental
>>> part of the interface. There should not be a case where a device
>>> doesn't have a group to report. Thanks,
>>
>> As the patch in link [1] makes vfio group optional, so if compile a kernel
>> with CONFIG_VFIO_GROUP=n, and boot it with iommu disabled, then there is no
>> group to report. Perhaps this is not a typical usage but still a sane usage
>> for noiommu mode as I confirmed with you in this thread. So when it comes,
>> needs to consider what to report for the group field.
>>
>> Perhaps I messed up the discussion by referring to a patch that is part of
>> another series. But I think it should be considered when talking about the
>> group to be reported.
>
> The question is whether the split that group.c code handles both the
> vfio group AND creation of the IOMMU group in such cases is the correct
> split. I'm not arguing that the way the code is currently laid out has
> the fake IOMMU group for no-iommu devices created in vfio group
> specific code, but we have a common interface that makes use of IOMMU
> group information for which we don't have an equivalent alternative
> data field to report.
yes. It is needed to ensure _HOT_RESET_INFO workable for noiommu devices.
> We've shown that dev-id doesn't work here because dev-ids only exist
> for devices within the user's IOMMU context. Also reporting an invalid
> ID of any sort fails to indicate the potential implied ownership.
> Therefore I recognize that if this interface is to report an IOMMU
> group, then the creation of fake IOMMU groups existing only in vfio
> group code would need to be refactored. Thanks,
yeah, needs to move the iommu group creation back to vfio_main.c. This
would be a prerequisite for [1]
[1] https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l.liu@intel.com/
I'll also try out your suggestion to add a capability like below and link
it in the vfio_device_info cap chain.
#define VFIO_DEVICE_INFO_CAP_PCI_BDF 5
struct vfio_device_info_cap_pci_bdf {
struct vfio_info_cap_header header;
__u32 group_id;
__u16 segment;
__u8 bus;
__u8 devfn; /* Use PCI_SLOT/PCI_FUNC */
};
--
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-09 11:58 ` Yi Liu
@ 2023-04-09 13:29 ` Alex Williamson
2023-04-10 8:48 ` Liu, Yi L
2023-04-11 13:34 ` Jason Gunthorpe
0 siblings, 2 replies; 142+ messages in thread
From: Alex Williamson @ 2023-04-09 13:29 UTC (permalink / raw)
To: Yi Liu
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Sun, 9 Apr 2023 19:58:47 +0800
Yi Liu <yi.l.liu@intel.com> wrote:
> On 2023/4/8 22:20, Alex Williamson wrote:
> > On Sat, 8 Apr 2023 05:07:16 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >
> >>> From: Alex Williamson <alex.williamson@redhat.com>
> >>> Sent: Saturday, April 8, 2023 5:07 AM
> >>>
> >>> On Fri, 7 Apr 2023 15:47:10 +0000
> >>> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >>>
> >>>>> From: Alex Williamson <alex.williamson@redhat.com>
> >>>>> Sent: Friday, April 7, 2023 11:14 PM
> >>>>>
> >>>>> On Fri, 7 Apr 2023 14:04:02 +0000
> >>>>> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >>>>>
> >>>>>>> From: Alex Williamson <alex.williamson@redhat.com>
> >>>>>>> Sent: Friday, April 7, 2023 9:52 PM
> >>>>>>>
> >>>>>>> On Fri, 7 Apr 2023 13:24:25 +0000
> >>>>>>> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >>>>>>>
> >>>>>>>>> From: Alex Williamson <alex.williamson@redhat.com>
> >>>>>>>>> Sent: Friday, April 7, 2023 8:04 PM
> >>>>>>>>>
> >>>>>>>>>>>>> @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev
> >>>>> *pdev,
> >>>>>>> void
> >>>>>>>>>>> *data)
> >>>>>>>>>>>>> if (!iommu_group)
> >>>>>>>>>>>>> return -EPERM; /* Cannot reset non-isolated devices
> >>> */
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi Alex,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Is disabling iommu a sane way to test vfio noiommu mode?
> >>>>>>>>>>>
> >>>>>>>>>>> Yes
> >>>>>>>>>>>
> >>>>>>>>>>>> I added intel_iommu=off to disable intel iommu and bind a device to
> >>> vfio-
> >>>>> pci.
> >>>>>>>>>>>> I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-
> >>> vfio0.
> >>>>>>> Bind
> >>>>>>>>>>>> iommufd==-1 can succeed, but failed to get hot reset info due to the
> >>>>> above
> >>>>>>>>>>>> group check. Reason is that this happens to have some affected
> >>> devices,
> >>>>> and
> >>>>>>>>>>>> these devices have no valid iommu_group (because they are not
> >>> bound to
> >>>>>>> vfio-
> >>>>>>>>> pci
> >>>>>>>>>>>> hence nobody allocates noiommu group for them). So when hot reset
> >>> info
> >>>>>>> loops
> >>>>>>>>>>>> such devices, it failed with -EPERM. Is this expected?
> >>>>>>>>>>>
> >>>>>>>>>>> Hmm, I didn't recall that we put in such a limitation, but given the
> >>>>>>>>>>> minimally intrusive approach to no-iommu and the fact that we never
> >>>>>>>>>>> defined an invalid group ID to return to the user, it makes sense that
> >>>>>>>>>>> we just blocked the ioctl for no-iommu use. I guess we can do the same
> >>>>>>>>>>> for no-iommu cdev.
> >>>>>>>>>>
> >>>>>>>>>> I just realize a further issue related to this limitation. Remember that we
> >>>>>>>>>> may finally compile out the vfio group infrastructure in the future. Say I
> >>>>>>>>>> want to test noiommu, I may boot such a kernel with iommu disabled. I
> >>> think
> >>>>>>>>>> the _INFO ioctl would fail as there is no iommu_group. Does it mean we
> >>> will
> >>>>>>>>>> not support hot reset for noiommu in future if vfio group infrastructure is
> >>>>>>>>>> compiled out?
> >>>>>>>>>
> >>>>>>>>> We're talking about IOMMU groups, IOMMU groups are always present
> >>>>>>>>> regardless of whether we expose a vfio group interface to userspace.
> >>>>>>>>> Remember, we create IOMMU groups even in the no-iommu case. Even
> >>> with
> >>>>>>>>> pure cdev, there are underlying IOMMU groups that maintain the DMA
> >>>>>>>>> ownership.
> >>>>>>>>
> >>>>>>>> hmmm. As [1], when iommu is disabled, there will be no iommu_group for a
> >>>>>>>> given device unless it is registered to VFIO, which a fake group is created.
> >>>>>>>> That's why I hit the limitation [1]. When vfio_group is compiled out, then
> >>>>>>>> even fake group goes away.
> >>>>>>>
> >>>>>>> In the vfio group case, [1] can be hit with no-iommu only when there
> >>>>>>> are affected devices which are not bound to vfio.
> >>>>>>
> >>>>>> yes. because vfio would allocate fake group when device is registered to
> >>>>>> it.
> >>>>>>
> >>>>>>> Why are we not
> >>>>>>> allocating an IOMMU group to no-iommu devices when vfio group is
> >>>>>>> disabled? Thanks,
> >>>>>>
> >>>>>> hmmm. when the vfio group code is configured out. The
> >>>>>> vfio_device_set_group() just returns 0 after below patch is
> >>>>>> applied and CONFIG_VFIO_GROUP=n. So when there is no
> >>>>>> vfio group, the fake group also goes away.
> >>>>>>
> >>>>>> https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l.liu@intel.com/
> >>>>>
> >>>>> Is this a fundamental issue or just a problem with the current
> >>>>> implementation proposal? It seems like the latter. FWIW, I also don't
> >>>>> see a taint happening in the cdev path for no-iommu use. Thanks,
> >>>>
> >>>> yes. the latter case. The reason I raised it here is to confirm the
> >>>> policy on the new group/bdf capability in the DEVICE_GET_INFO. If
> >>>> there is no iommu group, perhaps I only need to exclude the new
> >>>> group/bdf capability from the cap chain of DEVICE_GET_INFO. is it?
> >>>
> >>> I think we need to revisit the question of why allocating an IOMMU
> >>> group for a no-iommu device is exclusive to the vfio group support.
> >>
> >> For no-iommu device, the iommu group is a fake group allocated by vfio.
> >> is it? And the fake group allocation is part of the vfio group code.
> >> It is the vfio_device_set_group() in group.c. If vfio group code is not
> >> compiled in, vfio does not allocate fake groups. Detail for this compiling
> >> can be found in link [1].
> >>
> >>> We've already been down the path of trying to report a field that only
> >>> exists for devices with certain properties with dev-id. It doesn't
> >>> work well. I think we've said all along that while the cdev interface
> >>> is device based, there are still going to be underlying IOMMU groups
> >>> for the user to be aware of, they're just not as much a fundamental
> >>> part of the interface. There should not be a case where a device
> >>> doesn't have a group to report. Thanks,
> >>
> >> As the patch in link [1] makes vfio group optional, so if compile a kernel
> >> with CONFIG_VFIO_GROUP=n, and boot it with iommu disabled, then there is no
> >> group to report. Perhaps this is not a typical usage but still a sane usage
> >> for noiommu mode as I confirmed with you in this thread. So when it comes,
> >> needs to consider what to report for the group field.
> >>
> >> Perhaps I messed up the discussion by referring to a patch that is part of
> >> another series. But I think it should be considered when talking about the
> >> group to be reported.
> >
> > The question is whether the split that group.c code handles both the
> > vfio group AND creation of the IOMMU group in such cases is the correct
> > split. I'm not arguing that the way the code is currently laid out has
> > the fake IOMMU group for no-iommu devices created in vfio group
> > specific code, but we have a common interface that makes use of IOMMU
> > group information for which we don't have an equivalent alternative
> > data field to report.
>
> yes. It is needed to ensure _HOT_RESET_INFO workable for noiommu devices.
>
> > We've shown that dev-id doesn't work here because dev-ids only exist
> > for devices within the user's IOMMU context. Also reporting an invalid
> > ID of any sort fails to indicate the potential implied ownership.
> > Therefore I recognize that if this interface is to report an IOMMU
> > group, then the creation of fake IOMMU groups existing only in vfio
> > group code would need to be refactored. Thanks,
>
> yeah, needs to move the iommu group creation back to vfio_main.c. This
> would be a prerequisite for [1]
>
> [1] https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l.liu@intel.com/
>
> I'll also try out your suggestion to add a capability like below and link
> it in the vfio_device_info cap chain.
>
> #define VFIO_DEVICE_INFO_CAP_PCI_BDF 5
>
> struct vfio_device_info_cap_pci_bdf {
> struct vfio_info_cap_header header;
> __u32 group_id;
> __u16 segment;
> __u8 bus;
> __u8 devfn; /* Use PCI_SLOT/PCI_FUNC */
> };
>
Group-id and bdf should be separate capabilities, all device should
report a group-id capability and only PCI devices a bdf capability.
Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread* RE: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-09 13:29 ` Alex Williamson
@ 2023-04-10 8:48 ` Liu, Yi L
2023-04-10 14:41 ` Alex Williamson
2023-04-11 13:34 ` Jason Gunthorpe
1 sibling, 1 reply; 142+ messages in thread
From: Liu, Yi L @ 2023-04-10 8:48 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Sunday, April 9, 2023 9:30 PM
[...]
> > yeah, needs to move the iommu group creation back to vfio_main.c. This
> > would be a prerequisite for [1]
> >
> > [1] https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l.liu@intel.com/
> >
> > I'll also try out your suggestion to add a capability like below and link
> > it in the vfio_device_info cap chain.
> >
> > #define VFIO_DEVICE_INFO_CAP_PCI_BDF 5
> >
> > struct vfio_device_info_cap_pci_bdf {
> > struct vfio_info_cap_header header;
> > __u32 group_id;
> > __u16 segment;
> > __u8 bus;
> > __u8 devfn; /* Use PCI_SLOT/PCI_FUNC */
> > };
> >
>
> Group-id and bdf should be separate capabilities, all device should
> report a group-id capability and only PCI devices a bdf capability.
ok. Since this is to support the device fd passing usage, so we need to
let all the vfio device drivers report group-id capability. is it? So may
have a below helper in vfio_main.c. How about the sample drivers?
seems not necessary for them. right?
int vfio_pci_info_add_group_cap(struct device *dev,
struct vfio_info_cap *caps)
{
struct vfio_pci_device_info_cap_group cap = {
.header.id = VFIO_DEVICE_INFO_CAP_GROUP_ID,
.header.version = 1,
};
struct iommu_group *iommu_group;
iommu_group = iommu_group_get(&pdev->dev);
if (!iommu_group) {
kfree(caps->buf);
return -EPERM;
}
cap.group_id = iommu_group_id(iommu_group);
iommu_group_put(iommu_group);
return vfio_info_add_capability(caps, &cap.header, sizeof(cap));
}
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-10 8:48 ` Liu, Yi L
@ 2023-04-10 14:41 ` Alex Williamson
2023-04-10 15:18 ` Liu, Yi L
0 siblings, 1 reply; 142+ messages in thread
From: Alex Williamson @ 2023-04-10 14:41 UTC (permalink / raw)
To: Liu, Yi L
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Mon, 10 Apr 2023 08:48:54 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Sunday, April 9, 2023 9:30 PM
> [...]
> > > yeah, needs to move the iommu group creation back to vfio_main.c. This
> > > would be a prerequisite for [1]
> > >
> > > [1] https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l.liu@intel.com/
> > >
> > > I'll also try out your suggestion to add a capability like below and link
> > > it in the vfio_device_info cap chain.
> > >
> > > #define VFIO_DEVICE_INFO_CAP_PCI_BDF 5
> > >
> > > struct vfio_device_info_cap_pci_bdf {
> > > struct vfio_info_cap_header header;
> > > __u32 group_id;
> > > __u16 segment;
> > > __u8 bus;
> > > __u8 devfn; /* Use PCI_SLOT/PCI_FUNC */
> > > };
> > >
> >
> > Group-id and bdf should be separate capabilities, all device should
> > report a group-id capability and only PCI devices a bdf capability.
>
> ok. Since this is to support the device fd passing usage, so we need to
> let all the vfio device drivers report group-id capability. is it? So may
> have a below helper in vfio_main.c. How about the sample drivers?
> seems not necessary for them. right?
The more common we can make it, the better, but if it ends up that the
individual drivers need to initialize the capability then it would
probably be limited to those driver with a need to expose the group.
Sample drivers for the purpose of illustrating the interface and of
course anything based on vfio-pci-core which exposes hot-reset. Thanks
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread* RE: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-10 14:41 ` Alex Williamson
@ 2023-04-10 15:18 ` Liu, Yi L
2023-04-10 15:23 ` Alex Williamson
0 siblings, 1 reply; 142+ messages in thread
From: Liu, Yi L @ 2023-04-10 15:18 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Monday, April 10, 2023 10:41 PM
>
> On Mon, 10 Apr 2023 08:48:54 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
>
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Sunday, April 9, 2023 9:30 PM
> > [...]
> > > > yeah, needs to move the iommu group creation back to vfio_main.c. This
> > > > would be a prerequisite for [1]
> > > >
> > > > [1] https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l.liu@intel.com/
> > > >
> > > > I'll also try out your suggestion to add a capability like below and link
> > > > it in the vfio_device_info cap chain.
> > > >
> > > > #define VFIO_DEVICE_INFO_CAP_PCI_BDF 5
> > > >
> > > > struct vfio_device_info_cap_pci_bdf {
> > > > struct vfio_info_cap_header header;
> > > > __u32 group_id;
> > > > __u16 segment;
> > > > __u8 bus;
> > > > __u8 devfn; /* Use PCI_SLOT/PCI_FUNC */
> > > > };
> > > >
> > >
> > > Group-id and bdf should be separate capabilities, all device should
> > > report a group-id capability and only PCI devices a bdf capability.
> >
> > ok. Since this is to support the device fd passing usage, so we need to
> > let all the vfio device drivers report group-id capability. is it? So may
> > have a below helper in vfio_main.c. How about the sample drivers?
> > seems not necessary for them. right?
>
> The more common we can make it, the better, but if it ends up that the
> individual drivers need to initialize the capability then it would
> probably be limited to those driver with a need to expose the group.
looks to be such a case. vfio_device_info is assembled by the individual
drivers. If want to report group_id capability as a common behavior, needs
to change all of them. Had a quick draft for it as below commit:
https://github.com/yiliu1765/iommufd/commit/ff4b8bee90761961041126305183a9a7e0f0542d
https://github.com/yiliu1765/iommufd/commits/report_group_id
> Sample drivers for the purpose of illustrating the interface and of
> course anything based on vfio-pci-core which exposes hot-reset. Thanks
do you see any sample drivers need to report group_id cap? IMHO, seems
no.
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-10 15:18 ` Liu, Yi L
@ 2023-04-10 15:23 ` Alex Williamson
0 siblings, 0 replies; 142+ messages in thread
From: Alex Williamson @ 2023-04-10 15:23 UTC (permalink / raw)
To: Liu, Yi L
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Mon, 10 Apr 2023 15:18:27 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Monday, April 10, 2023 10:41 PM
> >
> > On Mon, 10 Apr 2023 08:48:54 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Sunday, April 9, 2023 9:30 PM
> > > [...]
> > > > > yeah, needs to move the iommu group creation back to vfio_main.c. This
> > > > > would be a prerequisite for [1]
> > > > >
> > > > > [1] https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l.liu@intel.com/
> > > > >
> > > > > I'll also try out your suggestion to add a capability like below and link
> > > > > it in the vfio_device_info cap chain.
> > > > >
> > > > > #define VFIO_DEVICE_INFO_CAP_PCI_BDF 5
> > > > >
> > > > > struct vfio_device_info_cap_pci_bdf {
> > > > > struct vfio_info_cap_header header;
> > > > > __u32 group_id;
> > > > > __u16 segment;
> > > > > __u8 bus;
> > > > > __u8 devfn; /* Use PCI_SLOT/PCI_FUNC */
> > > > > };
> > > > >
> > > >
> > > > Group-id and bdf should be separate capabilities, all device should
> > > > report a group-id capability and only PCI devices a bdf capability.
> > >
> > > ok. Since this is to support the device fd passing usage, so we need to
> > > let all the vfio device drivers report group-id capability. is it? So may
> > > have a below helper in vfio_main.c. How about the sample drivers?
> > > seems not necessary for them. right?
> >
> > The more common we can make it, the better, but if it ends up that the
> > individual drivers need to initialize the capability then it would
> > probably be limited to those driver with a need to expose the group.
>
> looks to be such a case. vfio_device_info is assembled by the individual
> drivers. If want to report group_id capability as a common behavior, needs
> to change all of them. Had a quick draft for it as below commit:
>
> https://github.com/yiliu1765/iommufd/commit/ff4b8bee90761961041126305183a9a7e0f0542d
>
> https://github.com/yiliu1765/iommufd/commits/report_group_id
>
> > Sample drivers for the purpose of illustrating the interface and of
> > course anything based on vfio-pci-core which exposes hot-reset. Thanks
>
> do you see any sample drivers need to report group_id cap? IMHO, seems
> no.
As in the quoted text, part of the purpose of the sample drivers is to
act both as a proof-of-concept and illustration of the API, therefore
gratuitous exposure of such capabilities should be encouraged. They
would also provide a proof point of an mdev device, ie. emulated IOMMU
device, exposing the capability. Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-09 13:29 ` Alex Williamson
2023-04-10 8:48 ` Liu, Yi L
@ 2023-04-11 13:34 ` Jason Gunthorpe
1 sibling, 0 replies; 142+ messages in thread
From: Jason Gunthorpe @ 2023-04-11 13:34 UTC (permalink / raw)
To: Alex Williamson
Cc: Yi Liu, Tian, Kevin, joro@8bytes.org, robin.murphy@arm.com,
cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Sun, Apr 09, 2023 at 07:29:51AM -0600, Alex Williamson wrote:
> > struct vfio_device_info_cap_pci_bdf {
> > struct vfio_info_cap_header header;
> > __u32 group_id;
> > __u16 segment;
> > __u8 bus;
> > __u8 devfn; /* Use PCI_SLOT/PCI_FUNC */
> > };
> >
>
> Group-id and bdf should be separate capabilities, all device should
> report a group-id capability and only PCI devices a bdf capability.
Group should be reported by iommufd using a generic ioctl, and not be
part of VFIO.
This should report BDF only and only work for PCI.
Jason
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-07 21:07 ` Alex Williamson
2023-04-08 5:07 ` Liu, Yi L
@ 2023-04-11 13:33 ` Jason Gunthorpe
1 sibling, 0 replies; 142+ messages in thread
From: Jason Gunthorpe @ 2023-04-11 13:33 UTC (permalink / raw)
To: Alex Williamson
Cc: Liu, Yi L, Tian, Kevin, joro@8bytes.org, robin.murphy@arm.com,
cohuck@redhat.com, eric.auger@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Fri, Apr 07, 2023 at 03:07:21PM -0600, Alex Williamson wrote:
> I think we need to revisit the question of why allocating an IOMMU
> group for a no-iommu device is exclusive to the vfio group support.
One of the points of this effort is to remove the co-mingling of iommu
and VFIO so much. We should not create the fake iommu groups for
no-iommu.
The _INFO API reporting the group is not a good reason to wreck this
clean separation.
Jason
^ permalink raw reply [flat|nested] 142+ messages in thread
* RE: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-07 12:03 ` Alex Williamson
2023-04-07 13:24 ` Liu, Yi L
@ 2023-04-11 6:16 ` Liu, Yi L
1 sibling, 0 replies; 142+ messages in thread
From: Liu, Yi L @ 2023-04-11 6:16 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, eric.auger@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
Hi Alex,
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, April 7, 2023 8:04 PM
>
> On Fri, 7 Apr 2023 10:09:58 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
>
> > Hi Alex,
> >
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Monday, April 3, 2023 11:02 PM
> > >
> > > On Mon, 3 Apr 2023 09:25:06 +0000
> > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > >
> > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > Sent: Saturday, April 1, 2023 10:44 PM
> > > >
> > > > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void
> > > *data)
> > > > > if (!iommu_group)
> > > > > return -EPERM; /* Cannot reset non-isolated devices */
> > > >
> > > > Hi Alex,
> > > >
> > > > Is disabling iommu a sane way to test vfio noiommu mode?
> > >
> > > Yes
> > >
> > > > I added intel_iommu=off to disable intel iommu and bind a device to vfio-pci.
> > > > I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-vfio0. Bind
> > > > iommufd==-1 can succeed, but failed to get hot reset info due to the above
> > > > group check. Reason is that this happens to have some affected devices, and
> > > > these devices have no valid iommu_group (because they are not bound to vfio-
> pci
> > > > hence nobody allocates noiommu group for them). So when hot reset info loops
> > > > such devices, it failed with -EPERM. Is this expected?
> > >
> > > Hmm, I didn't recall that we put in such a limitation, but given the
> > > minimally intrusive approach to no-iommu and the fact that we never
> > > defined an invalid group ID to return to the user, it makes sense that
> > > we just blocked the ioctl for no-iommu use. I guess we can do the same
> > > for no-iommu cdev.
> >
> > I just realize a further issue related to this limitation. Remember that we
> > may finally compile out the vfio group infrastructure in the future. Say I
> > want to test noiommu, I may boot such a kernel with iommu disabled. I think
> > the _INFO ioctl would fail as there is no iommu_group. Does it mean we will
> > not support hot reset for noiommu in future if vfio group infrastructure is
> > compiled out?
>
> We're talking about IOMMU groups, IOMMU groups are always present
> regardless of whether we expose a vfio group interface to userspace.
> Remember, we create IOMMU groups even in the no-iommu case. Even with
> pure cdev, there are underlying IOMMU groups that maintain the DMA
> ownership.
I just realize that there is one case that does not have iommu group.
although not implemented yet. There was a discussion on SIOV support.
IIRC, it was agreed that no need to allocate iommu_group for SIOV case.
Kevin or Jason can keep me honest here. I failed to find out the link
of this discussion.
> > As another thread, we are going to add a new bdf/group capability to
> > DEVICE_GET_INFO. If the above kernel is booted, shall we exclude the new
> > bdf/group capability or add a flag in the capability to mark the group_id
> > is invalid?
>
> As above, there's always an IOMMU group, it's never invalid. Thanks,
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-01 14:44 ` [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO Yi Liu
2023-04-03 9:25 ` Liu, Yi L
@ 2023-04-04 22:20 ` Alex Williamson
2023-04-05 12:19 ` Eric Auger
2 siblings, 0 replies; 142+ messages in thread
From: Alex Williamson @ 2023-04-04 22:20 UTC (permalink / raw)
To: Yi Liu
Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger, nicolinc,
kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
terrence.xu, yanting.jiang
On Sat, 1 Apr 2023 07:44:29 -0700
Yi Liu <yi.l.liu@intel.com> wrote:
> for the users that accept device fds passed from management stacks to be
> able to figure out the host reset affected devices among the devices
> opened by the user. This is needed as such users do not have BDF (bus,
> devfn) knowledge about the devices it has opened, hence unable to use
> the information reported by existing VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
> to figure out the affected devices.
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> drivers/vfio/pci/vfio_pci_core.c | 58 ++++++++++++++++++++++++++++----
> include/uapi/linux/vfio.h | 24 ++++++++++++-
> 2 files changed, 74 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 19f5b075d70a..a5a7e148dce1 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -30,6 +30,7 @@
> #if IS_ENABLED(CONFIG_EEH)
> #include <asm/eeh.h>
> #endif
> +#include <uapi/linux/iommufd.h>
>
> #include "vfio_pci_priv.h"
>
> @@ -767,6 +768,20 @@ static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_typ
> return 0;
> }
>
> +static struct vfio_device *
> +vfio_pci_find_device_in_devset(struct vfio_device_set *dev_set,
> + struct pci_dev *pdev)
> +{
> + struct vfio_device *cur;
> +
> + lockdep_assert_held(&dev_set->lock);
> +
> + list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
> + if (cur->dev == &pdev->dev)
> + return cur;
> + return NULL;
> +}
> +
> static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
> {
> (*(int *)data)++;
> @@ -776,13 +791,20 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
> struct vfio_pci_fill_info {
> int max;
> int cur;
> + bool require_devid;
> + struct iommufd_ctx *iommufd;
> + struct vfio_device_set *dev_set;
> struct vfio_pci_dependent_device *devices;
Poor structure packing, move the bool to the end.
Nit, maybe just name it @devid.
> };
>
> static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> {
> struct vfio_pci_fill_info *fill = data;
> + struct vfio_device_set *dev_set = fill->dev_set;
> struct iommu_group *iommu_group;
> + struct vfio_device *vdev;
> +
> + lockdep_assert_held(&dev_set->lock);
>
> if (fill->cur == fill->max)
> return -EAGAIN; /* Something changed, try again */
> @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> if (!iommu_group)
> return -EPERM; /* Cannot reset non-isolated devices */
>
> - fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> + if (fill->require_devid) {
Nit, @vdev could be scoped here.
> + /*
> + * Report dev_id of the devices that are opened as cdev
> + * and have the same iommufd with the fill->iommufd.
> + * Otherwise, just fill IOMMUFD_INVALID_ID.
> + */
> + vdev = vfio_pci_find_device_in_devset(dev_set, pdev);
I wish I had a better solution to this, but I don't.
> + if (vdev && vfio_device_cdev_opened(vdev) &&
> + fill->iommufd == vfio_iommufd_physical_ictx(vdev))
> + vfio_iommufd_physical_devid(vdev, &fill->devices[fill->cur].dev_id);
Long line, maybe a pointer to &fill->devices[fill->cur] would help.
> + else
> + fill->devices[fill->cur].dev_id = IOMMUFD_INVALID_ID;
> + } else {
> + fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> + }
> fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
> fill->devices[fill->cur].bus = pdev->bus->number;
> fill->devices[fill->cur].devfn = pdev->devfn;
> @@ -1230,17 +1266,27 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
> return -ENOMEM;
>
> fill.devices = devices;
> + fill.dev_set = vdev->vdev.dev_set;
>
> + mutex_lock(&vdev->vdev.dev_set->lock);
> + if (vfio_device_cdev_opened(&vdev->vdev)) {
> + fill.require_devid = true;
> + fill.iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> + }
We can do this unconditionally:
fill.devid = vfio_device_cdev_opened(&vdev->vdev);
fill.iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
Thanks,
Alex
> ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
> &fill, slot);
> + mutex_unlock(&vdev->vdev.dev_set->lock);
>
> /*
> * If a device was removed between counting and filling, we may come up
> * short of fill.max. If a device was added, we'll have a return of
> * -EAGAIN above.
> */
> - if (!ret)
> + if (!ret) {
> hdr.count = fill.cur;
> + if (fill.require_devid)
> + hdr.flags = VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID;
> + }
>
> reset_info_exit:
> if (copy_to_user(arg, &hdr, minsz))
> @@ -2346,12 +2392,10 @@ static bool vfio_dev_in_files(struct vfio_pci_core_device *vdev,
> static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data)
> {
> struct vfio_device_set *dev_set = data;
> - struct vfio_device *cur;
>
> - list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
> - if (cur->dev == &pdev->dev)
> - return 0;
> - return -EBUSY;
> + lockdep_assert_held(&dev_set->lock);
> +
> + return vfio_pci_find_device_in_devset(dev_set, pdev) ? 0 : -EBUSY;
> }
>
> /*
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 25432ef213ee..5a34364e3b94 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -650,11 +650,32 @@ enum {
> * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
> * struct vfio_pci_hot_reset_info)
> *
> + * This command is used to query the affected devices in the hot reset for
> + * a given device. User could use the information reported by this command
> + * to figure out the affected devices among the devices it has opened.
> + * This command always reports the segment, bus and devfn information for
> + * each affected device, and selectively report the group_id or the dev_id
> + * per the way how the device being queried is opened.
> + * - If the device is opened via the traditional group/container manner,
> + * this command reports the group_id for each affected device.
> + *
> + * - If the device is opened as a cdev, this command needs to report
> + * dev_id for each affected device and set the
> + * VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID flag. For the affected
> + * devices that are not opened as cdev or bound to different iommufds
> + * with the device that is queried, report an invalid dev_id to avoid
> + * potential dev_id conflict as dev_id is local to iommufd. For such
> + * affected devices, user shall fall back to use the segment, bus and
> + * devfn info to map it to opened device.
> + *
> * Return: 0 on success, -errno on failure:
> * -enospc = insufficient buffer, -enodev = unsupported for device.
> */
> struct vfio_pci_dependent_device {
> - __u32 group_id;
> + union {
> + __u32 group_id;
> + __u32 dev_id;
> + };
> __u16 segment;
> __u8 bus;
> __u8 devfn; /* Use PCI_SLOT/PCI_FUNC */
> @@ -663,6 +684,7 @@ struct vfio_pci_dependent_device {
> struct vfio_pci_hot_reset_info {
> __u32 argsz;
> __u32 flags;
> +#define VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID (1 << 0)
> __u32 count;
> struct vfio_pci_dependent_device devices[];
> };
^ permalink raw reply [flat|nested] 142+ messages in thread* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-01 14:44 ` [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO Yi Liu
2023-04-03 9:25 ` Liu, Yi L
2023-04-04 22:20 ` Alex Williamson
@ 2023-04-05 12:19 ` Eric Auger
2023-04-05 14:04 ` Liu, Yi L
2 siblings, 1 reply; 142+ messages in thread
From: Eric Auger @ 2023-04-05 12:19 UTC (permalink / raw)
To: Yi Liu, alex.williamson, jgg, kevin.tian
Cc: joro, robin.murphy, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang
Hi Yi,
On 4/1/23 16:44, Yi Liu wrote:
> for the users that accept device fds passed from management stacks to be
> able to figure out the host reset affected devices among the devices
> opened by the user. This is needed as such users do not have BDF (bus,
> devfn) knowledge about the devices it has opened, hence unable to use
> the information reported by existing VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
> to figure out the affected devices.
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
> drivers/vfio/pci/vfio_pci_core.c | 58 ++++++++++++++++++++++++++++----
> include/uapi/linux/vfio.h | 24 ++++++++++++-
> 2 files changed, 74 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 19f5b075d70a..a5a7e148dce1 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -30,6 +30,7 @@
> #if IS_ENABLED(CONFIG_EEH)
> #include <asm/eeh.h>
> #endif
> +#include <uapi/linux/iommufd.h>
>
> #include "vfio_pci_priv.h"
>
> @@ -767,6 +768,20 @@ static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_typ
> return 0;
> }
>
> +static struct vfio_device *
> +vfio_pci_find_device_in_devset(struct vfio_device_set *dev_set,
> + struct pci_dev *pdev)
> +{
> + struct vfio_device *cur;
> +
> + lockdep_assert_held(&dev_set->lock);
> +
> + list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
> + if (cur->dev == &pdev->dev)
> + return cur;
> + return NULL;
> +}
> +
> static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
> {
> (*(int *)data)++;
> @@ -776,13 +791,20 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
> struct vfio_pci_fill_info {
> int max;
> int cur;
> + bool require_devid;
> + struct iommufd_ctx *iommufd;
> + struct vfio_device_set *dev_set;
> struct vfio_pci_dependent_device *devices;
> };
>
> static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> {
> struct vfio_pci_fill_info *fill = data;
> + struct vfio_device_set *dev_set = fill->dev_set;
> struct iommu_group *iommu_group;
> + struct vfio_device *vdev;
> +
> + lockdep_assert_held(&dev_set->lock);
>
> if (fill->cur == fill->max)
> return -EAGAIN; /* Something changed, try again */
> @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> if (!iommu_group)
> return -EPERM; /* Cannot reset non-isolated devices */
>
> - fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> + if (fill->require_devid) {
> + /*
> + * Report dev_id of the devices that are opened as cdev
> + * and have the same iommufd with the fill->iommufd.
> + * Otherwise, just fill IOMMUFD_INVALID_ID.
> + */
> + vdev = vfio_pci_find_device_in_devset(dev_set, pdev);
> + if (vdev && vfio_device_cdev_opened(vdev) &&
> + fill->iommufd == vfio_iommufd_physical_ictx(vdev))
> + vfio_iommufd_physical_devid(vdev, &fill->devices[fill->cur].dev_id);
> + else
> + fill->devices[fill->cur].dev_id = IOMMUFD_INVALID_ID;
> + } else {
> + fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> + }
> fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
> fill->devices[fill->cur].bus = pdev->bus->number;
> fill->devices[fill->cur].devfn = pdev->devfn;
> @@ -1230,17 +1266,27 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
> return -ENOMEM;
>
> fill.devices = devices;
> + fill.dev_set = vdev->vdev.dev_set;
>
> + mutex_lock(&vdev->vdev.dev_set->lock);
> + if (vfio_device_cdev_opened(&vdev->vdev)) {
> + fill.require_devid = true;
> + fill.iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> + }
> ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
> &fill, slot);
> + mutex_unlock(&vdev->vdev.dev_set->lock);
>
> /*
> * If a device was removed between counting and filling, we may come up
> * short of fill.max. If a device was added, we'll have a return of
> * -EAGAIN above.
> */
> - if (!ret)
> + if (!ret) {
> hdr.count = fill.cur;
> + if (fill.require_devid)
> + hdr.flags = VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID;
> + }
>
> reset_info_exit:
> if (copy_to_user(arg, &hdr, minsz))
> @@ -2346,12 +2392,10 @@ static bool vfio_dev_in_files(struct vfio_pci_core_device *vdev,
> static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data)
> {
> struct vfio_device_set *dev_set = data;
> - struct vfio_device *cur;
>
> - list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
> - if (cur->dev == &pdev->dev)
> - return 0;
> - return -EBUSY;
> + lockdep_assert_held(&dev_set->lock);
> +
> + return vfio_pci_find_device_in_devset(dev_set, pdev) ? 0 : -EBUSY;
> }
>
> /*
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 25432ef213ee..5a34364e3b94 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -650,11 +650,32 @@ enum {
> * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
> * struct vfio_pci_hot_reset_info)
> *
> + * This command is used to query the affected devices in the hot reset for
> + * a given device. User could use the information reported by this command
> + * to figure out the affected devices among the devices it has opened.
> + * This command always reports the segment, bus and devfn information for
> + * each affected device, and selectively report the group_id or the dev_id
> + * per the way how the device being queried is opened.
> + * - If the device is opened via the traditional group/container manner,
> + * this command reports the group_id for each affected device.
> + *
> + * - If the device is opened as a cdev, this command needs to report
s/needs to report/reports
> + * dev_id for each affected device and set the
> + * VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID flag. For the affected
> + * devices that are not opened as cdev or bound to different iommufds
> + * with the device that is queried, report an invalid dev_id to avoid
s/bound to different iommufds with the device that is queried/bound to
iommufds different from the reset device one?
> + * potential dev_id conflict as dev_id is local to iommufd. For such
> + * affected devices, user shall fall back to use the segment, bus and
> + * devfn info to map it to opened device.
> + *
> * Return: 0 on success, -errno on failure:
> * -enospc = insufficient buffer, -enodev = unsupported for device.
> */
> struct vfio_pci_dependent_device {
> - __u32 group_id;
> + union {
> + __u32 group_id;
> + __u32 dev_id;
> + };
> __u16 segment;
> __u8 bus;
> __u8 devfn; /* Use PCI_SLOT/PCI_FUNC */
> @@ -663,6 +684,7 @@ struct vfio_pci_dependent_device {
> struct vfio_pci_hot_reset_info {
> __u32 argsz;
> __u32 flags;
> +#define VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID (1 << 0)
> __u32 count;
> struct vfio_pci_dependent_device devices[];
> };
Eric
^ permalink raw reply [flat|nested] 142+ messages in thread* RE: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-05 12:19 ` Eric Auger
@ 2023-04-05 14:04 ` Liu, Yi L
2023-04-05 16:25 ` Alex Williamson
0 siblings, 1 reply; 142+ messages in thread
From: Liu, Yi L @ 2023-04-05 14:04 UTC (permalink / raw)
To: eric.auger@redhat.com, alex.williamson@redhat.com, jgg@nvidia.com,
Tian, Kevin
Cc: joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
Hi Eric,
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Wednesday, April 5, 2023 8:20 PM
>
> Hi Yi,
> On 4/1/23 16:44, Yi Liu wrote:
> > for the users that accept device fds passed from management stacks to be
> > able to figure out the host reset affected devices among the devices
> > opened by the user. This is needed as such users do not have BDF (bus,
> > devfn) knowledge about the devices it has opened, hence unable to use
> > the information reported by existing VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
> > to figure out the affected devices.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> > drivers/vfio/pci/vfio_pci_core.c | 58 ++++++++++++++++++++++++++++----
> > include/uapi/linux/vfio.h | 24 ++++++++++++-
> > 2 files changed, 74 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index 19f5b075d70a..a5a7e148dce1 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -30,6 +30,7 @@
> > #if IS_ENABLED(CONFIG_EEH)
> > #include <asm/eeh.h>
> > #endif
> > +#include <uapi/linux/iommufd.h>
> >
> > #include "vfio_pci_priv.h"
> >
> > @@ -767,6 +768,20 @@ static int vfio_pci_get_irq_count(struct
> vfio_pci_core_device *vdev, int irq_typ
> > return 0;
> > }
> >
> > +static struct vfio_device *
> > +vfio_pci_find_device_in_devset(struct vfio_device_set *dev_set,
> > + struct pci_dev *pdev)
> > +{
> > + struct vfio_device *cur;
> > +
> > + lockdep_assert_held(&dev_set->lock);
> > +
> > + list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
> > + if (cur->dev == &pdev->dev)
> > + return cur;
> > + return NULL;
> > +}
> > +
> > static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
> > {
> > (*(int *)data)++;
> > @@ -776,13 +791,20 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, void
> *data)
> > struct vfio_pci_fill_info {
> > int max;
> > int cur;
> > + bool require_devid;
> > + struct iommufd_ctx *iommufd;
> > + struct vfio_device_set *dev_set;
> > struct vfio_pci_dependent_device *devices;
> > };
> >
> > static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> > {
> > struct vfio_pci_fill_info *fill = data;
> > + struct vfio_device_set *dev_set = fill->dev_set;
> > struct iommu_group *iommu_group;
> > + struct vfio_device *vdev;
> > +
> > + lockdep_assert_held(&dev_set->lock);
> >
> > if (fill->cur == fill->max)
> > return -EAGAIN; /* Something changed, try again */
> > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void
> *data)
> > if (!iommu_group)
> > return -EPERM; /* Cannot reset non-isolated devices */
> >
> > - fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > + if (fill->require_devid) {
> > + /*
> > + * Report dev_id of the devices that are opened as cdev
> > + * and have the same iommufd with the fill->iommufd.
> > + * Otherwise, just fill IOMMUFD_INVALID_ID.
> > + */
> > + vdev = vfio_pci_find_device_in_devset(dev_set, pdev);
> > + if (vdev && vfio_device_cdev_opened(vdev) &&
> > + fill->iommufd == vfio_iommufd_physical_ictx(vdev))
> > + vfio_iommufd_physical_devid(vdev, &fill->devices[fill-
> >cur].dev_id);
> > + else
> > + fill->devices[fill->cur].dev_id = IOMMUFD_INVALID_ID;
> > + } else {
> > + fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > + }
> > fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
> > fill->devices[fill->cur].bus = pdev->bus->number;
> > fill->devices[fill->cur].devfn = pdev->devfn;
> > @@ -1230,17 +1266,27 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
> > return -ENOMEM;
> >
> > fill.devices = devices;
> > + fill.dev_set = vdev->vdev.dev_set;
> >
> > + mutex_lock(&vdev->vdev.dev_set->lock);
> > + if (vfio_device_cdev_opened(&vdev->vdev)) {
> > + fill.require_devid = true;
> > + fill.iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> > + }
> > ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
> > &fill, slot);
> > + mutex_unlock(&vdev->vdev.dev_set->lock);
> >
> > /*
> > * If a device was removed between counting and filling, we may come up
> > * short of fill.max. If a device was added, we'll have a return of
> > * -EAGAIN above.
> > */
> > - if (!ret)
> > + if (!ret) {
> > hdr.count = fill.cur;
> > + if (fill.require_devid)
> > + hdr.flags = VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID;
> > + }
> >
> > reset_info_exit:
> > if (copy_to_user(arg, &hdr, minsz))
> > @@ -2346,12 +2392,10 @@ static bool vfio_dev_in_files(struct
> vfio_pci_core_device *vdev,
> > static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data)
> > {
> > struct vfio_device_set *dev_set = data;
> > - struct vfio_device *cur;
> >
> > - list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
> > - if (cur->dev == &pdev->dev)
> > - return 0;
> > - return -EBUSY;
> > + lockdep_assert_held(&dev_set->lock);
> > +
> > + return vfio_pci_find_device_in_devset(dev_set, pdev) ? 0 : -EBUSY;
> > }
> >
> > /*
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 25432ef213ee..5a34364e3b94 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -650,11 +650,32 @@ enum {
> > * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
> > * struct vfio_pci_hot_reset_info)
> > *
> > + * This command is used to query the affected devices in the hot reset for
> > + * a given device. User could use the information reported by this command
> > + * to figure out the affected devices among the devices it has opened.
> > + * This command always reports the segment, bus and devfn information for
> > + * each affected device, and selectively report the group_id or the dev_id
> > + * per the way how the device being queried is opened.
> > + * - If the device is opened via the traditional group/container manner,
> > + * this command reports the group_id for each affected device.
> > + *
> > + * - If the device is opened as a cdev, this command needs to report
> s/needs to report/reports
got it.
> > + * dev_id for each affected device and set the
> > + * VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID flag. For the affected
> > + * devices that are not opened as cdev or bound to different iommufds
> > + * with the device that is queried, report an invalid dev_id to avoid
> s/bound to different iommufds with the device that is queried/bound to
> iommufds different from the reset device one?
hmmm, I'm not a native speaker here. This _INFO is to query if want
hot reset a given device, what devices would be affected. So it appears
the queried device is better. But I'd admit "the queried device" is also
"the reset device". may Alex help pick one. 😊
Regards,
Yi Liu
> > + * potential dev_id conflict as dev_id is local to iommufd. For such
> > + * affected devices, user shall fall back to use the segment, bus and
> > + * devfn info to map it to opened device.
> > + *
> > * Return: 0 on success, -errno on failure:
> > * -enospc = insufficient buffer, -enodev = unsupported for device.
> > */
> > struct vfio_pci_dependent_device {
> > - __u32 group_id;
> > + union {
> > + __u32 group_id;
> > + __u32 dev_id;
> > + };
> > __u16 segment;
> > __u8 bus;
> > __u8 devfn; /* Use PCI_SLOT/PCI_FUNC */
> > @@ -663,6 +684,7 @@ struct vfio_pci_dependent_device {
> > struct vfio_pci_hot_reset_info {
> > __u32 argsz;
> > __u32 flags;
> > +#define VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID (1 << 0)
> > __u32 count;
> > struct vfio_pci_dependent_device devices[];
> > };
> Eric
^ permalink raw reply [flat|nested] 142+ messages in thread* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-05 14:04 ` Liu, Yi L
@ 2023-04-05 16:25 ` Alex Williamson
2023-04-05 16:37 ` Jason Gunthorpe
2023-04-05 17:58 ` Eric Auger
0 siblings, 2 replies; 142+ messages in thread
From: Alex Williamson @ 2023-04-05 16:25 UTC (permalink / raw)
To: Liu, Yi L
Cc: eric.auger@redhat.com, jgg@nvidia.com, Tian, Kevin,
joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Wed, 5 Apr 2023 14:04:51 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:
> Hi Eric,
>
> > From: Eric Auger <eric.auger@redhat.com>
> > Sent: Wednesday, April 5, 2023 8:20 PM
> >
> > Hi Yi,
> > On 4/1/23 16:44, Yi Liu wrote:
> > > for the users that accept device fds passed from management stacks to be
> > > able to figure out the host reset affected devices among the devices
> > > opened by the user. This is needed as such users do not have BDF (bus,
> > > devfn) knowledge about the devices it has opened, hence unable to use
> > > the information reported by existing VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
> > > to figure out the affected devices.
> > >
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > > drivers/vfio/pci/vfio_pci_core.c | 58 ++++++++++++++++++++++++++++----
> > > include/uapi/linux/vfio.h | 24 ++++++++++++-
> > > 2 files changed, 74 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > > index 19f5b075d70a..a5a7e148dce1 100644
> > > --- a/drivers/vfio/pci/vfio_pci_core.c
> > > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > > @@ -30,6 +30,7 @@
> > > #if IS_ENABLED(CONFIG_EEH)
> > > #include <asm/eeh.h>
> > > #endif
> > > +#include <uapi/linux/iommufd.h>
> > >
> > > #include "vfio_pci_priv.h"
> > >
> > > @@ -767,6 +768,20 @@ static int vfio_pci_get_irq_count(struct
> > vfio_pci_core_device *vdev, int irq_typ
> > > return 0;
> > > }
> > >
> > > +static struct vfio_device *
> > > +vfio_pci_find_device_in_devset(struct vfio_device_set *dev_set,
> > > + struct pci_dev *pdev)
> > > +{
> > > + struct vfio_device *cur;
> > > +
> > > + lockdep_assert_held(&dev_set->lock);
> > > +
> > > + list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
> > > + if (cur->dev == &pdev->dev)
> > > + return cur;
> > > + return NULL;
> > > +}
> > > +
> > > static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
> > > {
> > > (*(int *)data)++;
> > > @@ -776,13 +791,20 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, void
> > *data)
> > > struct vfio_pci_fill_info {
> > > int max;
> > > int cur;
> > > + bool require_devid;
> > > + struct iommufd_ctx *iommufd;
> > > + struct vfio_device_set *dev_set;
> > > struct vfio_pci_dependent_device *devices;
> > > };
> > >
> > > static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> > > {
> > > struct vfio_pci_fill_info *fill = data;
> > > + struct vfio_device_set *dev_set = fill->dev_set;
> > > struct iommu_group *iommu_group;
> > > + struct vfio_device *vdev;
> > > +
> > > + lockdep_assert_held(&dev_set->lock);
> > >
> > > if (fill->cur == fill->max)
> > > return -EAGAIN; /* Something changed, try again */
> > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void
> > *data)
> > > if (!iommu_group)
> > > return -EPERM; /* Cannot reset non-isolated devices */
> > >
> > > - fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > > + if (fill->require_devid) {
> > > + /*
> > > + * Report dev_id of the devices that are opened as cdev
> > > + * and have the same iommufd with the fill->iommufd.
> > > + * Otherwise, just fill IOMMUFD_INVALID_ID.
> > > + */
> > > + vdev = vfio_pci_find_device_in_devset(dev_set, pdev);
> > > + if (vdev && vfio_device_cdev_opened(vdev) &&
> > > + fill->iommufd == vfio_iommufd_physical_ictx(vdev))
> > > + vfio_iommufd_physical_devid(vdev, &fill->devices[fill-
> > >cur].dev_id);
> > > + else
> > > + fill->devices[fill->cur].dev_id = IOMMUFD_INVALID_ID;
> > > + } else {
> > > + fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > > + }
> > > fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
> > > fill->devices[fill->cur].bus = pdev->bus->number;
> > > fill->devices[fill->cur].devfn = pdev->devfn;
> > > @@ -1230,17 +1266,27 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
> > > return -ENOMEM;
> > >
> > > fill.devices = devices;
> > > + fill.dev_set = vdev->vdev.dev_set;
> > >
> > > + mutex_lock(&vdev->vdev.dev_set->lock);
> > > + if (vfio_device_cdev_opened(&vdev->vdev)) {
> > > + fill.require_devid = true;
> > > + fill.iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> > > + }
> > > ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
> > > &fill, slot);
> > > + mutex_unlock(&vdev->vdev.dev_set->lock);
> > >
> > > /*
> > > * If a device was removed between counting and filling, we may come up
> > > * short of fill.max. If a device was added, we'll have a return of
> > > * -EAGAIN above.
> > > */
> > > - if (!ret)
> > > + if (!ret) {
> > > hdr.count = fill.cur;
> > > + if (fill.require_devid)
> > > + hdr.flags = VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID;
> > > + }
> > >
> > > reset_info_exit:
> > > if (copy_to_user(arg, &hdr, minsz))
> > > @@ -2346,12 +2392,10 @@ static bool vfio_dev_in_files(struct
> > vfio_pci_core_device *vdev,
> > > static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data)
> > > {
> > > struct vfio_device_set *dev_set = data;
> > > - struct vfio_device *cur;
> > >
> > > - list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
> > > - if (cur->dev == &pdev->dev)
> > > - return 0;
> > > - return -EBUSY;
> > > + lockdep_assert_held(&dev_set->lock);
> > > +
> > > + return vfio_pci_find_device_in_devset(dev_set, pdev) ? 0 : -EBUSY;
> > > }
> > >
> > > /*
> > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > index 25432ef213ee..5a34364e3b94 100644
> > > --- a/include/uapi/linux/vfio.h
> > > +++ b/include/uapi/linux/vfio.h
> > > @@ -650,11 +650,32 @@ enum {
> > > * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
> > > * struct vfio_pci_hot_reset_info)
> > > *
> > > + * This command is used to query the affected devices in the hot reset for
> > > + * a given device. User could use the information reported by this command
> > > + * to figure out the affected devices among the devices it has opened.
> > > + * This command always reports the segment, bus and devfn information for
> > > + * each affected device, and selectively report the group_id or the dev_id
> > > + * per the way how the device being queried is opened.
> > > + * - If the device is opened via the traditional group/container manner,
> > > + * this command reports the group_id for each affected device.
> > > + *
> > > + * - If the device is opened as a cdev, this command needs to report
> > s/needs to report/reports
>
> got it.
>
> > > + * dev_id for each affected device and set the
> > > + * VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID flag. For the affected
> > > + * devices that are not opened as cdev or bound to different iommufds
> > > + * with the device that is queried, report an invalid dev_id to avoid
> > s/bound to different iommufds with the device that is queried/bound to
> > iommufds different from the reset device one?
>
> hmmm, I'm not a native speaker here. This _INFO is to query if want
> hot reset a given device, what devices would be affected. So it appears
> the queried device is better. But I'd admit "the queried device" is also
> "the reset device". may Alex help pick one. 😊
- If the calling device is opened directly via cdev rather than
accessed through the vfio group, the returned
vfio_pci_depdendent_device structure reports the dev_id
rather than the group_id, which is indicated by the
VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID flag in
vfio_pci_hot_reset_info. If the reset affects devices that
are not opened within the same iommufd context as the calling
device, IOMMUFD_INVALID_ID will be provided as the dev_id.
But that kind of brings to light the question of what does the user do
when they encounter this situation. If the device is not opened, the
reset can complete. If the device is opened by a different user, the
reset is blocked. The only logical conclusion is that the user should
try the reset regardless of the result of the info ioctl, which the
null-array approach further solidifies as the direction of the API.
I'm not liking this. Thanks,
Alex
> > > + * potential dev_id conflict as dev_id is local to iommufd. For such
> > > + * affected devices, user shall fall back to use the segment, bus and
> > > + * devfn info to map it to opened device.
> > > + *
> > > * Return: 0 on success, -errno on failure:
> > > * -enospc = insufficient buffer, -enodev = unsupported for device.
> > > */
> > > struct vfio_pci_dependent_device {
> > > - __u32 group_id;
> > > + union {
> > > + __u32 group_id;
> > > + __u32 dev_id;
> > > + };
> > > __u16 segment;
> > > __u8 bus;
> > > __u8 devfn; /* Use PCI_SLOT/PCI_FUNC */
> > > @@ -663,6 +684,7 @@ struct vfio_pci_dependent_device {
> > > struct vfio_pci_hot_reset_info {
> > > __u32 argsz;
> > > __u32 flags;
> > > +#define VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID (1 << 0)
> > > __u32 count;
> > > struct vfio_pci_dependent_device devices[];
> > > };
> > Eric
>
^ permalink raw reply [flat|nested] 142+ messages in thread* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-05 16:25 ` Alex Williamson
@ 2023-04-05 16:37 ` Jason Gunthorpe
2023-04-05 16:52 ` Alex Williamson
2023-04-05 17:58 ` Eric Auger
1 sibling, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2023-04-05 16:37 UTC (permalink / raw)
To: Alex Williamson
Cc: Liu, Yi L, eric.auger@redhat.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Wed, Apr 05, 2023 at 10:25:45AM -0600, Alex Williamson wrote:
> But that kind of brings to light the question of what does the user do
> when they encounter this situation.
What does it do now when it encounters a group_id it doesn't
understand? Userspace already doesn't know if the foreign group is
open or not, right?
> reset can complete. If the device is opened by a different user, the
> reset is blocked. The only logical conclusion is that the user should
> try the reset regardless of the result of the info ioctl, which the
IMHO my suggested version is still the overall saner uAPI.
An info that basically returns success/fail if reset is security
authorized and information about the reset groupings.
Actual reset follows the returned groupings automatically.
Easy for qemu. Call the info at startup to confirm reset can be
emulated, use the returned information to propogate the reset groups
to the guest. Trigger the reset with no fuss when the guest asks for
it.
Less weird corner cases.
Jason
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-05 16:37 ` Jason Gunthorpe
@ 2023-04-05 16:52 ` Alex Williamson
2023-04-05 17:23 ` Jason Gunthorpe
0 siblings, 1 reply; 142+ messages in thread
From: Alex Williamson @ 2023-04-05 16:52 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Liu, Yi L, eric.auger@redhat.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Wed, 5 Apr 2023 13:37:05 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:
> On Wed, Apr 05, 2023 at 10:25:45AM -0600, Alex Williamson wrote:
>
> > But that kind of brings to light the question of what does the user do
> > when they encounter this situation.
>
> What does it do now when it encounters a group_id it doesn't
> understand? Userspace already doesn't know if the foreign group is
> open or not, right?
It's simple, there is currently no screwiness around opened devices.
If the caller doesn't own all the groups mapping to the affected
devices, hot-reset is not available.
> > reset can complete. If the device is opened by a different user, the
> > reset is blocked. The only logical conclusion is that the user should
> > try the reset regardless of the result of the info ioctl, which the
>
> IMHO my suggested version is still the overall saner uAPI.
>
> An info that basically returns success/fail if reset is security
> authorized and information about the reset groupings.
>
> Actual reset follows the returned groupings automatically.
>
> Easy for qemu. Call the info at startup to confirm reset can be
> emulated, use the returned information to propogate the reset groups
> to the guest. Trigger the reset with no fuss when the guest asks for
> it.
>
> Less weird corner cases.
This leads to scenarios where the info ioctl indicates a hot-reset is
initially available, perhaps only because one of the affected devices
was not opened at the time, and now it fails when QEMU actually tries
to use it. In the group model, QEMU can know the set of affected
devices and the required groups, confirm it owns those, and for all
practical purposes guarantee that a hot-reset is available (yes, there
might be some exceptionally rare topology changes).
This goofiness around unopened devices and null-arrays is killing this
API. Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-05 16:52 ` Alex Williamson
@ 2023-04-05 17:23 ` Jason Gunthorpe
2023-04-05 18:56 ` Alex Williamson
0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2023-04-05 17:23 UTC (permalink / raw)
To: Alex Williamson
Cc: Liu, Yi L, eric.auger@redhat.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Wed, Apr 05, 2023 at 10:52:15AM -0600, Alex Williamson wrote:
> On Wed, 5 Apr 2023 13:37:05 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> > On Wed, Apr 05, 2023 at 10:25:45AM -0600, Alex Williamson wrote:
> >
> > > But that kind of brings to light the question of what does the user do
> > > when they encounter this situation.
> >
> > What does it do now when it encounters a group_id it doesn't
> > understand? Userspace already doesn't know if the foreign group is
> > open or not, right?
>
> It's simple, there is currently no screwiness around opened devices.
> If the caller doesn't own all the groups mapping to the affected
> devices, hot-reset is not available.
That still has nasty edge cases. If the reset group spans beyond a
single iommu group you end up with qemu being unable to operate reset
at all, and it is unfixable from an API perspective as we can't pass
in groups that VFIO isn't going to use.
I think you are right, the fact we'd have to return -1 dev_ids to this
modified API is pretty damaging, it doesn't seem like a good
direction.
> This leads to scenarios where the info ioctl indicates a hot-reset is
> initially available, perhaps only because one of the affected devices
> was not opened at the time, and now it fails when QEMU actually tries
> to use it.
I would like it if the APIs toward the kernel were only about the
kernel's security apparatus. It is makes it easier to reason about the
kernel side and gives nice simple well defined APIs.
This is a good point that qemu needs to make a policy decision if it
is happy about the VFIO configuration - but that is a policy decision
that should not become entangled with the kernel's security checks.
Today qemu can make this policy choice the same way it does right now
- call _INFO and check the group_ids. It gets the exact same outcome
as today. We already discussed that we need to expose the group ID
through an ioctl someplace.
If this is too awkward we could add a query to the kernel if the cdev
is "reset exclusive" - eg the iommufd covers all the groups that span
the reset set.
Jason
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-05 17:23 ` Jason Gunthorpe
@ 2023-04-05 18:56 ` Alex Williamson
2023-04-05 19:18 ` Alex Williamson
2023-04-05 19:21 ` Jason Gunthorpe
0 siblings, 2 replies; 142+ messages in thread
From: Alex Williamson @ 2023-04-05 18:56 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Liu, Yi L, eric.auger@redhat.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Wed, 5 Apr 2023 14:23:43 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:
> On Wed, Apr 05, 2023 at 10:52:15AM -0600, Alex Williamson wrote:
> > On Wed, 5 Apr 2023 13:37:05 -0300
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > > On Wed, Apr 05, 2023 at 10:25:45AM -0600, Alex Williamson wrote:
> > >
> > > > But that kind of brings to light the question of what does the user do
> > > > when they encounter this situation.
> > >
> > > What does it do now when it encounters a group_id it doesn't
> > > understand? Userspace already doesn't know if the foreign group is
> > > open or not, right?
> >
> > It's simple, there is currently no screwiness around opened devices.
> > If the caller doesn't own all the groups mapping to the affected
> > devices, hot-reset is not available.
>
> That still has nasty edge cases. If the reset group spans beyond a
> single iommu group you end up with qemu being unable to operate reset
> at all, and it is unfixable from an API perspective as we can't pass
> in groups that VFIO isn't going to use.
Hmm, s/nasty/niche/? Yes, QEMU currently has no way to own a group
without assigning a device from the group, but technically that could
be fixed within QEMU. If QEMU doesn't own that affected group, then it
can't very well count on that group to not be used in some other way
when it comes time to actually do a hot-reset.
> I think you are right, the fact we'd have to return -1 dev_ids to this
> modified API is pretty damaging, it doesn't seem like a good
> direction.
>
> > This leads to scenarios where the info ioctl indicates a hot-reset is
> > initially available, perhaps only because one of the affected devices
> > was not opened at the time, and now it fails when QEMU actually tries
> > to use it.
>
> I would like it if the APIs toward the kernel were only about the
> kernel's security apparatus. It is makes it easier to reason about the
> kernel side and gives nice simple well defined APIs.
Usability needs to be a consideration as well. An interface where the
result is effectively arbitrary from a user perspective because the
kernel is solely focused on whether the operation is allowed,
evaluating constraints that the user is unaware of and cannot control,
is unusable.
> This is a good point that qemu needs to make a policy decision if it
> is happy about the VFIO configuration - but that is a policy decision
> that should not become entangled with the kernel's security checks.
>
> Today qemu can make this policy choice the same way it does right now
> - call _INFO and check the group_ids. It gets the exact same outcome
> as today. We already discussed that we need to expose the group ID
> through an ioctl someplace.
QEMU can make a policy decision today because the kernel provides a
sufficiently reliable interface, ie. based on the set of owned groups, a
hot-reset is all but guaranteed to work. If we focus only on whether a
given reset is allowed from a kernel perspective and ignore that
userspace needs some predictability of the kernel behavior, then QEMU
cannot reasonable make that policy decision.
> If this is too awkward we could add a query to the kernel if the cdev
> is "reset exclusive" - eg the iommufd covers all the groups that span
> the reset set.
That's essentially what we have if there are valid dev-ids for each
affected device in the info ioctl. I don't think it helps the user
experience to create loopholes where the hot-reset ioctl can still work
in spite of those missing devices. The group interface uses the fact
that ownership of the group implies ownership of all devices within the
group such that the user only needs to prove group ownership.
But we still have underlying groups even with the cdev model, with the
same ownership principles, so don't we just need to prove group
ownership based on a device fd rather than a group fd?
For example, we have a VFIO_DEVICE_GET_INFO ioctl that supports
capability chains, we could add a capability that reports the group ID
for the device. The hot-reset info ioctl remains as it is today,
reporting group-ids and bdfs. The hot-reset ioctl itself is modified to
transparently support either group fds or device fds. The user can now
map cdevs to group-ids and therefore follow the same rules as groups,
providing at least one representative device fd for each group. We've
essentially already enabled this by allowing the limit of user provided
fds equal to the number of affected devices.
Does that work? Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-05 18:56 ` Alex Williamson
@ 2023-04-05 19:18 ` Alex Williamson
2023-04-05 19:21 ` Jason Gunthorpe
1 sibling, 0 replies; 142+ messages in thread
From: Alex Williamson @ 2023-04-05 19:18 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Liu, Yi L, eric.auger@redhat.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Wed, 5 Apr 2023 12:56:21 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:
> On Wed, 5 Apr 2023 14:23:43 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> > On Wed, Apr 05, 2023 at 10:52:15AM -0600, Alex Williamson wrote:
> > > On Wed, 5 Apr 2023 13:37:05 -0300
> > > Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > > On Wed, Apr 05, 2023 at 10:25:45AM -0600, Alex Williamson wrote:
> > > >
> > > > > But that kind of brings to light the question of what does the user do
> > > > > when they encounter this situation.
> > > >
> > > > What does it do now when it encounters a group_id it doesn't
> > > > understand? Userspace already doesn't know if the foreign group is
> > > > open or not, right?
> > >
> > > It's simple, there is currently no screwiness around opened devices.
> > > If the caller doesn't own all the groups mapping to the affected
> > > devices, hot-reset is not available.
> >
> > That still has nasty edge cases. If the reset group spans beyond a
> > single iommu group you end up with qemu being unable to operate reset
> > at all, and it is unfixable from an API perspective as we can't pass
> > in groups that VFIO isn't going to use.
>
> Hmm, s/nasty/niche/? Yes, QEMU currently has no way to own a group
> without assigning a device from the group, but technically that could
> be fixed within QEMU. If QEMU doesn't own that affected group, then it
> can't very well count on that group to not be used in some other way
> when it comes time to actually do a hot-reset.
>
> > I think you are right, the fact we'd have to return -1 dev_ids to this
> > modified API is pretty damaging, it doesn't seem like a good
> > direction.
> >
> > > This leads to scenarios where the info ioctl indicates a hot-reset is
> > > initially available, perhaps only because one of the affected devices
> > > was not opened at the time, and now it fails when QEMU actually tries
> > > to use it.
> >
> > I would like it if the APIs toward the kernel were only about the
> > kernel's security apparatus. It is makes it easier to reason about the
> > kernel side and gives nice simple well defined APIs.
>
> Usability needs to be a consideration as well. An interface where the
> result is effectively arbitrary from a user perspective because the
> kernel is solely focused on whether the operation is allowed,
> evaluating constraints that the user is unaware of and cannot control,
> is unusable.
>
> > This is a good point that qemu needs to make a policy decision if it
> > is happy about the VFIO configuration - but that is a policy decision
> > that should not become entangled with the kernel's security checks.
> >
> > Today qemu can make this policy choice the same way it does right now
> > - call _INFO and check the group_ids. It gets the exact same outcome
> > as today. We already discussed that we need to expose the group ID
> > through an ioctl someplace.
>
> QEMU can make a policy decision today because the kernel provides a
> sufficiently reliable interface, ie. based on the set of owned groups, a
> hot-reset is all but guaranteed to work. If we focus only on whether a
> given reset is allowed from a kernel perspective and ignore that
> userspace needs some predictability of the kernel behavior, then QEMU
> cannot reasonable make that policy decision.
>
> > If this is too awkward we could add a query to the kernel if the cdev
> > is "reset exclusive" - eg the iommufd covers all the groups that span
> > the reset set.
>
> That's essentially what we have if there are valid dev-ids for each
> affected device in the info ioctl. I don't think it helps the user
> experience to create loopholes where the hot-reset ioctl can still work
> in spite of those missing devices. The group interface uses the fact
> that ownership of the group implies ownership of all devices within the
> group such that the user only needs to prove group ownership.
>
> But we still have underlying groups even with the cdev model, with the
> same ownership principles, so don't we just need to prove group
> ownership based on a device fd rather than a group fd?
>
> For example, we have a VFIO_DEVICE_GET_INFO ioctl that supports
> capability chains, we could add a capability that reports the group ID
> for the device. The hot-reset info ioctl remains as it is today,
> reporting group-ids and bdfs. The hot-reset ioctl itself is modified to
> transparently support either group fds or device fds. The user can now
> map cdevs to group-ids and therefore follow the same rules as groups,
> providing at least one representative device fd for each group. We've
> essentially already enabled this by allowing the limit of user provided
> fds equal to the number of affected devices.
If I'm not mistaken, I think this resolves cdev no-iommu to work
equivalently to groups as well. Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-05 18:56 ` Alex Williamson
2023-04-05 19:18 ` Alex Williamson
@ 2023-04-05 19:21 ` Jason Gunthorpe
2023-04-05 19:49 ` Alex Williamson
1 sibling, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2023-04-05 19:21 UTC (permalink / raw)
To: Alex Williamson
Cc: Liu, Yi L, eric.auger@redhat.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Wed, Apr 05, 2023 at 12:56:21PM -0600, Alex Williamson wrote:
> Usability needs to be a consideration as well. An interface where the
> result is effectively arbitrary from a user perspective because the
> kernel is solely focused on whether the operation is allowed,
> evaluating constraints that the user is unaware of and cannot control,
> is unusable.
Considering this API is only invoked by qemu we might be overdoing
this usability and 'no shoot in foot' view.
> > This is a good point that qemu needs to make a policy decision if it
> > is happy about the VFIO configuration - but that is a policy decision
> > that should not become entangled with the kernel's security checks.
> >
> > Today qemu can make this policy choice the same way it does right now
> > - call _INFO and check the group_ids. It gets the exact same outcome
> > as today. We already discussed that we need to expose the group ID
> > through an ioctl someplace.
>
> QEMU can make a policy decision today because the kernel provides a
> sufficiently reliable interface, ie. based on the set of owned groups, a
> hot-reset is all but guaranteed to work.
And we don't change that with cdev. If qemu wants to make the policy
decision it keeps using the exact same _INFO interface to make that
decision same it has always made.
We weaken the actual reset action to only consider the security side.
Applications that want this exclusive reset group policy simply must
check it on their own. It is a reasonable API design.
> > If this is too awkward we could add a query to the kernel if the cdev
> > is "reset exclusive" - eg the iommufd covers all the groups that span
> > the reset set.
>
> That's essentially what we have if there are valid dev-ids for each
> affected device in the info ioctl.
If you have dev-ids for everything, yes. If you don't, then you can't
make the same policy choice using a dev-id interface.
> I don't think it helps the user experience to create loopholes where
> the hot-reset ioctl can still work in spite of those missing
> devices.
I disagree. The easy straightforward design is that the reset ioctl
works if the process has security permissions. Mixing a policy check
into the kernel on this path is creating complexity we don't really
need.
I don't view it as a loophole, it is flexability to use the API in a
way that is different from what qemu wants - eg an app like dpdk may
be willing to tolerate a reset group that becomes unavailable after
startup. Who knows, why should we force this in the kernel?
> For example, we have a VFIO_DEVICE_GET_INFO ioctl that supports
> capability chains, we could add a capability that reports the group ID
> for the device.
I was going to put that in an iommufd ioctl so it works with VDPA too,
but sure, lets assume we can get the group ID from a cdev fd.
> The hot-reset info ioctl remains as it is today, reporting group-ids
> and bdfs.
Sure, but userspace still needs to know how to map the reset sets into
dev-ids. Remember the reason we started doing this is because we don't
have easy access to the BDF anymore.
I like leaving this ioctl alone, lets go back to a dedicated ioctl to
return the dev_ids.
> The hot-reset ioctl itself is modified to transparently
> support either group fds or device fds. The user can now map cdevs
> to group-ids and therefore follow the same rules as groups,
> providing at least one representative device fd for each group.
This looks like a very complex uapi compared to the empty list option,
but it seems like it would work.
Jason
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-05 19:21 ` Jason Gunthorpe
@ 2023-04-05 19:49 ` Alex Williamson
2023-04-05 23:22 ` Jason Gunthorpe
2023-04-06 6:34 ` Liu, Yi L
0 siblings, 2 replies; 142+ messages in thread
From: Alex Williamson @ 2023-04-05 19:49 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Liu, Yi L, eric.auger@redhat.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Wed, 5 Apr 2023 16:21:09 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:
> On Wed, Apr 05, 2023 at 12:56:21PM -0600, Alex Williamson wrote:
> > Usability needs to be a consideration as well. An interface where the
> > result is effectively arbitrary from a user perspective because the
> > kernel is solely focused on whether the operation is allowed,
> > evaluating constraints that the user is unaware of and cannot control,
> > is unusable.
>
> Considering this API is only invoked by qemu we might be overdoing
> this usability and 'no shoot in foot' view.
Ok, I'm not sure why we're diminishing the de facto vfio userspace...
> > > This is a good point that qemu needs to make a policy decision if it
> > > is happy about the VFIO configuration - but that is a policy decision
> > > that should not become entangled with the kernel's security checks.
> > >
> > > Today qemu can make this policy choice the same way it does right now
> > > - call _INFO and check the group_ids. It gets the exact same outcome
> > > as today. We already discussed that we need to expose the group ID
> > > through an ioctl someplace.
> >
> > QEMU can make a policy decision today because the kernel provides a
> > sufficiently reliable interface, ie. based on the set of owned groups, a
> > hot-reset is all but guaranteed to work.
>
> And we don't change that with cdev. If qemu wants to make the policy
> decision it keeps using the exact same _INFO interface to make that
> decision same it has always made.
>
> We weaken the actual reset action to only consider the security side.
>
> Applications that want this exclusive reset group policy simply must
> check it on their own. It is a reasonable API design.
I disagree, as I've argued before, the info ioctl becomes so weak and
effectively arbitrary from a user perspective at being able to predict
whether the hot-reset ioctl works that it becomes useless, diminishing
the entire hot-reset info/execute API.
> > > If this is too awkward we could add a query to the kernel if the cdev
> > > is "reset exclusive" - eg the iommufd covers all the groups that span
> > > the reset set.
> >
> > That's essentially what we have if there are valid dev-ids for each
> > affected device in the info ioctl.
>
> If you have dev-ids for everything, yes. If you don't, then you can't
> make the same policy choice using a dev-id interface.
Exactly, you can't make any policy choice because the success or
failure of the hot-reset ioctl can't be known.
> > I don't think it helps the user experience to create loopholes where
> > the hot-reset ioctl can still work in spite of those missing
> > devices.
>
> I disagree. The easy straightforward design is that the reset ioctl
> works if the process has security permissions. Mixing a policy check
> into the kernel on this path is creating complexity we don't really
> need.
>
> I don't view it as a loophole, it is flexability to use the API in a
> way that is different from what qemu wants - eg an app like dpdk may
> be willing to tolerate a reset group that becomes unavailable after
> startup. Who knows, why should we force this in the kernel?
Because look at all the problems it's causing to try to introduce these
loopholes without also introducing subtle bugs. There's an argument
that we're overly strict, which is better than the alternative, which
seems to be what we're dabbling with. It is a straightforward
interface for the hot-reset ioctl to mirror the information provided
via the hot-reset info ioctl.
> > For example, we have a VFIO_DEVICE_GET_INFO ioctl that supports
> > capability chains, we could add a capability that reports the group ID
> > for the device.
>
> I was going to put that in an iommufd ioctl so it works with VDPA too,
> but sure, lets assume we can get the group ID from a cdev fd.
>
> > The hot-reset info ioctl remains as it is today, reporting group-ids
> > and bdfs.
>
> Sure, but userspace still needs to know how to map the reset sets into
> dev-ids.
No, it doesn't.
> Remember the reason we started doing this is because we don't
> have easy access to the BDF anymore.
We don't need it, the info ioctl provides the groups, the group
association can be learned from the DEVICE_GET_INFO ioctl, the
hot-reset ioctl only requires a single representative fd per affected
group. dev-ids not required.
> I like leaving this ioctl alone, lets go back to a dedicated ioctl to
> return the dev_ids.
I don't see any justification for this. We could add another PCI
specific DEVICE_GET_INFO capability to report the bdf if we really need
it, but reporting the group seems sufficient for this use case.
> > The hot-reset ioctl itself is modified to transparently
> > support either group fds or device fds. The user can now map cdevs
> > to group-ids and therefore follow the same rules as groups,
> > providing at least one representative device fd for each group.
>
> This looks like a very complex uapi compared to the empty list option,
> but it seems like it would work.
It's the same API that we have now. What's complex is trying to figure
out all the subtle side-effects from the loopholes that are being
proposed in this series. Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-05 19:49 ` Alex Williamson
@ 2023-04-05 23:22 ` Jason Gunthorpe
2023-04-06 10:02 ` Liu, Yi L
2023-04-06 6:34 ` Liu, Yi L
1 sibling, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2023-04-05 23:22 UTC (permalink / raw)
To: Alex Williamson
Cc: Liu, Yi L, eric.auger@redhat.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Wed, Apr 05, 2023 at 01:49:45PM -0600, Alex Williamson wrote:
> > > QEMU can make a policy decision today because the kernel provides a
> > > sufficiently reliable interface, ie. based on the set of owned groups, a
> > > hot-reset is all but guaranteed to work.
> >
> > And we don't change that with cdev. If qemu wants to make the policy
> > decision it keeps using the exact same _INFO interface to make that
> > decision same it has always made.
> >
> > We weaken the actual reset action to only consider the security side.
> >
> > Applications that want this exclusive reset group policy simply must
> > check it on their own. It is a reasonable API design.
>
> I disagree, as I've argued before, the info ioctl becomes so weak and
> effectively arbitrary from a user perspective at being able to predict
> whether the hot-reset ioctl works that it becomes useless, diminishing
> the entire hot-reset info/execute API.
reset should be strictly more permissive than INFO. If INFO predicts
reset is permitted then reset should succeed.
We don't change INFO so it cannot "becomes so weak" ??
We don't care about the cases where INFO says it will not succeed but
reset does (temporarily) succeed.
I don't get what argument you are trying to make or what you think is
diminished..
Again, userspace calls INFO, if info says yes then reset *always
works*, exactly just like today.
Userspace will call reset with a 0 length FD list and it uses a
security only check that is strictly more permissive than what
get_info will return. So the new check is simple in the kernel and
always works in the cases we need it to work.
What is getting things into trouble is insisting that RESET have
additional restrictions beyond the minimum checks required for
security.
> > I don't view it as a loophole, it is flexability to use the API in a
> > way that is different from what qemu wants - eg an app like dpdk may
> > be willing to tolerate a reset group that becomes unavailable after
> > startup. Who knows, why should we force this in the kernel?
>
> Because look at all the problems it's causing to try to introduce these
> loopholes without also introducing subtle bugs.
These problems are coming from tring to do this integrated version,
not from my approach!
AFAICT there was nothing wrong with my original plan of using the
empty fd list for reset. What Yi has here is some mashup of what you
and I both suggested.
> > Remember the reason we started doing this is because we don't
> > have easy access to the BDF anymore.
>
> We don't need it, the info ioctl provides the groups, the group
> association can be learned from the DEVICE_GET_INFO ioctl, the
> hot-reset ioctl only requires a single representative fd per affected
> group. dev-ids not required.
I'm not talking about triggering the ioctl.
I'm talking about whatever else qemu needs to do so that the VM is
aware of the reset groups device-by-device on it's side so nested VFIO
in the VM reflects the same data as the hypervisor. Maybe it doesn't
do this right now, but the kernel API should continue to provide the
data.
> > I like leaving this ioctl alone, lets go back to a dedicated ioctl to
> > return the dev_ids.
>
> I don't see any justification for this. We could add another PCI
> specific DEVICE_GET_INFO capability to report the bdf if we really need
> it, but reporting the group seems sufficient for this use case.
What I imagine is a single new ioctl 'get reset group 2' or something.
It returns a list of dev_ids in the reset group. It has an output flag
if the reset is reliable. This is the only ioctl user space needs to
call.
The reliable test is done by simply calling the ioctl and throwing
away the dev ids. The mapping of the VM's reset groups is done by
processing the dev_ids to vRIDs and flowing that into the VM somehow.
We don't expose group_ids, and we don't expose BDF. It is much simpler
and cleaner to use.
A BDF DEVICE_GET_INFO and the existing reset INFO will encode the same
data too, it is just not as elegant and requires userspace to do a lot
more work to keep track of the 3 different identifiers.
> > This looks like a very complex uapi compared to the empty list option,
> > but it seems like it would work.
>
> It's the same API that we have now. What's complex is trying to figure
> out all the subtle side-effects from the loopholes that are being
> proposed in this series. Thanks,
I might agree with you if we weren't now going backwards -
ideas didn't work out and Yi has to throw stuff away. :(
Jason
^ permalink raw reply [flat|nested] 142+ messages in thread
* RE: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-05 23:22 ` Jason Gunthorpe
@ 2023-04-06 10:02 ` Liu, Yi L
2023-04-06 17:53 ` Alex Williamson
0 siblings, 1 reply; 142+ messages in thread
From: Liu, Yi L @ 2023-04-06 10:02 UTC (permalink / raw)
To: Jason Gunthorpe, Alex Williamson
Cc: eric.auger@redhat.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting,
Duan, Zhenzhong
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, April 6, 2023 7:23 AM
>
> On Wed, Apr 05, 2023 at 01:49:45PM -0600, Alex Williamson wrote:
>
> > > > QEMU can make a policy decision today because the kernel provides a
> > > > sufficiently reliable interface, ie. based on the set of owned groups, a
> > > > hot-reset is all but guaranteed to work.
> > >
> > > And we don't change that with cdev. If qemu wants to make the policy
> > > decision it keeps using the exact same _INFO interface to make that
> > > decision same it has always made.
> > >
> > > We weaken the actual reset action to only consider the security side.
> > >
> > > Applications that want this exclusive reset group policy simply must
> > > check it on their own. It is a reasonable API design.
> >
> > I disagree, as I've argued before, the info ioctl becomes so weak and
> > effectively arbitrary from a user perspective at being able to predict
> > whether the hot-reset ioctl works that it becomes useless, diminishing
> > the entire hot-reset info/execute API.
>
> reset should be strictly more permissive than INFO. If INFO predicts
> reset is permitted then reset should succeed.
>
> We don't change INFO so it cannot "becomes so weak" ??
>
> We don't care about the cases where INFO says it will not succeed but
> reset does (temporarily) succeed.
>
> I don't get what argument you are trying to make or what you think is
> diminished..
>
> Again, userspace calls INFO, if info says yes then reset *always
> works*, exactly just like today.
>
> Userspace will call reset with a 0 length FD list and it uses a
> security only check that is strictly more permissive than what
> get_info will return. So the new check is simple in the kernel and
> always works in the cases we need it to work.
>
> What is getting things into trouble is insisting that RESET have
> additional restrictions beyond the minimum checks required for
> security.
>
> > > I don't view it as a loophole, it is flexability to use the API in a
> > > way that is different from what qemu wants - eg an app like dpdk may
> > > be willing to tolerate a reset group that becomes unavailable after
> > > startup. Who knows, why should we force this in the kernel?
> >
> > Because look at all the problems it's causing to try to introduce these
> > loopholes without also introducing subtle bugs.
>
> These problems are coming from tring to do this integrated version,
> not from my approach!
>
> AFAICT there was nothing wrong with my original plan of using the
> empty fd list for reset. What Yi has here is some mashup of what you
> and I both suggested.
Hi Alex, Jason,
could be this reason. So let me try to gather the changes of this series
does and the impact as far as I know.
1) only check the ownership of opened devices in the dev_set
in HOT_RESET ioctl.
- Impact: it changes the relationship between _INFO and HOT_RESET.
As " Each group must have IOMMU protection established for the
ioctl to succeed." in [1], existing design actually means userspace
should own all the affected groups before heading to do HOT_RESET.
With the change here, the user does not need to ensure all affected
groups are opened and it can do hot-reset successfully as long as the
devices in the affected group are just un-opened and can be reset.
[1] https://patchwork.kernel.org/project/linux-pci/patch/20130814200845.21923.64284.stgit@bling.home/
2) Allow passing zero-length fd array to do hot reset
- Impact: this uses the iommufd as ownership check in the kernel side.
It is only supposed to be used by the users that open cdev instead of
users that open group. The drawback is that it cannot cover the noiommu
devices as noiommu does not use iommufd at all. But it works well for
most cases.
3) Allow hot reset be successful when the dev_set is singleton
- Impact: this makes sense but it seems to mess up the boundary between
the group path and cdev path w.r.t. the usage of zero-length fd approach.
The group path can succeed to do hot reset even if it is passing an empty
fd array if the dev_set happens to be singleton.
4) Allow passing device fd to do hot reset
- Impact: this is a new way for hot reset. should have no impact.
5) Extend the _INFO to report devid
- Impact: this changes the way user to decode the info reported back.
devid and groupid are returned per the way the queried device is opened.
Since it was suggested to support the scenario in which some devices
are opened via cdev while some devices are opened via group. This makes
us to return invalid_devid for the device that is opened via group if
it is affected by the hot reset of a device that is opened via cdev.
This was proposed to support the future device fd passing usage which is
only available in cdev path.
To me the major confusion is from 1) and 3). 1) changes the meaning of
_INFO and HOT_RESET, while 3) messes up the boundary.
Here is my thought:
For 1), it was proposed due to below reason[2]. We'd like to make a scenario
that works in the group path be workable in cdev path as well. But IMHO, we
may just accept that cdev path cannot work for such scenario to avoid sublte
change to uapi. Otherwise, we need to have another HOT_RESET ioctl or a
hint in HOT_RESET ioctl to tell the kernel whether relaxed ownership check
is expected. Maybe this is awkward. But if we want to keep it, we'd do it
with the awareness by user.
[2] https://lore.kernel.org/kvm/Y%2FdobS6gdSkxnPH7@nvidia.com/
For 3), it was proposed when discussing the hot reset for noiommu[3]. But
it does not make hot reset always workable for noiommu in cdev, just in
case dev_set is singleton. So it is more of a general optimization that can
make the kernel skip the ownership check. But to make use of it, we may
need to test it before sanitizing the group fds from user or the iommufd
check. Maybe the dev_set singleton test in this series is not well placed.
If so, I can further modify it.
[3] https://lore.kernel.org/kvm/ZACX+Np%2FIY7ygqL5@nvidia.com/
Regards,
Yi Liu
>
> > > Remember the reason we started doing this is because we don't
> > > have easy access to the BDF anymore.
> >
> > We don't need it, the info ioctl provides the groups, the group
> > association can be learned from the DEVICE_GET_INFO ioctl, the
> > hot-reset ioctl only requires a single representative fd per affected
> > group. dev-ids not required.
>
> I'm not talking about triggering the ioctl.
>
> I'm talking about whatever else qemu needs to do so that the VM is
> aware of the reset groups device-by-device on it's side so nested VFIO
> in the VM reflects the same data as the hypervisor. Maybe it doesn't
> do this right now, but the kernel API should continue to provide the
> data.
>
> > > I like leaving this ioctl alone, lets go back to a dedicated ioctl to
> > > return the dev_ids.
> >
> > I don't see any justification for this. We could add another PCI
> > specific DEVICE_GET_INFO capability to report the bdf if we really need
> > it, but reporting the group seems sufficient for this use case.
>
> What I imagine is a single new ioctl 'get reset group 2' or something.
> It returns a list of dev_ids in the reset group. It has an output flag
> if the reset is reliable. This is the only ioctl user space needs to
> call.
>
> The reliable test is done by simply calling the ioctl and throwing
> away the dev ids. The mapping of the VM's reset groups is done by
> processing the dev_ids to vRIDs and flowing that into the VM somehow.
>
> We don't expose group_ids, and we don't expose BDF. It is much simpler
> and cleaner to use.
>
> A BDF DEVICE_GET_INFO and the existing reset INFO will encode the same
> data too, it is just not as elegant and requires userspace to do a lot
> more work to keep track of the 3 different identifiers.
>
> > > This looks like a very complex uapi compared to the empty list option,
> > > but it seems like it would work.
> >
> > It's the same API that we have now. What's complex is trying to figure
> > out all the subtle side-effects from the loopholes that are being
> > proposed in this series. Thanks,
>
> I might agree with you if we weren't now going backwards -
> ideas didn't work out and Yi has to throw stuff away. :(
>
> Jason
^ permalink raw reply [flat|nested] 142+ messages in thread* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-06 10:02 ` Liu, Yi L
@ 2023-04-06 17:53 ` Alex Williamson
2023-04-07 10:09 ` Liu, Yi L
2023-04-11 13:24 ` Jason Gunthorpe
0 siblings, 2 replies; 142+ messages in thread
From: Alex Williamson @ 2023-04-06 17:53 UTC (permalink / raw)
To: Liu, Yi L
Cc: Jason Gunthorpe, eric.auger@redhat.com, Tian, Kevin,
joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting,
Duan, Zhenzhong
On Thu, 6 Apr 2023 10:02:10 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Thursday, April 6, 2023 7:23 AM
> >
> > On Wed, Apr 05, 2023 at 01:49:45PM -0600, Alex Williamson wrote:
> >
> > > > > QEMU can make a policy decision today because the kernel provides a
> > > > > sufficiently reliable interface, ie. based on the set of owned groups, a
> > > > > hot-reset is all but guaranteed to work.
> > > >
> > > > And we don't change that with cdev. If qemu wants to make the policy
> > > > decision it keeps using the exact same _INFO interface to make that
> > > > decision same it has always made.
> > > >
> > > > We weaken the actual reset action to only consider the security side.
> > > >
> > > > Applications that want this exclusive reset group policy simply must
> > > > check it on their own. It is a reasonable API design.
> > >
> > > I disagree, as I've argued before, the info ioctl becomes so weak and
> > > effectively arbitrary from a user perspective at being able to predict
> > > whether the hot-reset ioctl works that it becomes useless, diminishing
> > > the entire hot-reset info/execute API.
> >
> > reset should be strictly more permissive than INFO. If INFO predicts
> > reset is permitted then reset should succeed.
> >
> > We don't change INFO so it cannot "becomes so weak" ??
> >
> > We don't care about the cases where INFO says it will not succeed but
> > reset does (temporarily) succeed.
> >
> > I don't get what argument you are trying to make or what you think is
> > diminished..
> >
> > Again, userspace calls INFO, if info says yes then reset *always
> > works*, exactly just like today.
> >
> > Userspace will call reset with a 0 length FD list and it uses a
> > security only check that is strictly more permissive than what
> > get_info will return. So the new check is simple in the kernel and
> > always works in the cases we need it to work.
> >
> > What is getting things into trouble is insisting that RESET have
> > additional restrictions beyond the minimum checks required for
> > security.
> >
> > > > I don't view it as a loophole, it is flexability to use the API in a
> > > > way that is different from what qemu wants - eg an app like dpdk may
> > > > be willing to tolerate a reset group that becomes unavailable after
> > > > startup. Who knows, why should we force this in the kernel?
> > >
> > > Because look at all the problems it's causing to try to introduce these
> > > loopholes without also introducing subtle bugs.
> >
> > These problems are coming from tring to do this integrated version,
> > not from my approach!
> >
> > AFAICT there was nothing wrong with my original plan of using the
> > empty fd list for reset. What Yi has here is some mashup of what you
> > and I both suggested.
>
> Hi Alex, Jason,
>
> could be this reason. So let me try to gather the changes of this series
> does and the impact as far as I know.
>
> 1) only check the ownership of opened devices in the dev_set
> in HOT_RESET ioctl.
> - Impact: it changes the relationship between _INFO and HOT_RESET.
> As " Each group must have IOMMU protection established for the
> ioctl to succeed." in [1], existing design actually means userspace
> should own all the affected groups before heading to do HOT_RESET.
> With the change here, the user does not need to ensure all affected
> groups are opened and it can do hot-reset successfully as long as the
> devices in the affected group are just un-opened and can be reset.
>
> [1] https://patchwork.kernel.org/project/linux-pci/patch/20130814200845.21923.64284.stgit@bling.home/
Where whether a device is opened is subject to change outside of the
user's control. This essentially allows the user to perform hot-resets
of devices outside of their ownership so long as the device is not
used elsewhere, versus the current requirement that the user own all the
affected groups, which implies device ownership. It's not been
justified why this feature needs to exist, imo.
> 2) Allow passing zero-length fd array to do hot reset
> - Impact: this uses the iommufd as ownership check in the kernel side.
> It is only supposed to be used by the users that open cdev instead of
> users that open group. The drawback is that it cannot cover the noiommu
> devices as noiommu does not use iommufd at all. But it works well for
> most cases.
The "only supposed to be used" is problematic here, we're extending all
the interfaces to transparently accept group and device fds, but here
we need to make a distinction because the ioctl needs to perform one
way for groups and another way for devices, which it currently doesn't
do. As above, I've not seen sufficient justification for this other
than references to reducing complexity, but the only userspace expected
to make use of this interface already has equivalent complexity.
> 3) Allow hot reset be successful when the dev_set is singleton
> - Impact: this makes sense but it seems to mess up the boundary between
> the group path and cdev path w.r.t. the usage of zero-length fd approach.
> The group path can succeed to do hot reset even if it is passing an empty
> fd array if the dev_set happens to be singleton.
Again, what is the justification for requiring this, it seems to be
only a hack towards no-iommu support with cdev, which we can achieve by
other means. Why have we not needed this in the group model? It
introduces subtle loopholes, so while maybe we could, I don't see why we
should, therefore I cannot agree with "this makes sense".
> 4) Allow passing device fd to do hot reset
> - Impact: this is a new way for hot reset. should have no impact.
>
> 5) Extend the _INFO to report devid
> - Impact: this changes the way user to decode the info reported back.
> devid and groupid are returned per the way the queried device is opened.
> Since it was suggested to support the scenario in which some devices
> are opened via cdev while some devices are opened via group. This makes
> us to return invalid_devid for the device that is opened via group if
> it is affected by the hot reset of a device that is opened via cdev.
>
> This was proposed to support the future device fd passing usage which is
> only available in cdev path.
I think this is fundamentally flawed because of the scope of the
dev-id. We can only provide dev-ids for devices which belong to the
same iommufd of the calling device, thus there are multiple instances
where no dev-id can be provided. The group-id and bdf are static
properties of the devices, regardless of their ownership. The bdf
provides the specific device level association while the group-id
indicates implied, static ownership.
> To me the major confusion is from 1) and 3). 1) changes the meaning of
> _INFO and HOT_RESET, while 3) messes up the boundary.
As above, I think 2) is also an issue.
> Here is my thought:
>
> For 1), it was proposed due to below reason[2]. We'd like to make a scenario
> that works in the group path be workable in cdev path as well. But IMHO, we
> may just accept that cdev path cannot work for such scenario to avoid sublte
> change to uapi. Otherwise, we need to have another HOT_RESET ioctl or a
> hint in HOT_RESET ioctl to tell the kernel whether relaxed ownership check
> is expected. Maybe this is awkward. But if we want to keep it, we'd do it
> with the awareness by user.
>
> [2] https://lore.kernel.org/kvm/Y%2FdobS6gdSkxnPH7@nvidia.com/
The group association is that relaxed ownership test. Yes, there are
corner cases where we have a dual function card with separate IOMMU
groups, where a user owning function 0 could do a bus reset because
function 1 is temporarily unused, but so what, what good is that, have
we ever had an issue raised because of this? The user can't rely on
the unopened state of the other function. It's an entirely
opportunistic optimization.
The much more typical scenario is that a multi-function device does not
provide isolation, all the functions are in the same group and because
of the association of the group the user has implied ownership of the
other devices for the purpose of a reset.
> For 3), it was proposed when discussing the hot reset for noiommu[3]. But
> it does not make hot reset always workable for noiommu in cdev, just in
> case dev_set is singleton. So it is more of a general optimization that can
> make the kernel skip the ownership check. But to make use of it, we may
> need to test it before sanitizing the group fds from user or the iommufd
> check. Maybe the dev_set singleton test in this series is not well placed.
> If so, I can further modify it.
>
> [3] https://lore.kernel.org/kvm/ZACX+Np%2FIY7ygqL5@nvidia.com/
As above, this seems to be some optimization related to no-iommu for
cdev because we don't have an iommufd association for the device in
no-iommu mode. Note however that the current group interface doesn't
care about the IOMMU context of the devices. We only need proof that
the user owns the affected groups. So why are we bringing iommufd
context anywhere into this interface, here or the null-array interface?
It seems like the minor difference with cdev is that a) we're passing
device fds rather than group fds, and b) those device fds need to be
validated as having device access to complete the proof of ownership
relative to the group. Otherwise we add capabilities to
DEVICE_GET_INFO to support the device fd passing model where the user
doesn't know the device group or bdf and allow the reset ioctl itself
to accept device fds (extracting the group relationship for those which
the user has configured for access). Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread
* RE: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-06 17:53 ` Alex Williamson
@ 2023-04-07 10:09 ` Liu, Yi L
2023-04-11 13:24 ` Jason Gunthorpe
1 sibling, 0 replies; 142+ messages in thread
From: Liu, Yi L @ 2023-04-07 10:09 UTC (permalink / raw)
To: Alex Williamson
Cc: Jason Gunthorpe, eric.auger@redhat.com, Tian, Kevin,
joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting,
Duan, Zhenzhong
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, April 7, 2023 1:54 AM
>
> On Thu, 6 Apr 2023 10:02:10 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
>
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Thursday, April 6, 2023 7:23 AM
> > >
> > > On Wed, Apr 05, 2023 at 01:49:45PM -0600, Alex Williamson wrote:
> > >
> > > > > > QEMU can make a policy decision today because the kernel provides a
> > > > > > sufficiently reliable interface, ie. based on the set of owned groups, a
> > > > > > hot-reset is all but guaranteed to work.
> > > > >
> > > > > And we don't change that with cdev. If qemu wants to make the policy
> > > > > decision it keeps using the exact same _INFO interface to make that
> > > > > decision same it has always made.
> > > > >
> > > > > We weaken the actual reset action to only consider the security side.
> > > > >
> > > > > Applications that want this exclusive reset group policy simply must
> > > > > check it on their own. It is a reasonable API design.
> > > >
> > > > I disagree, as I've argued before, the info ioctl becomes so weak and
> > > > effectively arbitrary from a user perspective at being able to predict
> > > > whether the hot-reset ioctl works that it becomes useless, diminishing
> > > > the entire hot-reset info/execute API.
> > >
> > > reset should be strictly more permissive than INFO. If INFO predicts
> > > reset is permitted then reset should succeed.
> > >
> > > We don't change INFO so it cannot "becomes so weak" ??
> > >
> > > We don't care about the cases where INFO says it will not succeed but
> > > reset does (temporarily) succeed.
> > >
> > > I don't get what argument you are trying to make or what you think is
> > > diminished..
> > >
> > > Again, userspace calls INFO, if info says yes then reset *always
> > > works*, exactly just like today.
> > >
> > > Userspace will call reset with a 0 length FD list and it uses a
> > > security only check that is strictly more permissive than what
> > > get_info will return. So the new check is simple in the kernel and
> > > always works in the cases we need it to work.
> > >
> > > What is getting things into trouble is insisting that RESET have
> > > additional restrictions beyond the minimum checks required for
> > > security.
> > >
> > > > > I don't view it as a loophole, it is flexability to use the API in a
> > > > > way that is different from what qemu wants - eg an app like dpdk may
> > > > > be willing to tolerate a reset group that becomes unavailable after
> > > > > startup. Who knows, why should we force this in the kernel?
> > > >
> > > > Because look at all the problems it's causing to try to introduce these
> > > > loopholes without also introducing subtle bugs.
> > >
> > > These problems are coming from tring to do this integrated version,
> > > not from my approach!
> > >
> > > AFAICT there was nothing wrong with my original plan of using the
> > > empty fd list for reset. What Yi has here is some mashup of what you
> > > and I both suggested.
> >
> > Hi Alex, Jason,
> >
> > could be this reason. So let me try to gather the changes of this series
> > does and the impact as far as I know.
> >
> > 1) only check the ownership of opened devices in the dev_set
> > in HOT_RESET ioctl.
> > - Impact: it changes the relationship between _INFO and HOT_RESET.
> > As " Each group must have IOMMU protection established for the
> > ioctl to succeed." in [1], existing design actually means userspace
> > should own all the affected groups before heading to do HOT_RESET.
> > With the change here, the user does not need to ensure all affected
> > groups are opened and it can do hot-reset successfully as long as the
> > devices in the affected group are just un-opened and can be reset.
> >
> > [1] https://patchwork.kernel.org/project/linux-
> pci/patch/20130814200845.21923.64284.stgit@bling.home/
>
> Where whether a device is opened is subject to change outside of the
> user's control. This essentially allows the user to perform hot-resets
> of devices outside of their ownership so long as the device is not
> used elsewhere, versus the current requirement that the user own all the
> affected groups, which implies device ownership. It's not been
> justified why this feature needs to exist, imo.
>
> > 2) Allow passing zero-length fd array to do hot reset
> > - Impact: this uses the iommufd as ownership check in the kernel side.
> > It is only supposed to be used by the users that open cdev instead of
> > users that open group. The drawback is that it cannot cover the noiommu
> > devices as noiommu does not use iommufd at all. But it works well for
> > most cases.
>
> The "only supposed to be used" is problematic here, we're extending all
> the interfaces to transparently accept group and device fds, but here
> we need to make a distinction because the ioctl needs to perform one
> way for groups and another way for devices, which it currently doesn't
> do. As above, I've not seen sufficient justification for this other
> than references to reducing complexity, but the only userspace expected
> to make use of this interface already has equivalent complexity.
>
> > 3) Allow hot reset be successful when the dev_set is singleton
> > - Impact: this makes sense but it seems to mess up the boundary between
> > the group path and cdev path w.r.t. the usage of zero-length fd approach.
> > The group path can succeed to do hot reset even if it is passing an empty
> > fd array if the dev_set happens to be singleton.
>
> Again, what is the justification for requiring this, it seems to be
> only a hack towards no-iommu support with cdev, which we can achieve by
> other means. Why have we not needed this in the group model? It
> introduces subtle loopholes, so while maybe we could, I don't see why we
> should, therefore I cannot agree with "this makes sense".
>
> > 4) Allow passing device fd to do hot reset
> > - Impact: this is a new way for hot reset. should have no impact.
> >
> > 5) Extend the _INFO to report devid
> > - Impact: this changes the way user to decode the info reported back.
> > devid and groupid are returned per the way the queried device is opened.
> > Since it was suggested to support the scenario in which some devices
> > are opened via cdev while some devices are opened via group. This makes
> > us to return invalid_devid for the device that is opened via group if
> > it is affected by the hot reset of a device that is opened via cdev.
> >
> > This was proposed to support the future device fd passing usage which is
> > only available in cdev path.
>
> I think this is fundamentally flawed because of the scope of the
> dev-id. We can only provide dev-ids for devices which belong to the
> same iommufd of the calling device, thus there are multiple instances
> where no dev-id can be provided. The group-id and bdf are static
> properties of the devices, regardless of their ownership. The bdf
> provides the specific device level association while the group-id
> indicates implied, static ownership.
>
> > To me the major confusion is from 1) and 3). 1) changes the meaning of
> > _INFO and HOT_RESET, while 3) messes up the boundary.
>
> As above, I think 2) is also an issue.
>
> > Here is my thought:
> >
> > For 1), it was proposed due to below reason[2]. We'd like to make a scenario
> > that works in the group path be workable in cdev path as well. But IMHO, we
> > may just accept that cdev path cannot work for such scenario to avoid sublte
> > change to uapi. Otherwise, we need to have another HOT_RESET ioctl or a
> > hint in HOT_RESET ioctl to tell the kernel whether relaxed ownership check
> > is expected. Maybe this is awkward. But if we want to keep it, we'd do it
> > with the awareness by user.
> >
> > [2] https://lore.kernel.org/kvm/Y%2FdobS6gdSkxnPH7@nvidia.com/
>
> The group association is that relaxed ownership test. Yes, there are
> corner cases where we have a dual function card with separate IOMMU
> groups, where a user owning function 0 could do a bus reset because
> function 1 is temporarily unused, but so what, what good is that, have
> we ever had an issue raised because of this? The user can't rely on
> the unopened state of the other function. It's an entirely
> opportunistic optimization.
>
> The much more typical scenario is that a multi-function device does not
> provide isolation, all the functions are in the same group and because
> of the association of the group the user has implied ownership of the
> other devices for the purpose of a reset.
>
> > For 3), it was proposed when discussing the hot reset for noiommu[3]. But
> > it does not make hot reset always workable for noiommu in cdev, just in
> > case dev_set is singleton. So it is more of a general optimization that can
> > make the kernel skip the ownership check. But to make use of it, we may
> > need to test it before sanitizing the group fds from user or the iommufd
> > check. Maybe the dev_set singleton test in this series is not well placed.
> > If so, I can further modify it.
> >
> > [3] https://lore.kernel.org/kvm/ZACX+Np%2FIY7ygqL5@nvidia.com/
>
> As above, this seems to be some optimization related to no-iommu for
> cdev because we don't have an iommufd association for the device in
> no-iommu mode. Note however that the current group interface doesn't
> care about the IOMMU context of the devices. We only need proof that
> the user owns the affected groups. So why are we bringing iommufd
> context anywhere into this interface, here or the null-array interface?
>
> It seems like the minor difference with cdev is that a) we're passing
> device fds rather than group fds, and b) those device fds need to be
> validated as having device access to complete the proof of ownership
> relative to the group. Otherwise we add capabilities to
> DEVICE_GET_INFO to support the device fd passing model where the user
> doesn't know the device group or bdf and allow the reset ioctl itself
> to accept device fds (extracting the group relationship for those which
> the user has configured for access). Thanks,
so your suggestion is to drop 1) 2) 3) and 5), keep 4) and add new bdf/group
capability to DEVICE_GET_INFO to retrieve group_id and bdf. In this way, the
existing _INFO ioctl can be reused without any change. is it?
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-06 17:53 ` Alex Williamson
2023-04-07 10:09 ` Liu, Yi L
@ 2023-04-11 13:24 ` Jason Gunthorpe
[not found] ` <20230411095417.240bac39.alex.williamson@redhat.com>
1 sibling, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2023-04-11 13:24 UTC (permalink / raw)
To: Alex Williamson
Cc: Liu, Yi L, eric.auger@redhat.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting,
Duan, Zhenzhong
On Thu, Apr 06, 2023 at 11:53:47AM -0600, Alex Williamson wrote:
> Where whether a device is opened is subject to change outside of the
> user's control. This essentially allows the user to perform hot-resets
> of devices outside of their ownership so long as the device is not
> used elsewhere, versus the current requirement that the user own all the
> affected groups, which implies device ownership. It's not been
> justified why this feature needs to exist, imo.
The cdev API doesn't have the notion that owning a group means you
"own" some collection of devices. It still happens as a side effect,
but it isn't obviously part of the API. I'm really loath to
re-introduce that group-based concept just for this. We are trying
reduce the group API surface.
How about a different direction.
We add a new uAPI for cdev mode that is "take ownership of the reset
group". Maybe it can be a flag in during bind.
When requested vfio will ensure that every device in the reset group
is only bound to this iommufd_ctx or left closed. Now and in the
future. Since no-iommu has no iommufd_ctx this means we can open only
one device in the reset group.
With this flag RESET is guaranteed to always work by definition.
We continue with the zero-length FD, but we can just replace the
security checks with a check if we are in reset group ownership mode.
_INFO is unchanged.
We decide if we add a new IOCTL to return the BDF so the existing
_INFO can get back to the dev_id or a new IOCTL that returns the
dev_id list of the reset group.
Userspace is required to figure out the extent of the reset, but we
don't require that userspace prove to the kernel it did this when
requesting the reset.
Jason
^ permalink raw reply [flat|nested] 142+ messages in thread
* RE: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-05 19:49 ` Alex Williamson
2023-04-05 23:22 ` Jason Gunthorpe
@ 2023-04-06 6:34 ` Liu, Yi L
2023-04-06 17:07 ` Alex Williamson
1 sibling, 1 reply; 142+ messages in thread
From: Liu, Yi L @ 2023-04-06 6:34 UTC (permalink / raw)
To: Alex Williamson, Jason Gunthorpe
Cc: eric.auger@redhat.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
Hi Alex,
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Thursday, April 6, 2023 3:50 AM
>
> On Wed, 5 Apr 2023 16:21:09 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> > On Wed, Apr 05, 2023 at 12:56:21PM -0600, Alex Williamson wrote:
> > > Usability needs to be a consideration as well. An interface where the
> > > result is effectively arbitrary from a user perspective because the
> > > kernel is solely focused on whether the operation is allowed,
> > > evaluating constraints that the user is unaware of and cannot control,
> > > is unusable.
> >
> > Considering this API is only invoked by qemu we might be overdoing
> > this usability and 'no shoot in foot' view.
>
> Ok, I'm not sure why we're diminishing the de facto vfio userspace...
>
> > > > This is a good point that qemu needs to make a policy decision if it
> > > > is happy about the VFIO configuration - but that is a policy decision
> > > > that should not become entangled with the kernel's security checks.
> > > >
> > > > Today qemu can make this policy choice the same way it does right now
> > > > - call _INFO and check the group_ids. It gets the exact same outcome
> > > > as today. We already discussed that we need to expose the group ID
> > > > through an ioctl someplace.
> > >
> > > QEMU can make a policy decision today because the kernel provides a
> > > sufficiently reliable interface, ie. based on the set of owned groups, a
> > > hot-reset is all but guaranteed to work.
> >
> > And we don't change that with cdev. If qemu wants to make the policy
> > decision it keeps using the exact same _INFO interface to make that
> > decision same it has always made.
> >
> > We weaken the actual reset action to only consider the security side.
> >
> > Applications that want this exclusive reset group policy simply must
> > check it on their own. It is a reasonable API design.
>
> I disagree, as I've argued before, the info ioctl becomes so weak and
> effectively arbitrary from a user perspective at being able to predict
> whether the hot-reset ioctl works that it becomes useless, diminishing
> the entire hot-reset info/execute API.
>
> > > > If this is too awkward we could add a query to the kernel if the cdev
> > > > is "reset exclusive" - eg the iommufd covers all the groups that span
> > > > the reset set.
> > >
> > > That's essentially what we have if there are valid dev-ids for each
> > > affected device in the info ioctl.
> >
> > If you have dev-ids for everything, yes. If you don't, then you can't
> > make the same policy choice using a dev-id interface.
>
> Exactly, you can't make any policy choice because the success or
> failure of the hot-reset ioctl can't be known.
could you elaborate a bit about what the policy is here. As far as I know,
QEMU makes use of the information reported by _INFO to check:
- if all the affected groups are owned by the current QEMU[1]
- if the affected devices are opened by the current QEMU, if yes, QEMU
needs to use vfio_pci_pre_reset() to do preparation before issuing
hot rest[1]
[1] vfio_pci_hot_reset() in https://github.com/qemu/qemu/blob/master/hw/vfio/pci.c
> > > I don't think it helps the user experience to create loopholes where
> > > the hot-reset ioctl can still work in spite of those missing
> > > devices.
> >
> > I disagree. The easy straightforward design is that the reset ioctl
> > works if the process has security permissions. Mixing a policy check
> > into the kernel on this path is creating complexity we don't really
> > need.
> >
> > I don't view it as a loophole, it is flexability to use the API in a
> > way that is different from what qemu wants - eg an app like dpdk may
> > be willing to tolerate a reset group that becomes unavailable after
> > startup. Who knows, why should we force this in the kernel?
>
> Because look at all the problems it's causing to try to introduce these
> loopholes without also introducing subtle bugs. There's an argument
> that we're overly strict, which is better than the alternative, which
> seems to be what we're dabbling with. It is a straightforward
> interface for the hot-reset ioctl to mirror the information provided
> via the hot-reset info ioctl.
>
> > > For example, we have a VFIO_DEVICE_GET_INFO ioctl that supports
> > > capability chains, we could add a capability that reports the group ID
> > > for the device.
> >
> > I was going to put that in an iommufd ioctl so it works with VDPA too,
> > but sure, lets assume we can get the group ID from a cdev fd.
> >
> > > The hot-reset info ioctl remains as it is today, reporting group-ids
> > > and bdfs.
> >
> > Sure, but userspace still needs to know how to map the reset sets into
> > dev-ids.
>
> No, it doesn't.
>
> > Remember the reason we started doing this is because we don't
> > have easy access to the BDF anymore.
>
> We don't need it, the info ioctl provides the groups, the group
> association can be learned from the DEVICE_GET_INFO ioctl, the
> hot-reset ioctl only requires a single representative fd per affected
> group. dev-ids not required.
>
> > I like leaving this ioctl alone, lets go back to a dedicated ioctl to
> > return the dev_ids.
>
> I don't see any justification for this. We could add another PCI
> specific DEVICE_GET_INFO capability to report the bdf if we really need
> it, but reporting the group seems sufficient for this use case.
IMHO, the knowledge of group may be not enough. Take QEMU as an example.
QEMU not only needs to ensure the group is owned by it, it also needs to
do preparation on the devices that are already in use and affected by
the hot reset on a new opened device. If there is only group knowledge,
QEMU may blindly prepares all the devices that are already opened and
belong to the same iommu group. But as I got in the discussion iommu
group is not equal to hot reset scope (a.k.a. dev_set). is it? It is
possible that devices in an iommu_group may span into multiple hot
reset scope. For such case, get bdf info from cdev fd is necessary.
Regards,
Yi Liu
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-06 6:34 ` Liu, Yi L
@ 2023-04-06 17:07 ` Alex Williamson
0 siblings, 0 replies; 142+ messages in thread
From: Alex Williamson @ 2023-04-06 17:07 UTC (permalink / raw)
To: Liu, Yi L
Cc: Jason Gunthorpe, eric.auger@redhat.com, Tian, Kevin,
joro@8bytes.org, robin.murphy@arm.com, cohuck@redhat.com,
nicolinc@nvidia.com, kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On Thu, 6 Apr 2023 06:34:08 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:
> Hi Alex,
>
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Thursday, April 6, 2023 3:50 AM
> >
> > On Wed, 5 Apr 2023 16:21:09 -0300
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > > On Wed, Apr 05, 2023 at 12:56:21PM -0600, Alex Williamson wrote:
> > > > Usability needs to be a consideration as well. An interface where the
> > > > result is effectively arbitrary from a user perspective because the
> > > > kernel is solely focused on whether the operation is allowed,
> > > > evaluating constraints that the user is unaware of and cannot control,
> > > > is unusable.
> > >
> > > Considering this API is only invoked by qemu we might be overdoing
> > > this usability and 'no shoot in foot' view.
> >
> > Ok, I'm not sure why we're diminishing the de facto vfio userspace...
> >
> > > > > This is a good point that qemu needs to make a policy decision if it
> > > > > is happy about the VFIO configuration - but that is a policy decision
> > > > > that should not become entangled with the kernel's security checks.
> > > > >
> > > > > Today qemu can make this policy choice the same way it does right now
> > > > > - call _INFO and check the group_ids. It gets the exact same outcome
> > > > > as today. We already discussed that we need to expose the group ID
> > > > > through an ioctl someplace.
> > > >
> > > > QEMU can make a policy decision today because the kernel provides a
> > > > sufficiently reliable interface, ie. based on the set of owned groups, a
> > > > hot-reset is all but guaranteed to work.
> > >
> > > And we don't change that with cdev. If qemu wants to make the policy
> > > decision it keeps using the exact same _INFO interface to make that
> > > decision same it has always made.
> > >
> > > We weaken the actual reset action to only consider the security side.
> > >
> > > Applications that want this exclusive reset group policy simply must
> > > check it on their own. It is a reasonable API design.
> >
> > I disagree, as I've argued before, the info ioctl becomes so weak and
> > effectively arbitrary from a user perspective at being able to predict
> > whether the hot-reset ioctl works that it becomes useless, diminishing
> > the entire hot-reset info/execute API.
> >
> > > > > If this is too awkward we could add a query to the kernel if the cdev
> > > > > is "reset exclusive" - eg the iommufd covers all the groups that span
> > > > > the reset set.
> > > >
> > > > That's essentially what we have if there are valid dev-ids for each
> > > > affected device in the info ioctl.
> > >
> > > If you have dev-ids for everything, yes. If you don't, then you can't
> > > make the same policy choice using a dev-id interface.
> >
> > Exactly, you can't make any policy choice because the success or
> > failure of the hot-reset ioctl can't be known.
>
> could you elaborate a bit about what the policy is here. As far as I know,
> QEMU makes use of the information reported by _INFO to check:
> - if all the affected groups are owned by the current QEMU[1]
> - if the affected devices are opened by the current QEMU, if yes, QEMU
> needs to use vfio_pci_pre_reset() to do preparation before issuing
> hot rest[1]
>
> [1] vfio_pci_hot_reset() in https://github.com/qemu/qemu/blob/master/hw/vfio/pci.c
Regarding the policy decisions, look for instance at the distinction
between vfio_pci_hot_reset_one() vs vfio_pci_hot_reset_multi(), or the
way QEMU will opt for a bus reset if it believes only a PM reset is
available.
In my proposal, I did miss that if _INFO reports the group and bdf that
allows QEMU to associate fd passed devices to a group affected by the
reset, but not specifically whether the device is affected by the
reset. I think that would be justification for capabilities on the
DEVICE_GET_INFO ioctl to report both the group and PCI address as
separate capabilities.
> > > > I don't think it helps the user experience to create loopholes where
> > > > the hot-reset ioctl can still work in spite of those missing
> > > > devices.
> > >
> > > I disagree. The easy straightforward design is that the reset ioctl
> > > works if the process has security permissions. Mixing a policy check
> > > into the kernel on this path is creating complexity we don't really
> > > need.
> > >
> > > I don't view it as a loophole, it is flexability to use the API in a
> > > way that is different from what qemu wants - eg an app like dpdk may
> > > be willing to tolerate a reset group that becomes unavailable after
> > > startup. Who knows, why should we force this in the kernel?
> >
> > Because look at all the problems it's causing to try to introduce these
> > loopholes without also introducing subtle bugs. There's an argument
> > that we're overly strict, which is better than the alternative, which
> > seems to be what we're dabbling with. It is a straightforward
> > interface for the hot-reset ioctl to mirror the information provided
> > via the hot-reset info ioctl.
> >
> > > > For example, we have a VFIO_DEVICE_GET_INFO ioctl that supports
> > > > capability chains, we could add a capability that reports the group ID
> > > > for the device.
> > >
> > > I was going to put that in an iommufd ioctl so it works with VDPA too,
> > > but sure, lets assume we can get the group ID from a cdev fd.
> > >
> > > > The hot-reset info ioctl remains as it is today, reporting group-ids
> > > > and bdfs.
> > >
> > > Sure, but userspace still needs to know how to map the reset sets into
> > > dev-ids.
> >
> > No, it doesn't.
> >
> > > Remember the reason we started doing this is because we don't
> > > have easy access to the BDF anymore.
> >
> > We don't need it, the info ioctl provides the groups, the group
> > association can be learned from the DEVICE_GET_INFO ioctl, the
> > hot-reset ioctl only requires a single representative fd per affected
> > group. dev-ids not required.
> >
> > > I like leaving this ioctl alone, lets go back to a dedicated ioctl to
> > > return the dev_ids.
> >
> > I don't see any justification for this. We could add another PCI
> > specific DEVICE_GET_INFO capability to report the bdf if we really need
> > it, but reporting the group seems sufficient for this use case.
>
> IMHO, the knowledge of group may be not enough. Take QEMU as an example.
> QEMU not only needs to ensure the group is owned by it, it also needs to
> do preparation on the devices that are already in use and affected by
> the hot reset on a new opened device. If there is only group knowledge,
> QEMU may blindly prepares all the devices that are already opened and
> belong to the same iommu group. But as I got in the discussion iommu
> group is not equal to hot reset scope (a.k.a. dev_set). is it? It is
> possible that devices in an iommu_group may span into multiple hot
> reset scope. For such case, get bdf info from cdev fd is necessary.
Yes, you're correct, group and reset scope are not equivalent, so we'd
require a means to get both the group and the bdf for the device.
Knowing the bdf allows the user to know which opened devices are
directly affected by the reset, knowing the group allows the user to
know if ancillary affected devices are within the set of groups the
user owns and therefore effectively under their purview. Thanks,
Alex
^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-05 16:25 ` Alex Williamson
2023-04-05 16:37 ` Jason Gunthorpe
@ 2023-04-05 17:58 ` Eric Auger
2023-04-06 5:31 ` Liu, Yi L
1 sibling, 1 reply; 142+ messages in thread
From: Eric Auger @ 2023-04-05 17:58 UTC (permalink / raw)
To: Alex Williamson, Liu, Yi L
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
On 4/5/23 18:25, Alex Williamson wrote:
> On Wed, 5 Apr 2023 14:04:51 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
>
>> Hi Eric,
>>
>>> From: Eric Auger <eric.auger@redhat.com>
>>> Sent: Wednesday, April 5, 2023 8:20 PM
>>>
>>> Hi Yi,
>>> On 4/1/23 16:44, Yi Liu wrote:
>>>> for the users that accept device fds passed from management stacks to be
>>>> able to figure out the host reset affected devices among the devices
>>>> opened by the user. This is needed as such users do not have BDF (bus,
>>>> devfn) knowledge about the devices it has opened, hence unable to use
>>>> the information reported by existing VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
>>>> to figure out the affected devices.
>>>>
>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>> ---
>>>> drivers/vfio/pci/vfio_pci_core.c | 58 ++++++++++++++++++++++++++++----
>>>> include/uapi/linux/vfio.h | 24 ++++++++++++-
>>>> 2 files changed, 74 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>>>> index 19f5b075d70a..a5a7e148dce1 100644
>>>> --- a/drivers/vfio/pci/vfio_pci_core.c
>>>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>>>> @@ -30,6 +30,7 @@
>>>> #if IS_ENABLED(CONFIG_EEH)
>>>> #include <asm/eeh.h>
>>>> #endif
>>>> +#include <uapi/linux/iommufd.h>
>>>>
>>>> #include "vfio_pci_priv.h"
>>>>
>>>> @@ -767,6 +768,20 @@ static int vfio_pci_get_irq_count(struct
>>> vfio_pci_core_device *vdev, int irq_typ
>>>> return 0;
>>>> }
>>>>
>>>> +static struct vfio_device *
>>>> +vfio_pci_find_device_in_devset(struct vfio_device_set *dev_set,
>>>> + struct pci_dev *pdev)
>>>> +{
>>>> + struct vfio_device *cur;
>>>> +
>>>> + lockdep_assert_held(&dev_set->lock);
>>>> +
>>>> + list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
>>>> + if (cur->dev == &pdev->dev)
>>>> + return cur;
>>>> + return NULL;
>>>> +}
>>>> +
>>>> static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
>>>> {
>>>> (*(int *)data)++;
>>>> @@ -776,13 +791,20 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, void
>>> *data)
>>>> struct vfio_pci_fill_info {
>>>> int max;
>>>> int cur;
>>>> + bool require_devid;
>>>> + struct iommufd_ctx *iommufd;
>>>> + struct vfio_device_set *dev_set;
>>>> struct vfio_pci_dependent_device *devices;
>>>> };
>>>>
>>>> static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>>>> {
>>>> struct vfio_pci_fill_info *fill = data;
>>>> + struct vfio_device_set *dev_set = fill->dev_set;
>>>> struct iommu_group *iommu_group;
>>>> + struct vfio_device *vdev;
>>>> +
>>>> + lockdep_assert_held(&dev_set->lock);
>>>>
>>>> if (fill->cur == fill->max)
>>>> return -EAGAIN; /* Something changed, try again */
>>>> @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void
>>> *data)
>>>> if (!iommu_group)
>>>> return -EPERM; /* Cannot reset non-isolated devices */
>>>>
>>>> - fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
>>>> + if (fill->require_devid) {
>>>> + /*
>>>> + * Report dev_id of the devices that are opened as cdev
>>>> + * and have the same iommufd with the fill->iommufd.
>>>> + * Otherwise, just fill IOMMUFD_INVALID_ID.
>>>> + */
>>>> + vdev = vfio_pci_find_device_in_devset(dev_set, pdev);
>>>> + if (vdev && vfio_device_cdev_opened(vdev) &&
>>>> + fill->iommufd == vfio_iommufd_physical_ictx(vdev))
>>>> + vfio_iommufd_physical_devid(vdev, &fill->devices[fill-
>>>> cur].dev_id);
>>>> + else
>>>> + fill->devices[fill->cur].dev_id = IOMMUFD_INVALID_ID;
>>>> + } else {
>>>> + fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
>>>> + }
>>>> fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
>>>> fill->devices[fill->cur].bus = pdev->bus->number;
>>>> fill->devices[fill->cur].devfn = pdev->devfn;
>>>> @@ -1230,17 +1266,27 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>>>> return -ENOMEM;
>>>>
>>>> fill.devices = devices;
>>>> + fill.dev_set = vdev->vdev.dev_set;
>>>>
>>>> + mutex_lock(&vdev->vdev.dev_set->lock);
>>>> + if (vfio_device_cdev_opened(&vdev->vdev)) {
>>>> + fill.require_devid = true;
>>>> + fill.iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
>>>> + }
>>>> ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
>>>> &fill, slot);
>>>> + mutex_unlock(&vdev->vdev.dev_set->lock);
>>>>
>>>> /*
>>>> * If a device was removed between counting and filling, we may come up
>>>> * short of fill.max. If a device was added, we'll have a return of
>>>> * -EAGAIN above.
>>>> */
>>>> - if (!ret)
>>>> + if (!ret) {
>>>> hdr.count = fill.cur;
>>>> + if (fill.require_devid)
>>>> + hdr.flags = VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID;
>>>> + }
>>>>
>>>> reset_info_exit:
>>>> if (copy_to_user(arg, &hdr, minsz))
>>>> @@ -2346,12 +2392,10 @@ static bool vfio_dev_in_files(struct
>>> vfio_pci_core_device *vdev,
>>>> static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data)
>>>> {
>>>> struct vfio_device_set *dev_set = data;
>>>> - struct vfio_device *cur;
>>>>
>>>> - list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
>>>> - if (cur->dev == &pdev->dev)
>>>> - return 0;
>>>> - return -EBUSY;
>>>> + lockdep_assert_held(&dev_set->lock);
>>>> +
>>>> + return vfio_pci_find_device_in_devset(dev_set, pdev) ? 0 : -EBUSY;
>>>> }
>>>>
>>>> /*
>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>>> index 25432ef213ee..5a34364e3b94 100644
>>>> --- a/include/uapi/linux/vfio.h
>>>> +++ b/include/uapi/linux/vfio.h
>>>> @@ -650,11 +650,32 @@ enum {
>>>> * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
>>>> * struct vfio_pci_hot_reset_info)
>>>> *
>>>> + * This command is used to query the affected devices in the hot reset for
>>>> + * a given device. User could use the information reported by this command
>>>> + * to figure out the affected devices among the devices it has opened.
the 'opened' terminology does not look sufficient here because it is not
only a matter of the device being opened using cdev but it also needs to
have been bound to an iommufd, dev_id being the output of the
dev-iommufd binding.
By the way I am now confused. What does happen if the reset impact some
devices which are not bound to an iommu ctx. Previously we returned the
iommu group which always pre-exists but now you will report invalid id?
>>>> + * This command always reports the segment, bus and devfn information for
>>>> + * each affected device, and selectively report the group_id or the dev_id
>>>> + * per the way how the device being queried is opened.
>>>> + * - If the device is opened via the traditional group/container manner,
>>>> + * this command reports the group_id for each affected device.
>>>> + *
>>>> + * - If the device is opened as a cdev, this command needs to report
>>> s/needs to report/reports
>> got it.
>>
>>>> + * dev_id for each affected device and set the
>>>> + * VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID flag. For the affected
>>>> + * devices that are not opened as cdev or bound to different iommufds
>>>> + * with the device that is queried, report an invalid dev_id to avoid
or not bound at all
>>> s/bound to different iommufds with the device that is queried/bound to
>>> iommufds different from the reset device one?
>> hmmm, I'm not a native speaker here. This _INFO is to query if want
>> hot reset a given device, what devices would be affected. So it appears
>> the queried device is better. But I'd admit "the queried device" is also
>> "the reset device". may Alex help pick one. 😊
> - If the calling device is opened directly via cdev rather than
> accessed through the vfio group, the returned
> vfio_pci_depdendent_device structure reports the dev_id
> rather than the group_id, which is indicated by the
> VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID flag in
> vfio_pci_hot_reset_info. If the reset affects devices that
> are not opened within the same iommufd context as the calling
> device, IOMMUFD_INVALID_ID will be provided as the dev_id.
>
> But that kind of brings to light the question of what does the user do
> when they encounter this situation. If the device is not opened, the
> reset can complete. If the device is opened by a different user, the
> reset is blocked. The only logical conclusion is that the user should
> try the reset regardless of the result of the info ioctl, which the
> null-array approach further solidifies as the direction of the API.
> I'm not liking this. Thanks,
>
> Alex
Thanks
Eric
>
>
>>>> + * potential dev_id conflict as dev_id is local to iommufd. For such
>>>> + * affected devices, user shall fall back to use the segment, bus and
>>>> + * devfn info to map it to opened device.
>>>> + *
>>>> * Return: 0 on success, -errno on failure:
>>>> * -enospc = insufficient buffer, -enodev = unsupported for device.
>>>> */
>>>> struct vfio_pci_dependent_device {
>>>> - __u32 group_id;
>>>> + union {
>>>> + __u32 group_id;
>>>> + __u32 dev_id;
>>>> + };
>>>> __u16 segment;
>>>> __u8 bus;
>>>> __u8 devfn; /* Use PCI_SLOT/PCI_FUNC */
>>>> @@ -663,6 +684,7 @@ struct vfio_pci_dependent_device {
>>>> struct vfio_pci_hot_reset_info {
>>>> __u32 argsz;
>>>> __u32 flags;
>>>> +#define VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID (1 << 0)
>>>> __u32 count;
>>>> struct vfio_pci_dependent_device devices[];
>>>> };
>>> Eric
^ permalink raw reply [flat|nested] 142+ messages in thread* RE: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
2023-04-05 17:58 ` Eric Auger
@ 2023-04-06 5:31 ` Liu, Yi L
0 siblings, 0 replies; 142+ messages in thread
From: Liu, Yi L @ 2023-04-06 5:31 UTC (permalink / raw)
To: eric.auger@redhat.com, Alex Williamson
Cc: jgg@nvidia.com, Tian, Kevin, joro@8bytes.org,
robin.murphy@arm.com, cohuck@redhat.com, nicolinc@nvidia.com,
kvm@vger.kernel.org, mjrosato@linux.ibm.com,
chao.p.peng@linux.intel.com, yi.y.sun@linux.intel.com,
peterx@redhat.com, jasowang@redhat.com,
shameerali.kolothum.thodi@huawei.com, lulu@redhat.com,
suravee.suthikulpanit@amd.com,
intel-gvt-dev@lists.freedesktop.org,
intel-gfx@lists.freedesktop.org, linux-s390@vger.kernel.org,
Hao, Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting
Hi Eric,
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Thursday, April 6, 2023 1:58 AM
[...]
> >>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >>>> index 25432ef213ee..5a34364e3b94 100644
> >>>> --- a/include/uapi/linux/vfio.h
> >>>> +++ b/include/uapi/linux/vfio.h
> >>>> @@ -650,11 +650,32 @@ enum {
> >>>> * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE +
> 12,
> >>>> * struct vfio_pci_hot_reset_info)
> >>>> *
> >>>> + * This command is used to query the affected devices in the hot reset for
> >>>> + * a given device. User could use the information reported by this command
> >>>> + * to figure out the affected devices among the devices it has opened.
> the 'opened' terminology does not look sufficient here because it is not
> only a matter of the device being opened using cdev but it also needs to
> have been bound to an iommufd, dev_id being the output of the
> dev-iommufd binding.
>
> By the way I am now confused. What does happen if the reset impact some
> devices which are not bound to an iommu ctx. Previously we returned the
> iommu group which always pre-exists but now you will report invalid id?
For such devices, user could use the bdf information to check if
affected device is opened by the user. If yes, do some necessary
preparation on the device before issuing hot reset.
Regards,
Yi Liu
> >>>> + * This command always reports the segment, bus and devfn information for
> >>>> + * each affected device, and selectively report the group_id or the dev_id
> >>>> + * per the way how the device being queried is opened.
> >>>> + * - If the device is opened via the traditional group/container manner,
> >>>> + * this command reports the group_id for each affected device.
> >>>> + *
> >>>> + * - If the device is opened as a cdev, this command needs to report
> >>> s/needs to report/reports
> >> got it.
> >>
> >>>> + * dev_id for each affected device and set the
> >>>> + * VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID flag. For the
> affected
> >>>> + * devices that are not opened as cdev or bound to different iommufds
> >>>> + * with the device that is queried, report an invalid dev_id to avoid
> or not bound at all
> >>> s/bound to different iommufds with the device that is queried/bound to
> >>> iommufds different from the reset device one?
> >> hmmm, I'm not a native speaker here. This _INFO is to query if want
> >> hot reset a given device, what devices would be affected. So it appears
> >> the queried device is better. But I'd admit "the queried device" is also
> >> "the reset device". may Alex help pick one. 😊
> > - If the calling device is opened directly via cdev rather than
> > accessed through the vfio group, the returned
> > vfio_pci_depdendent_device structure reports the dev_id
> > rather than the group_id, which is indicated by the
> > VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID flag in
> > vfio_pci_hot_reset_info. If the reset affects devices that
> > are not opened within the same iommufd context as the calling
> > device, IOMMUFD_INVALID_ID will be provided as the dev_id.
> >
> > But that kind of brings to light the question of what does the user do
> > when they encounter this situation. If the device is not opened, the
> > reset can complete. If the device is opened by a different user, the
> > reset is blocked. The only logical conclusion is that the user should
> > try the reset regardless of the result of the info ioctl, which the
> > null-array approach further solidifies as the direction of the API.
> > I'm not liking this. Thanks,
> >
> > Alex
>
> Thanks
>
> Eric
> >
> >
> >>>> + * potential dev_id conflict as dev_id is local to iommufd. For such
> >>>> + * affected devices, user shall fall back to use the segment, bus and
> >>>> + * devfn info to map it to opened device.
> >>>> + *
> >>>> * Return: 0 on success, -errno on failure:
> >>>> * -enospc = insufficient buffer, -enodev = unsupported for device.
> >>>> */
> >>>> struct vfio_pci_dependent_device {
> >>>> - __u32 group_id;
> >>>> + union {
> >>>> + __u32 group_id;
> >>>> + __u32 dev_id;
> >>>> + };
> >>>> __u16 segment;
> >>>> __u8 bus;
> >>>> __u8 devfn; /* Use PCI_SLOT/PCI_FUNC */
> >>>> @@ -663,6 +684,7 @@ struct vfio_pci_dependent_device {
> >>>> struct vfio_pci_hot_reset_info {
> >>>> __u32 argsz;
> >>>> __u32 flags;
> >>>> +#define VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID (1 << 0)
> >>>> __u32 count;
> >>>> struct vfio_pci_dependent_device devices[];
> >>>> };
> >>> Eric
^ permalink raw reply [flat|nested] 142+ messages in thread