* RE: [PATCH 0/6] VFIO mdev aggregated resources handling [not found] ` <20191108081925.GH4196@zhen-hp.sh.intel.com> @ 2019-12-04 17:36 ` Parav Pandit 2019-12-05 6:06 ` Zhenyu Wang 0 siblings, 1 reply; 12+ messages in thread From: Parav Pandit @ 2019-12-04 17:36 UTC (permalink / raw) To: Zhenyu Wang Cc: kvm@vger.kernel.org, alex.williamson@redhat.com, kwankhede@nvidia.com, kevin.tian@intel.com, cohuck@redhat.com, Jiri Pirko, netdev@vger.kernel.org, Jason Wang, Michael S. Tsirkin + Jiri + Netdev since you mentioned netdev queue. + Jason Wang and Michael as we had similar discussion in vdpa discussion thread. > From: Zhenyu Wang <zhenyuw@linux.intel.com> > Sent: Friday, November 8, 2019 2:19 AM > To: Parav Pandit <parav@mellanox.com> > My apologies to reply late. Something bad with my email client, due to which I found this patch under spam folder today. More comments below. > On 2019.11.07 20:37:49 +0000, Parav Pandit wrote: > > Hi, > > > > > -----Original Message----- > > > From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On > > > Behalf Of Zhenyu Wang > > > Sent: Thursday, October 24, 2019 12:08 AM > > > To: kvm@vger.kernel.org > > > Cc: alex.williamson@redhat.com; kwankhede@nvidia.com; > > > kevin.tian@intel.com; cohuck@redhat.com > > > Subject: [PATCH 0/6] VFIO mdev aggregated resources handling > > > > > > Hi, > > > > > > This is a refresh for previous send of this series. I got impression > > > that some SIOV drivers would still deploy their own create and > > > config method so stopped effort on this. But seems this would still > > > be useful for some other SIOV driver which may simply want > > > capability to aggregate resources. So here's refreshed series. > > > > > > Current mdev device create interface depends on fixed mdev type, > > > which get uuid from user to create instance of mdev device. If user > > > wants to use customized number of resource for mdev device, then > > > only can create new > > Can you please give an example of 'resource'? > > When I grep [1], [2] and [3], I couldn't find anything related to ' aggregate'. > > The resource is vendor device specific, in SIOV spec there's ADI (Assignable > Device Interface) definition which could be e.g queue for net device, context > for gpu, etc. I just named this interface as 'aggregate' > for aggregation purpose, it's not used in spec doc. > Some 'unknown/undefined' vendor specific resource just doesn't work. Orchestration tool doesn't know which resource and what/how to configure for which vendor. It has to be well defined. You can also find such discussion in recent lgpu DRM cgroup patches series v4. Exposing networking resource configuration in non-net namespace aware mdev sysfs at PCI device level is no-go. Adding per file NET_ADMIN or other checks is not the approach we follow in kernel. devlink has been a subsystem though under net, that has very rich interface for syscaller, device health, resource management and many more. Even though it is used by net driver today, its written for generic device management at bus/device level. Yuval has posted patches to manage PCI sub-devices [1] and updated version will be posted soon which addresses comments. For any device slice resource management of mdev, sub-function etc, we should be using single kernel interface as devlink [2], [3]. [1] https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-yuvalav@mellanox.com/ [2] http://man7.org/linux/man-pages/man8/devlink-dev.8.html [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html Most modern device configuration that I am aware of is usually done via well defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and more) or via netlink commands (net, devlink, rdma and more) not via sysfs. > Thanks > > > > > > mdev type for that which may not be flexible. This requirement comes > > > not only from to be able to allocate flexible resources for KVMGT, > > > but also from Intel scalable IO virtualization which would use > > > vfio/mdev to be able to allocate arbitrary resources on mdev instance. > More info on [1] [2] [3]. > > > > > > To allow to create user defined resources for mdev, it trys to > > > extend mdev create interface by adding new "aggregate=xxx" parameter > > > following UUID, for target mdev type if aggregation is supported, it > > > can create new mdev device which contains resources combined by > > > number of instances, e.g > > > > > > echo "<uuid>,aggregate=10" > create > > > > > > VM manager e.g libvirt can check mdev type with "aggregation" > > > attribute which can support this setting. If no "aggregation" > > > attribute found for mdev type, previous behavior is still kept for > > > one instance allocation. And new sysfs attribute > > > "aggregated_instances" is created for each mdev device to show allocated > number. > > > > > > References: > > > [1] > > > https://software.intel.com/en-us/download/intel-virtualization-techn > > > ology- for-directed-io-architecture-specification > > > [2] > > > https://software.intel.com/en-us/download/intel-scalable-io-virtuali > > > zation- > > > technical-specification > > > [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf > > > > > > Zhenyu Wang (6): > > > vfio/mdev: Add new "aggregate" parameter for mdev create > > > vfio/mdev: Add "aggregation" attribute for supported mdev type > > > vfio/mdev: Add "aggregated_instances" attribute for supported mdev > > > device > > > Documentation/driver-api/vfio-mediated-device.rst: Update for > > > vfio/mdev aggregation support > > > Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev > > > aggregation support > > > drm/i915/gvt: Add new type with aggregation support > > > > > > Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++ > > > .../driver-api/vfio-mediated-device.rst | 23 ++++++ > > > drivers/gpu/drm/i915/gvt/gvt.c | 4 +- > > > drivers/gpu/drm/i915/gvt/gvt.h | 11 ++- > > > drivers/gpu/drm/i915/gvt/kvmgt.c | 53 ++++++++++++- > > > drivers/gpu/drm/i915/gvt/vgpu.c | 56 ++++++++++++- > > > drivers/vfio/mdev/mdev_core.c | 36 ++++++++- > > > drivers/vfio/mdev/mdev_private.h | 6 +- > > > drivers/vfio/mdev/mdev_sysfs.c | 79 ++++++++++++++++++- > > > include/linux/mdev.h | 19 +++++ > > > 10 files changed, 294 insertions(+), 17 deletions(-) > > > > > > -- > > > 2.24.0.rc0 > > > > -- > Open Source Technology Center, Intel ltd. > > $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/6] VFIO mdev aggregated resources handling 2019-12-04 17:36 ` [PATCH 0/6] VFIO mdev aggregated resources handling Parav Pandit @ 2019-12-05 6:06 ` Zhenyu Wang 2019-12-05 6:40 ` Jason Wang 2019-12-05 18:59 ` Parav Pandit 0 siblings, 2 replies; 12+ messages in thread From: Zhenyu Wang @ 2019-12-05 6:06 UTC (permalink / raw) To: Parav Pandit Cc: kvm@vger.kernel.org, alex.williamson@redhat.com, kwankhede@nvidia.com, kevin.tian@intel.com, cohuck@redhat.com, Jiri Pirko, netdev@vger.kernel.org, Jason Wang, Michael S. Tsirkin [-- Attachment #1: Type: text/plain, Size: 7139 bytes --] On 2019.12.04 17:36:12 +0000, Parav Pandit wrote: > + Jiri + Netdev since you mentioned netdev queue. > > + Jason Wang and Michael as we had similar discussion in vdpa discussion thread. > > > From: Zhenyu Wang <zhenyuw@linux.intel.com> > > Sent: Friday, November 8, 2019 2:19 AM > > To: Parav Pandit <parav@mellanox.com> > > > > My apologies to reply late. > Something bad with my email client, due to which I found this patch under spam folder today. > More comments below. > > > On 2019.11.07 20:37:49 +0000, Parav Pandit wrote: > > > Hi, > > > > > > > -----Original Message----- > > > > From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On > > > > Behalf Of Zhenyu Wang > > > > Sent: Thursday, October 24, 2019 12:08 AM > > > > To: kvm@vger.kernel.org > > > > Cc: alex.williamson@redhat.com; kwankhede@nvidia.com; > > > > kevin.tian@intel.com; cohuck@redhat.com > > > > Subject: [PATCH 0/6] VFIO mdev aggregated resources handling > > > > > > > > Hi, > > > > > > > > This is a refresh for previous send of this series. I got impression > > > > that some SIOV drivers would still deploy their own create and > > > > config method so stopped effort on this. But seems this would still > > > > be useful for some other SIOV driver which may simply want > > > > capability to aggregate resources. So here's refreshed series. > > > > > > > > Current mdev device create interface depends on fixed mdev type, > > > > which get uuid from user to create instance of mdev device. If user > > > > wants to use customized number of resource for mdev device, then > > > > only can create new > > > Can you please give an example of 'resource'? > > > When I grep [1], [2] and [3], I couldn't find anything related to ' aggregate'. > > > > The resource is vendor device specific, in SIOV spec there's ADI (Assignable > > Device Interface) definition which could be e.g queue for net device, context > > for gpu, etc. I just named this interface as 'aggregate' > > for aggregation purpose, it's not used in spec doc. > > > > Some 'unknown/undefined' vendor specific resource just doesn't work. > Orchestration tool doesn't know which resource and what/how to configure for which vendor. > It has to be well defined. > > You can also find such discussion in recent lgpu DRM cgroup patches series v4. > > Exposing networking resource configuration in non-net namespace aware mdev sysfs at PCI device level is no-go. > Adding per file NET_ADMIN or other checks is not the approach we follow in kernel. > > devlink has been a subsystem though under net, that has very rich interface for syscaller, device health, resource management and many more. > Even though it is used by net driver today, its written for generic device management at bus/device level. > > Yuval has posted patches to manage PCI sub-devices [1] and updated version will be posted soon which addresses comments. > > For any device slice resource management of mdev, sub-function etc, we should be using single kernel interface as devlink [2], [3]. > > [1] https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-yuvalav@mellanox.com/ > [2] http://man7.org/linux/man-pages/man8/devlink-dev.8.html > [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html > > Most modern device configuration that I am aware of is usually done via well defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and more) or via netlink commands (net, devlink, rdma and more) not via sysfs. > Current vfio/mdev configuration is via documented sysfs ABI instead of other ways. So this adhere to that way to introduce more configurable method on mdev device for standard, it's optional and not actually vendor specific e.g vfio-ap. I'm not sure how many devices support devlink now, or if really make sense to utilize devlink for other devices except net, or if really make sense to take mdev resource configuration from there... > > > > > > > > > mdev type for that which may not be flexible. This requirement comes > > > > not only from to be able to allocate flexible resources for KVMGT, > > > > but also from Intel scalable IO virtualization which would use > > > > vfio/mdev to be able to allocate arbitrary resources on mdev instance. > > More info on [1] [2] [3]. > > > > > > > > To allow to create user defined resources for mdev, it trys to > > > > extend mdev create interface by adding new "aggregate=xxx" parameter > > > > following UUID, for target mdev type if aggregation is supported, it > > > > can create new mdev device which contains resources combined by > > > > number of instances, e.g > > > > > > > > echo "<uuid>,aggregate=10" > create > > > > > > > > VM manager e.g libvirt can check mdev type with "aggregation" > > > > attribute which can support this setting. If no "aggregation" > > > > attribute found for mdev type, previous behavior is still kept for > > > > one instance allocation. And new sysfs attribute > > > > "aggregated_instances" is created for each mdev device to show allocated > > number. > > > > > > > > References: > > > > [1] > > > > https://software.intel.com/en-us/download/intel-virtualization-techn > > > > ology- for-directed-io-architecture-specification > > > > [2] > > > > https://software.intel.com/en-us/download/intel-scalable-io-virtuali > > > > zation- > > > > technical-specification > > > > [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf > > > > > > > > Zhenyu Wang (6): > > > > vfio/mdev: Add new "aggregate" parameter for mdev create > > > > vfio/mdev: Add "aggregation" attribute for supported mdev type > > > > vfio/mdev: Add "aggregated_instances" attribute for supported mdev > > > > device > > > > Documentation/driver-api/vfio-mediated-device.rst: Update for > > > > vfio/mdev aggregation support > > > > Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev > > > > aggregation support > > > > drm/i915/gvt: Add new type with aggregation support > > > > > > > > Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++ > > > > .../driver-api/vfio-mediated-device.rst | 23 ++++++ > > > > drivers/gpu/drm/i915/gvt/gvt.c | 4 +- > > > > drivers/gpu/drm/i915/gvt/gvt.h | 11 ++- > > > > drivers/gpu/drm/i915/gvt/kvmgt.c | 53 ++++++++++++- > > > > drivers/gpu/drm/i915/gvt/vgpu.c | 56 ++++++++++++- > > > > drivers/vfio/mdev/mdev_core.c | 36 ++++++++- > > > > drivers/vfio/mdev/mdev_private.h | 6 +- > > > > drivers/vfio/mdev/mdev_sysfs.c | 79 ++++++++++++++++++- > > > > include/linux/mdev.h | 19 +++++ > > > > 10 files changed, 294 insertions(+), 17 deletions(-) > > > > > > > > -- > > > > 2.24.0.rc0 > > > > > > > -- > > Open Source Technology Center, Intel ltd. > > > > $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827 -- Open Source Technology Center, Intel ltd. $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/6] VFIO mdev aggregated resources handling 2019-12-05 6:06 ` Zhenyu Wang @ 2019-12-05 6:40 ` Jason Wang 2019-12-05 19:02 ` Parav Pandit 2019-12-05 18:59 ` Parav Pandit 1 sibling, 1 reply; 12+ messages in thread From: Jason Wang @ 2019-12-05 6:40 UTC (permalink / raw) To: Zhenyu Wang, Parav Pandit Cc: kvm@vger.kernel.org, alex.williamson@redhat.com, kwankhede@nvidia.com, kevin.tian@intel.com, cohuck@redhat.com, Jiri Pirko, netdev@vger.kernel.org, Michael S. Tsirkin On 2019/12/5 下午2:06, Zhenyu Wang wrote: > On 2019.12.04 17:36:12 +0000, Parav Pandit wrote: >> + Jiri + Netdev since you mentioned netdev queue. >> >> + Jason Wang and Michael as we had similar discussion in vdpa discussion thread. >> >>> From: Zhenyu Wang <zhenyuw@linux.intel.com> >>> Sent: Friday, November 8, 2019 2:19 AM >>> To: Parav Pandit <parav@mellanox.com> >>> >> My apologies to reply late. >> Something bad with my email client, due to which I found this patch under spam folder today. >> More comments below. >> >>> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote: >>>> Hi, >>>> >>>>> -----Original Message----- >>>>> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On >>>>> Behalf Of Zhenyu Wang >>>>> Sent: Thursday, October 24, 2019 12:08 AM >>>>> To: kvm@vger.kernel.org >>>>> Cc: alex.williamson@redhat.com; kwankhede@nvidia.com; >>>>> kevin.tian@intel.com; cohuck@redhat.com >>>>> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling >>>>> >>>>> Hi, >>>>> >>>>> This is a refresh for previous send of this series. I got impression >>>>> that some SIOV drivers would still deploy their own create and >>>>> config method so stopped effort on this. But seems this would still >>>>> be useful for some other SIOV driver which may simply want >>>>> capability to aggregate resources. So here's refreshed series. >>>>> >>>>> Current mdev device create interface depends on fixed mdev type, >>>>> which get uuid from user to create instance of mdev device. If user >>>>> wants to use customized number of resource for mdev device, then >>>>> only can create new >>>> Can you please give an example of 'resource'? >>>> When I grep [1], [2] and [3], I couldn't find anything related to ' aggregate'. >>> The resource is vendor device specific, in SIOV spec there's ADI (Assignable >>> Device Interface) definition which could be e.g queue for net device, context >>> for gpu, etc. I just named this interface as 'aggregate' >>> for aggregation purpose, it's not used in spec doc. >>> >> Some 'unknown/undefined' vendor specific resource just doesn't work. >> Orchestration tool doesn't know which resource and what/how to configure for which vendor. >> It has to be well defined. >> >> You can also find such discussion in recent lgpu DRM cgroup patches series v4. >> >> Exposing networking resource configuration in non-net namespace aware mdev sysfs at PCI device level is no-go. >> Adding per file NET_ADMIN or other checks is not the approach we follow in kernel. >> >> devlink has been a subsystem though under net, that has very rich interface for syscaller, device health, resource management and many more. >> Even though it is used by net driver today, its written for generic device management at bus/device level. >> >> Yuval has posted patches to manage PCI sub-devices [1] and updated version will be posted soon which addresses comments. >> >> For any device slice resource management of mdev, sub-function etc, we should be using single kernel interface as devlink [2], [3]. >> >> [1] https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-yuvalav@mellanox.com/ >> [2] http://man7.org/linux/man-pages/man8/devlink-dev.8.html >> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html >> >> Most modern device configuration that I am aware of is usually done via well defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and more) or via netlink commands (net, devlink, rdma and more) not via sysfs. >> > Current vfio/mdev configuration is via documented sysfs ABI instead of > other ways. So this adhere to that way to introduce more configurable > method on mdev device for standard, it's optional and not actually > vendor specific e.g vfio-ap. > > I'm not sure how many devices support devlink now, or if really make > sense to utilize devlink for other devices except net, or if really make > sense to take mdev resource configuration from there... It may make sense to allow other types of API to manage mdev other than sysfs. But I'm not sure whether or not it will be a challenge for orchestration. Thanks >>>>> mdev type for that which may not be flexible. This requirement comes >>>>> not only from to be able to allocate flexible resources for KVMGT, >>>>> but also from Intel scalable IO virtualization which would use >>>>> vfio/mdev to be able to allocate arbitrary resources on mdev instance. >>> More info on [1] [2] [3]. >>>>> To allow to create user defined resources for mdev, it trys to >>>>> extend mdev create interface by adding new "aggregate=xxx" parameter >>>>> following UUID, for target mdev type if aggregation is supported, it >>>>> can create new mdev device which contains resources combined by >>>>> number of instances, e.g >>>>> >>>>> echo "<uuid>,aggregate=10" > create >>>>> >>>>> VM manager e.g libvirt can check mdev type with "aggregation" >>>>> attribute which can support this setting. If no "aggregation" >>>>> attribute found for mdev type, previous behavior is still kept for >>>>> one instance allocation. And new sysfs attribute >>>>> "aggregated_instances" is created for each mdev device to show allocated >>> number. >>>>> References: >>>>> [1] >>>>> https://software.intel.com/en-us/download/intel-virtualization-techn >>>>> ology- for-directed-io-architecture-specification >>>>> [2] >>>>> https://software.intel.com/en-us/download/intel-scalable-io-virtuali >>>>> zation- >>>>> technical-specification >>>>> [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf >>>>> >>>>> Zhenyu Wang (6): >>>>> vfio/mdev: Add new "aggregate" parameter for mdev create >>>>> vfio/mdev: Add "aggregation" attribute for supported mdev type >>>>> vfio/mdev: Add "aggregated_instances" attribute for supported mdev >>>>> device >>>>> Documentation/driver-api/vfio-mediated-device.rst: Update for >>>>> vfio/mdev aggregation support >>>>> Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev >>>>> aggregation support >>>>> drm/i915/gvt: Add new type with aggregation support >>>>> >>>>> Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++ >>>>> .../driver-api/vfio-mediated-device.rst | 23 ++++++ >>>>> drivers/gpu/drm/i915/gvt/gvt.c | 4 +- >>>>> drivers/gpu/drm/i915/gvt/gvt.h | 11 ++- >>>>> drivers/gpu/drm/i915/gvt/kvmgt.c | 53 ++++++++++++- >>>>> drivers/gpu/drm/i915/gvt/vgpu.c | 56 ++++++++++++- >>>>> drivers/vfio/mdev/mdev_core.c | 36 ++++++++- >>>>> drivers/vfio/mdev/mdev_private.h | 6 +- >>>>> drivers/vfio/mdev/mdev_sysfs.c | 79 ++++++++++++++++++- >>>>> include/linux/mdev.h | 19 +++++ >>>>> 10 files changed, 294 insertions(+), 17 deletions(-) >>>>> >>>>> -- >>>>> 2.24.0.rc0 >>> -- >>> Open Source Technology Center, Intel ltd. >>> >>> $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827 ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH 0/6] VFIO mdev aggregated resources handling 2019-12-05 6:40 ` Jason Wang @ 2019-12-05 19:02 ` Parav Pandit 0 siblings, 0 replies; 12+ messages in thread From: Parav Pandit @ 2019-12-05 19:02 UTC (permalink / raw) To: Jason Wang, Zhenyu Wang Cc: kvm@vger.kernel.org, alex.williamson@redhat.com, kwankhede@nvidia.com, kevin.tian@intel.com, cohuck@redhat.com, Jiri Pirko, netdev@vger.kernel.org, Michael S. Tsirkin Hi Jason, > From: Jason Wang <jasowang@redhat.com> > Sent: Thursday, December 5, 2019 12:41 AM > > > On 2019/12/5 下午2:06, Zhenyu Wang wrote: > > On 2019.12.04 17:36:12 +0000, Parav Pandit wrote: > >> + Jiri + Netdev since you mentioned netdev queue. > >> > >> + Jason Wang and Michael as we had similar discussion in vdpa discussion > thread. > >> > >>> From: Zhenyu Wang <zhenyuw@linux.intel.com> > >>> Sent: Friday, November 8, 2019 2:19 AM > >>> To: Parav Pandit <parav@mellanox.com> > >>> > >> My apologies to reply late. > >> Something bad with my email client, due to which I found this patch under > spam folder today. > >> More comments below. > >> > >>> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote: > >>>> Hi, > >>>> > >>>>> -----Original Message----- > >>>>> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On > >>>>> Behalf Of Zhenyu Wang > >>>>> Sent: Thursday, October 24, 2019 12:08 AM > >>>>> To: kvm@vger.kernel.org > >>>>> Cc: alex.williamson@redhat.com; kwankhede@nvidia.com; > >>>>> kevin.tian@intel.com; cohuck@redhat.com > >>>>> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling > >>>>> > >>>>> Hi, > >>>>> > >>>>> This is a refresh for previous send of this series. I got > >>>>> impression that some SIOV drivers would still deploy their own > >>>>> create and config method so stopped effort on this. But seems this > >>>>> would still be useful for some other SIOV driver which may simply > >>>>> want capability to aggregate resources. So here's refreshed series. > >>>>> > >>>>> Current mdev device create interface depends on fixed mdev type, > >>>>> which get uuid from user to create instance of mdev device. If > >>>>> user wants to use customized number of resource for mdev device, > >>>>> then only can create new > >>>> Can you please give an example of 'resource'? > >>>> When I grep [1], [2] and [3], I couldn't find anything related to ' > aggregate'. > >>> The resource is vendor device specific, in SIOV spec there's ADI > >>> (Assignable Device Interface) definition which could be e.g queue > >>> for net device, context for gpu, etc. I just named this interface as > 'aggregate' > >>> for aggregation purpose, it's not used in spec doc. > >>> > >> Some 'unknown/undefined' vendor specific resource just doesn't work. > >> Orchestration tool doesn't know which resource and what/how to configure > for which vendor. > >> It has to be well defined. > >> > >> You can also find such discussion in recent lgpu DRM cgroup patches series > v4. > >> > >> Exposing networking resource configuration in non-net namespace aware > mdev sysfs at PCI device level is no-go. > >> Adding per file NET_ADMIN or other checks is not the approach we follow in > kernel. > >> > >> devlink has been a subsystem though under net, that has very rich interface > for syscaller, device health, resource management and many more. > >> Even though it is used by net driver today, its written for generic device > management at bus/device level. > >> > >> Yuval has posted patches to manage PCI sub-devices [1] and updated version > will be posted soon which addresses comments. > >> > >> For any device slice resource management of mdev, sub-function etc, we > should be using single kernel interface as devlink [2], [3]. > >> > >> [1] > >> https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-yuva > >> lav@mellanox.com/ [2] > >> http://man7.org/linux/man-pages/man8/devlink-dev.8.html > >> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html > >> > >> Most modern device configuration that I am aware of is usually done via > well defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and more) > or via netlink commands (net, devlink, rdma and more) not via sysfs. > >> > > Current vfio/mdev configuration is via documented sysfs ABI instead of > > other ways. So this adhere to that way to introduce more configurable > > method on mdev device for standard, it's optional and not actually > > vendor specific e.g vfio-ap. > > > > I'm not sure how many devices support devlink now, or if really make > > sense to utilize devlink for other devices except net, or if really > > make sense to take mdev resource configuration from there... > > > It may make sense to allow other types of API to manage mdev other than > sysfs. But I'm not sure whether or not it will be a challenge for orchestration. > There are two parts. 1. How you specify resource config (sysfs/netlink/devlink/ioctl etc) 2. definition of the resource itself. It has to be well defined. Or it should be categorized as miscellaneous. It cannot be some undefined/vague name as 'aggregate'. > Thanks > > > >>>>> mdev type for that which may not be flexible. This requirement > >>>>> comes not only from to be able to allocate flexible resources for > >>>>> KVMGT, but also from Intel scalable IO virtualization which would > >>>>> use vfio/mdev to be able to allocate arbitrary resources on mdev > instance. > >>> More info on [1] [2] [3]. > >>>>> To allow to create user defined resources for mdev, it trys to > >>>>> extend mdev create interface by adding new "aggregate=xxx" > >>>>> parameter following UUID, for target mdev type if aggregation is > >>>>> supported, it can create new mdev device which contains resources > >>>>> combined by number of instances, e.g > >>>>> > >>>>> echo "<uuid>,aggregate=10" > create > >>>>> > >>>>> VM manager e.g libvirt can check mdev type with "aggregation" > >>>>> attribute which can support this setting. If no "aggregation" > >>>>> attribute found for mdev type, previous behavior is still kept for > >>>>> one instance allocation. And new sysfs attribute > >>>>> "aggregated_instances" is created for each mdev device to show > >>>>> allocated > >>> number. > >>>>> References: > >>>>> [1] > >>>>> https://software.intel.com/en-us/download/intel-virtualization-tec > >>>>> hn > >>>>> ology- for-directed-io-architecture-specification > >>>>> [2] > >>>>> https://software.intel.com/en-us/download/intel-scalable-io-virtua > >>>>> li > >>>>> zation- > >>>>> technical-specification > >>>>> [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf > >>>>> > >>>>> Zhenyu Wang (6): > >>>>> vfio/mdev: Add new "aggregate" parameter for mdev create > >>>>> vfio/mdev: Add "aggregation" attribute for supported mdev type > >>>>> vfio/mdev: Add "aggregated_instances" attribute for supported mdev > >>>>> device > >>>>> Documentation/driver-api/vfio-mediated-device.rst: Update for > >>>>> vfio/mdev aggregation support > >>>>> Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev > >>>>> aggregation support > >>>>> drm/i915/gvt: Add new type with aggregation support > >>>>> > >>>>> Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++ > >>>>> .../driver-api/vfio-mediated-device.rst | 23 ++++++ > >>>>> drivers/gpu/drm/i915/gvt/gvt.c | 4 +- > >>>>> drivers/gpu/drm/i915/gvt/gvt.h | 11 ++- > >>>>> drivers/gpu/drm/i915/gvt/kvmgt.c | 53 ++++++++++++- > >>>>> drivers/gpu/drm/i915/gvt/vgpu.c | 56 ++++++++++++- > >>>>> drivers/vfio/mdev/mdev_core.c | 36 ++++++++- > >>>>> drivers/vfio/mdev/mdev_private.h | 6 +- > >>>>> drivers/vfio/mdev/mdev_sysfs.c | 79 ++++++++++++++++++- > >>>>> include/linux/mdev.h | 19 +++++ > >>>>> 10 files changed, 294 insertions(+), 17 deletions(-) > >>>>> > >>>>> -- > >>>>> 2.24.0.rc0 > >>> -- > >>> Open Source Technology Center, Intel ltd. > >>> > >>> $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827 ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH 0/6] VFIO mdev aggregated resources handling 2019-12-05 6:06 ` Zhenyu Wang 2019-12-05 6:40 ` Jason Wang @ 2019-12-05 18:59 ` Parav Pandit 2019-12-06 8:03 ` Zhenyu Wang 1 sibling, 1 reply; 12+ messages in thread From: Parav Pandit @ 2019-12-05 18:59 UTC (permalink / raw) To: Zhenyu Wang Cc: kvm@vger.kernel.org, alex.williamson@redhat.com, kwankhede@nvidia.com, kevin.tian@intel.com, cohuck@redhat.com, Jiri Pirko, netdev@vger.kernel.org, Jason Wang, Michael S. Tsirkin > From: Zhenyu Wang <zhenyuw@linux.intel.com> > Sent: Thursday, December 5, 2019 12:06 AM > To: Parav Pandit <parav@mellanox.com> > > On 2019.12.04 17:36:12 +0000, Parav Pandit wrote: > > + Jiri + Netdev since you mentioned netdev queue. > > > > + Jason Wang and Michael as we had similar discussion in vdpa discussion > thread. > > > > > From: Zhenyu Wang <zhenyuw@linux.intel.com> > > > Sent: Friday, November 8, 2019 2:19 AM > > > To: Parav Pandit <parav@mellanox.com> > > > > > > > My apologies to reply late. > > Something bad with my email client, due to which I found this patch under > spam folder today. > > More comments below. > > > > > On 2019.11.07 20:37:49 +0000, Parav Pandit wrote: > > > > Hi, > > > > > > > > > -----Original Message----- > > > > > From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On > > > > > Behalf Of Zhenyu Wang > > > > > Sent: Thursday, October 24, 2019 12:08 AM > > > > > To: kvm@vger.kernel.org > > > > > Cc: alex.williamson@redhat.com; kwankhede@nvidia.com; > > > > > kevin.tian@intel.com; cohuck@redhat.com > > > > > Subject: [PATCH 0/6] VFIO mdev aggregated resources handling > > > > > > > > > > Hi, > > > > > > > > > > This is a refresh for previous send of this series. I got > > > > > impression that some SIOV drivers would still deploy their own > > > > > create and config method so stopped effort on this. But seems > > > > > this would still be useful for some other SIOV driver which may > > > > > simply want capability to aggregate resources. So here's refreshed > series. > > > > > > > > > > Current mdev device create interface depends on fixed mdev type, > > > > > which get uuid from user to create instance of mdev device. If > > > > > user wants to use customized number of resource for mdev device, > > > > > then only can create new > > > > Can you please give an example of 'resource'? > > > > When I grep [1], [2] and [3], I couldn't find anything related to ' > aggregate'. > > > > > > The resource is vendor device specific, in SIOV spec there's ADI > > > (Assignable Device Interface) definition which could be e.g queue > > > for net device, context for gpu, etc. I just named this interface as > 'aggregate' > > > for aggregation purpose, it's not used in spec doc. > > > > > > > Some 'unknown/undefined' vendor specific resource just doesn't work. > > Orchestration tool doesn't know which resource and what/how to configure > for which vendor. > > It has to be well defined. > > > > You can also find such discussion in recent lgpu DRM cgroup patches series > v4. > > > > Exposing networking resource configuration in non-net namespace aware > mdev sysfs at PCI device level is no-go. > > Adding per file NET_ADMIN or other checks is not the approach we follow in > kernel. > > > > devlink has been a subsystem though under net, that has very rich interface > for syscaller, device health, resource management and many more. > > Even though it is used by net driver today, its written for generic device > management at bus/device level. > > > > Yuval has posted patches to manage PCI sub-devices [1] and updated version > will be posted soon which addresses comments. > > > > For any device slice resource management of mdev, sub-function etc, we > should be using single kernel interface as devlink [2], [3]. > > > > [1] > > https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-yuval > > av@mellanox.com/ [2] > > http://man7.org/linux/man-pages/man8/devlink-dev.8.html > > [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html > > > > Most modern device configuration that I am aware of is usually done via well > defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and more) or > via netlink commands (net, devlink, rdma and more) not via sysfs. > > > > Current vfio/mdev configuration is via documented sysfs ABI instead of other > ways. So this adhere to that way to introduce more configurable method on > mdev device for standard, it's optional and not actually vendor specific e.g vfio- > ap. > Some unknown/undefined resource as 'aggregate' is just not an ABI. It has to be well defined, as 'hardware_address', 'num_netdev_sqs' or something similar appropriate to that mdev device class. If user wants to set a parameter for a mdev regardless of vendor, they must have single way to do so. > I'm not sure how many devices support devlink now, or if really make sense to > utilize devlink for other devices except net, or if really make sense to take > mdev resource configuration from there... > This is about adding new knobs not the existing one. It has to be well defined. 'aggregate' is not the word that describes it. If this is something very device specific, it should be prefixed with 'misc_' something.. or it should be misc_X ioctl(). Miscellaneous not so well defined class of devices are usually registered using misc_register(). Similarly attributes has to be well defined, otherwise, it should fall under misc category specially when you are pointing to 3 well defined specifications. > > > > > > > > > > > > mdev type for that which may not be flexible. This requirement > > > > > comes not only from to be able to allocate flexible resources > > > > > for KVMGT, but also from Intel scalable IO virtualization which > > > > > would use vfio/mdev to be able to allocate arbitrary resources on mdev > instance. > > > More info on [1] [2] [3]. > > > > > > > > > > To allow to create user defined resources for mdev, it trys to > > > > > extend mdev create interface by adding new "aggregate=xxx" > > > > > parameter following UUID, for target mdev type if aggregation is > > > > > supported, it can create new mdev device which contains > > > > > resources combined by number of instances, e.g > > > > > > > > > > echo "<uuid>,aggregate=10" > create > > > > > > > > > > VM manager e.g libvirt can check mdev type with "aggregation" > > > > > attribute which can support this setting. If no "aggregation" > > > > > attribute found for mdev type, previous behavior is still kept > > > > > for one instance allocation. And new sysfs attribute > > > > > "aggregated_instances" is created for each mdev device to show > > > > > allocated > > > number. > > > > > > > > > > References: > > > > > [1] > > > > > https://software.intel.com/en-us/download/intel-virtualization-t > > > > > echn > > > > > ology- for-directed-io-architecture-specification > > > > > [2] > > > > > https://software.intel.com/en-us/download/intel-scalable-io-virt > > > > > uali > > > > > zation- > > > > > technical-specification > > > > > [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf > > > > > > > > > > Zhenyu Wang (6): > > > > > vfio/mdev: Add new "aggregate" parameter for mdev create > > > > > vfio/mdev: Add "aggregation" attribute for supported mdev type > > > > > vfio/mdev: Add "aggregated_instances" attribute for supported mdev > > > > > device > > > > > Documentation/driver-api/vfio-mediated-device.rst: Update for > > > > > vfio/mdev aggregation support > > > > > Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev > > > > > aggregation support > > > > > drm/i915/gvt: Add new type with aggregation support > > > > > > > > > > Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++ > > > > > .../driver-api/vfio-mediated-device.rst | 23 ++++++ > > > > > drivers/gpu/drm/i915/gvt/gvt.c | 4 +- > > > > > drivers/gpu/drm/i915/gvt/gvt.h | 11 ++- > > > > > drivers/gpu/drm/i915/gvt/kvmgt.c | 53 ++++++++++++- > > > > > drivers/gpu/drm/i915/gvt/vgpu.c | 56 ++++++++++++- > > > > > drivers/vfio/mdev/mdev_core.c | 36 ++++++++- > > > > > drivers/vfio/mdev/mdev_private.h | 6 +- > > > > > drivers/vfio/mdev/mdev_sysfs.c | 79 ++++++++++++++++++- > > > > > include/linux/mdev.h | 19 +++++ > > > > > 10 files changed, 294 insertions(+), 17 deletions(-) > > > > > > > > > > -- > > > > > 2.24.0.rc0 > > > > > > > > > > -- > > > Open Source Technology Center, Intel ltd. > > > > > > $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827 > > -- > Open Source Technology Center, Intel ltd. > > $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/6] VFIO mdev aggregated resources handling 2019-12-05 18:59 ` Parav Pandit @ 2019-12-06 8:03 ` Zhenyu Wang 2019-12-06 17:33 ` Parav Pandit 0 siblings, 1 reply; 12+ messages in thread From: Zhenyu Wang @ 2019-12-06 8:03 UTC (permalink / raw) To: Parav Pandit Cc: kvm@vger.kernel.org, alex.williamson@redhat.com, kwankhede@nvidia.com, kevin.tian@intel.com, cohuck@redhat.com, Jiri Pirko, netdev@vger.kernel.org, Jason Wang, Michael S. Tsirkin [-- Attachment #1: Type: text/plain, Size: 8729 bytes --] On 2019.12.05 18:59:36 +0000, Parav Pandit wrote: > > > > > > > On 2019.11.07 20:37:49 +0000, Parav Pandit wrote: > > > > > Hi, > > > > > > > > > > > -----Original Message----- > > > > > > From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On > > > > > > Behalf Of Zhenyu Wang > > > > > > Sent: Thursday, October 24, 2019 12:08 AM > > > > > > To: kvm@vger.kernel.org > > > > > > Cc: alex.williamson@redhat.com; kwankhede@nvidia.com; > > > > > > kevin.tian@intel.com; cohuck@redhat.com > > > > > > Subject: [PATCH 0/6] VFIO mdev aggregated resources handling > > > > > > > > > > > > Hi, > > > > > > > > > > > > This is a refresh for previous send of this series. I got > > > > > > impression that some SIOV drivers would still deploy their own > > > > > > create and config method so stopped effort on this. But seems > > > > > > this would still be useful for some other SIOV driver which may > > > > > > simply want capability to aggregate resources. So here's refreshed > > series. > > > > > > > > > > > > Current mdev device create interface depends on fixed mdev type, > > > > > > which get uuid from user to create instance of mdev device. If > > > > > > user wants to use customized number of resource for mdev device, > > > > > > then only can create new > > > > > Can you please give an example of 'resource'? > > > > > When I grep [1], [2] and [3], I couldn't find anything related to ' > > aggregate'. > > > > > > > > The resource is vendor device specific, in SIOV spec there's ADI > > > > (Assignable Device Interface) definition which could be e.g queue > > > > for net device, context for gpu, etc. I just named this interface as > > 'aggregate' > > > > for aggregation purpose, it's not used in spec doc. > > > > > > > > > > Some 'unknown/undefined' vendor specific resource just doesn't work. > > > Orchestration tool doesn't know which resource and what/how to configure > > for which vendor. > > > It has to be well defined. > > > > > > You can also find such discussion in recent lgpu DRM cgroup patches series > > v4. > > > > > > Exposing networking resource configuration in non-net namespace aware > > mdev sysfs at PCI device level is no-go. > > > Adding per file NET_ADMIN or other checks is not the approach we follow in > > kernel. > > > > > > devlink has been a subsystem though under net, that has very rich interface > > for syscaller, device health, resource management and many more. > > > Even though it is used by net driver today, its written for generic device > > management at bus/device level. > > > > > > Yuval has posted patches to manage PCI sub-devices [1] and updated version > > will be posted soon which addresses comments. > > > > > > For any device slice resource management of mdev, sub-function etc, we > > should be using single kernel interface as devlink [2], [3]. > > > > > > [1] > > > https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-yuval > > > av@mellanox.com/ [2] > > > http://man7.org/linux/man-pages/man8/devlink-dev.8.html > > > [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html > > > > > > Most modern device configuration that I am aware of is usually done via well > > defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and more) or > > via netlink commands (net, devlink, rdma and more) not via sysfs. > > > > > > > Current vfio/mdev configuration is via documented sysfs ABI instead of other > > ways. So this adhere to that way to introduce more configurable method on > > mdev device for standard, it's optional and not actually vendor specific e.g vfio- > > ap. > > > Some unknown/undefined resource as 'aggregate' is just not an ABI. > It has to be well defined, as 'hardware_address', 'num_netdev_sqs' or something similar appropriate to that mdev device class. > If user wants to set a parameter for a mdev regardless of vendor, they must have single way to do so. The idea is not specific for some device class, but for each mdev type's resource, and be optional for each vendor. If more device class specific way is preferred, then we might have very different ways for different vendors. Better to avoid that, so here means to aggregate number of mdev type's resources for target instance, instead of defining kinds of mdev types for those number of resources. > > > I'm not sure how many devices support devlink now, or if really make sense to > > utilize devlink for other devices except net, or if really make sense to take > > mdev resource configuration from there... > > > This is about adding new knobs not the existing one. > It has to be well defined. 'aggregate' is not the word that describes it. > If this is something very device specific, it should be prefixed with 'misc_' something.. or it should be misc_X ioctl(). > Miscellaneous not so well defined class of devices are usually registered using misc_register(). > Similarly attributes has to be well defined, otherwise, it should fall under misc category specially when you are pointing to 3 well defined specifications. > Any suggestion for naming it? > > > > > > > > > > > > > > > mdev type for that which may not be flexible. This requirement > > > > > > comes not only from to be able to allocate flexible resources > > > > > > for KVMGT, but also from Intel scalable IO virtualization which > > > > > > would use vfio/mdev to be able to allocate arbitrary resources on mdev > > instance. > > > > More info on [1] [2] [3]. > > > > > > > > > > > > To allow to create user defined resources for mdev, it trys to > > > > > > extend mdev create interface by adding new "aggregate=xxx" > > > > > > parameter following UUID, for target mdev type if aggregation is > > > > > > supported, it can create new mdev device which contains > > > > > > resources combined by number of instances, e.g > > > > > > > > > > > > echo "<uuid>,aggregate=10" > create > > > > > > > > > > > > VM manager e.g libvirt can check mdev type with "aggregation" > > > > > > attribute which can support this setting. If no "aggregation" > > > > > > attribute found for mdev type, previous behavior is still kept > > > > > > for one instance allocation. And new sysfs attribute > > > > > > "aggregated_instances" is created for each mdev device to show > > > > > > allocated > > > > number. > > > > > > > > > > > > References: > > > > > > [1] > > > > > > https://software.intel.com/en-us/download/intel-virtualization-t > > > > > > echn > > > > > > ology- for-directed-io-architecture-specification > > > > > > [2] > > > > > > https://software.intel.com/en-us/download/intel-scalable-io-virt > > > > > > uali > > > > > > zation- > > > > > > technical-specification > > > > > > [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf > > > > > > > > > > > > Zhenyu Wang (6): > > > > > > vfio/mdev: Add new "aggregate" parameter for mdev create > > > > > > vfio/mdev: Add "aggregation" attribute for supported mdev type > > > > > > vfio/mdev: Add "aggregated_instances" attribute for supported mdev > > > > > > device > > > > > > Documentation/driver-api/vfio-mediated-device.rst: Update for > > > > > > vfio/mdev aggregation support > > > > > > Documentation/ABI/testing/sysfs-bus-vfio-mdev: Update for vfio/mdev > > > > > > aggregation support > > > > > > drm/i915/gvt: Add new type with aggregation support > > > > > > > > > > > > Documentation/ABI/testing/sysfs-bus-vfio-mdev | 24 ++++++ > > > > > > .../driver-api/vfio-mediated-device.rst | 23 ++++++ > > > > > > drivers/gpu/drm/i915/gvt/gvt.c | 4 +- > > > > > > drivers/gpu/drm/i915/gvt/gvt.h | 11 ++- > > > > > > drivers/gpu/drm/i915/gvt/kvmgt.c | 53 ++++++++++++- > > > > > > drivers/gpu/drm/i915/gvt/vgpu.c | 56 ++++++++++++- > > > > > > drivers/vfio/mdev/mdev_core.c | 36 ++++++++- > > > > > > drivers/vfio/mdev/mdev_private.h | 6 +- > > > > > > drivers/vfio/mdev/mdev_sysfs.c | 79 ++++++++++++++++++- > > > > > > include/linux/mdev.h | 19 +++++ > > > > > > 10 files changed, 294 insertions(+), 17 deletions(-) > > > > > > > > > > > > -- > > > > > > 2.24.0.rc0 > > > > > > > > > > > > > -- > > > > Open Source Technology Center, Intel ltd. > > > > > > > > $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827 > > > > -- > > Open Source Technology Center, Intel ltd. > > > > $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827 -- Open Source Technology Center, Intel ltd. $gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/6] VFIO mdev aggregated resources handling 2019-12-06 8:03 ` Zhenyu Wang @ 2019-12-06 17:33 ` Parav Pandit 2019-12-10 3:33 ` Tian, Kevin 0 siblings, 1 reply; 12+ messages in thread From: Parav Pandit @ 2019-12-06 17:33 UTC (permalink / raw) To: Zhenyu Wang Cc: kvm@vger.kernel.org, alex.williamson@redhat.com, kwankhede@nvidia.com, kevin.tian@intel.com, cohuck@redhat.com, Jiri Pirko, netdev@vger.kernel.org, Jason Wang, Michael S. Tsirkin On 12/6/2019 2:03 AM, Zhenyu Wang wrote: > On 2019.12.05 18:59:36 +0000, Parav Pandit wrote: >>>> >>>>> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote: >>>>>> Hi, >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> On >>>>>>> Behalf Of Zhenyu Wang >>>>>>> Sent: Thursday, October 24, 2019 12:08 AM >>>>>>> To: kvm@vger.kernel.org >>>>>>> Cc: alex.williamson@redhat.com; kwankhede@nvidia.com; >>>>>>> kevin.tian@intel.com; cohuck@redhat.com >>>>>>> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> This is a refresh for previous send of this series. I got >>>>>>> impression that some SIOV drivers would still deploy their own >>>>>>> create and config method so stopped effort on this. But seems >>>>>>> this would still be useful for some other SIOV driver which may >>>>>>> simply want capability to aggregate resources. So here's refreshed >>> series. >>>>>>> >>>>>>> Current mdev device create interface depends on fixed mdev type, >>>>>>> which get uuid from user to create instance of mdev device. If >>>>>>> user wants to use customized number of resource for mdev device, >>>>>>> then only can create new >>>>>> Can you please give an example of 'resource'? >>>>>> When I grep [1], [2] and [3], I couldn't find anything related to ' >>> aggregate'. >>>>> >>>>> The resource is vendor device specific, in SIOV spec there's ADI >>>>> (Assignable Device Interface) definition which could be e.g queue >>>>> for net device, context for gpu, etc. I just named this interface as >>> 'aggregate' >>>>> for aggregation purpose, it's not used in spec doc. >>>>> >>>> >>>> Some 'unknown/undefined' vendor specific resource just doesn't work. >>>> Orchestration tool doesn't know which resource and what/how to configure >>> for which vendor. >>>> It has to be well defined. >>>> >>>> You can also find such discussion in recent lgpu DRM cgroup patches series >>> v4. >>>> >>>> Exposing networking resource configuration in non-net namespace aware >>> mdev sysfs at PCI device level is no-go. >>>> Adding per file NET_ADMIN or other checks is not the approach we follow in >>> kernel. >>>> >>>> devlink has been a subsystem though under net, that has very rich interface >>> for syscaller, device health, resource management and many more. >>>> Even though it is used by net driver today, its written for generic device >>> management at bus/device level. >>>> >>>> Yuval has posted patches to manage PCI sub-devices [1] and updated version >>> will be posted soon which addresses comments. >>>> >>>> For any device slice resource management of mdev, sub-function etc, we >>> should be using single kernel interface as devlink [2], [3]. >>>> >>>> [1] >>>> https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email-yuval >>>> av@mellanox.com/ [2] >>>> http://man7.org/linux/man-pages/man8/devlink-dev.8.html >>>> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html >>>> >>>> Most modern device configuration that I am aware of is usually done via well >>> defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and more) or >>> via netlink commands (net, devlink, rdma and more) not via sysfs. >>>> >>> >>> Current vfio/mdev configuration is via documented sysfs ABI instead of other >>> ways. So this adhere to that way to introduce more configurable method on >>> mdev device for standard, it's optional and not actually vendor specific e.g vfio- >>> ap. >>> >> Some unknown/undefined resource as 'aggregate' is just not an ABI. >> It has to be well defined, as 'hardware_address', 'num_netdev_sqs' or something similar appropriate to that mdev device class. >> If user wants to set a parameter for a mdev regardless of vendor, they must have single way to do so. > > The idea is not specific for some device class, but for each mdev > type's resource, and be optional for each vendor. If more device class > specific way is preferred, then we might have very different ways for > different vendors. Better to avoid that, so here means to aggregate > number of mdev type's resources for target instance, instead of defining > kinds of mdev types for those number of resources. > Parameter or attribute certainly can be optional. But the way to aggregate them should not be vendor specific. Look for some excellent existing examples across subsystems, for example how you create aggregated netdev or block device is not depend on vendor or underlying device type. >> >>> I'm not sure how many devices support devlink now, or if really make sense to >>> utilize devlink for other devices except net, or if really make sense to take >>> mdev resource configuration from there... >>> >> This is about adding new knobs not the existing one. >> It has to be well defined. 'aggregate' is not the word that describes it. >> If this is something very device specific, it should be prefixed with 'misc_' something.. or it should be misc_X ioctl(). >> Miscellaneous not so well defined class of devices are usually registered using misc_register(). >> Similarly attributes has to be well defined, otherwise, it should fall under misc category specially when you are pointing to 3 well defined specifications. >> > > Any suggestion for naming it? If parameter is miscellaneous, please prefix it with misc in mdev ioctl() or in sysfs. If parameter/attribute is max_netdev_txqs for netdev, name as that, If its max_dedicated_wqs of some dsa device, please name is that way. ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH 0/6] VFIO mdev aggregated resources handling 2019-12-06 17:33 ` Parav Pandit @ 2019-12-10 3:33 ` Tian, Kevin 2019-12-10 19:07 ` Alex Williamson 0 siblings, 1 reply; 12+ messages in thread From: Tian, Kevin @ 2019-12-10 3:33 UTC (permalink / raw) To: Parav Pandit, Zhenyu Wang Cc: kvm@vger.kernel.org, alex.williamson@redhat.com, kwankhede@nvidia.com, cohuck@redhat.com, Jiri Pirko, netdev@vger.kernel.org, Jason Wang, Michael S. Tsirkin > From: Parav Pandit <parav@mellanox.com> > Sent: Saturday, December 7, 2019 1:34 AM > > On 12/6/2019 2:03 AM, Zhenyu Wang wrote: > > On 2019.12.05 18:59:36 +0000, Parav Pandit wrote: > >>>> > >>>>> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote: > >>>>>> Hi, > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> > On > >>>>>>> Behalf Of Zhenyu Wang > >>>>>>> Sent: Thursday, October 24, 2019 12:08 AM > >>>>>>> To: kvm@vger.kernel.org > >>>>>>> Cc: alex.williamson@redhat.com; kwankhede@nvidia.com; > >>>>>>> kevin.tian@intel.com; cohuck@redhat.com > >>>>>>> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling > >>>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> This is a refresh for previous send of this series. I got > >>>>>>> impression that some SIOV drivers would still deploy their own > >>>>>>> create and config method so stopped effort on this. But seems > >>>>>>> this would still be useful for some other SIOV driver which may > >>>>>>> simply want capability to aggregate resources. So here's refreshed > >>> series. > >>>>>>> > >>>>>>> Current mdev device create interface depends on fixed mdev type, > >>>>>>> which get uuid from user to create instance of mdev device. If > >>>>>>> user wants to use customized number of resource for mdev device, > >>>>>>> then only can create new > >>>>>> Can you please give an example of 'resource'? > >>>>>> When I grep [1], [2] and [3], I couldn't find anything related to ' > >>> aggregate'. > >>>>> > >>>>> The resource is vendor device specific, in SIOV spec there's ADI > >>>>> (Assignable Device Interface) definition which could be e.g queue > >>>>> for net device, context for gpu, etc. I just named this interface as > >>> 'aggregate' > >>>>> for aggregation purpose, it's not used in spec doc. > >>>>> > >>>> > >>>> Some 'unknown/undefined' vendor specific resource just doesn't work. > >>>> Orchestration tool doesn't know which resource and what/how to > configure > >>> for which vendor. > >>>> It has to be well defined. > >>>> > >>>> You can also find such discussion in recent lgpu DRM cgroup patches > series > >>> v4. > >>>> > >>>> Exposing networking resource configuration in non-net namespace > aware > >>> mdev sysfs at PCI device level is no-go. > >>>> Adding per file NET_ADMIN or other checks is not the approach we > follow in > >>> kernel. > >>>> > >>>> devlink has been a subsystem though under net, that has very rich > interface > >>> for syscaller, device health, resource management and many more. > >>>> Even though it is used by net driver today, its written for generic device > >>> management at bus/device level. > >>>> > >>>> Yuval has posted patches to manage PCI sub-devices [1] and updated > version > >>> will be posted soon which addresses comments. > >>>> > >>>> For any device slice resource management of mdev, sub-function etc, > we > >>> should be using single kernel interface as devlink [2], [3]. > >>>> > >>>> [1] > >>>> https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email- > yuval > >>>> av@mellanox.com/ [2] > >>>> http://man7.org/linux/man-pages/man8/devlink-dev.8.html > >>>> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html > >>>> > >>>> Most modern device configuration that I am aware of is usually done > via well > >>> defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and > more) or > >>> via netlink commands (net, devlink, rdma and more) not via sysfs. > >>>> > >>> > >>> Current vfio/mdev configuration is via documented sysfs ABI instead of > other > >>> ways. So this adhere to that way to introduce more configurable method > on > >>> mdev device for standard, it's optional and not actually vendor specific > e.g vfio- > >>> ap. > >>> > >> Some unknown/undefined resource as 'aggregate' is just not an ABI. > >> It has to be well defined, as 'hardware_address', 'num_netdev_sqs' or > something similar appropriate to that mdev device class. > >> If user wants to set a parameter for a mdev regardless of vendor, they > must have single way to do so. > > > > The idea is not specific for some device class, but for each mdev > > type's resource, and be optional for each vendor. If more device class > > specific way is preferred, then we might have very different ways for > > different vendors. Better to avoid that, so here means to aggregate > > number of mdev type's resources for target instance, instead of defining > > kinds of mdev types for those number of resources. > > > Parameter or attribute certainly can be optional. > But the way to aggregate them should not be vendor specific. > Look for some excellent existing examples across subsystems, for example > how you create aggregated netdev or block device is not depend on vendor > or underlying device type. I'd like to hear Alex's opinion on this. Today VFIO mdev supports two styles of "types" imo: fixed resource definition (most cases) and dynamic resource definition (vfio-ap). In fixed style, a type has fixed association to a set of vendor specific resources (resourceX=M, resourceY=N, ...). In dynamic case, the user is allowed to specify actual resource X/Y/... backing the mdev instance post its creation. In either case, the way to identify such association or configurable knobs is vendor specific, maybe contained in optional attributes (name and description) plus additional info in vendor documents. Then the user is assumed to clearly understand the implication of the resource allocation under a given type, when creating a new mdev under this type. If this assumption holds true, the aggregated attribute simply provides an extension in the same direction of fixed-style types but allowing for more flexible linearly-increasing resource allocation. e.g. when using aggregate=2, it means creating a instance with resourceX=2M, resourceY=2N, ... under the specified type. Along this direction I didn't see the need of well-defined vendor specific attributes here. When those are actually required, I suppose the dynamic style would better fit. Or if the vendor driver thinks implementing such aggregate feature will confuse its type definition, it's optional to not doing so anyway. Thanks Kevin ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/6] VFIO mdev aggregated resources handling 2019-12-10 3:33 ` Tian, Kevin @ 2019-12-10 19:07 ` Alex Williamson 2019-12-10 21:08 ` Parav Pandit 0 siblings, 1 reply; 12+ messages in thread From: Alex Williamson @ 2019-12-10 19:07 UTC (permalink / raw) To: Tian, Kevin Cc: Parav Pandit, Zhenyu Wang, kvm@vger.kernel.org, kwankhede@nvidia.com, cohuck@redhat.com, Jiri Pirko, netdev@vger.kernel.org, Jason Wang, Michael S. Tsirkin On Tue, 10 Dec 2019 03:33:23 +0000 "Tian, Kevin" <kevin.tian@intel.com> wrote: > > From: Parav Pandit <parav@mellanox.com> > > Sent: Saturday, December 7, 2019 1:34 AM > > > > On 12/6/2019 2:03 AM, Zhenyu Wang wrote: > > > On 2019.12.05 18:59:36 +0000, Parav Pandit wrote: > > >>>> > > >>>>> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote: > > >>>>>> Hi, > > >>>>>> > > >>>>>>> -----Original Message----- > > >>>>>>> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> > > On > > >>>>>>> Behalf Of Zhenyu Wang > > >>>>>>> Sent: Thursday, October 24, 2019 12:08 AM > > >>>>>>> To: kvm@vger.kernel.org > > >>>>>>> Cc: alex.williamson@redhat.com; kwankhede@nvidia.com; > > >>>>>>> kevin.tian@intel.com; cohuck@redhat.com > > >>>>>>> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling > > >>>>>>> > > >>>>>>> Hi, > > >>>>>>> > > >>>>>>> This is a refresh for previous send of this series. I got > > >>>>>>> impression that some SIOV drivers would still deploy their own > > >>>>>>> create and config method so stopped effort on this. But seems > > >>>>>>> this would still be useful for some other SIOV driver which may > > >>>>>>> simply want capability to aggregate resources. So here's refreshed > > >>> series. > > >>>>>>> > > >>>>>>> Current mdev device create interface depends on fixed mdev type, > > >>>>>>> which get uuid from user to create instance of mdev device. If > > >>>>>>> user wants to use customized number of resource for mdev device, > > >>>>>>> then only can create new > > >>>>>> Can you please give an example of 'resource'? > > >>>>>> When I grep [1], [2] and [3], I couldn't find anything related to ' > > >>> aggregate'. > > >>>>> > > >>>>> The resource is vendor device specific, in SIOV spec there's ADI > > >>>>> (Assignable Device Interface) definition which could be e.g queue > > >>>>> for net device, context for gpu, etc. I just named this interface as > > >>> 'aggregate' > > >>>>> for aggregation purpose, it's not used in spec doc. > > >>>>> > > >>>> > > >>>> Some 'unknown/undefined' vendor specific resource just doesn't work. > > >>>> Orchestration tool doesn't know which resource and what/how to > > configure > > >>> for which vendor. > > >>>> It has to be well defined. > > >>>> > > >>>> You can also find such discussion in recent lgpu DRM cgroup patches > > series > > >>> v4. > > >>>> > > >>>> Exposing networking resource configuration in non-net namespace > > aware > > >>> mdev sysfs at PCI device level is no-go. > > >>>> Adding per file NET_ADMIN or other checks is not the approach we > > follow in > > >>> kernel. > > >>>> > > >>>> devlink has been a subsystem though under net, that has very rich > > interface > > >>> for syscaller, device health, resource management and many more. > > >>>> Even though it is used by net driver today, its written for generic device > > >>> management at bus/device level. > > >>>> > > >>>> Yuval has posted patches to manage PCI sub-devices [1] and updated > > version > > >>> will be posted soon which addresses comments. Always good to see tools that intend to manage arbitrary devices posted only to the netdev list :-\ > > >>>> > > >>>> For any device slice resource management of mdev, sub-function etc, > > we > > >>> should be using single kernel interface as devlink [2], [3]. This seems impractical, mdevs and SR-IOV are both enumerated, inspected, created, and removed in sysfs, where do we define what features are manipulated vis sysfs versus devlink? mdevs, by definition, are vendor defined "chunks" of a thing. We allow vendor drivers to define different types, representing different configurations of these chunks. Often these different types are incrementally bigger or smaller chunks of these things, but defining what bigger and smaller means generically across vendors is an impossible task. Orchestration tools already need to know vendor specific information in terms of what type of mdev device they want to create and make use of. The aggregation seems to simply augment that vendor information, ie. 'type' and 'scale' are separate rather than combined only behind just 'type'. > > >>>> > > >>>> [1] > > >>>> https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email- > > yuval > > >>>> av@mellanox.com/ [2] > > >>>> http://man7.org/linux/man-pages/man8/devlink-dev.8.html > > >>>> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html > > >>>> > > >>>> Most modern device configuration that I am aware of is usually done > > via well > > >>> defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and > > more) or > > >>> via netlink commands (net, devlink, rdma and more) not via sysfs. > > >>>> > > >>> > > >>> Current vfio/mdev configuration is via documented sysfs ABI instead of > > other > > >>> ways. So this adhere to that way to introduce more configurable method > > on > > >>> mdev device for standard, it's optional and not actually vendor specific > > e.g vfio- > > >>> ap. > > >>> > > >> Some unknown/undefined resource as 'aggregate' is just not an ABI. > > >> It has to be well defined, as 'hardware_address', 'num_netdev_sqs' or > > something similar appropriate to that mdev device class. > > >> If user wants to set a parameter for a mdev regardless of vendor, they > > must have single way to do so. Aggregation augments type, which is by definition vendor specific. > > > > > > The idea is not specific for some device class, but for each mdev > > > type's resource, and be optional for each vendor. If more device class > > > specific way is preferred, then we might have very different ways for > > > different vendors. Better to avoid that, so here means to aggregate > > > number of mdev type's resources for target instance, instead of defining > > > kinds of mdev types for those number of resources. > > > > > Parameter or attribute certainly can be optional. > > But the way to aggregate them should not be vendor specific. > > Look for some excellent existing examples across subsystems, for example > > how you create aggregated netdev or block device is not depend on vendor > > or underlying device type. > > I'd like to hear Alex's opinion on this. Today VFIO mdev supports two styles > of "types" imo: fixed resource definition (most cases) and dynamic resource > definition (vfio-ap). In fixed style, a type has fixed association to a set of > vendor specific resources (resourceX=M, resourceY=N, ...). In dynamic case, > the user is allowed to specify actual resource X/Y/... backing the mdev > instance post its creation. In either case, the way to identify such association > or configurable knobs is vendor specific, maybe contained in optional > attributes (name and description) plus additional info in vendor documents. > > Then the user is assumed to clearly understand the implication of the resource > allocation under a given type, when creating a new mdev under this type. > > If this assumption holds true, the aggregated attribute simply provides an > extension in the same direction of fixed-style types but allowing for more > flexible linearly-increasing resource allocation. e.g. when using aggregate=2, > it means creating a instance with resourceX=2M, resourceY=2N, ... under > the specified type. Along this direction I didn't see the need of well-defined > vendor specific attributes here. When those are actually required, I suppose > the dynamic style would better fit. Or if the vendor driver thinks implementing > such aggregate feature will confuse its type definition, it's optional to not > doing so anyway. Yep, though I don't think we can even define that aggregate=2 indicates that every resources is doubled, it's going to have vendor specific meaning. Maybe this is what Parav is rejecting, but I don't see an alternative. For example, an mdev vGPU might have high level resources like the number of execution units, graphics memory, display heads, maximum resolution, etc. Aggregation could affect one or all of these. Orchestration tools already need to know the vendor specific type of device they want to create, so it doesn't seem unreasonable that if they use aggregation that they choose a type that aggregates the resource(s) they need, but that aggregation is going to be specific to the type. Potentially as we think about adding "defined" sysfs attributes for devices we could start with $SYSFS_DEV_PATH/mdev/aggregation/type, where value written to type is a vendor specific aggregation of that mdev type. This allows us the option that we might someday agree on specific resources that might be aggregated in a common way (ex. ./aggregation/graphics_memory), but I'm somewhat doubtful those would ever be pursued. Thanks, Alex ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/6] VFIO mdev aggregated resources handling 2019-12-10 19:07 ` Alex Williamson @ 2019-12-10 21:08 ` Parav Pandit 2019-12-10 22:08 ` Alex Williamson 0 siblings, 1 reply; 12+ messages in thread From: Parav Pandit @ 2019-12-10 21:08 UTC (permalink / raw) To: Alex Williamson, Tian, Kevin Cc: Zhenyu Wang, kvm@vger.kernel.org, kwankhede@nvidia.com, cohuck@redhat.com, Jiri Pirko, netdev@vger.kernel.org, Jason Wang, Michael S. Tsirkin On 12/10/2019 1:07 PM, Alex Williamson wrote: > On Tue, 10 Dec 2019 03:33:23 +0000 > "Tian, Kevin" <kevin.tian@intel.com> wrote: > >>> From: Parav Pandit <parav@mellanox.com> >>> Sent: Saturday, December 7, 2019 1:34 AM >>> >>> On 12/6/2019 2:03 AM, Zhenyu Wang wrote: >>>> On 2019.12.05 18:59:36 +0000, Parav Pandit wrote: >>>>>>> >>>>>>>> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> >>> On >>>>>>>>>> Behalf Of Zhenyu Wang >>>>>>>>>> Sent: Thursday, October 24, 2019 12:08 AM >>>>>>>>>> To: kvm@vger.kernel.org >>>>>>>>>> Cc: alex.williamson@redhat.com; kwankhede@nvidia.com; >>>>>>>>>> kevin.tian@intel.com; cohuck@redhat.com >>>>>>>>>> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> This is a refresh for previous send of this series. I got >>>>>>>>>> impression that some SIOV drivers would still deploy their own >>>>>>>>>> create and config method so stopped effort on this. But seems >>>>>>>>>> this would still be useful for some other SIOV driver which may >>>>>>>>>> simply want capability to aggregate resources. So here's refreshed >>>>>> series. >>>>>>>>>> >>>>>>>>>> Current mdev device create interface depends on fixed mdev type, >>>>>>>>>> which get uuid from user to create instance of mdev device. If >>>>>>>>>> user wants to use customized number of resource for mdev device, >>>>>>>>>> then only can create new >>>>>>>>> Can you please give an example of 'resource'? >>>>>>>>> When I grep [1], [2] and [3], I couldn't find anything related to ' >>>>>> aggregate'. >>>>>>>> >>>>>>>> The resource is vendor device specific, in SIOV spec there's ADI >>>>>>>> (Assignable Device Interface) definition which could be e.g queue >>>>>>>> for net device, context for gpu, etc. I just named this interface as >>>>>> 'aggregate' >>>>>>>> for aggregation purpose, it's not used in spec doc. >>>>>>>> >>>>>>> >>>>>>> Some 'unknown/undefined' vendor specific resource just doesn't work. >>>>>>> Orchestration tool doesn't know which resource and what/how to >>> configure >>>>>> for which vendor. >>>>>>> It has to be well defined. >>>>>>> >>>>>>> You can also find such discussion in recent lgpu DRM cgroup patches >>> series >>>>>> v4. >>>>>>> >>>>>>> Exposing networking resource configuration in non-net namespace >>> aware >>>>>> mdev sysfs at PCI device level is no-go. >>>>>>> Adding per file NET_ADMIN or other checks is not the approach we >>> follow in >>>>>> kernel. >>>>>>> >>>>>>> devlink has been a subsystem though under net, that has very rich >>> interface >>>>>> for syscaller, device health, resource management and many more. >>>>>>> Even though it is used by net driver today, its written for generic device >>>>>> management at bus/device level. >>>>>>> >>>>>>> Yuval has posted patches to manage PCI sub-devices [1] and updated >>> version >>>>>> will be posted soon which addresses comments. > > Always good to see tools that intend to manage arbitrary devices posted > only to the netdev list :-\ > >>>>>>> >>>>>>> For any device slice resource management of mdev, sub-function etc, >>> we >>>>>> should be using single kernel interface as devlink [2], [3]. > > This seems impractical, mdevs and SR-IOV are both enumerated, > inspected, created, and removed in sysfs, Both enumerated via sysfs, but VFs are not configured via sysfs. > where do we define what > features are manipulated vis sysfs versus devlink? VFs are configured via well defined, vendor neutral tool iproute2/ip link set <pf_netdev> vf <vf_index> <attribute> <value> This falls short lately for few cases and non-networking or generic VF property configuration, are proposed to be handled by similar 'VF' object using devlink, because they are either pure 'pci vf' property or more device class type VF property such as MAC address or number_of_queues etc. More advance mode of networking VFs, are controlled using netdev representors again in vendor neutral way for last few years. It may be fair to say that mdev subsystem wants to invent new sysfs files for configuration. mdevs, by > definition, are vendor defined "chunks" of a thing. We allow vendor > drivers to define different types, representing different > configurations of these chunks. Often these different types are > incrementally bigger or smaller chunks of these things, but defining > what bigger and smaller means generically across vendors is an > impossible task. Orchestration tools already need to know vendor > specific information in terms of what type of mdev device they want to > create and make use of. The aggregation seems to simply augment that > vendor information, ie. 'type' and 'scale' are separate rather than > combined only behind just 'type'. > >>>>>>> >>>>>>> [1] >>>>>>> https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email- >>> yuval >>>>>>> av@mellanox.com/ [2] >>>>>>> http://man7.org/linux/man-pages/man8/devlink-dev.8.html >>>>>>> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html >>>>>>> >>>>>>> Most modern device configuration that I am aware of is usually done >>> via well >>>>>> defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and >>> more) or >>>>>> via netlink commands (net, devlink, rdma and more) not via sysfs. >>>>>>> >>>>>> >>>>>> Current vfio/mdev configuration is via documented sysfs ABI instead of >>> other >>>>>> ways. So this adhere to that way to introduce more configurable method >>> on >>>>>> mdev device for standard, it's optional and not actually vendor specific >>> e.g vfio- >>>>>> ap. >>>>>> >>>>> Some unknown/undefined resource as 'aggregate' is just not an ABI. >>>>> It has to be well defined, as 'hardware_address', 'num_netdev_sqs' or >>> something similar appropriate to that mdev device class. >>>>> If user wants to set a parameter for a mdev regardless of vendor, they >>> must have single way to do so. > > Aggregation augments type, which is by definition vendor specific. > >>>> >>>> The idea is not specific for some device class, but for each mdev >>>> type's resource, and be optional for each vendor. If more device class >>>> specific way is preferred, then we might have very different ways for >>>> different vendors. Better to avoid that, so here means to aggregate >>>> number of mdev type's resources for target instance, instead of defining >>>> kinds of mdev types for those number of resources. >>>> >>> Parameter or attribute certainly can be optional. >>> But the way to aggregate them should not be vendor specific. >>> Look for some excellent existing examples across subsystems, for example >>> how you create aggregated netdev or block device is not depend on vendor >>> or underlying device type. >> >> I'd like to hear Alex's opinion on this. Today VFIO mdev supports two styles >> of "types" imo: fixed resource definition (most cases) and dynamic resource >> definition (vfio-ap). In fixed style, a type has fixed association to a set of >> vendor specific resources (resourceX=M, resourceY=N, ...). In dynamic case, >> the user is allowed to specify actual resource X/Y/... backing the mdev >> instance post its creation. In either case, the way to identify such association >> or configurable knobs is vendor specific, maybe contained in optional >> attributes (name and description) plus additional info in vendor documents. >> >> Then the user is assumed to clearly understand the implication of the resource >> allocation under a given type, when creating a new mdev under this type. >> >> If this assumption holds true, the aggregated attribute simply provides an >> extension in the same direction of fixed-style types but allowing for more >> flexible linearly-increasing resource allocation. e.g. when using aggregate=2, >> it means creating a instance with resourceX=2M, resourceY=2N, ... under >> the specified type. Along this direction I didn't see the need of well-defined >> vendor specific attributes here. When those are actually required, I suppose >> the dynamic style would better fit. Or if the vendor driver thinks implementing >> such aggregate feature will confuse its type definition, it's optional to not >> doing so anyway. > > Yep, though I don't think we can even define that aggregate=2 indicates > that every resources is doubled, it's going to have vendor specific > meaning. Maybe this is what Parav is rejecting, but I don't see an > alternative. For example, an mdev vGPU might have high level resources > like the number of execution units, graphics memory, display heads, > maximum resolution, etc. Aggregation could affect one or all of these. > Orchestration tools already need to know the vendor specific type of > device they want to create, so it doesn't seem unreasonable that if > they use aggregation that they choose a type that aggregates the > resource(s) they need, but that aggregation is going to be specific to > the type. Potentially as we think about adding "defined" sysfs > attributes for devices we could start with > $SYSFS_DEV_PATH/mdev/aggregation/type, where value written to type is a > vendor specific aggregation of that mdev type. This allows us the > option that we might someday agree on specific resources that might be > aggregated in a common way (ex. ./aggregation/graphics_memory), but I'm > somewhat doubtful those would ever be pursued. Thanks, > My point is, from Zhenyu Wang's example it is certainly incorrect to define mdev sysfs files, as, vendor_foo_mdev.netdev_mac_addr=X vendor_bar_mdev.resource_addr=Y vendor_foo_mdev.netdev_queues=4 vendor_bar_mdev.aggregate=8 Unless this is a miscellaneous (not well defined) parameter of a vendor device. I am 100% sure that consumers of network devices where a PCI PF is sliced into multiple smaller devices, wants to configure these devices in unified way regardless of vendor type. That may not be the case with vGPU mdevs. If Zhenyu Wang proposed to use networking class of mdev device, attributes should have well defined meaning, as it is well known class in linux kernel. mdev should be providing an API to define such mdev config object and all sysfs for such mdev to be created by the mdev core, not by vendor driver. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/6] VFIO mdev aggregated resources handling 2019-12-10 21:08 ` Parav Pandit @ 2019-12-10 22:08 ` Alex Williamson 2019-12-10 22:40 ` Parav Pandit 0 siblings, 1 reply; 12+ messages in thread From: Alex Williamson @ 2019-12-10 22:08 UTC (permalink / raw) To: Parav Pandit Cc: Tian, Kevin, Zhenyu Wang, kvm@vger.kernel.org, kwankhede@nvidia.com, cohuck@redhat.com, Jiri Pirko, netdev@vger.kernel.org, Jason Wang, Michael S. Tsirkin On Tue, 10 Dec 2019 21:08:29 +0000 Parav Pandit <parav@mellanox.com> wrote: > On 12/10/2019 1:07 PM, Alex Williamson wrote: > > On Tue, 10 Dec 2019 03:33:23 +0000 > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > >>> From: Parav Pandit <parav@mellanox.com> > >>> Sent: Saturday, December 7, 2019 1:34 AM > >>> > >>> On 12/6/2019 2:03 AM, Zhenyu Wang wrote: > >>>> On 2019.12.05 18:59:36 +0000, Parav Pandit wrote: > >>>>>>> > >>>>>>>> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote: > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>>> -----Original Message----- > >>>>>>>>>> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> > >>> On > >>>>>>>>>> Behalf Of Zhenyu Wang > >>>>>>>>>> Sent: Thursday, October 24, 2019 12:08 AM > >>>>>>>>>> To: kvm@vger.kernel.org > >>>>>>>>>> Cc: alex.williamson@redhat.com; kwankhede@nvidia.com; > >>>>>>>>>> kevin.tian@intel.com; cohuck@redhat.com > >>>>>>>>>> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling > >>>>>>>>>> > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> This is a refresh for previous send of this series. I got > >>>>>>>>>> impression that some SIOV drivers would still deploy their own > >>>>>>>>>> create and config method so stopped effort on this. But seems > >>>>>>>>>> this would still be useful for some other SIOV driver which may > >>>>>>>>>> simply want capability to aggregate resources. So here's refreshed > >>>>>> series. > >>>>>>>>>> > >>>>>>>>>> Current mdev device create interface depends on fixed mdev type, > >>>>>>>>>> which get uuid from user to create instance of mdev device. If > >>>>>>>>>> user wants to use customized number of resource for mdev device, > >>>>>>>>>> then only can create new > >>>>>>>>> Can you please give an example of 'resource'? > >>>>>>>>> When I grep [1], [2] and [3], I couldn't find anything related to ' > >>>>>> aggregate'. > >>>>>>>> > >>>>>>>> The resource is vendor device specific, in SIOV spec there's ADI > >>>>>>>> (Assignable Device Interface) definition which could be e.g queue > >>>>>>>> for net device, context for gpu, etc. I just named this interface as > >>>>>> 'aggregate' > >>>>>>>> for aggregation purpose, it's not used in spec doc. > >>>>>>>> > >>>>>>> > >>>>>>> Some 'unknown/undefined' vendor specific resource just doesn't work. > >>>>>>> Orchestration tool doesn't know which resource and what/how to > >>> configure > >>>>>> for which vendor. > >>>>>>> It has to be well defined. > >>>>>>> > >>>>>>> You can also find such discussion in recent lgpu DRM cgroup patches > >>> series > >>>>>> v4. > >>>>>>> > >>>>>>> Exposing networking resource configuration in non-net namespace > >>> aware > >>>>>> mdev sysfs at PCI device level is no-go. > >>>>>>> Adding per file NET_ADMIN or other checks is not the approach we > >>> follow in > >>>>>> kernel. > >>>>>>> > >>>>>>> devlink has been a subsystem though under net, that has very rich > >>> interface > >>>>>> for syscaller, device health, resource management and many more. > >>>>>>> Even though it is used by net driver today, its written for generic device > >>>>>> management at bus/device level. > >>>>>>> > >>>>>>> Yuval has posted patches to manage PCI sub-devices [1] and updated > >>> version > >>>>>> will be posted soon which addresses comments. > > > > Always good to see tools that intend to manage arbitrary devices posted > > only to the netdev list :-\ > > > >>>>>>> > >>>>>>> For any device slice resource management of mdev, sub-function etc, > >>> we > >>>>>> should be using single kernel interface as devlink [2], [3]. > > > > This seems impractical, mdevs and SR-IOV are both enumerated, > > inspected, created, and removed in sysfs, > Both enumerated via sysfs, but VFs are not configured via sysfs. > > > where do we define what > > features are manipulated vis sysfs versus devlink? > > VFs are configured via well defined, vendor neutral tool > iproute2/ip link set <pf_netdev> vf <vf_index> <attribute> <value> > > This falls short lately for few cases and non-networking or generic VF > property configuration, are proposed to be handled by similar 'VF' > object using devlink, because they are either pure 'pci vf' property or > more device class type VF property such as MAC address or > number_of_queues etc. > > More advance mode of networking VFs, are controlled using netdev > representors again in vendor neutral way for last few years. > > It may be fair to say that mdev subsystem wants to invent new sysfs > files for configuration. It seems you're trying to apply rules for classes of devices where configuration features are well defined to an environment where we don't even have classes of devices, let alone agreed features. > mdevs, by > > definition, are vendor defined "chunks" of a thing. We allow vendor > > drivers to define different types, representing different > > configurations of these chunks. Often these different types are > > incrementally bigger or smaller chunks of these things, but defining > > what bigger and smaller means generically across vendors is an > > impossible task. Orchestration tools already need to know vendor > > specific information in terms of what type of mdev device they want to > > create and make use of. The aggregation seems to simply augment that > > vendor information, ie. 'type' and 'scale' are separate rather than > > combined only behind just 'type'. > > > >>>>>>> > >>>>>>> [1] > >>>>>>> https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email- > >>> yuval > >>>>>>> av@mellanox.com/ [2] > >>>>>>> http://man7.org/linux/man-pages/man8/devlink-dev.8.html > >>>>>>> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html > >>>>>>> > >>>>>>> Most modern device configuration that I am aware of is usually done > >>> via well > >>>>>> defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and > >>> more) or > >>>>>> via netlink commands (net, devlink, rdma and more) not via sysfs. > >>>>>>> > >>>>>> > >>>>>> Current vfio/mdev configuration is via documented sysfs ABI instead of > >>> other > >>>>>> ways. So this adhere to that way to introduce more configurable method > >>> on > >>>>>> mdev device for standard, it's optional and not actually vendor specific > >>> e.g vfio- > >>>>>> ap. > >>>>>> > >>>>> Some unknown/undefined resource as 'aggregate' is just not an ABI. > >>>>> It has to be well defined, as 'hardware_address', 'num_netdev_sqs' or > >>> something similar appropriate to that mdev device class. > >>>>> If user wants to set a parameter for a mdev regardless of vendor, they > >>> must have single way to do so. > > > > Aggregation augments type, which is by definition vendor specific. > > > >>>> > >>>> The idea is not specific for some device class, but for each mdev > >>>> type's resource, and be optional for each vendor. If more device class > >>>> specific way is preferred, then we might have very different ways for > >>>> different vendors. Better to avoid that, so here means to aggregate > >>>> number of mdev type's resources for target instance, instead of defining > >>>> kinds of mdev types for those number of resources. > >>>> > >>> Parameter or attribute certainly can be optional. > >>> But the way to aggregate them should not be vendor specific. > >>> Look for some excellent existing examples across subsystems, for example > >>> how you create aggregated netdev or block device is not depend on vendor > >>> or underlying device type. > >> > >> I'd like to hear Alex's opinion on this. Today VFIO mdev supports two styles > >> of "types" imo: fixed resource definition (most cases) and dynamic resource > >> definition (vfio-ap). In fixed style, a type has fixed association to a set of > >> vendor specific resources (resourceX=M, resourceY=N, ...). In dynamic case, > >> the user is allowed to specify actual resource X/Y/... backing the mdev > >> instance post its creation. In either case, the way to identify such association > >> or configurable knobs is vendor specific, maybe contained in optional > >> attributes (name and description) plus additional info in vendor documents. > >> > >> Then the user is assumed to clearly understand the implication of the resource > >> allocation under a given type, when creating a new mdev under this type. > >> > >> If this assumption holds true, the aggregated attribute simply provides an > >> extension in the same direction of fixed-style types but allowing for more > >> flexible linearly-increasing resource allocation. e.g. when using aggregate=2, > >> it means creating a instance with resourceX=2M, resourceY=2N, ... under > >> the specified type. Along this direction I didn't see the need of well-defined > >> vendor specific attributes here. When those are actually required, I suppose > >> the dynamic style would better fit. Or if the vendor driver thinks implementing > >> such aggregate feature will confuse its type definition, it's optional to not > >> doing so anyway. > > > > Yep, though I don't think we can even define that aggregate=2 indicates > > that every resources is doubled, it's going to have vendor specific > > meaning. Maybe this is what Parav is rejecting, but I don't see an > > alternative. For example, an mdev vGPU might have high level resources > > like the number of execution units, graphics memory, display heads, > > maximum resolution, etc. Aggregation could affect one or all of these. > > Orchestration tools already need to know the vendor specific type of > > device they want to create, so it doesn't seem unreasonable that if > > they use aggregation that they choose a type that aggregates the > > resource(s) they need, but that aggregation is going to be specific to > > the type. Potentially as we think about adding "defined" sysfs > > attributes for devices we could start with > > $SYSFS_DEV_PATH/mdev/aggregation/type, where value written to type is a > > vendor specific aggregation of that mdev type. This allows us the > > option that we might someday agree on specific resources that might be > > aggregated in a common way (ex. ./aggregation/graphics_memory), but I'm > > somewhat doubtful those would ever be pursued. Thanks, > > > > My point is, from Zhenyu Wang's example it is certainly incorrect to > define mdev sysfs files, as, > > vendor_foo_mdev.netdev_mac_addr=X > vendor_bar_mdev.resource_addr=Y > > vendor_foo_mdev.netdev_queues=4 > vendor_bar_mdev.aggregate=8 > > Unless this is a miscellaneous (not well defined) parameter of a vendor > device. I certainly think it's wrong to associate a "netdev" property with something that the kernel only knows as an opaque device. But that's really the issue, mdevs are opaque devices as far as the host kernel is concerned. Since we seem to have landed on mdev being used exclusively for vfio, the only thing we really know about an mdev generically is which vfio bus driver API the device uses. Any association of an mdev to a GPU, NIC, HBA, or other accelerator or I/O interface is strictly known by the user/admin's interpretation of the vendor specific type. > I am 100% sure that consumers of network devices where a PCI PF is > sliced into multiple smaller devices, wants to configure these devices > in unified way regardless of vendor type. > That may not be the case with vGPU mdevs. I don't know about devlink, but iirc the ip command operates on a netdev PF in order to, for example, assign MAC addresses to the VFs. We have no guarantee with mdevs that there's a parent netdev device for such an interface. The parent device might be an FPGA where one type it's able to expose looks like a NIC. How do you envision devlink/ip interacting with something like that? Using common tools to set networking properties on a device that the host kernel fundamentally does not know is a networking device is... difficult. > If Zhenyu Wang proposed to use networking class of mdev device, > attributes should have well defined meaning, as it is well known class > in linux kernel. > mdev should be providing an API to define such mdev config object and > all sysfs for such mdev to be created by the mdev core, not by vendor > driver. But of course there is no "networking class of mdev device". Instead there are mdev devices that might be NICs, but that's for the admin and user to care about. If you have an interface in mind for how devlink is going to learn about mdev device and set properties, please share. It's not clear to me if we need to design something to be compatible with devlink or devlink needs to learn how to do certain things on mdev devices (does devlink want to become a vfio userspace device driver in order to probe the type of an mdev device? That'll be hard given some of the backdoor userspace dependencies of existing vGPU mdevs). Thanks, Alex ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/6] VFIO mdev aggregated resources handling 2019-12-10 22:08 ` Alex Williamson @ 2019-12-10 22:40 ` Parav Pandit 0 siblings, 0 replies; 12+ messages in thread From: Parav Pandit @ 2019-12-10 22:40 UTC (permalink / raw) To: Alex Williamson Cc: Tian, Kevin, Zhenyu Wang, kvm@vger.kernel.org, kwankhede@nvidia.com, cohuck@redhat.com, Jiri Pirko, netdev@vger.kernel.org, Jason Wang, Michael S. Tsirkin On 12/10/2019 4:08 PM, Alex Williamson wrote: > On Tue, 10 Dec 2019 21:08:29 +0000 > Parav Pandit <parav@mellanox.com> wrote: > >> On 12/10/2019 1:07 PM, Alex Williamson wrote: >>> On Tue, 10 Dec 2019 03:33:23 +0000 >>> "Tian, Kevin" <kevin.tian@intel.com> wrote: >>> >>>>> From: Parav Pandit <parav@mellanox.com> >>>>> Sent: Saturday, December 7, 2019 1:34 AM >>>>> >>>>> On 12/6/2019 2:03 AM, Zhenyu Wang wrote: >>>>>> On 2019.12.05 18:59:36 +0000, Parav Pandit wrote: >>>>>>>>> >>>>>>>>>> On 2019.11.07 20:37:49 +0000, Parav Pandit wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: kvm-owner@vger.kernel.org <kvm-owner@vger.kernel.org> >>>>> On >>>>>>>>>>>> Behalf Of Zhenyu Wang >>>>>>>>>>>> Sent: Thursday, October 24, 2019 12:08 AM >>>>>>>>>>>> To: kvm@vger.kernel.org >>>>>>>>>>>> Cc: alex.williamson@redhat.com; kwankhede@nvidia.com; >>>>>>>>>>>> kevin.tian@intel.com; cohuck@redhat.com >>>>>>>>>>>> Subject: [PATCH 0/6] VFIO mdev aggregated resources handling >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> This is a refresh for previous send of this series. I got >>>>>>>>>>>> impression that some SIOV drivers would still deploy their own >>>>>>>>>>>> create and config method so stopped effort on this. But seems >>>>>>>>>>>> this would still be useful for some other SIOV driver which may >>>>>>>>>>>> simply want capability to aggregate resources. So here's refreshed >>>>>>>> series. >>>>>>>>>>>> >>>>>>>>>>>> Current mdev device create interface depends on fixed mdev type, >>>>>>>>>>>> which get uuid from user to create instance of mdev device. If >>>>>>>>>>>> user wants to use customized number of resource for mdev device, >>>>>>>>>>>> then only can create new >>>>>>>>>>> Can you please give an example of 'resource'? >>>>>>>>>>> When I grep [1], [2] and [3], I couldn't find anything related to ' >>>>>>>> aggregate'. >>>>>>>>>> >>>>>>>>>> The resource is vendor device specific, in SIOV spec there's ADI >>>>>>>>>> (Assignable Device Interface) definition which could be e.g queue >>>>>>>>>> for net device, context for gpu, etc. I just named this interface as >>>>>>>> 'aggregate' >>>>>>>>>> for aggregation purpose, it's not used in spec doc. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Some 'unknown/undefined' vendor specific resource just doesn't work. >>>>>>>>> Orchestration tool doesn't know which resource and what/how to >>>>> configure >>>>>>>> for which vendor. >>>>>>>>> It has to be well defined. >>>>>>>>> >>>>>>>>> You can also find such discussion in recent lgpu DRM cgroup patches >>>>> series >>>>>>>> v4. >>>>>>>>> >>>>>>>>> Exposing networking resource configuration in non-net namespace >>>>> aware >>>>>>>> mdev sysfs at PCI device level is no-go. >>>>>>>>> Adding per file NET_ADMIN or other checks is not the approach we >>>>> follow in >>>>>>>> kernel. >>>>>>>>> >>>>>>>>> devlink has been a subsystem though under net, that has very rich >>>>> interface >>>>>>>> for syscaller, device health, resource management and many more. >>>>>>>>> Even though it is used by net driver today, its written for generic device >>>>>>>> management at bus/device level. >>>>>>>>> >>>>>>>>> Yuval has posted patches to manage PCI sub-devices [1] and updated >>>>> version >>>>>>>> will be posted soon which addresses comments. >>> >>> Always good to see tools that intend to manage arbitrary devices posted >>> only to the netdev list :-\ >>> >>>>>>>>> >>>>>>>>> For any device slice resource management of mdev, sub-function etc, >>>>> we >>>>>>>> should be using single kernel interface as devlink [2], [3]. >>> >>> This seems impractical, mdevs and SR-IOV are both enumerated, >>> inspected, created, and removed in sysfs, >> Both enumerated via sysfs, but VFs are not configured via sysfs. >> >>> where do we define what >>> features are manipulated vis sysfs versus devlink? >> >> VFs are configured via well defined, vendor neutral tool >> iproute2/ip link set <pf_netdev> vf <vf_index> <attribute> <value> >> >> This falls short lately for few cases and non-networking or generic VF >> property configuration, are proposed to be handled by similar 'VF' >> object using devlink, because they are either pure 'pci vf' property or >> more device class type VF property such as MAC address or >> number_of_queues etc. >> >> More advance mode of networking VFs, are controlled using netdev >> representors again in vendor neutral way for last few years. >> >> It may be fair to say that mdev subsystem wants to invent new sysfs >> files for configuration. > > It seems you're trying to apply rules for classes of devices where > configuration features are well defined to an environment where we > don't even have classes of devices, let alone agreed features. > >> mdevs, by >>> definition, are vendor defined "chunks" of a thing. We allow vendor >>> drivers to define different types, representing different >>> configurations of these chunks. Often these different types are >>> incrementally bigger or smaller chunks of these things, but defining >>> what bigger and smaller means generically across vendors is an >>> impossible task. Orchestration tools already need to know vendor >>> specific information in terms of what type of mdev device they want to >>> create and make use of. The aggregation seems to simply augment that >>> vendor information, ie. 'type' and 'scale' are separate rather than >>> combined only behind just 'type'. >>> >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> https://lore.kernel.org/netdev/1573229926-30040-1-git-send-email- >>>>> yuval >>>>>>>>> av@mellanox.com/ [2] >>>>>>>>> http://man7.org/linux/man-pages/man8/devlink-dev.8.html >>>>>>>>> [3] http://man7.org/linux/man-pages/man8/devlink-resource.8.html >>>>>>>>> >>>>>>>>> Most modern device configuration that I am aware of is usually done >>>>> via well >>>>>>>> defined ioctl() of the subsystem (vhost, virtio, vfio, rdma, nvme and >>>>> more) or >>>>>>>> via netlink commands (net, devlink, rdma and more) not via sysfs. >>>>>>>>> >>>>>>>> >>>>>>>> Current vfio/mdev configuration is via documented sysfs ABI instead of >>>>> other >>>>>>>> ways. So this adhere to that way to introduce more configurable method >>>>> on >>>>>>>> mdev device for standard, it's optional and not actually vendor specific >>>>> e.g vfio- >>>>>>>> ap. >>>>>>>> >>>>>>> Some unknown/undefined resource as 'aggregate' is just not an ABI. >>>>>>> It has to be well defined, as 'hardware_address', 'num_netdev_sqs' or >>>>> something similar appropriate to that mdev device class. >>>>>>> If user wants to set a parameter for a mdev regardless of vendor, they >>>>> must have single way to do so. >>> >>> Aggregation augments type, which is by definition vendor specific. >>> >>>>>> >>>>>> The idea is not specific for some device class, but for each mdev >>>>>> type's resource, and be optional for each vendor. If more device class >>>>>> specific way is preferred, then we might have very different ways for >>>>>> different vendors. Better to avoid that, so here means to aggregate >>>>>> number of mdev type's resources for target instance, instead of defining >>>>>> kinds of mdev types for those number of resources. >>>>>> >>>>> Parameter or attribute certainly can be optional. >>>>> But the way to aggregate them should not be vendor specific. >>>>> Look for some excellent existing examples across subsystems, for example >>>>> how you create aggregated netdev or block device is not depend on vendor >>>>> or underlying device type. >>>> >>>> I'd like to hear Alex's opinion on this. Today VFIO mdev supports two styles >>>> of "types" imo: fixed resource definition (most cases) and dynamic resource >>>> definition (vfio-ap). In fixed style, a type has fixed association to a set of >>>> vendor specific resources (resourceX=M, resourceY=N, ...). In dynamic case, >>>> the user is allowed to specify actual resource X/Y/... backing the mdev >>>> instance post its creation. In either case, the way to identify such association >>>> or configurable knobs is vendor specific, maybe contained in optional >>>> attributes (name and description) plus additional info in vendor documents. >>>> >>>> Then the user is assumed to clearly understand the implication of the resource >>>> allocation under a given type, when creating a new mdev under this type. >>>> >>>> If this assumption holds true, the aggregated attribute simply provides an >>>> extension in the same direction of fixed-style types but allowing for more >>>> flexible linearly-increasing resource allocation. e.g. when using aggregate=2, >>>> it means creating a instance with resourceX=2M, resourceY=2N, ... under >>>> the specified type. Along this direction I didn't see the need of well-defined >>>> vendor specific attributes here. When those are actually required, I suppose >>>> the dynamic style would better fit. Or if the vendor driver thinks implementing >>>> such aggregate feature will confuse its type definition, it's optional to not >>>> doing so anyway. >>> >>> Yep, though I don't think we can even define that aggregate=2 indicates >>> that every resources is doubled, it's going to have vendor specific >>> meaning. Maybe this is what Parav is rejecting, but I don't see an >>> alternative. For example, an mdev vGPU might have high level resources >>> like the number of execution units, graphics memory, display heads, >>> maximum resolution, etc. Aggregation could affect one or all of these. >>> Orchestration tools already need to know the vendor specific type of >>> device they want to create, so it doesn't seem unreasonable that if >>> they use aggregation that they choose a type that aggregates the >>> resource(s) they need, but that aggregation is going to be specific to >>> the type. Potentially as we think about adding "defined" sysfs >>> attributes for devices we could start with >>> $SYSFS_DEV_PATH/mdev/aggregation/type, where value written to type is a >>> vendor specific aggregation of that mdev type. This allows us the >>> option that we might someday agree on specific resources that might be >>> aggregated in a common way (ex. ./aggregation/graphics_memory), but I'm >>> somewhat doubtful those would ever be pursued. Thanks, >>> >> >> My point is, from Zhenyu Wang's example it is certainly incorrect to >> define mdev sysfs files, as, >> >> vendor_foo_mdev.netdev_mac_addr=X >> vendor_bar_mdev.resource_addr=Y >> >> vendor_foo_mdev.netdev_queues=4 >> vendor_bar_mdev.aggregate=8 >> >> Unless this is a miscellaneous (not well defined) parameter of a vendor >> device. > > I certainly think it's wrong to associate a "netdev" property with > something that the kernel only knows as an opaque device. But that's > really the issue, mdevs are opaque devices as far as the host kernel is > concerned. Since we seem to have landed on mdev being used exclusively > for vfio, the only thing we really know about an mdev generically is > which vfio bus driver API the device uses. Any association of an mdev > to a GPU, NIC, HBA, or other accelerator or I/O interface is strictly > known by the user/admin's interpretation of the vendor specific type. > >> I am 100% sure that consumers of network devices where a PCI PF is >> sliced into multiple smaller devices, wants to configure these devices >> in unified way regardless of vendor type. >> That may not be the case with vGPU mdevs. > > I don't know about devlink, but iirc the ip command operates on a > netdev PF in order to, for example, assign MAC addresses to the VFs. > We have no guarantee with mdevs that there's a parent netdev device for > such an interface. Right. ip link works on netdev. But devlink works on devlink instance such as bus/device. Here is an example from one system $ devlink dev show pci/0000:06:00.0 pci/0000:06:00.1 Here two devlink instance for a PCI device is registered and this devlink device has params, ports, health monitoring, register dumps and lot more. The parent device might be an FPGA where one type > it's able to expose looks like a NIC. How do you envision devlink/ip > interacting with something like that? Using common tools to set > networking properties on a device that the host kernel fundamentally > does not know is a networking device is... difficult. > If it is exposing FPGA NIC and it needs to be configured, as individual mdev devices, my series of sub-function is perfect example of it, where lifecycle of mdev is done through the mdev subsystem, and all params configured using devlink. Series was NACKed for different reason which anyway still holds true regardless of this discussion. >> If Zhenyu Wang proposed to use networking class of mdev device, >> attributes should have well defined meaning, as it is well known class >> in linux kernel. >> mdev should be providing an API to define such mdev config object and >> all sysfs for such mdev to be created by the mdev core, not by vendor >> driver. > > But of course there is no "networking class of mdev device". Instead > there are mdev devices that might be NICs, Such object should be created as pre-patch if this is networking class mdev and configure it using such method. Things will be lot more clear. > but that's for the admin and > user to care about. Admin and user to care about but kernel is the one to provide config interface so program 'net class mdev device' using one command. > If you have an interface in mind for how devlink > is going to learn about mdev device and set properties, please share. I shared the working patches of mdev nics as mellanox sub-functions. It is good starting point. Since input was to use devlink interface for sub-function/mediated/slice NICs, I have revised the RFC to do so, instead of mdev way. I am happy to share once we finish internal review. I wish I can share before Christmas holidays. > It's not clear to me if we need to design something to be compatible > with devlink or devlink needs to learn how to do certain things on mdev > devices (does devlink want to become a vfio userspace device driver in > order to probe the type of an mdev device? That'll be hard given some > of the backdoor userspace dependencies of existing vGPU mdevs). Thanks, Lets assume for a moment that devlink may not be the tool for mdev device configuration. Even in such case, my ask is to clearly define config params via ioctl() or sysfs as exactly what that param is. Exposing FPGA NIC net device mac address and other things in sysfs which is not protected by net namespace is security bug. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2019-12-10 22:40 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20191024050829.4517-1-zhenyuw@linux.intel.com>
[not found] ` <AM0PR05MB4866CA9B70A8BEC1868AF8C8D1780@AM0PR05MB4866.eurprd05.prod.outlook.com>
[not found] ` <20191108081925.GH4196@zhen-hp.sh.intel.com>
2019-12-04 17:36 ` [PATCH 0/6] VFIO mdev aggregated resources handling Parav Pandit
2019-12-05 6:06 ` Zhenyu Wang
2019-12-05 6:40 ` Jason Wang
2019-12-05 19:02 ` Parav Pandit
2019-12-05 18:59 ` Parav Pandit
2019-12-06 8:03 ` Zhenyu Wang
2019-12-06 17:33 ` Parav Pandit
2019-12-10 3:33 ` Tian, Kevin
2019-12-10 19:07 ` Alex Williamson
2019-12-10 21:08 ` Parav Pandit
2019-12-10 22:08 ` Alex Williamson
2019-12-10 22:40 ` Parav Pandit
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).