From: Jason Gunthorpe <jgg@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "Liu, Yi L" <yi.l.liu@intel.com>,
"eric.auger@redhat.com" <eric.auger@redhat.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
"joro@8bytes.org" <joro@8bytes.org>,
"robin.murphy@arm.com" <robin.murphy@arm.com>,
"cohuck@redhat.com" <cohuck@redhat.com>,
"nicolinc@nvidia.com" <nicolinc@nvidia.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"mjrosato@linux.ibm.com" <mjrosato@linux.ibm.com>,
"chao.p.peng@linux.intel.com" <chao.p.peng@linux.intel.com>,
"yi.y.sun@linux.intel.com" <yi.y.sun@linux.intel.com>,
"peterx@redhat.com" <peterx@redhat.com>,
"jasowang@redhat.com" <jasowang@redhat.com>,
"shameerali.kolothum.thodi@huawei.com"
<shameerali.kolothum.thodi@huawei.com>,
"lulu@redhat.com" <lulu@redhat.com>,
"suravee.suthikulpanit@amd.com" <suravee.suthikulpanit@amd.com>,
"intel-gvt-dev@lists.freedesktop.org"
<intel-gvt-dev@lists.freedesktop.org>,
"intel-gfx@lists.freedesktop.org"
<intel-gfx@lists.freedesktop.org>,
"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
"Hao, Xudong" <xudong.hao@intel.com>,
"Zhao, Yan Y" <yan.y.zhao@intel.com>,
"Xu, Terrence" <terrence.xu@intel.com>,
"Jiang, Yanting" <yanting.jiang@intel.com>,
"Duan, Zhenzhong" <zhenzhong.duan@intel.com>
Subject: Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
Date: Tue, 11 Apr 2023 15:40:07 -0300 [thread overview]
Message-ID: <ZDWph7g0hcbJHU1B@nvidia.com> (raw)
In-Reply-To: <20230411111117.0766ad52.alex.williamson@redhat.com>
On Tue, Apr 11, 2023 at 11:11:17AM -0600, Alex Williamson wrote:
> [Appears the list got dropped, replying to my previous message to re-add]
Wowo this got mesed up alot, mutt drops the cc when replying for some
reason. I think it is fixed up now
> > Our cdev model says that opening a cdev locks out other cdevs from
> > independent use, eg because of the group sharing. Extending this to
> > include the reset group as well seems consistent.
>
> The DMA ownership model based on the IOMMU group is consistent with
> legacy vfio, but now you're proposing a new ownership model that
> optionally allows a user to extend their ownership, opportunistically
> lock out other users, and wreaking havoc for management utilities that
> also have no insight into dev_sets or userspace driver behavior.
I suggested below that the owership require enough open devices - so
it doesn't "extend ownership opportunistically", and there is no
havoc.
Management tools already need to understand dev_set if they want to
offer reliable reset support to the VMs. Same as today.
> > There is some security concern here, but that goes both ways, a 3rd
> > party should not be able to break an application that needs to use
> > this RESET and had sufficient privileges to assert an ownership.
>
> There are clearly scenarios we have now that could break. For example,
> today if QEMU doesn't own all the IOMMU groups for a mult-function
> device, it can't do a reset, the remaining functions are available for
> other users.
Sure, and we can keep that with this approach.
> As I understand the proposal, QEMU now gets to attempt to
> claim ownership of the dev_set, so it opportunistically extends its
> ownership and may block other users from the affected devices.
We can decide the policy for the kernel to accept a claim. I suggested
below "same as today" - it must hold all the groups within the
iommufd_ctx.
The main point is to make this claiming operation qemu needs to do
clearer and more explicit. I view this as better than trying to guess
if it successfully made the claim by inspecting the _INFO output.
> > I'd say anyone should be able to assert RESET ownership if, like
> > today, the iommufd_ctx has all the groups of the dev_set inside
> > it. Once asserted it becomes safe against all forms of hotplug, and
> > continues to be safe even if some of the devices are closed. eg hot
> > unplugging from the VM doesn't change the availability of RESET.
> >
> > This comes from your ask that qemu know clearly if RESET works, and it
> > doesn't change while qemu is running. This seems stronger and clearer
> > than the current implicit scheme. It also doesn't require usespace to
> > do any calculations with groups or BDFs to figure out of RESET is
> > available, kernel confirms it directly.
>
> As above, clarity and predictability seem lacking in this proposal.
> With the current scheme, the ownership of the affected devices is
> implied if they exist within an owned group, but the strength of that
> ownership is clear.
Same logic holds here
Ownership is claimed same as today by having all groups representated
in the iommufd_ctx. This seems just as clear as today.
> > > seems this proposal essentially extends the ownership model to the
> > > greater of the dev_set or iommu group, apparently neither of which
> > > are explicitly exposed to the user in the cdev API.
> >
> > IIRC the group id can be learned from sysfs before opening the cdev
> > file. Something like /sys/class/vfio/XX/../../iommu_group
>
> And in the passed cdev fd model... ?
IMHO we should try to avoid needing to expose group_id specifically to
userspace. We are missing a way to learn the "same ioas" restriction
in iommufd, and it should provide that directly based on dev_ids.
Otherwise if we really really need group_id then iommufd should
provide an ioctl to get it. Let's find a good reason first
> > We should also have an iommufd ioctl to report the "same ioas"
> > groupings of dev_ids to make it easy on userspace. I haven't checked
> > to see what the current qemu patches are doing with this..
>
> Seems we're ignoring that no-iommu doesn't have a valid iommufd.
no-iommu doesn't and shouldn't have iommu_groups either. It also
doesn't have an IOAS so querying for same-IOAS is not necessary.
The simplest option for no-iommu is to require it to pass in every
device fd to the reset ioctl.
> > > How does a user determine when devices cannot be used independently
> > > in the cdev API?
> >
> > We have this problem right now. The only way to learn the reset group
> > is to call the _INFO ioctl. We could add a sysfs "pci_reset_group"
> > under /sys/class/vfio/XX/ if something needs it earlier.
>
> For all the complaints about complexity, now we're asking management
> tools to not only take into account IOMMU groups, but also reset
> groups, and some inferred knowledge about the application and devices
> to speculate whether reset group ownership is taken by a given
> userspace??
No, we are trying to keep things pretty much the same as today without
resorting to exposing a lot of group related concepts.
The reset group is a clear concept that already exists and isn't
exposed. If we really need to know about it then it should be exposed
on its own, as a seperate discussion from this cdev stuff.
I want to re-focus on the basics of what cdev is supposed to be doing,
because several of the idea you suggested seem against this direction:
- cdev does not have, and cannot rely on vfio_groups. We enforce this
by compiling all the vfio_group infrastructure out. iommu_groups
continue to exist.
So converting a cdev to a vfio_group is not an allowed operation.
- no-iommu should not have iommu_groups. We enforce this by compiling
out all the no-iommu vfio_group infrastructure.
- cdev APIs should ideally not require the user to know the group_id,
we should try hard to design APIs to avoid this.
We have solved every other problem but reset like this, I would like
to get past reset without compromising the above.
Jason
next prev parent reply other threads:[~2023-04-11 18:40 UTC|newest]
Thread overview: 142+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-01 14:44 [PATCH v3 00/12] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
2023-04-01 14:44 ` [PATCH v3 01/12] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
2023-04-04 13:59 ` Eric Auger
2023-04-04 14:37 ` Liu, Yi L
2023-04-01 14:44 ` [PATCH v3 02/12] vfio/pci: Only check ownership of opened devices in hot reset Yi Liu
2023-04-04 13:59 ` Eric Auger
2023-04-04 14:37 ` Liu, Yi L
2023-04-04 15:18 ` Eric Auger
2023-04-04 15:29 ` Liu, Yi L
2023-04-04 15:59 ` Eric Auger
2023-04-05 11:41 ` Jason Gunthorpe
2023-04-05 15:14 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 03/12] vfio/pci: Move the existing hot reset logic to be a helper Yi Liu
2023-04-04 13:59 ` Eric Auger
2023-04-04 14:24 ` Liu, Yi L
2023-04-01 14:44 ` [PATCH v3 04/12] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device Yi Liu
2023-04-04 15:28 ` Eric Auger
2023-04-04 21:48 ` Alex Williamson
2023-04-21 7:11 ` Liu, Yi L
2023-04-01 14:44 ` [PATCH v3 05/12] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
2023-04-04 16:54 ` Eric Auger
2023-04-04 20:18 ` Alex Williamson
2023-04-05 7:55 ` Liu, Yi L
2023-04-05 8:01 ` Liu, Yi L
2023-04-05 15:36 ` Alex Williamson
2023-04-05 16:46 ` Jason Gunthorpe
2023-04-05 8:02 ` Eric Auger
2023-04-05 8:09 ` Liu, Yi L
2023-04-01 14:44 ` [PATCH v3 06/12] vfio: Refine vfio file kAPIs for vfio PCI hot reset Yi Liu
2023-04-05 8:27 ` Eric Auger
2023-04-05 9:23 ` Liu, Yi L
2023-04-01 14:44 ` [PATCH v3 07/12] vfio: Accpet device file from vfio PCI hot reset path Yi Liu
2023-04-04 20:31 ` Alex Williamson
2023-04-05 8:07 ` Eric Auger
2023-04-05 8:10 ` Liu, Yi L
2023-04-01 14:44 ` [PATCH v3 08/12] vfio/pci: Renaming for accepting device fd in " Yi Liu
2023-04-04 21:23 ` Alex Williamson
2023-04-05 9:32 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 09/12] vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl Yi Liu
2023-04-05 9:36 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 10/12] vfio: Mark cdev usage in vfio_device Yi Liu
2023-04-05 11:48 ` Eric Auger
2023-04-21 7:06 ` Liu, Yi L
2023-04-01 14:44 ` [PATCH v3 11/12] iommufd: Define IOMMUFD_INVALID_ID in uapi Yi Liu
2023-04-04 21:00 ` Alex Williamson
2023-04-05 9:31 ` Liu, Yi L
2023-04-05 15:13 ` Alex Williamson
2023-04-05 15:17 ` Liu, Yi L
2023-04-05 11:46 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO Yi Liu
2023-04-03 9:25 ` Liu, Yi L
2023-04-03 15:01 ` Alex Williamson
2023-04-03 15:22 ` Liu, Yi L
2023-04-03 15:32 ` Alex Williamson
2023-04-03 16:12 ` Jason Gunthorpe
2023-04-07 10:09 ` Liu, Yi L
2023-04-07 12:03 ` Alex Williamson
2023-04-07 13:24 ` Liu, Yi L
2023-04-07 13:51 ` Alex Williamson
2023-04-07 14:04 ` Liu, Yi L
2023-04-07 15:14 ` Alex Williamson
2023-04-07 15:47 ` Liu, Yi L
2023-04-07 21:07 ` Alex Williamson
2023-04-08 5:07 ` Liu, Yi L
2023-04-08 14:20 ` Alex Williamson
2023-04-09 11:58 ` Yi Liu
2023-04-09 13:29 ` Alex Williamson
2023-04-10 8:48 ` Liu, Yi L
2023-04-10 14:41 ` Alex Williamson
2023-04-10 15:18 ` Liu, Yi L
2023-04-10 15:23 ` Alex Williamson
2023-04-11 13:34 ` Jason Gunthorpe
2023-04-11 13:33 ` Jason Gunthorpe
2023-04-11 6:16 ` Liu, Yi L
2023-04-04 22:20 ` Alex Williamson
2023-04-05 12:19 ` Eric Auger
2023-04-05 14:04 ` Liu, Yi L
2023-04-05 16:25 ` Alex Williamson
2023-04-05 16:37 ` Jason Gunthorpe
2023-04-05 16:52 ` Alex Williamson
2023-04-05 17:23 ` Jason Gunthorpe
2023-04-05 18:56 ` Alex Williamson
2023-04-05 19:18 ` Alex Williamson
2023-04-05 19:21 ` Jason Gunthorpe
2023-04-05 19:49 ` Alex Williamson
2023-04-05 23:22 ` Jason Gunthorpe
2023-04-06 10:02 ` Liu, Yi L
2023-04-06 17:53 ` Alex Williamson
2023-04-07 10:09 ` Liu, Yi L
2023-04-11 13:24 ` Jason Gunthorpe
[not found] ` <20230411095417.240bac39.alex.williamson@redhat.com>
[not found] ` <20230411111117.0766ad52.alex.williamson@redhat.com>
2023-04-11 18:40 ` Jason Gunthorpe [this message]
2023-04-11 21:58 ` Alex Williamson
2023-04-12 0:01 ` Jason Gunthorpe
2023-04-12 7:27 ` Tian, Kevin
2023-04-12 15:05 ` Jason Gunthorpe
2023-04-12 17:01 ` Alex Williamson
2023-04-13 2:57 ` Tian, Kevin
2023-04-12 10:09 ` Liu, Yi L
2023-04-12 16:54 ` Alex Williamson
2023-04-12 16:50 ` Alex Williamson
2023-04-12 20:06 ` Jason Gunthorpe
2023-04-13 8:25 ` Tian, Kevin
2023-04-13 11:50 ` Jason Gunthorpe
2023-04-13 14:35 ` Liu, Yi L
2023-04-13 14:41 ` Jason Gunthorpe
2023-04-13 18:07 ` Alex Williamson
2023-04-14 9:11 ` Tian, Kevin
2023-04-14 11:38 ` Liu, Yi L
2023-04-14 17:10 ` Alex Williamson
2023-04-17 4:20 ` Liu, Yi L
2023-04-17 19:01 ` Alex Williamson
2023-04-17 19:31 ` Jason Gunthorpe
2023-04-17 20:06 ` Alex Williamson
2023-04-18 3:24 ` Tian, Kevin
2023-04-18 4:10 ` Alex Williamson
2023-04-18 5:02 ` Tian, Kevin
2023-04-18 12:59 ` Jason Gunthorpe
2023-04-18 16:44 ` Alex Williamson
2023-04-18 10:34 ` Liu, Yi L
2023-04-18 16:49 ` Alex Williamson
2023-04-18 12:57 ` Jason Gunthorpe
2023-04-18 18:39 ` Alex Williamson
2023-04-20 12:10 ` Liu, Yi L
2023-04-20 14:08 ` Alex Williamson
2023-04-21 22:35 ` Jason Gunthorpe
2023-04-23 14:46 ` Liu, Yi L
2023-04-26 7:22 ` Liu, Yi L
2023-04-26 13:20 ` Alex Williamson
2023-04-26 15:08 ` Liu, Yi L
2023-04-14 16:34 ` Alex Williamson
2023-04-17 13:39 ` Jason Gunthorpe
2023-04-18 1:28 ` Tian, Kevin
2023-04-18 10:23 ` Liu, Yi L
2023-04-18 13:02 ` Jason Gunthorpe
2023-04-23 10:28 ` Liu, Yi L
2023-04-24 17:38 ` Jason Gunthorpe
2023-04-17 14:05 ` Jason Gunthorpe
2023-04-12 7:14 ` Tian, Kevin
2023-04-06 6:34 ` Liu, Yi L
2023-04-06 17:07 ` Alex Williamson
2023-04-05 17:58 ` Eric Auger
2023-04-06 5:31 ` Liu, Yi L
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZDWph7g0hcbJHU1B@nvidia.com \
--to=jgg@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=chao.p.peng@linux.intel.com \
--cc=cohuck@redhat.com \
--cc=eric.auger@redhat.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-gvt-dev@lists.freedesktop.org \
--cc=jasowang@redhat.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=lulu@redhat.com \
--cc=mjrosato@linux.ibm.com \
--cc=nicolinc@nvidia.com \
--cc=peterx@redhat.com \
--cc=robin.murphy@arm.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=suravee.suthikulpanit@amd.com \
--cc=terrence.xu@intel.com \
--cc=xudong.hao@intel.com \
--cc=yan.y.zhao@intel.com \
--cc=yanting.jiang@intel.com \
--cc=yi.l.liu@intel.com \
--cc=yi.y.sun@linux.intel.com \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox