From: Alex Williamson <alex.williamson@redhat.com>
To: "Liu, Yi L" <yi.l.liu@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
"eric.auger@redhat.com" <eric.auger@redhat.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
"joro@8bytes.org" <joro@8bytes.org>,
"robin.murphy@arm.com" <robin.murphy@arm.com>,
"cohuck@redhat.com" <cohuck@redhat.com>,
"nicolinc@nvidia.com" <nicolinc@nvidia.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"mjrosato@linux.ibm.com" <mjrosato@linux.ibm.com>,
"chao.p.peng@linux.intel.com" <chao.p.peng@linux.intel.com>,
"yi.y.sun@linux.intel.com" <yi.y.sun@linux.intel.com>,
"peterx@redhat.com" <peterx@redhat.com>,
"jasowang@redhat.com" <jasowang@redhat.com>,
"shameerali.kolothum.thodi@huawei.com"
<shameerali.kolothum.thodi@huawei.com>,
"lulu@redhat.com" <lulu@redhat.com>,
"suravee.suthikulpanit@amd.com" <suravee.suthikulpanit@amd.com>,
"intel-gvt-dev@lists.freedesktop.org"
<intel-gvt-dev@lists.freedesktop.org>,
"intel-gfx@lists.freedesktop.org"
<intel-gfx@lists.freedesktop.org>,
"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
"Hao, Xudong" <xudong.hao@intel.com>,
"Zhao, Yan Y" <yan.y.zhao@intel.com>,
"Xu, Terrence" <terrence.xu@intel.com>,
"Jiang, Yanting" <yanting.jiang@intel.com>,
"Duan, Zhenzhong" <zhenzhong.duan@intel.com>
Subject: Re: [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
Date: Wed, 12 Apr 2023 10:54:37 -0600 [thread overview]
Message-ID: <20230412105437.13897845.alex.williamson@redhat.com> (raw)
In-Reply-To: <DS0PR11MB7529E75A0868B338F5AFD014C39B9@DS0PR11MB7529.namprd11.prod.outlook.com>
On Wed, 12 Apr 2023 10:09:32 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, April 12, 2023 8:01 AM
> >
> > On Tue, Apr 11, 2023 at 03:58:27PM -0600, Alex Williamson wrote:
> >
> > > > Management tools already need to understand dev_set if they want to
> > > > offer reliable reset support to the VMs. Same as today.
> > >
> > > I don't think that's true. Our primary hot-reset use case is GPUs and
> > > subordinate functions, where the isolation and reset scope are often
> > > sufficiently similar to make hot-reset possible, regardless whether
> > > all the functions are assigned to a VM. I don't think you'll find any
> > > management tools that takes reset scope into account otherwise.
> >
> > When I think of "reliable reset support" I think of the management
> > tool offering a checkbox that says "ensure PCI function reset
> > availability" and if checked it will not launch the VM without a
> > working reset.
> >
> > If the user configures a set of VFIO devices and then hopes they get
> > working reset, that is fine, but doesn't require any reporting of
> > reset groups, or iommu groups to the management layer to work.
> >
> > > > > As I understand the proposal, QEMU now gets to attempt to
> > > > > claim ownership of the dev_set, so it opportunistically extends its
> > > > > ownership and may block other users from the affected devices.
> > > >
> > > > We can decide the policy for the kernel to accept a claim. I suggested
> > > > below "same as today" - it must hold all the groups within the
> > > > iommufd_ctx.
> > >
> > > It must hold all the groups [that the user doesn't know about because
> > > it's not a formal part of the cdev API] within the iommufd_ctx?
> >
> > You keep going back to this, but I maintain userspace doesn't
> > care. qemu is given a list of VFIO devices to use, all it wants to
> > know is if it is allowed to use reset or not. Why should it need to
> > know groups and group_ids to get that binary signal out of the kernel?
> >
> > > > The simplest option for no-iommu is to require it to pass in every
> > > > device fd to the reset ioctl.
> > >
> > > Which ironically is exactly how it ends up working today, each no-iommu
> > > device has a fake IOMMU group, so every affected device (group) needs
> > > to be provided.
> >
> > Sure, that is probably the way forward for no-iommu. Not that anyone
> > uses it..
> >
> > The kicker is we don't force the user to generate a de-duplicated list
> > of devices FDs, one per group, just because.
> >
> > > > I want to re-focus on the basics of what cdev is supposed to be doing,
> > > > because several of the idea you suggested seem against this direction:
> > > >
> > > > - cdev does not have, and cannot rely on vfio_groups. We enforce this
> > > > by compiling all the vfio_group infrastructure out. iommu_groups
> > > > continue to exist.
> > > >
> > > > So converting a cdev to a vfio_group is not an allowed operation.
> > >
> > > My only statements in this respect were towards the notion that IOMMU
> > > groups continue to exist. I'm well aware of the desire to deprecate
> > > and remove vfio groups.
> >
> > Yes
> >
> > > > - no-iommu should not have iommu_groups. We enforce this by compiling
> > > > out all the no-iommu vfio_group infrastructure.
> > >
> > > This is not logically inferred from the above if IOMMU groups continue
> > > to exist and continue to be a basis for describing DMA ownership as
> > > well as "reset groups"
> >
> > It is not ment to flow out of the above, it is a seperate statement. I
> > want the iommu_group mechanism to stop being abused outside the iommu
> > core code. The only thing that should be creating groups is an
> > attached iommu driver operating under ops->device_group().
> >
> > VFIO needed this to support mdev and no-iommu. We already have mdev
> > free of iommu_groups, I would like no-iommu to also be free of it too,
> > we are very close.
> >
> > That would leave POWER as the only abuser of the
> > iommu_group_add_device() API, and it is only doing it because it
> > hasn't got a proper iommu driver implementation yet. It turns out
> > their abuse is mislocked and maybe racy to boot :(
> >
> > > > - cdev APIs should ideally not require the user to know the group_id,
> > > > we should try hard to design APIs to avoid this.
> > >
> > > This is a nuance, group_id vs group, where it's been previously
> > > discussed that users will need to continue to know the boundaries of a
> > > group for the purpose of DMA isolation and potentially IOAS
> > > independence should cdev/iommufd choose to tackle those topics.
> >
> > Yes, group_id is a value we have no specific use for and would require
> > userspace to keep seperate track of. I'd prefer to rely on dev_id as
> > much as possible instead.
> >
> > > What is the actual proposal here?
> >
> > I don't know anymore, you don't seem to like this direction either...
> >
> > > You've said that hot-reset works if the iommufd_ctx has
> > > representation from each affected group, the INFO ioctl remains as
> > > it is, which suggests that it's reporting group ID and BDF, yet only
> > > sysfs tells the user the relation between a vfio cdev and a group
> > > and we're trying to enable a pass-by-fd model for cdev where the
> > > user has no reference to a sysfs node for the device. Show me how
> > > these pieces fit together.
> >
> > I prefer the version where INFO2 returns the dev_id, but info can work
> > if we do the BDF cap like you suggested to Yi
> >
> > > OTOH, if we say IOMMU groups continue to exist [agreed], every vfio
> > > device has an IOMMU group
> >
> > I don't desire every VFIO device to have an iommu_group. I want VFIO
> > devices with real IOMMU drivers to have an iommu_group. mdev and
> > no-iommu should not. I don't want to add them back into the design
> > just so INFO has a value to return.
> >
> > I'd rather give no-iommu a dummy dev_id in iommufdctx then give it an
> > iommu_group...
> >
> > I see this problem as a few basic requirements from a qemu-like
> > application:
> >
> > 1) Does the configuration I was given support reset right now?
> > 2) Will the configuration I was given support reset for the duration
> > of my execution?
> > 3) What groups of the devices I already have open does the reset
> > effect?
> > 4) For debugging, report to the user the full list of devices in the
> > reset group, in a way that relates back to sysfs.
> > 5) Away to trigger a reset on a group of devices
> >
> > #1/#2 is the API I suggested here. Ask the kernel if the current
> > configuration works, and ask it to keep it working.
> >
> > #3 is either INFO and a CAP for BDF or INFO2 reporting dev_id
> >
> > #4 is either INFO and print the BDFs or INFO2 reporting the struct
> > vfio_device IDR # (eg /sys/class/vfio/vfioXXX/).
>
> I hope we can have a clear statement on the _INFO or INFO2 usage.
> Today, per QEMU's implementation, the output of _INFO is used to:
>
> 1) do a self-check to see if all the affected groups are opened by the
> current user before it can invoke hot-reset.
> 2) figure out the devices that are already opened by the user. QEMU
> needs to save the state of such devices as the device may already
> been in use. If so, its state should be saved and restored prior/post
> the hot-reset.
>
> Seems like we are relaxing the self-check as it may be done by locking
> the reset group. is it?
I hope not. Locking the reset group suggests the user is able to
extend their ownership. IMO we should not allow that. Thanks,
Alex
next prev parent reply other threads:[~2023-04-12 16:56 UTC|newest]
Thread overview: 142+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-01 14:44 [PATCH v3 00/12] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
2023-04-01 14:44 ` [PATCH v3 01/12] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
2023-04-04 13:59 ` Eric Auger
2023-04-04 14:37 ` Liu, Yi L
2023-04-01 14:44 ` [PATCH v3 02/12] vfio/pci: Only check ownership of opened devices in hot reset Yi Liu
2023-04-04 13:59 ` Eric Auger
2023-04-04 14:37 ` Liu, Yi L
2023-04-04 15:18 ` Eric Auger
2023-04-04 15:29 ` Liu, Yi L
2023-04-04 15:59 ` Eric Auger
2023-04-05 11:41 ` Jason Gunthorpe
2023-04-05 15:14 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 03/12] vfio/pci: Move the existing hot reset logic to be a helper Yi Liu
2023-04-04 13:59 ` Eric Auger
2023-04-04 14:24 ` Liu, Yi L
2023-04-01 14:44 ` [PATCH v3 04/12] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device Yi Liu
2023-04-04 15:28 ` Eric Auger
2023-04-04 21:48 ` Alex Williamson
2023-04-21 7:11 ` Liu, Yi L
2023-04-01 14:44 ` [PATCH v3 05/12] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
2023-04-04 16:54 ` Eric Auger
2023-04-04 20:18 ` Alex Williamson
2023-04-05 7:55 ` Liu, Yi L
2023-04-05 8:01 ` Liu, Yi L
2023-04-05 15:36 ` Alex Williamson
2023-04-05 16:46 ` Jason Gunthorpe
2023-04-05 8:02 ` Eric Auger
2023-04-05 8:09 ` Liu, Yi L
2023-04-01 14:44 ` [PATCH v3 06/12] vfio: Refine vfio file kAPIs for vfio PCI hot reset Yi Liu
2023-04-05 8:27 ` Eric Auger
2023-04-05 9:23 ` Liu, Yi L
2023-04-01 14:44 ` [PATCH v3 07/12] vfio: Accpet device file from vfio PCI hot reset path Yi Liu
2023-04-04 20:31 ` Alex Williamson
2023-04-05 8:07 ` Eric Auger
2023-04-05 8:10 ` Liu, Yi L
2023-04-01 14:44 ` [PATCH v3 08/12] vfio/pci: Renaming for accepting device fd in " Yi Liu
2023-04-04 21:23 ` Alex Williamson
2023-04-05 9:32 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 09/12] vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl Yi Liu
2023-04-05 9:36 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 10/12] vfio: Mark cdev usage in vfio_device Yi Liu
2023-04-05 11:48 ` Eric Auger
2023-04-21 7:06 ` Liu, Yi L
2023-04-01 14:44 ` [PATCH v3 11/12] iommufd: Define IOMMUFD_INVALID_ID in uapi Yi Liu
2023-04-04 21:00 ` Alex Williamson
2023-04-05 9:31 ` Liu, Yi L
2023-04-05 15:13 ` Alex Williamson
2023-04-05 15:17 ` Liu, Yi L
2023-04-05 11:46 ` Eric Auger
2023-04-01 14:44 ` [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO Yi Liu
2023-04-03 9:25 ` Liu, Yi L
2023-04-03 15:01 ` Alex Williamson
2023-04-03 15:22 ` Liu, Yi L
2023-04-03 15:32 ` Alex Williamson
2023-04-03 16:12 ` Jason Gunthorpe
2023-04-07 10:09 ` Liu, Yi L
2023-04-07 12:03 ` Alex Williamson
2023-04-07 13:24 ` Liu, Yi L
2023-04-07 13:51 ` Alex Williamson
2023-04-07 14:04 ` Liu, Yi L
2023-04-07 15:14 ` Alex Williamson
2023-04-07 15:47 ` Liu, Yi L
2023-04-07 21:07 ` Alex Williamson
2023-04-08 5:07 ` Liu, Yi L
2023-04-08 14:20 ` Alex Williamson
2023-04-09 11:58 ` Yi Liu
2023-04-09 13:29 ` Alex Williamson
2023-04-10 8:48 ` Liu, Yi L
2023-04-10 14:41 ` Alex Williamson
2023-04-10 15:18 ` Liu, Yi L
2023-04-10 15:23 ` Alex Williamson
2023-04-11 13:34 ` Jason Gunthorpe
2023-04-11 13:33 ` Jason Gunthorpe
2023-04-11 6:16 ` Liu, Yi L
2023-04-04 22:20 ` Alex Williamson
2023-04-05 12:19 ` Eric Auger
2023-04-05 14:04 ` Liu, Yi L
2023-04-05 16:25 ` Alex Williamson
2023-04-05 16:37 ` Jason Gunthorpe
2023-04-05 16:52 ` Alex Williamson
2023-04-05 17:23 ` Jason Gunthorpe
2023-04-05 18:56 ` Alex Williamson
2023-04-05 19:18 ` Alex Williamson
2023-04-05 19:21 ` Jason Gunthorpe
2023-04-05 19:49 ` Alex Williamson
2023-04-05 23:22 ` Jason Gunthorpe
2023-04-06 10:02 ` Liu, Yi L
2023-04-06 17:53 ` Alex Williamson
2023-04-07 10:09 ` Liu, Yi L
2023-04-11 13:24 ` Jason Gunthorpe
[not found] ` <20230411095417.240bac39.alex.williamson@redhat.com>
[not found] ` <20230411111117.0766ad52.alex.williamson@redhat.com>
2023-04-11 18:40 ` Jason Gunthorpe
2023-04-11 21:58 ` Alex Williamson
2023-04-12 0:01 ` Jason Gunthorpe
2023-04-12 7:27 ` Tian, Kevin
2023-04-12 15:05 ` Jason Gunthorpe
2023-04-12 17:01 ` Alex Williamson
2023-04-13 2:57 ` Tian, Kevin
2023-04-12 10:09 ` Liu, Yi L
2023-04-12 16:54 ` Alex Williamson [this message]
2023-04-12 16:50 ` Alex Williamson
2023-04-12 20:06 ` Jason Gunthorpe
2023-04-13 8:25 ` Tian, Kevin
2023-04-13 11:50 ` Jason Gunthorpe
2023-04-13 14:35 ` Liu, Yi L
2023-04-13 14:41 ` Jason Gunthorpe
2023-04-13 18:07 ` Alex Williamson
2023-04-14 9:11 ` Tian, Kevin
2023-04-14 11:38 ` Liu, Yi L
2023-04-14 17:10 ` Alex Williamson
2023-04-17 4:20 ` Liu, Yi L
2023-04-17 19:01 ` Alex Williamson
2023-04-17 19:31 ` Jason Gunthorpe
2023-04-17 20:06 ` Alex Williamson
2023-04-18 3:24 ` Tian, Kevin
2023-04-18 4:10 ` Alex Williamson
2023-04-18 5:02 ` Tian, Kevin
2023-04-18 12:59 ` Jason Gunthorpe
2023-04-18 16:44 ` Alex Williamson
2023-04-18 10:34 ` Liu, Yi L
2023-04-18 16:49 ` Alex Williamson
2023-04-18 12:57 ` Jason Gunthorpe
2023-04-18 18:39 ` Alex Williamson
2023-04-20 12:10 ` Liu, Yi L
2023-04-20 14:08 ` Alex Williamson
2023-04-21 22:35 ` Jason Gunthorpe
2023-04-23 14:46 ` Liu, Yi L
2023-04-26 7:22 ` Liu, Yi L
2023-04-26 13:20 ` Alex Williamson
2023-04-26 15:08 ` Liu, Yi L
2023-04-14 16:34 ` Alex Williamson
2023-04-17 13:39 ` Jason Gunthorpe
2023-04-18 1:28 ` Tian, Kevin
2023-04-18 10:23 ` Liu, Yi L
2023-04-18 13:02 ` Jason Gunthorpe
2023-04-23 10:28 ` Liu, Yi L
2023-04-24 17:38 ` Jason Gunthorpe
2023-04-17 14:05 ` Jason Gunthorpe
2023-04-12 7:14 ` Tian, Kevin
2023-04-06 6:34 ` Liu, Yi L
2023-04-06 17:07 ` Alex Williamson
2023-04-05 17:58 ` Eric Auger
2023-04-06 5:31 ` Liu, Yi L
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230412105437.13897845.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=chao.p.peng@linux.intel.com \
--cc=cohuck@redhat.com \
--cc=eric.auger@redhat.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-gvt-dev@lists.freedesktop.org \
--cc=jasowang@redhat.com \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=lulu@redhat.com \
--cc=mjrosato@linux.ibm.com \
--cc=nicolinc@nvidia.com \
--cc=peterx@redhat.com \
--cc=robin.murphy@arm.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=suravee.suthikulpanit@amd.com \
--cc=terrence.xu@intel.com \
--cc=xudong.hao@intel.com \
--cc=yan.y.zhao@intel.com \
--cc=yanting.jiang@intel.com \
--cc=yi.l.liu@intel.com \
--cc=yi.y.sun@linux.intel.com \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox