From: Jason Gunthorpe <jgg@nvidia.com>
To: Steven Sistare <steven.sistare@oracle.com>
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
"Eric Auger" <eric.auger@redhat.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
"Rodel, Jorg" <jroedel@suse.de>,
"Lu Baolu" <baolu.lu@linux.intel.com>,
"Chaitanya Kulkarni" <chaitanyak@nvidia.com>,
"Cornelia Huck" <cohuck@redhat.com>,
"Daniel Jordan" <daniel.m.jordan@oracle.com>,
"David Gibson" <david@gibson.dropbear.id.au>,
"Eric Farman" <farman@linux.ibm.com>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
"Jason Wang" <jasowang@redhat.com>,
"Jean-Philippe Brucker" <jean-philippe@linaro.org>,
"Martins, Joao" <joao.m.martins@oracle.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"Matthew Rosato" <mjrosato@linux.ibm.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Nicolin Chen" <nicolinc@nvidia.com>,
"Niklas Schnelle" <schnelle@linux.ibm.com>,
"Shameerali Kolothum Thodi"
<shameerali.kolothum.thodi@huawei.com>,
"Liu, Yi L" <yi.l.liu@intel.com>,
"Keqian Zhu" <zhukeqian1@huawei.com>,
"libvir-list@redhat.com" <libvir-list@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Laine Stump" <laine@redhat.com>
Subject: Re: [PATCH RFC v2 00/13] IOMMUFD Generic interface
Date: Wed, 12 Oct 2022 09:32:50 -0300 [thread overview]
Message-ID: <Y0az8pNrA9jOA79k@nvidia.com> (raw)
In-Reply-To: <634a8f2f-a025-6c74-7e5f-f3d99448dd4a@oracle.com>
On Tue, Oct 11, 2022 at 04:30:58PM -0400, Steven Sistare wrote:
> On 10/11/2022 8:30 AM, Jason Gunthorpe wrote:
> > On Mon, Oct 10, 2022 at 04:54:50PM -0400, Steven Sistare wrote:
> >>> Do we have a solution to this?
> >>>
> >>> If not I would like to make a patch removing VFIO_DMA_UNMAP_FLAG_VADDR
> >>>
> >>> Aside from the approach to use the FD, another idea is to just use
> >>> fork.
> >>>
> >>> qemu would do something like
> >>>
> >>> .. stop all container ioctl activity ..
> >>> fork()
> >>> ioctl(CHANGE_MM) // switch all maps to this mm
> >>> .. signal parent..
> >>> .. wait parent..
> >>> exit(0)
> >>> .. wait child ..
> >>> exec()
> >>> ioctl(CHANGE_MM) // switch all maps to this mm
> >>> ..signal child..
> >>> waitpid(childpid)
> >>>
> >>> This way the kernel is never left without a page provider for the
> >>> maps, the dummy mm_struct belonging to the fork will serve that role
> >>> for the gap.
> >>>
> >>> And the above is only required if we have mdevs, so we could imagine
> >>> userspace optimizing it away for, eg vfio-pci only cases.
> >>>
> >>> It is not as efficient as using a FD backing, but this is super easy
> >>> to implement in the kernel.
> >>
> >> I propose to avoid deadlock for mediated devices as follows. Currently, an
> >> mdev calling vfio_pin_pages blocks in vfio_wait while VFIO_DMA_UNMAP_FLAG_VADDR
> >> is asserted.
> >>
> >> * In vfio_wait, I will maintain a list of waiters, each list element
> >> consisting of (task, mdev, close_flag=false).
> >>
> >> * When the vfio device descriptor is closed, vfio_device_fops_release
> >> will notify the vfio_iommu driver, which will find the mdev on the waiters
> >> list, set elem->close_flag=true, and call wake_up_process for the task.
> >
> > This alone is not sufficient, the mdev driver can continue to
> > establish new mappings until it's close_device function
> > returns. Killing only existing mappings is racy.
> >
> > I think you are focusing on the one issue I pointed at, as I said, I'm
> > sure there are more ways than just close to abuse this functionality
> > to deadlock the kernel.
> >
> > I continue to prefer we remove it completely and do something more
> > robust. I suggested two options.
>
> It's not racy. New pin requests also land in vfio_wait if any vaddr's have
> been invalidated in any vfio_dma in the iommu. See
> vfio_iommu_type1_pin_pages()
> if (iommu->vaddr_invalid_count)
> vfio_find_dma_valid()
> vfio_wait()
I mean you can't do a one shot wakeup of only existing waiters, and
you can't corrupt the container to wake up waiters for other devices,
so I don't see how this can be made to work safely...
It also doesn't solve any flow that doesn't trigger file close, like a
process thread being stuck on the wait in the kernel. eg because a
trapped mmio triggered an access or something.
So it doesn't seem like a workable direction to me.
> However, I will investigate saving a reference to the file object in
> the vfio_dma (for mappings backed by a file) and using that to
> translate IOVA's.
It is certainly the best flow, but it may be difficult. Eg the memfd
work for KVM to do something similar is quite involved.
> I think that will be easier to use than fork/CHANGE_MM/exec, and may
> even be easier to use than VFIO_DMA_UNMAP_FLAG_VADDR. To be
> continued.
Yes, certainly easier to use, I suggested CHANGE_MM because the kernel
implementation is very easy, I could send you something to test w/
iommufd in a few hours effort probably.
Anyhow, I think this conversation has convinced me there is no way to
fix VFIO_DMA_UNMAP_FLAG_VADDR. I'll send a patch reverting it due to
it being a security bug, basically.
Jason
next prev parent reply other threads:[~2022-10-12 12:32 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-02 19:59 [PATCH RFC v2 00/13] IOMMUFD Generic interface Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 01/13] interval-tree: Add a utility to iterate over spans in an interval tree Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 02/13] iommufd: Overview documentation Jason Gunthorpe
2022-09-07 1:39 ` David Gibson
2022-09-09 18:52 ` Jason Gunthorpe
2022-09-12 10:40 ` David Gibson
2022-09-27 17:33 ` Jason Gunthorpe
2022-09-29 3:47 ` David Gibson
2022-09-02 19:59 ` [PATCH RFC v2 03/13] iommufd: File descriptor, context, kconfig and makefiles Jason Gunthorpe
2022-09-04 8:19 ` Baolu Lu
2022-09-09 18:46 ` Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 04/13] kernel/user: Allow user::locked_vm to be usable for iommufd Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 05/13] iommufd: PFN handling for iopt_pages Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 06/13] iommufd: Algorithms for PFN storage Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 07/13] iommufd: Data structure to provide IOVA to PFN mapping Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 08/13] iommufd: IOCTLs for the io_pagetable Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 09/13] iommufd: Add a HW pagetable object Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 10/13] iommufd: Add kAPI toward external drivers for physical devices Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 11/13] iommufd: Add kAPI toward external drivers for kernel access Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 12/13] iommufd: vfio container FD ioctl compatibility Jason Gunthorpe
2022-09-02 19:59 ` [PATCH RFC v2 13/13] iommufd: Add a selftest Jason Gunthorpe
2022-09-13 1:55 ` [PATCH RFC v2 00/13] IOMMUFD Generic interface Tian, Kevin
2022-09-13 7:28 ` Eric Auger
2022-09-20 19:56 ` Jason Gunthorpe
2022-09-21 3:48 ` Tian, Kevin
2022-09-21 18:06 ` Alex Williamson
2022-09-21 18:44 ` Jason Gunthorpe
2022-09-21 19:30 ` Steven Sistare
2022-09-21 23:09 ` Jason Gunthorpe
2022-10-06 16:01 ` Jason Gunthorpe
2022-10-06 22:57 ` Steven Sistare
2022-10-10 20:54 ` Steven Sistare
2022-10-11 12:30 ` Jason Gunthorpe
2022-10-11 20:30 ` Steven Sistare
2022-10-12 12:32 ` Jason Gunthorpe [this message]
2022-10-12 13:50 ` Steven Sistare
2022-10-12 14:40 ` Jason Gunthorpe
2022-10-12 14:55 ` Steven Sistare
2022-10-12 14:59 ` Jason Gunthorpe
2022-09-21 23:20 ` Jason Gunthorpe
2022-09-22 11:20 ` Daniel P. Berrangé
2022-09-22 14:08 ` Jason Gunthorpe
2022-09-22 14:49 ` Daniel P. Berrangé
2022-09-22 14:51 ` Jason Gunthorpe
2022-09-22 15:00 ` Daniel P. Berrangé
2022-09-22 15:31 ` Jason Gunthorpe
2022-09-23 8:54 ` Daniel P. Berrangé
2022-09-23 13:29 ` Jason Gunthorpe
2022-09-23 13:35 ` Daniel P. Berrangé
2022-09-23 13:46 ` Jason Gunthorpe
2022-09-23 14:00 ` Daniel P. Berrangé
2022-09-23 15:40 ` Laine Stump
2022-10-21 19:56 ` Jason Gunthorpe
2022-09-23 14:03 ` Alex Williamson
2022-09-26 6:34 ` David Gibson
2022-09-21 22:36 ` Laine Stump
2022-09-22 11:06 ` Daniel P. Berrangé
2022-09-22 14:13 ` Jason Gunthorpe
2022-09-22 14:46 ` Daniel P. Berrangé
2022-09-13 2:05 ` Tian, Kevin
2022-09-20 20:07 ` Jason Gunthorpe
2022-09-21 3:40 ` Tian, Kevin
2022-09-21 16:19 ` Jason Gunthorpe
2022-09-26 13:48 ` Rodel, Jorg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y0az8pNrA9jOA79k@nvidia.com \
--to=jgg@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=baolu.lu@linux.intel.com \
--cc=berrange@redhat.com \
--cc=chaitanyak@nvidia.com \
--cc=cohuck@redhat.com \
--cc=daniel.m.jordan@oracle.com \
--cc=david@gibson.dropbear.id.au \
--cc=eric.auger@redhat.com \
--cc=farman@linux.ibm.com \
--cc=iommu@lists.linux.dev \
--cc=jasowang@redhat.com \
--cc=jean-philippe@linaro.org \
--cc=joao.m.martins@oracle.com \
--cc=jroedel@suse.de \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=laine@redhat.com \
--cc=libvir-list@redhat.com \
--cc=mjrosato@linux.ibm.com \
--cc=mst@redhat.com \
--cc=nicolinc@nvidia.com \
--cc=schnelle@linux.ibm.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=steven.sistare@oracle.com \
--cc=yi.l.liu@intel.com \
--cc=zhukeqian1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox