From: Jason Gunthorpe <jgg@nvidia.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: Alex Williamson <alex.williamson@redhat.com>,
Lu Baolu <baolu.lu@linux.intel.com>,
Chaitanya Kulkarni <chaitanyak@nvidia.com>,
Cornelia Huck <cohuck@redhat.com>,
Daniel Jordan <daniel.m.jordan@oracle.com>,
Eric Auger <eric.auger@redhat.com>,
iommu@lists.linux-foundation.org,
Jason Wang <jasowang@redhat.com>,
Jean-Philippe Brucker <jean-philippe@linaro.org>,
Joao Martins <joao.m.martins@oracle.com>,
Kevin Tian <kevin.tian@intel.com>,
kvm@vger.kernel.org, Matthew Rosato <mjrosato@linux.ibm.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Nicolin Chen <nicolinc@nvidia.com>,
Niklas Schnelle <schnelle@linux.ibm.com>,
Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
Yi Liu <yi.l.liu@intel.com>, Keqian Zhu <zhukeqian1@huawei.com>
Subject: Re: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility
Date: Thu, 5 May 2022 16:07:28 -0300 [thread overview]
Message-ID: <20220505190728.GV49344@nvidia.com> (raw)
In-Reply-To: <Ym+IfTvdD2zS6j4G@yekko>
On Mon, May 02, 2022 at 05:30:05PM +1000, David Gibson wrote:
> > It is a bit more CPU work since maps in the lower range would have to
> > be copied over, but conceptually the model matches the HW nesting.
>
> Ah.. ok. IIUC what you're saying is that the kernel-side IOASes have
> fixed windows, but we fake dynamic windows in the userspace
> implementation by flipping the devices over to a new IOAS with the new
> windows. Is that right?
Yes
> Where exactly would the windows be specified? My understanding was
> that when creating a back-end specific IOAS, that would typically be
> for the case where you're using a user / guest managed IO pagetable,
> with the backend specifying the format for that. In the ppc case we'd
> need to specify the windows, but we'd still need the IOAS_MAP/UNMAP
> operations to manage the mappings. The PAPR vIOMMU is
> paravirtualized, so all updates come via hypercalls, so there's no
> user/guest managed data structure.
When the iommu_domain is created I want to have a
iommu-driver-specific struct, so PPC can customize its iommu_domain
however it likes.
> That should work from the point of view of the userspace and guest
> side interfaces. It might be fiddly from the point of view of the
> back end. The ppc iommu doesn't really have the notion of
> configurable domains - instead the address spaces are the hardware or
> firmware fixed PEs, so they have a fixed set of devices. At the bare
> metal level it's possible to sort of do domains by making the actual
> pagetable pointers for several PEs point to a common place.
I'm not sure I understand this - a domain is just a storage container
for an IO page table, if the HW has IOPTEs then it should be able to
have a domain?
Making page table pointers point to a common IOPTE tree is exactly
what iommu_domains are for - why is that "sort of" for ppc?
> However, in the future, nested KVM under PowerVM is likely to be the
> norm. In that situation the L1 as well as the L2 only has the
> paravirtualized interfaces, which don't have any notion of domains,
> only PEs. All updates take place via hypercalls which explicitly
> specify a PE (strictly speaking they take a "Logical IO Bus Number"
> (LIOBN), but those generally map one to one with PEs), so it can't use
> shared pointer tricks either.
How does the paravirtualized interfaces deal with the page table? Does
it call a map/unmap hypercall instead of providing guest IOPTEs?
Assuming yes, I'd expect that:
The iommu_domain for nested PPC is just a log of map/unmap hypervsior
calls to make. Whenever a new PE is attached to that domain it gets
the logged map's replayed to set it up, and when a PE is detached the
log is used to unmap everything.
It is not perfectly memory efficient - and we could perhaps talk about
a API modification to allow re-use of the iommufd datastructure
somehow, but I think this is a good logical starting point.
The PE would have to be modeled as an iommu_group.
> So, here's an alternative set of interfaces that should work for ppc,
> maybe you can tell me whether they also work for x86 and others:
Fundamentally PPC has to fit into the iommu standard framework of
group and domains, we can talk about modifications, but drifting too
far away is a big problem.
> * Each domain/IOAS has a concept of one or more IOVA windows, which
> each have a base address, size, pagesize (granularity) and optionally
> other flags/attributes.
> * This has some bearing on hardware capabilities, but is
> primarily a software notion
iommu_domain has the aperture, PPC will require extending this to a
list of apertures since it is currently only one window.
Once a domain is created and attached to a group the aperture should
be immutable.
> * MAP/UNMAP operations are only permitted within an existing IOVA
> window (and with addresses aligned to the window's pagesize)
> * This is enforced by software whether or not it is required by
> the underlying hardware
> * Likewise IOAS_COPY operations are only permitted if the source and
> destination windows have compatible attributes
Already done, domain's aperture restricts all the iommufd operations
> * A newly created kernel-managed IOAS has *no* IOVA windows
Already done, the iommufd IOAS has no iommu_domains inside it at
creation time.
> * A CREATE_WINDOW operation is added
> * This takes a size, pagesize/granularity, optional base address
> and optional additional attributes
> * If any of the specified attributes are incompatible with the
> underlying hardware, the operation fails
iommu layer has nothing called a window. The closest thing is a
domain.
I really don't want to try to make a new iommu layer object that is so
unique and special to PPC - we have to figure out how to fit PPC into
the iommu_domain model with reasonable extensions.
> > > > Maybe every device gets a copy of the error notification?
> > >
> > > Alas, it's harder than that. One of the things that can happen on an
> > > EEH fault is that the entire PE gets suspended (blocking both DMA and
> > > MMIO, IIRC) until the proper recovery steps are taken.
> >
> > I think qemu would have to de-duplicate the duplicated device
> > notifications and then it can go from a device notifiation to the
> > device's iommu_group to the IOAS to the vPE?
>
> It's not about the notifications.
The only thing the kernel can do is rely a notification that something
happened to a PE. The kernel gets an event on the PE basis, I would
like it to replicate it to all the devices and push it through the
VFIO device FD.
qemu will de-duplicate the replicates and recover exactly the same
event the kernel saw, delivered at exactly the same time.
If instead you want to have one event per-PE then all that changes in
the kernel is one event is generated instead of N, and qemu doesn't
have to throw away the duplicates.
With either method qemu still gets the same PE centric event,
delivered at the same time, with the same races.
This is not a general mechanism, it is some PPC specific thing to
communicate a PPC specific PE centric event to userspace. I just
prefer it in VFIO instead of iommufd.
Jason
next prev parent reply other threads:[~2022-05-05 19:07 UTC|newest]
Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-18 17:27 [PATCH RFC 00/12] IOMMUFD Generic interface Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 01/12] interval-tree: Add a utility to iterate over spans in an interval tree Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 02/12] iommufd: Overview documentation Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 03/12] iommufd: File descriptor, context, kconfig and makefiles Jason Gunthorpe
2022-03-22 14:18 ` Niklas Schnelle
2022-03-22 14:50 ` Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 04/12] kernel/user: Allow user::locked_vm to be usable for iommufd Jason Gunthorpe
2022-03-22 14:28 ` Niklas Schnelle
2022-03-22 14:57 ` Jason Gunthorpe
2022-03-22 15:29 ` Alex Williamson
2022-03-22 16:15 ` Jason Gunthorpe
2022-03-24 2:11 ` Tian, Kevin
2022-03-24 2:27 ` Jason Wang
2022-03-24 2:42 ` Tian, Kevin
2022-03-24 2:57 ` Jason Wang
2022-03-24 3:15 ` Tian, Kevin
2022-03-24 3:50 ` Jason Wang
2022-03-24 4:29 ` Tian, Kevin
2022-03-24 11:46 ` Jason Gunthorpe
2022-03-28 1:53 ` Jason Wang
2022-03-28 12:22 ` Jason Gunthorpe
2022-03-29 4:59 ` Jason Wang
2022-03-29 11:46 ` Jason Gunthorpe
2022-03-28 13:14 ` Sean Mooney
2022-03-28 14:27 ` Jason Gunthorpe
2022-03-24 20:40 ` Alex Williamson
2022-03-24 22:27 ` Jason Gunthorpe
2022-03-24 22:41 ` Alex Williamson
2022-03-22 16:31 ` Niklas Schnelle
2022-03-22 16:41 ` Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 05/12] iommufd: PFN handling for iopt_pages Jason Gunthorpe
2022-03-23 15:37 ` Niklas Schnelle
2022-03-23 16:09 ` Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 06/12] iommufd: Algorithms for PFN storage Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 07/12] iommufd: Data structure to provide IOVA to PFN mapping Jason Gunthorpe
2022-03-22 22:15 ` Alex Williamson
2022-03-23 18:15 ` Jason Gunthorpe
2022-03-24 3:09 ` Tian, Kevin
2022-03-24 12:46 ` Jason Gunthorpe
2022-03-25 13:34 ` zhangfei.gao
2022-03-25 17:19 ` Jason Gunthorpe
2022-04-13 14:02 ` Yi Liu
2022-04-13 14:36 ` Jason Gunthorpe
2022-04-13 14:49 ` Yi Liu
2022-04-17 14:56 ` Yi Liu
2022-04-18 10:47 ` Yi Liu
2022-03-18 17:27 ` [PATCH RFC 08/12] iommufd: IOCTLs for the io_pagetable Jason Gunthorpe
2022-03-23 19:10 ` Alex Williamson
2022-03-23 19:34 ` Jason Gunthorpe
2022-03-23 20:04 ` Alex Williamson
2022-03-23 20:34 ` Jason Gunthorpe
2022-03-23 22:54 ` Jason Gunthorpe
2022-03-24 7:25 ` Tian, Kevin
2022-03-24 13:46 ` Jason Gunthorpe
2022-03-25 2:15 ` Tian, Kevin
2022-03-27 2:32 ` Tian, Kevin
2022-03-27 14:28 ` Jason Gunthorpe
2022-03-28 17:17 ` Alex Williamson
2022-03-28 18:57 ` Jason Gunthorpe
2022-03-28 19:47 ` Jason Gunthorpe
2022-03-28 21:26 ` Alex Williamson
2022-03-24 6:46 ` Tian, Kevin
2022-03-30 13:35 ` Yi Liu
2022-03-31 12:59 ` Jason Gunthorpe
2022-04-01 13:30 ` Yi Liu
2022-03-31 4:36 ` David Gibson
2022-03-31 5:41 ` Tian, Kevin
2022-03-31 12:58 ` Jason Gunthorpe
2022-04-28 5:58 ` David Gibson
2022-04-28 14:22 ` Jason Gunthorpe
2022-04-29 6:00 ` David Gibson
2022-04-29 12:54 ` Jason Gunthorpe
2022-04-30 14:44 ` David Gibson
2022-03-18 17:27 ` [PATCH RFC 09/12] iommufd: Add a HW pagetable object Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 10/12] iommufd: Add kAPI toward external drivers Jason Gunthorpe
2022-03-23 18:10 ` Alex Williamson
2022-03-23 18:15 ` Jason Gunthorpe
2022-05-11 12:54 ` Yi Liu
2022-05-19 9:45 ` Yi Liu
2022-05-19 12:35 ` Jason Gunthorpe
2022-03-18 17:27 ` [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility Jason Gunthorpe
2022-03-23 22:51 ` Alex Williamson
2022-03-24 0:33 ` Jason Gunthorpe
2022-03-24 8:13 ` Eric Auger
2022-03-24 22:04 ` Alex Williamson
2022-03-24 23:11 ` Jason Gunthorpe
2022-03-25 3:10 ` Tian, Kevin
2022-03-25 11:24 ` Joao Martins
2022-04-28 14:53 ` David Gibson
2022-04-28 15:10 ` Jason Gunthorpe
2022-04-29 1:21 ` Tian, Kevin
2022-04-29 6:22 ` David Gibson
2022-04-29 12:50 ` Jason Gunthorpe
2022-05-02 4:10 ` David Gibson
2022-04-29 6:20 ` David Gibson
2022-04-29 12:48 ` Jason Gunthorpe
2022-05-02 7:30 ` David Gibson
2022-05-05 19:07 ` Jason Gunthorpe [this message]
2022-05-06 5:25 ` David Gibson
2022-05-06 10:42 ` Tian, Kevin
2022-05-09 3:36 ` David Gibson
2022-05-06 12:48 ` Jason Gunthorpe
2022-05-09 6:01 ` David Gibson
2022-05-09 14:00 ` Jason Gunthorpe
2022-05-10 7:12 ` David Gibson
2022-05-10 19:00 ` Jason Gunthorpe
2022-05-11 3:15 ` Tian, Kevin
2022-05-11 16:32 ` Jason Gunthorpe
2022-05-11 23:23 ` Tian, Kevin
2022-05-13 4:35 ` David Gibson
2022-05-11 4:40 ` David Gibson
2022-05-11 2:46 ` Tian, Kevin
2022-05-23 6:02 ` Alexey Kardashevskiy
2022-05-24 13:25 ` Jason Gunthorpe
2022-05-25 1:39 ` David Gibson
2022-05-25 2:09 ` Alexey Kardashevskiy
2022-03-29 9:17 ` Yi Liu
2022-03-18 17:27 ` [PATCH RFC 12/12] iommufd: Add a selftest Jason Gunthorpe
2022-04-12 20:13 ` [PATCH RFC 00/12] IOMMUFD Generic interface Eric Auger
2022-04-12 20:22 ` Jason Gunthorpe
2022-04-12 20:50 ` Eric Auger
2022-04-14 10:56 ` Yi Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220505190728.GV49344@nvidia.com \
--to=jgg@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=baolu.lu@linux.intel.com \
--cc=chaitanyak@nvidia.com \
--cc=cohuck@redhat.com \
--cc=daniel.m.jordan@oracle.com \
--cc=david@gibson.dropbear.id.au \
--cc=eric.auger@redhat.com \
--cc=iommu@lists.linux-foundation.org \
--cc=jasowang@redhat.com \
--cc=jean-philippe@linaro.org \
--cc=joao.m.martins@oracle.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=mjrosato@linux.ibm.com \
--cc=mst@redhat.com \
--cc=nicolinc@nvidia.com \
--cc=schnelle@linux.ibm.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=yi.l.liu@intel.com \
--cc=zhukeqian1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).