From: Jason Gunthorpe <jgg@nvidia.com>
To: Peter Xu <peterx@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>,
Nicolin Chen <nicolinc@nvidia.com>,
peter.maydell@linaro.org, qemu-devel@nongnu.org,
qemu-arm@nongnu.org, yi.l.liu@intel.com, kevin.tian@intel.com,
Jason Wang <jasowang@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: Multiple vIOMMU instance support in QEMU?
Date: Thu, 18 May 2023 17:19:20 -0300 [thread overview]
Message-ID: <ZGaISJUBFdMu+nxo@nvidia.com> (raw)
In-Reply-To: <ZGaAVAI9u4K4vy1/@x1n>
On Thu, May 18, 2023 at 03:45:24PM -0400, Peter Xu wrote:
> On Thu, May 18, 2023 at 11:56:46AM -0300, Jason Gunthorpe wrote:
> > On Thu, May 18, 2023 at 10:16:24AM -0400, Peter Xu wrote:
> >
> > > What you mentioned above makes sense to me from the POV that 1 vIOMMU may
> > > not suffice, but that's at least totally new area to me because I never
> > > used >1 IOMMUs even bare metal (excluding the case where I'm aware that
> > > e.g. a GPU could have its own IOMMU-like dma translator).
> >
> > Even x86 systems are multi-iommu, one iommu per physical CPU socket.
>
> I tried to look at a 2-node system on hand and I indeed got two dmars:
>
> [ 4.444788] DMAR: dmar0: reg_base_addr fbffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
> [ 4.459673] DMAR: dmar1: reg_base_addr c7ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
>
> Though they do not seem to be all parallel on attaching devices. E.g.,
> most of the devices on this host are attached to dmar1, while there're only
> two devices attached to dmar0:
Yeah, I expect it has to do with physical topology. PCIe devices
physically connected to each socket should use the socket local iommu
and the socket local caches.
ie it would be foolish to take an IO in socket A and the forward it to
socket B to perform IOMMU translation then forward it back to socket A
to land in memory.
> > I'm not sure how they model this though - Kevin do you know? Do we get
> > multiple iommu instances in Linux or is all the broadcasting of
> > invalidates and sharing of tables hidden?
> >
> > > What's the system layout of your multi-vIOMMU world? Is there still a
> > > centric vIOMMU, or multi-vIOMMUs can run fully in parallel, so that e.g. we
> > > can have DEV1,DEV2 under vIOMMU1 and DEV3,DEV4 under vIOMMU2?
> >
> > Just like physical, each viommu is parallel and independent. Each has
> > its own caches, ASIDs, DIDs/etc and thus invalidation domains.
> >
> > The seperated caches is the motivating reason to do this as something
> > like vCMDQ is a direct command channel for invalidations to only the
> > caches of a single IOMMU block.
>
> From cache invalidation pov, shouldn't the best be per-device granule (like
> dev-iotlb in VT-d? No idea for ARM)?
There are many caches and different cache tag schemes in an iommu. All
of them are local to the IOMMU block.
Consider where we might have a single vDID but the devices using that
DID are spread across two physical IOMMUs. When the VM asks to
invalidate the vDID the system has to generate two physical pDID
invalidations.
This can't be done without a software mediation layer in the VMM.
The better solution is to make the pDID and vDID 1:1 so the VM itself
replicates the invalidations. The VM has better knowledge of when
replication is needed so it is overall more efficient.
> I see that Intel is already copied here (at least Yi and Kevin) so I assume
> there're already some kind of synchronizations on multi-vIOMMU vs recent
> works on Intel side, which is definitely nice and can avoid work conflicts.
I actually don't know that.. Intel sees multiple DMAR blocks in SW and
they have kernel level replication of invalidation.. Intel doesn't
have a HW fast path yet so they can rely on mediation to fix it. Thus
I expect there is no HW replication of invalidations here. Kevin?
Remember the VFIO API hides all of this, when you change the VFIO
container it automatically generates all requires invalidations in the
kernel.
I also heard AMD has a HW fast and also multi-iommu but I don't really
know the details.
Jason
next prev parent reply other threads:[~2023-05-18 20:21 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-24 23:42 Multiple vIOMMU instance support in QEMU? Nicolin Chen
2023-05-18 3:22 ` Nicolin Chen
2023-05-18 9:06 ` Eric Auger
2023-05-18 14:16 ` Peter Xu
2023-05-18 14:56 ` Jason Gunthorpe
2023-05-18 19:45 ` Peter Xu
2023-05-18 20:19 ` Jason Gunthorpe [this message]
2023-05-19 0:38 ` Tian, Kevin
2023-05-18 22:56 ` Tian, Kevin
2023-05-18 17:39 ` Nicolin Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZGaISJUBFdMu+nxo@nvidia.com \
--to=jgg@nvidia.com \
--cc=eric.auger@redhat.com \
--cc=jasowang@redhat.com \
--cc=kevin.tian@intel.com \
--cc=mst@redhat.com \
--cc=nicolinc@nvidia.com \
--cc=peter.maydell@linaro.org \
--cc=peterx@redhat.com \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=yi.l.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).