From: Marc Zyngier <maz@kernel.org>
To: Robin Murphy <robin.murphy@arm.com>
Cc: Nicolin Chen <nicolinc@nvidia.com>,
tglx@linutronix.de, bhelgaas@google.com,
alex.williamson@redhat.com, jgg@nvidia.com, leonro@nvidia.com,
shameerali.kolothum.thodi@huawei.com, dlemoal@kernel.org,
kevin.tian@intel.com, smostafa@google.com,
andriy.shevchenko@linux.intel.com, reinette.chatre@intel.com,
eric.auger@redhat.com, ddutile@redhat.com, yebin10@huawei.com,
brauner@kernel.org, apatel@ventanamicro.com,
shivamurthy.shastri@linutronix.de, anna-maria@linutronix.de,
nipun.gupta@amd.com, marek.vasut+renesas@mailbox.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
kvm@vger.kernel.org
Subject: Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
Date: Mon, 11 Nov 2024 14:14:15 +0000 [thread overview]
Message-ID: <86pln1zwlk.wl-maz@kernel.org> (raw)
In-Reply-To: <a63e7c3b-ce96-47a5-b462-d5de3a2edb56@arm.com>
On Mon, 11 Nov 2024 13:09:20 +0000,
Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2024-11-09 5:48 am, Nicolin Chen wrote:
> > On ARM GIC systems and others, the target address of the MSI is translated
> > by the IOMMU. For GIC, the MSI address page is called "ITS" page. When the
> > IOMMU is disabled, the MSI address is programmed to the physical location
> > of the GIC ITS page (e.g. 0x20200000). When the IOMMU is enabled, the ITS
> > page is behind the IOMMU, so the MSI address is programmed to an allocated
> > IO virtual address (a.k.a IOVA), e.g. 0xFFFF0000, which must be mapped to
> > the physical ITS page: IOVA (0xFFFF0000) ===> PA (0x20200000).
> > When a 2-stage translation is enabled, IOVA will be still used to program
> > the MSI address, though the mappings will be in two stages:
> > IOVA (0xFFFF0000) ===> IPA (e.g. 0x80900000) ===> 0x20200000
> > (IPA stands for Intermediate Physical Address).
> >
> > If the device that generates MSI is attached to an IOMMU_DOMAIN_DMA, the
> > IOVA is dynamically allocated from the top of the IOVA space. If attached
> > to an IOMMU_DOMAIN_UNMANAGED (e.g. a VFIO passthrough device), the IOVA is
> > fixed to an MSI window reported by the IOMMU driver via IOMMU_RESV_SW_MSI,
> > which is hardwired to MSI_IOVA_BASE (IOVA==0x8000000) for ARM IOMMUs.
> >
> > So far, this IOMMU_RESV_SW_MSI works well as kernel is entirely in charge
> > of the IOMMU translation (1-stage translation), since the IOVA for the ITS
> > page is fixed and known by kernel. However, with virtual machine enabling
> > a nested IOMMU translation (2-stage), a guest kernel directly controls the
> > stage-1 translation with an IOMMU_DOMAIN_DMA, mapping a vITS page (at an
> > IPA 0x80900000) onto its own IOVA space (e.g. 0xEEEE0000). Then, the host
> > kernel can't know that guest-level IOVA to program the MSI address.
> >
> > To solve this problem the VMM should capture the MSI IOVA allocated by the
> > guest kernel and relay it to the GIC driver in the host kernel, to program
> > the correct MSI IOVA. And this requires a new ioctl via VFIO.
>
> Once VFIO has that information from userspace, though, do we really
> need the whole complicated dance to push it right down into the
> irqchip layer just so it can be passed back up again? AFAICS
> vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already
> explicitly rewrites MSI-X vectors, so it seems like it should be
> pretty straightforward to override the message address in general at
> that level, without the lower layers having to be aware at all, no?
+1.
I would like to avoid polluting each and every interrupt controller
with usage-specific knowledge (they usually are brain-damaged enough).
We already have an indirection into the IOMMU subsystem and it
shouldn't be a big deal to intercept the message for all
implementations at this level.
I also wonder how to handle the case of braindead^Wwonderful platforms
where ITS transactions are not translated by the SMMU. Somehow, VFIO
should be made aware of this situation.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2024-11-11 14:18 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-09 5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
2024-11-09 5:48 ` [PATCH RFCv1 1/7] genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell address Nicolin Chen
2024-11-09 5:48 ` [PATCH RFCv1 2/7] irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset Nicolin Chen
2024-11-09 5:48 ` [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc Nicolin Chen
2024-11-09 5:48 ` [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova Nicolin Chen
2024-11-11 9:30 ` Andy Shevchenko
2024-11-09 5:48 ` [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function Nicolin Chen
2024-11-11 9:33 ` Andy Shevchenko
2024-11-09 5:48 ` [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper Nicolin Chen
2024-11-11 9:34 ` Andy Shevchenko
2024-11-12 22:14 ` Nicolin Chen
2024-11-09 5:48 ` [PATCH RFCv1 7/7] vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE Nicolin Chen
2024-11-11 13:09 ` [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Robin Murphy
2024-11-11 14:14 ` Marc Zyngier [this message]
2024-11-12 22:13 ` Nicolin Chen
2024-11-12 21:54 ` Nicolin Chen
2024-11-13 1:34 ` Jason Gunthorpe
2024-11-13 21:11 ` Alex Williamson
2024-11-14 15:35 ` Robin Murphy
2024-11-20 13:17 ` Eric Auger
2024-11-20 14:03 ` Jason Gunthorpe
2024-11-28 11:15 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86pln1zwlk.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=alex.williamson@redhat.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=anna-maria@linutronix.de \
--cc=apatel@ventanamicro.com \
--cc=bhelgaas@google.com \
--cc=brauner@kernel.org \
--cc=ddutile@redhat.com \
--cc=dlemoal@kernel.org \
--cc=eric.auger@redhat.com \
--cc=jgg@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=marek.vasut+renesas@mailbox.org \
--cc=nicolinc@nvidia.com \
--cc=nipun.gupta@amd.com \
--cc=reinette.chatre@intel.com \
--cc=robin.murphy@arm.com \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=shivamurthy.shastri@linutronix.de \
--cc=smostafa@google.com \
--cc=tglx@linutronix.de \
--cc=yebin10@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).