From: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
To: Nicolin Chen <nicolinc@nvidia.com>,
"will@kernel.org" <will@kernel.org>,
"robin.murphy@arm.com" <robin.murphy@arm.com>,
"jgg@nvidia.com" <jgg@nvidia.com>,
"kevin.tian@intel.com" <kevin.tian@intel.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"maz@kernel.org" <maz@kernel.org>,
"alex.williamson@redhat.com" <alex.williamson@redhat.com>
Cc: "joro@8bytes.org" <joro@8bytes.org>,
"shuah@kernel.org" <shuah@kernel.org>,
"reinette.chatre@intel.com" <reinette.chatre@intel.com>,
"eric.auger@redhat.com" <eric.auger@redhat.com>,
"yebin (H)" <yebin10@huawei.com>,
"apatel@ventanamicro.com" <apatel@ventanamicro.com>,
"shivamurthy.shastri@linutronix.de"
<shivamurthy.shastri@linutronix.de>,
"bhelgaas@google.com" <bhelgaas@google.com>,
"anna-maria@linutronix.de" <anna-maria@linutronix.de>,
"yury.norov@gmail.com" <yury.norov@gmail.com>,
"nipun.gupta@amd.com" <nipun.gupta@amd.com>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-kselftest@vger.kernel.org"
<linux-kselftest@vger.kernel.org>,
"patches@lists.linux.dev" <patches@lists.linux.dev>,
"jean-philippe@linaro.org" <jean-philippe@linaro.org>,
"mdf@kernel.org" <mdf@kernel.org>,
"mshavit@google.com" <mshavit@google.com>,
"smostafa@google.com" <smostafa@google.com>,
"ddutile@redhat.com" <ddutile@redhat.com>
Subject: RE: [PATCH RFCv2 00/13] iommu: Add MSI mapping support with nested SMMU
Date: Thu, 23 Jan 2025 09:06:49 +0000 [thread overview]
Message-ID: <4946ea266bdc4b1e8796dee1b228bd8f@huawei.com> (raw)
In-Reply-To: <cover.1736550979.git.nicolinc@nvidia.com>
Hi Nicolin,
> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, January 11, 2025 3:32 AM
> To: will@kernel.org; robin.murphy@arm.com; jgg@nvidia.com;
> kevin.tian@intel.com; tglx@linutronix.de; maz@kernel.org;
> alex.williamson@redhat.com
> Cc: joro@8bytes.org; shuah@kernel.org; reinette.chatre@intel.com;
> eric.auger@redhat.com; yebin (H) <yebin10@huawei.com>;
> apatel@ventanamicro.com; shivamurthy.shastri@linutronix.de;
> bhelgaas@google.com; anna-maria@linutronix.de; yury.norov@gmail.com;
> nipun.gupta@amd.com; iommu@lists.linux.dev; linux-
> kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> kvm@vger.kernel.org; linux-kselftest@vger.kernel.org;
> patches@lists.linux.dev; jean-philippe@linaro.org; mdf@kernel.org;
> mshavit@google.com; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; smostafa@google.com;
> ddutile@redhat.com
> Subject: [PATCH RFCv2 00/13] iommu: Add MSI mapping support with
> nested SMMU
>
> [ Background ]
> On ARM GIC systems and others, the target address of the MSI is translated
> by the IOMMU. For GIC, the MSI address page is called "ITS" page. When
> the
> IOMMU is disabled, the MSI address is programmed to the physical location
> of the GIC ITS page (e.g. 0x20200000). When the IOMMU is enabled, the ITS
> page is behind the IOMMU, so the MSI address is programmed to an
> allocated
> IO virtual address (a.k.a IOVA), e.g. 0xFFFF0000, which must be mapped to
> the physical ITS page: IOVA (0xFFFF0000) ===> PA (0x20200000).
> When a 2-stage translation is enabled, IOVA will be still used to program
> the MSI address, though the mappings will be in two stages:
> IOVA (0xFFFF0000) ===> IPA (e.g. 0x80900000) ===> PA (0x20200000)
> (IPA stands for Intermediate Physical Address).
>
> If the device that generates MSI is attached to an IOMMU_DOMAIN_DMA,
> the
> IOVA is dynamically allocated from the top of the IOVA space. If attached
> to an IOMMU_DOMAIN_UNMANAGED (e.g. a VFIO passthrough device), the
> IOVA is
> fixed to an MSI window reported by the IOMMU driver via
> IOMMU_RESV_SW_MSI,
> which is hardwired to MSI_IOVA_BASE (IOVA==0x8000000) for ARM
> IOMMUs.
>
> So far, this IOMMU_RESV_SW_MSI works well as kernel is entirely in charge
> of the IOMMU translation (1-stage translation), since the IOVA for the ITS
> page is fixed and known by kernel. However, with virtual machine enabling
> a nested IOMMU translation (2-stage), a guest kernel directly controls the
> stage-1 translation with an IOMMU_DOMAIN_DMA, mapping a vITS page (at
> an
> IPA 0x80900000) onto its own IOVA space (e.g. 0xEEEE0000). Then, the host
> kernel can't know that guest-level IOVA to program the MSI address.
>
> There have been two approaches to solve this problem:
> 1. Create an identity mapping in the stage-1. VMM could insert a few RMRs
> (Reserved Memory Regions) in guest's IORT. Then the guest kernel would
> fetch these RMR entries from the IORT and create an
> IOMMU_RESV_DIRECT
> region per iommu group for a direct mapping. Eventually, the mappings
> would look like: IOVA (0x8000000) === IPA (0x8000000) ===> 0x20200000
> This requires an IOMMUFD ioctl for kernel and VMM to agree on the IPA.
> 2. Forward the guest-level MSI IOVA captured by VMM to the host-level GIC
> driver, to program the correct MSI IOVA. Forward the VMM-defined vITS
> page location (IPA) to the kernel for the stage-2 mapping. Eventually:
> IOVA (0xFFFF0000) ===> IPA (0x80900000) ===> PA (0x20200000)
> This requires a VFIO ioctl (for IOVA) and an IOMMUFD ioctl (for IPA).
>
> Worth mentioning that when Eric Auger was working on the same topic
> with
> the VFIO iommu uAPI, he had the approach (2) first, and then switched to
> the approach (1), suggested by Jean-Philippe for reduction of complexity.
>
> The approach (1) basically feels like the existing VFIO passthrough that
> has a 1-stage mapping for the unmanaged domain, yet only by shifting the
> MSI mapping from stage 1 (guest-has-no-iommu case) to stage 2 (guest-has-
> iommu case). So, it could reuse the existing IOMMU_RESV_SW_MSI piece,
> by
> sharing the same idea of "VMM leaving everything to the kernel".
>
> The approach (2) is an ideal solution, yet it requires additional effort
> for kernel to be aware of the 1-stage gIOVA(s) and 2-stage IPAs for vITS
> page(s), which demands VMM to closely cooperate.
> * It also brings some complicated use cases to the table where the host
> or/and guest system(s) has/have multiple ITS pages.
I had done some basic sanity tests with this series and the Qemu branches you
provided on a HiSilicon hardwrae. The basic dev assignment works fine. I will
rebase my Qemu smuv3-accel branch on top of this and will do some more tests.
One confusion I have about the above text is, do we still plan to support the
approach -1( Using RMR in Qemu) or you are just mentioning it here because
it is still possible to make use of that. I think from previous discussions the
argument was to adopt a more dedicated MSI pass-through model which I
think is approach-2 here. Could you please confirm.
Thanks,
Shameer
next prev parent reply other threads:[~2025-01-23 9:07 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-11 3:32 [PATCH RFCv2 00/13] iommu: Add MSI mapping support with nested SMMU Nicolin Chen
2025-01-11 3:32 ` [PATCH RFCv2 01/13] genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookie Nicolin Chen
2025-01-23 17:10 ` Eric Auger
2025-01-23 18:48 ` Jason Gunthorpe
2025-01-29 12:11 ` Eric Auger
2025-01-11 3:32 ` [PATCH RFCv2 02/13] genirq/msi: Rename iommu_dma_compose_msi_msg() to msi_msg_set_msi_addr() Nicolin Chen
2025-01-23 17:10 ` Eric Auger
2025-01-23 18:50 ` Jason Gunthorpe
2025-01-29 10:44 ` Eric Auger
2025-01-11 3:32 ` [PATCH RFCv2 03/13] iommu: Make iommu_dma_prepare_msi() into a generic operation Nicolin Chen
2025-01-23 17:10 ` Eric Auger
2025-01-23 18:16 ` Jason Gunthorpe
2025-01-29 12:29 ` Eric Auger
2025-01-11 3:32 ` [PATCH RFCv2 04/13] irqchip: Have CONFIG_IRQ_MSI_IOMMU be selected by the irqchips that need it Nicolin Chen
2025-01-11 3:32 ` [PATCH RFCv2 05/13] iommu: Turn fault_data to iommufd private pointer Nicolin Chen
2025-01-23 9:54 ` Tian, Kevin
2025-01-23 13:25 ` Jason Gunthorpe
2025-01-29 12:40 ` Eric Auger
2025-02-03 17:48 ` Nicolin Chen
2025-01-11 3:32 ` [PATCH RFCv2 06/13] iommufd: Make attach_handle generic Nicolin Chen
2025-01-18 8:23 ` Yi Liu
2025-01-18 20:32 ` Nicolin Chen
2025-01-19 10:40 ` Yi Liu
2025-01-20 5:54 ` Nicolin Chen
2025-01-24 13:31 ` Yi Liu
2025-01-20 14:20 ` Jason Gunthorpe
2025-01-29 13:14 ` Eric Auger
2025-02-03 18:08 ` Nicolin Chen
2025-01-11 3:32 ` [PATCH RFCv2 07/13] iommufd: Implement sw_msi support natively Nicolin Chen
2025-01-15 4:21 ` Yury Norov
2025-01-16 20:21 ` Jason Gunthorpe
2025-01-23 19:30 ` Jason Gunthorpe
2025-01-11 3:32 ` [PATCH RFCv2 08/13] iommu: Turn iova_cookie to dma-iommu private pointer Nicolin Chen
2025-01-13 16:40 ` Jason Gunthorpe
2025-01-11 3:32 ` [PATCH RFCv2 09/13] iommufd: Add IOMMU_OPTION_SW_MSI_START/SIZE ioctls Nicolin Chen
2025-01-23 10:07 ` Tian, Kevin
2025-02-03 18:36 ` Nicolin Chen
2025-01-29 13:44 ` Eric Auger
2025-01-29 14:58 ` Jason Gunthorpe
2025-01-29 17:23 ` Eric Auger
2025-01-29 17:39 ` Jason Gunthorpe
2025-01-29 17:49 ` Eric Auger
2025-01-29 20:15 ` Jason Gunthorpe
2025-02-07 4:26 ` Nicolin Chen
2025-02-07 14:30 ` Jason Gunthorpe
2025-02-07 15:28 ` Jason Gunthorpe
2025-02-07 18:59 ` Nicolin Chen
2025-02-09 18:09 ` Jason Gunthorpe
2025-01-11 3:32 ` [PATCH RFCv2 10/13] iommufd/selftes: Add coverage for IOMMU_OPTION_SW_MSI_START/SIZE Nicolin Chen
2025-01-11 3:32 ` [PATCH RFCv2 11/13] iommufd/device: Allow setting IOVAs for MSI(x) vectors Nicolin Chen
2025-01-11 3:32 ` [PATCH RFCv2 12/13] vfio-iommufd: Provide another layer of msi_iova helpers Nicolin Chen
2025-01-11 3:32 ` [PATCH RFCv2 13/13] vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE Nicolin Chen
2025-01-23 9:06 ` Shameerali Kolothum Thodi [this message]
2025-01-23 13:24 ` [PATCH RFCv2 00/13] iommu: Add MSI mapping support with nested SMMU Jason Gunthorpe
2025-01-29 14:54 ` Eric Auger
2025-01-29 15:04 ` Jason Gunthorpe
2025-01-29 17:46 ` Eric Auger
2025-01-29 20:13 ` Jason Gunthorpe
2025-02-04 12:55 ` Eric Auger
2025-02-04 13:02 ` Jason Gunthorpe
2025-02-05 22:49 ` Jacob Pan
2025-02-05 22:56 ` Nicolin Chen
2025-02-07 14:34 ` Jason Gunthorpe
2025-02-07 14:42 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4946ea266bdc4b1e8796dee1b228bd8f@huawei.com \
--to=shameerali.kolothum.thodi@huawei.com \
--cc=alex.williamson@redhat.com \
--cc=anna-maria@linutronix.de \
--cc=apatel@ventanamicro.com \
--cc=bhelgaas@google.com \
--cc=ddutile@redhat.com \
--cc=eric.auger@redhat.com \
--cc=iommu@lists.linux.dev \
--cc=jean-philippe@linaro.org \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=maz@kernel.org \
--cc=mdf@kernel.org \
--cc=mshavit@google.com \
--cc=nicolinc@nvidia.com \
--cc=nipun.gupta@amd.com \
--cc=patches@lists.linux.dev \
--cc=reinette.chatre@intel.com \
--cc=robin.murphy@arm.com \
--cc=shivamurthy.shastri@linutronix.de \
--cc=shuah@kernel.org \
--cc=smostafa@google.com \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
--cc=yebin10@huawei.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).