From: Jason Gunthorpe <jgg@nvidia.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: Xu Yilun <yilun.xu@linux.intel.com>,
Christoph Hellwig <hch@lst.de>,
Leon Romanovsky <leonro@nvidia.com>,
kvm@vger.kernel.org, dri-devel@lists.freedesktop.org,
linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org,
sumit.semwal@linaro.org, pbonzini@redhat.com, seanjc@google.com,
alex.williamson@redhat.com, vivek.kasireddy@intel.com,
dan.j.williams@intel.com, aik@amd.com, yilun.xu@intel.com,
linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org,
lukas@wunner.de, yan.y.zhao@intel.com, leon@kernel.org,
baolu.lu@linux.intel.com, zhenzhong.duan@intel.com,
tao1.su@intel.com
Subject: Re: [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked() kAPI
Date: Thu, 23 Jan 2025 11:02:12 -0400 [thread overview]
Message-ID: <20250123150212.GR5556@nvidia.com> (raw)
In-Reply-To: <97db03be-df86-440d-be4a-082f94934ddf@amd.com>
On Thu, Jan 23, 2025 at 03:35:21PM +0100, Christian König wrote:
> Sending it as text mail once more.
>
> Am 23.01.25 um 15:32 schrieb Christian König:
> > Am 23.01.25 um 14:59 schrieb Jason Gunthorpe:
> > > On Wed, Jan 22, 2025 at 03:59:11PM +0100, Christian König wrote:
> > > > > > For example we have cases with multiple devices are in the same IOMMU domain
> > > > > > and re-using their DMA address mappings.
> > > > > IMHO this is just another flavour of "private" address flow between
> > > > > two cooperating drivers.
> > > > Well that's the point. The inporter is not cooperating here.
> > > If the private address relies on a shared iommu_domain controlled by
> > > the driver, then yes, the importer MUST be cooperating. For instance,
> > > if you send the same private address into RDMA it will explode because
> > > it doesn't have any notion of shared iommu_domain mappings, and it
> > > certainly doesn't setup any such shared domains.
> >
> > Hui? Why the heck should a driver own it's iommu domain?
I don't know, you are the one saying the drivers have special shared
iommu_domains so DMA BUF need some special design to accommodate it.
I'm aware that DRM drivers do directly call into the iommu subsystem
and do directly manage their own IOVA. I assumed this is what you were
talkinga bout. See below.
> > The domain is owned and assigned by the PCI subsystem under Linux.
That domain is *exclusively* owned by the DMA API and is only accessed
via maps created by DMA API calls.
If you are using the DMA API correctly then all of this is abstracted
and none of it matters to you. There is no concept of "shared domains"
in the DMA API.
You call the DMA API, you get a dma_addr_t that is valid for a
*single* device, you program it in HW. That is all. There is no reason
to dig deeper than this.
> > > > The importer doesn't have the slightest idea that he is sharing it's DMA
> > > > addresses with the exporter.
> > > Of course it does. The importer driver would have had to explicitly
> > > set this up! The normal kernel behavior is that all drivers get
> > > private iommu_domains controled by the DMA API. If your driver is
> > > doing something else *it did it deliberately*.
> >
> > As far as I know that is simply not correct. Currently IOMMU
> > domains/groups are usually shared between devices.
No, the opposite. The iommu subsystem tries to maximally isolate
devices up to the HW limit.
On server platforms every device is expected to get its own iommu domain.
> > Especially multi function devices get only a single IOMMU domain.
Only if the PCI HW doesn't support ACS.
This is all DMA API internal details you shouldn't even be talking
about at the DMA BUF level. It is all hidden and simply does not
matter to DMA BUF at all.
> > > The new iommu architecture has the probing driver disable the DMA API
> > > and can then manipulate its iommu domain however it likes, safely. Ie
> > > the probing driver is aware and particiapting in disabling the DMA
> > > API.
> >
> > Why the heck should we do this?
> >
> > That drivers manage all of that on their own sounds like a massive step
> > in the wrong direction.
I am talking about DRM drivers that HAVE to manage their own for some
reason I don't know. eg:
drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c: tdev->iommu.domain = iommu_domain_alloc(&platform_bus_type);
drivers/gpu/drm/msm/msm_iommu.c: domain = iommu_paging_domain_alloc(dev);
drivers/gpu/drm/rockchip/rockchip_drm_drv.c: private->domain = iommu_paging_domain_alloc(private->iommu_dev);
drivers/gpu/drm/tegra/drm.c: tegra->domain = iommu_paging_domain_alloc(dma_dev);
drivers/gpu/host1x/dev.c: host->domain = iommu_paging_domain_alloc(host->dev);
Normal simple drivers should never be calling these functions!
If you are calling these functions you are not using the DMA API, and,
yes, some cases like tegra n1x are actively sharing these special
domains across multiple devices and drivers.
If you want to pass an IOVA in one of these special driver-created
domains then it would be some private address in DMABUF that only
works on drivers that have understood they attached to these manually
created domains. No DMA API involvement here.
> > > > I still strongly think that the exporter should talk with the DMA API to
> > > > setup the access path for the importer and *not* the importer directly.
> > > It is contrary to the design of the new API which wants to co-optimize
> > > mapping and HW setup together as one unit.
> >
> > Yeah and I'm really questioning this design goal. That sounds like
> > totally going into the wrong direction just because of the RDMA
> > drivers.
Actually it is storage that motivates this. It is just pointless to
allocate a dma_addr_t list in the fast path when you don't need
it. You can stream the dma_addr_t directly into HW structures that are
necessary and already allocated.
> > > For instance in RDMA we want to hint and control the way the IOMMU
> > > mapping works in the DMA API to optimize the RDMA HW side. I can't do
> > > those optimizations if I'm not in control of the mapping.
> >
> > Why? What is the technical background here?
dma-iommu.c chooses an IOVA alignment based on its own reasoning that
is not always compatible with the HW. The HW can optimize if the IOVA
alignment meets certain restrictions. Much like page tables in a GPU.
> > > The same is probably true on the GPU side too, you want IOVAs that
> > > have tidy alignment with your PTE structure, but only the importer
> > > understands its own HW to make the correct hints to the DMA API.
> >
> > Yeah but then express those as requirements to the DMA API and not move
> > all the important decisions into the driver where they are implemented
> > over and over again and potentially broken halve the time.
It wouild be in the DMA API, just the per-mapping portion of the API.
Same as the multipath, the ATS, and more. It is all per-mapping
descisions of the executing HW, not global decisions or something
like.
Jason
next prev parent reply other threads:[~2025-01-23 15:02 UTC|newest]
Thread overview: 134+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-07 14:27 [RFC PATCH 00/12] Private MMIO support for private assigned dev Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked() kAPI Xu Yilun
2025-01-08 8:01 ` Christian König
2025-01-08 13:23 ` Jason Gunthorpe
2025-01-08 13:44 ` Christian König
2025-01-08 14:58 ` Jason Gunthorpe
2025-01-08 15:25 ` Christian König
2025-01-08 16:22 ` Jason Gunthorpe
2025-01-08 17:56 ` Xu Yilun
2025-01-10 19:24 ` Simona Vetter
2025-01-10 20:16 ` Jason Gunthorpe
2025-01-08 18:44 ` Simona Vetter
2025-01-08 19:22 ` Xu Yilun
[not found] ` <58e97916-e6fd-41ef-84b4-bbf53ed0e8e4@amd.com>
2025-01-08 23:06 ` Xu Yilun
2025-01-10 19:34 ` Simona Vetter
2025-01-10 20:38 ` Jason Gunthorpe
2025-01-12 22:10 ` Xu Yilun
2025-01-14 14:44 ` Simona Vetter
2025-01-14 17:31 ` Jason Gunthorpe
2025-01-15 8:55 ` Simona Vetter
2025-01-15 9:32 ` Christoph Hellwig
2025-01-15 13:34 ` Jason Gunthorpe
2025-01-16 5:33 ` Christoph Hellwig
2024-06-19 23:39 ` Xu Yilun
2025-01-16 13:28 ` Jason Gunthorpe
[not found] ` <420bd2ea-d87c-4f01-883e-a7a5cf1635fe@amd.com>
2025-01-17 14:42 ` Simona Vetter
2025-01-20 12:14 ` Christian König
2025-01-20 17:59 ` Jason Gunthorpe
2025-01-20 18:50 ` Simona Vetter
2025-01-20 19:48 ` Jason Gunthorpe
2025-01-21 16:11 ` Simona Vetter
2025-01-21 17:36 ` Jason Gunthorpe
2025-01-22 11:04 ` Simona Vetter
2025-01-22 13:28 ` Jason Gunthorpe
2025-01-22 13:29 ` Christian König
2025-01-22 14:37 ` Jason Gunthorpe
2025-01-22 14:59 ` Christian König
2025-01-23 13:59 ` Jason Gunthorpe
[not found] ` <9a36fba5-2dee-46fd-9f51-47c5f0ffc1d4@amd.com>
2025-01-23 14:35 ` Christian König
2025-01-23 15:02 ` Jason Gunthorpe [this message]
[not found] ` <89f46c7f-a585-44e2-963d-bf00bf09b493@amd.com>
2025-01-23 16:08 ` Jason Gunthorpe
2025-01-09 8:09 ` Christian König
2025-01-10 20:54 ` Jason Gunthorpe
2025-01-15 9:38 ` Christian König
2025-01-15 13:38 ` Jason Gunthorpe
[not found] ` <f6c2524f-5ef5-4c2c-a464-a7b195e0bf6c@amd.com>
2025-01-15 13:46 ` Christian König
2025-01-15 14:14 ` Jason Gunthorpe
[not found] ` <c86cfee1-063a-4972-a343-ea0eff2141c9@amd.com>
2025-01-15 14:30 ` Christian König
2025-01-15 15:10 ` Jason Gunthorpe
[not found] ` <6f7a14aa-f607-45f9-9e15-759e26079dec@amd.com>
2025-01-15 17:09 ` Jason Gunthorpe
[not found] ` <5f588dac-d3e2-445d-9389-067b875412dc@amd.com>
2024-06-20 22:02 ` Xu Yilun
2025-01-20 13:44 ` Christian König
2025-01-22 4:16 ` Xu Yilun
2025-01-16 16:07 ` Jason Gunthorpe
2025-01-17 14:37 ` Simona Vetter
[not found] ` <0e7f92bd-7da3-4328-9081-0957b3d155ca@amd.com>
2025-01-09 9:28 ` Leon Romanovsky
2025-01-07 14:27 ` [RFC PATCH 02/12] vfio: Export vfio device get and put registration helpers Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 03/12] vfio/pci: Share the core device pointer while invoking feature functions Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 04/12] vfio/pci: Allow MMIO regions to be exported through dma-buf Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 05/12] vfio/pci: Support get_pfn() callback for dma-buf Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 06/12] KVM: Support vfio_dmabuf backed MMIO region Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 07/12] KVM: x86/mmu: Handle page fault for vfio_dmabuf backed MMIO Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 08/12] vfio/pci: Create host unaccessible dma-buf for private device Xu Yilun
2025-01-08 13:30 ` Jason Gunthorpe
2025-01-08 16:57 ` Xu Yilun
2025-01-09 14:40 ` Jason Gunthorpe
2025-01-09 16:40 ` Xu Yilun
2025-01-10 13:31 ` Jason Gunthorpe
2025-01-11 3:48 ` Xu Yilun
2025-01-13 16:49 ` Jason Gunthorpe
2024-06-17 23:28 ` Xu Yilun
2025-01-14 13:35 ` Jason Gunthorpe
2025-01-15 12:57 ` Alexey Kardashevskiy
2025-01-15 13:01 ` Jason Gunthorpe
2025-01-17 1:57 ` Baolu Lu
2025-01-17 13:25 ` Jason Gunthorpe
2024-06-23 19:59 ` Xu Yilun
2025-01-20 13:25 ` Jason Gunthorpe
2024-06-24 21:12 ` Xu Yilun
2025-01-21 17:43 ` Jason Gunthorpe
2025-01-22 4:32 ` Xu Yilun
2025-01-22 12:55 ` Jason Gunthorpe
2025-01-23 7:41 ` Xu Yilun
2025-01-23 13:08 ` Jason Gunthorpe
2025-01-20 4:41 ` Baolu Lu
2025-01-20 9:45 ` Alexey Kardashevskiy
2025-01-20 13:28 ` Jason Gunthorpe
2025-03-12 1:37 ` Dan Williams
2025-03-17 16:38 ` Jason Gunthorpe
2025-01-07 14:27 ` [RFC PATCH 09/12] vfio/pci: Export vfio dma-buf specific info for importers Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 10/12] KVM: vfio_dmabuf: Fetch VFIO specific dma-buf data for sanity check Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 11/12] KVM: x86/mmu: Export kvm_is_mmio_pfn() Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 12/12] KVM: TDX: Implement TDX specific private MMIO map/unmap for SEPT Xu Yilun
2025-04-29 6:48 ` [RFC PATCH 00/12] Private MMIO support for private assigned dev Alexey Kardashevskiy
2025-04-29 7:50 ` Alexey Kardashevskiy
2025-05-09 3:04 ` Alexey Kardashevskiy
2025-05-09 11:12 ` Xu Yilun
2025-05-09 16:28 ` Xu Yilun
2025-05-09 18:43 ` Jason Gunthorpe
2025-05-10 3:47 ` Xu Yilun
2025-05-12 9:30 ` Alexey Kardashevskiy
2025-05-12 14:06 ` Jason Gunthorpe
2025-05-13 10:03 ` Zhi Wang
2025-05-14 9:47 ` Xu Yilun
2025-05-14 20:05 ` Zhi Wang
2025-05-15 18:02 ` Xu Yilun
2025-05-15 19:21 ` Jason Gunthorpe
2025-05-16 6:19 ` Xu Yilun
2025-05-16 12:49 ` Jason Gunthorpe
2025-05-17 2:33 ` Xu Yilun
2025-05-20 10:57 ` Alexey Kardashevskiy
2025-05-24 3:33 ` Xu Yilun
2025-05-15 10:29 ` Alexey Kardashevskiy
2025-05-15 16:44 ` Zhi Wang
2025-05-15 16:53 ` Zhi Wang
2025-05-21 10:41 ` Alexey Kardashevskiy
2025-05-14 7:02 ` Xu Yilun
2025-05-14 16:33 ` Jason Gunthorpe
2025-05-15 16:04 ` Xu Yilun
2025-05-15 17:56 ` Jason Gunthorpe
2025-05-16 6:03 ` Xu Yilun
2025-05-22 3:45 ` Alexey Kardashevskiy
2025-05-24 3:13 ` Xu Yilun
2025-05-26 7:18 ` Alexey Kardashevskiy
2025-05-29 14:41 ` Xu Yilun
2025-05-29 16:29 ` Jason Gunthorpe
2025-05-30 16:07 ` Xu Yilun
2025-05-30 2:29 ` Alexey Kardashevskiy
2025-05-30 16:23 ` Xu Yilun
2025-06-10 4:20 ` Alexey Kardashevskiy
2025-06-10 5:19 ` Baolu Lu
2025-06-10 6:53 ` Xu Yilun
2025-05-14 3:20 ` Xu Yilun
2025-06-10 4:37 ` Alexey Kardashevskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250123150212.GR5556@nvidia.com \
--to=jgg@nvidia.com \
--cc=aik@amd.com \
--cc=alex.williamson@redhat.com \
--cc=baolu.lu@linux.intel.com \
--cc=christian.koenig@amd.com \
--cc=dan.j.williams@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=hch@lst.de \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=leonro@nvidia.com \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=sumit.semwal@linaro.org \
--cc=tao1.su@intel.com \
--cc=vivek.kasireddy@intel.com \
--cc=yan.y.zhao@intel.com \
--cc=yilun.xu@intel.com \
--cc=yilun.xu@linux.intel.com \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).