linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: "Leon Romanovsky" <leon@kernel.org>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Leon Romanovsky" <leonro@nvidia.com>,
	"Christoph Hellwig" <hch@lst.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Christian König" <christian.koenig@amd.com>,
	dri-devel@lists.freedesktop.org, iommu@lists.linux.dev,
	"Jens Axboe" <axboe@kernel.dk>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Joerg Roedel" <joro@8bytes.org>,
	kvm@vger.kernel.org, linaro-mm-sig@lists.linaro.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-media@vger.kernel.org, linux-mm@kvack.org,
	linux-pci@vger.kernel.org,
	"Logan Gunthorpe" <logang@deltatee.com>,
	"Marek Szyprowski" <m.szyprowski@samsung.com>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Vivek Kasireddy" <vivek.kasireddy@intel.com>,
	"Will Deacon" <will@kernel.org>
Subject: Re: [PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO regions
Date: Wed, 30 Jul 2025 15:49:45 +0100	[thread overview]
Message-ID: <6c5fb9f0-c608-4e19-8c60-5d8cef3efbdf@arm.com> (raw)
In-Reply-To: <20250729201351.GA82395@nvidia.com>

On 2025-07-29 9:13 pm, Jason Gunthorpe wrote:
> On Tue, Jul 29, 2025 at 08:44:21PM +0100, Robin Murphy wrote:
> 
>> In this case with just one single
>> contiguous mapping, it is clearly objectively worse to have to bounce in and
>> out of the IOMMU layer 3 separate times and store a dma_map_state,
> 
> The non-contiguous mappings are comming back, it was in earlier drafts
> of this. Regardless, the point is to show how to use the general API
> that we would want to bring into the DRM drivers that don't have
> contiguity even though VFIO is a bit special.
> 
>> Oh yeah, and mapping MMIO with regular memory attributes (IOMMU_CACHE)
>> rather than appropriate ones (IOMMU_MMIO), as this will end up doing, isn't
>> guaranteed not to end badly either (e.g. if the system interconnect ends up
>> merging consecutive write bursts and exceeding the target root port's MPS.)
> 
> Yes, I recently noticed this too, it should be fixed..
> 
> But so we are all on the same page, alot of the PCI P2P systems are
> setup so P2P does not transit through the iommu. It either takes the
> ACS path through a switch or it uses ATS and takes a different ACS
> path through a switch. It only transits through the iommu in
> misconfigured systems or in the rarer case of P2P between root ports.

For non-ATS (and ATS Untranslated traffic), my understanding is that we 
rely on ACS upstream redirect to send transactions all the way up to the 
root port for translation (and without that then they are indeed pure 
bus addresses, take the pci_p2pdma_bus_addr_map() case, and the rest of 
this is all irrelevant). In Arm system terms, simpler root ports may 
well have to run that traffic out to an external SMMU TBU, at which 
point any P2P would loop back externally through the memory space window 
in the system interconnect PA space, as opposed to DTI-ATS root 
complexes that effectively implement their own internal translation 
agent on the PCIe side. Thus on some systems, even P2P behind a single 
root port may end up looking functionally the same as the cross-RP case, 
but in general cross-RP *is* something that people seem to care about as 
well. We're seeing more and more systems where each slot has its own RP 
as a separate segment, rather than giant root complexes with a host 
bridge and everyone on one big happy root bus together.

>> And again, if the IOMMU is in bypass (the idea of P2P with vfio-noiommu simply
>> isn't worth entertaining)
> 
> Not quite. DMABUF is sort of upside down.
> 
> For example if we are exporting a DMABUF from VFIO and importing it to
> RDMA then RDMA will call VFIO to make an attachment and the above VFIO
> code will perform the DMA map to the RDMA struct device. DMABUF
> returns a dma mapped scatterlist back to the RDMA driver.
> 
> The above dma_map_phys(rdma_dev,...) can be in bypass because the rdma
> device can legitimately be in bypass, or not have a iommu, or
> whatever.

I understand how dma-buf works - obviously DMA mapping for the VFIO 
device itself while it's not even attached to its default domain would 
be silly. I mean that any system that has 64-bit coherent PCIe behind an 
IOMMU such that this VFIO exporter could exist, is realistically going 
to have the same (or equivalent) IOMMU in front of any potential 
importers as well. *Especially* if you expect the normal case for P2P to 
be within a single hierarchy. Thus I was simply commenting that 
IOMMU_DOMAIN_IDENTITY is the *only* realistic reason to actually expect 
to interact with dma-direct here.

But of course, if it's not dma-direct because we're on POWER with TCE, 
rather than VFIO Type1 implying an iommu-dma/dma-direct arch, then who 
knows? I imagine the complete absence of any mention means this hasn't 
been tried, or possibly even considered?

>> AFAICS you're *depending* on this call being an effective no-op, and thus
>> only demonstrating that the dma_map_phys() idea is still entirely
>> unnecessary.
> 
> It should not be a full no-op, and it should be closer to
> dma map resource to avoid the mmio issues.

I don't get what you mean by "not be a full no-op", can you clarify 
exactly what you think it should be doing? Even if it's just the 
dma_capable() mask check equivalent to dma_direct_map_resource(), you 
don't actually want that here either - in that case you'd want to fail 
the entire attachment to begin with since it can never work.

> It should be failing for cases where it is not supported (ie
> swiotlb=force), it should still be calling the legacy dma_ops, and it
> should be undoing any CC mangling with the address. (also the
> pci_p2pdma_bus_addr_map() needs to deal with any CC issues too)

Um, my whole point is that the "legacy DMA ops" cannot be called, 
because they still assume page-backed memory, so at best are guaranteed 
to fail; any "CC mangling" assumed for memory is most likely wrong for 
MMIO, and there simply is no "deal with" at this point.

A device BAR is simply not under control of the trusted hypervisor the 
same way memory is; whatever (I/G)PA it is at must already be the 
correct address, if the aliasing scheme even applies at all. Sticking to 
Arm CCA terminology for example, if a device in shared state tries to 
import a BAR from a device in locked/private state, there is no notion 
of touching the shared alias and hoping it somehow magically works (at 
best it might throw the exporting device into TDISP error state 
terminally); that attachment simply cannot be allowed. If an shared 
resource exists in the shared IPA space to begin with, dma_to_phys() 
will do the wrong thing, and even phys_to_dma() would technically not 
walk dma_range_map correctly, because both assume "phys" represents 
kernel memory. However it's also all moot since any attempt at any 
combination will fail anyway due to SWIOTLB being forced by 
is_realm_world().

(OK, I admit "crash" wasn't strictly the right word to use there - I 
keep forgetting that some of the P2P scatterlist support in dma-direct 
ended up affecting the map_page path too, even though that was never 
really the functional intent - but hey, the overall result of failing to 
work as expected is the same.)

Thanks,
Robin.


  parent reply	other threads:[~2025-07-30 14:49 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-23 13:00 [PATCH 00/10] vfio/pci: Allow MMIO regions to be exported through dma-buf Leon Romanovsky
2025-07-23 13:00 ` [PATCH 01/10] PCI/P2PDMA: Remove redundant bus_offset from map state Leon Romanovsky
2025-07-24  7:50   ` Christoph Hellwig
2025-07-23 13:00 ` [PATCH 02/10] PCI/P2PDMA: Introduce p2pdma_provider structure for cleaner abstraction Leon Romanovsky
2025-07-24  7:51   ` Christoph Hellwig
2025-07-24  7:55     ` Leon Romanovsky
2025-07-24  7:59       ` Christoph Hellwig
2025-07-24  8:07         ` Leon Romanovsky
2025-07-27 18:51         ` Jason Gunthorpe
2025-07-29  7:52           ` Christoph Hellwig
2025-07-29  8:53             ` Leon Romanovsky
2025-07-29 10:41               ` Christoph Hellwig
2025-07-29 11:39                 ` Leon Romanovsky
2025-07-29 13:15             ` Jason Gunthorpe
2025-07-29 16:12   ` Jason Gunthorpe
2025-07-23 13:00 ` [PATCH 03/10] PCI/P2PDMA: Simplify bus address mapping API Leon Romanovsky
2025-07-24  7:52   ` Christoph Hellwig
2025-07-23 13:00 ` [PATCH 04/10] PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation Leon Romanovsky
2025-07-23 13:00 ` [PATCH 05/10] PCI/P2PDMA: Export pci_p2pdma_map_type() function Leon Romanovsky
2025-07-24  8:03   ` Christoph Hellwig
2025-07-24  8:13     ` Leon Romanovsky
2025-07-25 16:30       ` Logan Gunthorpe
2025-07-25 18:54         ` Leon Romanovsky
2025-07-25 19:12           ` Logan Gunthorpe
2025-07-27  6:01             ` Leon Romanovsky
2025-07-27 19:05         ` Jason Gunthorpe
2025-07-28 16:12           ` Logan Gunthorpe
2025-07-28 16:41             ` Leon Romanovsky
2025-07-28 17:07               ` Logan Gunthorpe
2025-07-28 23:11                 ` Jason Gunthorpe
2025-07-29 20:54                   ` Logan Gunthorpe
2025-07-29 22:14                     ` Jason Gunthorpe
2025-07-30  8:03                     ` Leon Romanovsky
2025-07-29  7:52       ` Christoph Hellwig
2025-07-29  8:45         ` Leon Romanovsky
2025-07-27 19:02     ` Jason Gunthorpe
2025-07-23 13:00 ` [PATCH 06/10] types: move phys_vec definition to common header Leon Romanovsky
2025-07-23 13:00 ` [PATCH 07/10] vfio: Export vfio device get and put registration helpers Leon Romanovsky
2025-07-23 13:00 ` [PATCH 08/10] vfio/pci: Enable peer-to-peer DMA transactions by default Leon Romanovsky
2025-07-23 13:00 ` [PATCH 09/10] vfio/pci: Share the core device pointer while invoking feature functions Leon Romanovsky
2025-07-28 20:55   ` Alex Williamson
2025-07-29  8:39     ` Leon Romanovsky
2025-07-23 13:00 ` [PATCH 10/10] vfio/pci: Add dma-buf export support for MMIO regions Leon Romanovsky
2025-07-24  5:13   ` Kasireddy, Vivek
2025-07-24  5:44     ` Leon Romanovsky
2025-07-25  5:34       ` Kasireddy, Vivek
2025-07-27  6:16         ` Leon Romanovsky
2025-07-29 19:44   ` Robin Murphy
2025-07-29 20:13     ` Jason Gunthorpe
2025-07-30  9:32       ` Leon Romanovsky
2025-07-30 14:49       ` Robin Murphy [this message]
2025-07-30 16:01         ` Jason Gunthorpe
2025-07-30 19:58 ` [PATCH 00/10] vfio/pci: Allow MMIO regions to be exported through dma-buf Alex Williamson
2025-07-31  0:21   ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6c5fb9f0-c608-4e19-8c60-5d8cef3efbdf@arm.com \
    --to=robin.murphy@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bhelgaas@google.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=m.szyprowski@samsung.com \
    --cc=sumit.semwal@linaro.org \
    --cc=vivek.kasireddy@intel.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).