All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Yonatan Maman <ymaman@nvidia.com>,
	kherbst@redhat.com, lyude@redhat.com, dakr@redhat.com,
	airlied@gmail.com, simona@ffwll.ch, leon@kernel.org,
	 jglisse@redhat.com, akpm@linux-foundation.org,
	GalShalom@nvidia.com,  dri-devel@lists.freedesktop.org,
	nouveau@lists.freedesktop.org,  linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-mm@kvack.org,
		linux-tegra@vger.kernel.org
Subject: Re: [RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages
Date: Tue, 28 Jan 2025 17:32:23 +0100	[thread overview]
Message-ID: <b78d32e13811ef1fa57b0535749c811f2afb4dcd.camel@linux.intel.com> (raw)
In-Reply-To: <20250128151610.GC1524382@ziepe.ca>

On Tue, 2025-01-28 at 11:16 -0400, Jason Gunthorpe wrote:
> On Tue, Jan 28, 2025 at 03:48:54PM +0100, Thomas Hellström wrote:
> > On Tue, 2025-01-28 at 09:20 -0400, Jason Gunthorpe wrote:
> > > On Tue, Jan 28, 2025 at 09:51:52AM +0100, Thomas Hellström wrote:
> > > 
> > > > How would the pgmap device know whether P2P is actually
> > > > possible
> > > > without knowing the client device, (like calling
> > > > pci_p2pdma_distance)
> > > > and also if looking into access control, whether it is allowed?
> > > 
> > > The DMA API will do this, this happens after this patch is put on
> > > top
> > > of Leon's DMA API patches. The mapping operation will fail and it
> > > will
> > > likely be fatal to whatever is going on.
> > >  
> > > get_dma_pfn_for_device() returns a new PFN, but that is not a DMA
> > > mapped address, it is just a PFN that has another struct page
> > > under
> > > it.
> > > 
> > > There is an implicit assumption here that P2P will work and we
> > > don't
> > > need a 3rd case to handle non-working P2P..
> > 
> > OK. We will have the case where we want pfnmaps with driver-private
> > fast interconnects to return "interconnect possible, don't migrate"
> > whereas possibly other gpus and other devices would return
> > "interconnect unsuitable, do migrate", so (as I understand it)
> > something requiring a more flexible interface than this.
> 
> I'm not sure this doesn't handle that case?
> 
> Here we are talking about having DEVICE_PRIVATE struct page
> mappings. On a GPU this should represent GPU local memory that is
> non-coherent with the CPU, and not mapped into the CPU.
> 
> This series supports three case:
> 
>  1) pgmap->owner == range->dev_private_owner
>     This is "driver private fast interconnect" in this case HMM
> should
>     immediately return the page. The calling driver understands the
>     private parts of the pgmap and computes the private interconnect
>     address.
> 
>     This requires organizing your driver so that all private
>     interconnect has the same pgmap->owner.

Yes, although that makes this map static, since pgmap->owner has to be
set at pgmap creation time. and we were during initial discussions
looking at something dynamic here. However I think we can probably do
with a per-driver owner for now and get back if that's not sufficient.

> 
>  2) The page is DEVICE_PRIVATE and get_dma_pfn_for_device() exists.
>     The exporting driver has the option to return a P2P struct page
>     that can be used for PCI P2P without any migration. In a PCI GPU
>     context this means the GPU has mapped its local memory to a PCI
>     address. The assumption is that P2P always works and so this
>     address can be DMA'd from.

So do I understand it correctly, that the driver then needs to set up
one device_private struct page and one pcie_p2p struct page for each
page of device memory participating in this way?

> 
>  3) Migrate back to CPU memory - then eveything works.
> 
> Is that not enough? Where do you want something different?
> 
> > > > but leaves any dma- mapping or pfn mangling to be done after
> > > > the
> > > > call to hmm_range_fault(), since hmm_range_fault() really only
> > > > needs
> > > > to know whether it has to migrate to system or not.
> > > 
> > > See above, this is already the case..
> > 
> > Well what I meant was at hmm_range_fault() time only consider
> > whether
> > to migrate or not. Afterwards at dma-mapping time you'd expose the
> > alternative pfns that could be used for dma-mapping.
> 
> That sounds like you are talking about multipath, we are not really
> ready to tackle general multipath yet at the DMA API level, IMHO.
> 
> If you are just talking about your private multi-path, then that is
> already handled..

No, the issue I'm having with this is really why would
hmm_range_fault() need the new pfn when it could easily be obtained
from the device-private pfn by the hmm_range_fault() caller? The only
thing hmm_range_fault() needs to know is, again, whether to migrate or
not. But I guess if the plan is to have hmm_range_fault() call
pci_p2pdma_distance() on it, and we don't want the exporter to do that,
it makes sense.

> 
> > We were actually looking at a solution where the pagemap implements
> > something along
> > 
> > bool devmem_allowed(pagemap, client); //for hmm_range_fault
> > 
> > plus dma_map() and dma_unmap() methods.
> 
> This sounds like dmabuf philosophy, and I don't think we should go in
> this direction. The hmm caller should always be responsible for dma
> mapping and we need to improve the DMA API to make this work better,
> not build side hacks like this.
> 
> You can read my feelings and reasoning on this topic within this huge
> thread:
> 
> https://lore.kernel.org/dri-devel/20250108132358.GP5556@nvidia.com/
> 
> > In this way you'd don't need to expose special p2p dma pages and
> > the
> 
> Removing the "special p2p dma pages" has to be done by improving the
> DMA API to understand how to map phsyical addresses without struct
> page. We are working toward this, slowly.

> pgmap->ops->dma_map/unmap() ideas just repeat the DMABUF mistake
> of mis-using the DMA API for P2P cases. Today you cannot correctly
> DMA
> map P2P memory without the struct page.

Yeah, I don't want to drag hmm into that discussion, although
admittedly the idea of pgmap->ops->dma_map/unmap mimics the dma-buf
behaviour.

So anyway what we'll do is to try to use an interconnect-common owner
for now and revisit the problem if that's not sufficient so we can come
up with an acceptable solution.


/Thomas


  reply	other threads:[~2025-01-28 16:32 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-01 10:36 [RFC 0/5] GPU Direct RDMA (P2P DMA) for Device Private Pages Yonatan Maman
2024-12-01 10:36 ` [RFC 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages Yonatan Maman
2025-01-28  8:51   ` Thomas Hellström
2025-01-28 13:20     ` Jason Gunthorpe
2025-01-28 14:48       ` Thomas Hellström
2025-01-28 15:16         ` Jason Gunthorpe
2025-01-28 16:32           ` Thomas Hellström [this message]
2025-01-28 17:21             ` Jason Gunthorpe
2025-01-29 13:38               ` Simona Vetter
2025-01-29 13:47                 ` Jason Gunthorpe
2025-01-29 17:09                   ` Thomas Hellström
2025-01-30 10:50                   ` Simona Vetter
2025-01-30 13:23                     ` Jason Gunthorpe
2025-01-30 16:09                       ` Simona Vetter
2025-01-30 17:42                         ` Jason Gunthorpe
2025-01-31 16:59                           ` Simona Vetter
2025-02-03 15:08                             ` Jason Gunthorpe
2025-02-04  9:32                               ` Thomas Hellström
2025-02-04 13:26                                 ` Jason Gunthorpe
2025-02-04 14:29                                   ` Thomas Hellström
2025-02-04 19:16                                     ` Jason Gunthorpe
2025-02-04 22:01                                       ` Thomas Hellström
2024-12-01 10:36 ` [RFC 2/5] nouveau/dmem: HMM P2P DMA for private dev pages Yonatan Maman
2024-12-01 10:36 ` [RFC 3/5] IB/core: P2P DMA for device private pages Yonatan Maman
2024-12-01 10:36 ` [RFC 4/5] RDMA/mlx5: Add fallback for P2P DMA errors Yonatan Maman
2024-12-01 10:36 ` [RFC 5/5] RDMA/mlx5: Enabling ATS for ODP memory Yonatan Maman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b78d32e13811ef1fa57b0535749c811f2afb4dcd.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=GalShalom@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=dakr@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=kherbst@redhat.com \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=lyude@redhat.com \
    --cc=nouveau@lists.freedesktop.org \
    --cc=simona@ffwll.ch \
    --cc=ymaman@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.