linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Alistair Popple <apopple@nvidia.com>
Cc: "Jason Gunthorpe" <jgg@ziepe.ca>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Yonatan Maman" <ymaman@nvidia.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Leon Romanovsky" <leon@kernel.org>,
	"Lyude Paul" <lyude@redhat.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Ben Skeggs" <bskeggs@nvidia.com>,
	"Michael Guralnik" <michaelgur@nvidia.com>,
	"Or Har-Toov" <ohartoov@nvidia.com>,
	"Daisuke Matsuda" <dskmtsd@gmail.com>,
	"Shay Drory" <shayd@nvidia.com>,
	linux-mm@kvack.org, linux-rdma@vger.kernel.org,
	dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, "Gal Shalom" <GalShalom@nvidia.com>
Subject: Re: [PATCH v2 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages
Date: Fri, 25 Jul 2025 11:51:54 +0200	[thread overview]
Message-ID: <e64ee7c6-5113-4180-94e8-2fd7e711d5e2@redhat.com> (raw)
In-Reply-To: <i5ya3n7bhhufpczprtp2ndg7bxtykoyjtsfae6dfdqk2rfz6ix@nzwnhqfwh6rq>

On 25.07.25 02:31, Alistair Popple wrote:
> On Thu, Jul 24, 2025 at 10:52:54AM +0200, David Hildenbrand wrote:
>> On 23.07.25 06:10, Alistair Popple wrote:
>>> On Wed, Jul 23, 2025 at 12:51:42AM -0300, Jason Gunthorpe wrote:
>>>> On Tue, Jul 22, 2025 at 10:49:10AM +1000, Alistair Popple wrote:
>>>>>> So what is it?
>>>>>
>>>>> IMHO a hack, because obviously we shouldn't require real physical addresses for
>>>>> something the CPU can't actually address anyway and this causes real
>>>>> problems
>>>>
>>>> IMHO what DEVICE PRIVATE really boils down to is a way to have swap
>>>> entries that point to some kind of opaque driver managed memory.
>>>>
>>>> We have alot of assumptions all over about pfn/phys to page
>>>> relationships so anything that has a struct page also has to come with
>>>> a fake PFN today..
>>>
>>> Hmm ... maybe. To get that PFN though we have to come from either a special
>>> swap entry which we already have special cases for, or a struct page (which is
>>> a device private page) which we mostly have to handle specially anyway. I'm not
>>> sure there's too many places that can sensibly handle a fake PFN without somehow
>>> already knowing it is device-private PFN.
>>>
>>>>> (eg. it doesn't actually work on anything other than x86_64). There's no reason
>>>>> the "PFN" we store in device-private entries couldn't instead just be an index
>>>>> into some data structure holding pointers to the struct pages. So instead of
>>>>> using pfn_to_page()/page_to_pfn() we would use device_private_index_to_page()
>>>>> and page_to_device_private_index().
>>>>
>>>> It could work, but any of the pfn conversions would have to be tracked
>>>> down.. Could be troublesome.
>>>
>>> I looked at this a while back and I'm reasonably optimistic that this is doable
>>> because we already have to treat these specially everywhere anyway.
>> How would that look like?
>>
>> E.g., we have code like
>>
>> if (is_device_private_entry(entry)) {
>> 	page = pfn_swap_entry_to_page(entry);
>> 	folio = page_folio(page);
>>
>> 	...
>> 	folio_get(folio);
>> 	...
>> }
>>
>> We could easily stop allowing pfn_swap_entry_to_page(), turning these into
>> non-pfn swap entries.
>>
>> Would it then be something like
>>
>> if (is_device_private_entry(entry)) {
>> 	page = device_private_entry_to_page(entry);
>> 	
>> 	...
>> }
>>
>> Whereby device_private_entry_to_page() obtains the "struct page" not via the
>> PFN but some other magical (index) value?
> 
> Exactly. The observation being that when you convert a PTE from a swap entry
> to a page we already know it is a device private entry, so can go look up the
> struct page with special magic (eg. an index into some other array or data
> structure).
> 
> And if you have a struct page you already know it's a device private page so if
> you need to create the swap entry you can look up the magic index using some
> alternate function.
> 
> The only issue would be if there were generic code paths that somehow have a
> raw pfn obtained from neither a page-table walk or struct page. My assumption
> (yet to be proven/tested) is that these paths don't exist.

I guess memory compaction and friends don't apply to ZONE_DEVICE, and 
even memory_failure() handling goes a separate path.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2025-07-25  9:52 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-18 11:51 [PATCH v2 0/5] *** GPU Direct RDMA (P2P DMA) for Device Private Pages *** Yonatan Maman
2025-07-18 11:51 ` [PATCH v2 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages Yonatan Maman
2025-07-18 14:17   ` Matthew Wilcox
2025-07-18 14:44     ` Jason Gunthorpe
2025-07-21  0:11       ` Alistair Popple
2025-07-21 13:23       ` Matthew Wilcox
2025-07-22  0:49         ` Alistair Popple
2025-07-23  3:51           ` Jason Gunthorpe
2025-07-23  4:10             ` Alistair Popple
2025-07-24  8:52               ` David Hildenbrand
2025-07-25  0:31                 ` Alistair Popple
2025-07-25  9:51                   ` David Hildenbrand [this message]
2025-08-01 16:40                   ` Jason Gunthorpe
2025-08-01 16:50                     ` David Hildenbrand
2025-08-01 16:57                       ` Jason Gunthorpe
2025-08-04  1:51                         ` Alistair Popple
2025-08-05 14:09                           ` Jason Gunthorpe
2025-08-04  7:48                         ` David Hildenbrand
2025-07-21  6:59   ` Christoph Hellwig
2025-07-22  5:42     ` Yonatan Maman
2025-08-01 16:52     ` Jason Gunthorpe
2025-07-18 11:51 ` [PATCH v2 2/5] nouveau/dmem: HMM P2P DMA for private dev pages Yonatan Maman
2025-07-21  7:00   ` Christoph Hellwig
2025-07-22  5:23     ` Yonatan Maman
2025-07-18 11:51 ` [PATCH v2 3/5] IB/core: P2P DMA for device private pages Yonatan Maman
2025-07-18 11:51 ` [PATCH v2 4/5] RDMA/mlx5: Enable P2P DMA with fallback mechanism Yonatan Maman
2025-07-21  7:03   ` Christoph Hellwig
2025-07-23  3:55     ` Jason Gunthorpe
2025-07-24  7:30       ` Christoph Hellwig
2025-08-01 16:46         ` Jason Gunthorpe
2025-07-18 11:51 ` [PATCH v2 5/5] RDMA/mlx5: Enabling ATS for ODP memory Yonatan Maman
2025-07-20 10:30 ` [PATCH v2 0/5] *** GPU Direct RDMA (P2P DMA) for Device Private Pages *** Leon Romanovsky
2025-07-20 21:03   ` Yonatan Maman
2025-07-21  6:49     ` Leon Romanovsky
2025-07-23  4:03       ` Jason Gunthorpe
2025-07-23  8:44         ` Leon Romanovsky
2025-07-21  6:54 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e64ee7c6-5113-4180-94e8-2fd7e711d5e2@redhat.com \
    --to=david@redhat.com \
    --cc=GalShalom@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=bskeggs@nvidia.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=dskmtsd@gmail.com \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=lyude@redhat.com \
    --cc=michaelgur@nvidia.com \
    --cc=nouveau@lists.freedesktop.org \
    --cc=ohartoov@nvidia.com \
    --cc=shayd@nvidia.com \
    --cc=simona@ffwll.ch \
    --cc=willy@infradead.org \
    --cc=ymaman@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).