From: David Hildenbrand <david@redhat.com>
To: Alistair Popple <apopple@nvidia.com>
Cc: "Jason Gunthorpe" <jgg@ziepe.ca>,
"Matthew Wilcox" <willy@infradead.org>,
"Yonatan Maman" <ymaman@nvidia.com>,
"Jérôme Glisse" <jglisse@redhat.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Leon Romanovsky" <leon@kernel.org>,
"Lyude Paul" <lyude@redhat.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Ben Skeggs" <bskeggs@nvidia.com>,
"Michael Guralnik" <michaelgur@nvidia.com>,
"Or Har-Toov" <ohartoov@nvidia.com>,
"Daisuke Matsuda" <dskmtsd@gmail.com>,
"Shay Drory" <shayd@nvidia.com>,
linux-mm@kvack.org, linux-rdma@vger.kernel.org,
dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
linux-kernel@vger.kernel.org, "Gal Shalom" <GalShalom@nvidia.com>
Subject: Re: [PATCH v2 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages
Date: Fri, 25 Jul 2025 11:51:54 +0200 [thread overview]
Message-ID: <e64ee7c6-5113-4180-94e8-2fd7e711d5e2@redhat.com> (raw)
In-Reply-To: <i5ya3n7bhhufpczprtp2ndg7bxtykoyjtsfae6dfdqk2rfz6ix@nzwnhqfwh6rq>
On 25.07.25 02:31, Alistair Popple wrote:
> On Thu, Jul 24, 2025 at 10:52:54AM +0200, David Hildenbrand wrote:
>> On 23.07.25 06:10, Alistair Popple wrote:
>>> On Wed, Jul 23, 2025 at 12:51:42AM -0300, Jason Gunthorpe wrote:
>>>> On Tue, Jul 22, 2025 at 10:49:10AM +1000, Alistair Popple wrote:
>>>>>> So what is it?
>>>>>
>>>>> IMHO a hack, because obviously we shouldn't require real physical addresses for
>>>>> something the CPU can't actually address anyway and this causes real
>>>>> problems
>>>>
>>>> IMHO what DEVICE PRIVATE really boils down to is a way to have swap
>>>> entries that point to some kind of opaque driver managed memory.
>>>>
>>>> We have alot of assumptions all over about pfn/phys to page
>>>> relationships so anything that has a struct page also has to come with
>>>> a fake PFN today..
>>>
>>> Hmm ... maybe. To get that PFN though we have to come from either a special
>>> swap entry which we already have special cases for, or a struct page (which is
>>> a device private page) which we mostly have to handle specially anyway. I'm not
>>> sure there's too many places that can sensibly handle a fake PFN without somehow
>>> already knowing it is device-private PFN.
>>>
>>>>> (eg. it doesn't actually work on anything other than x86_64). There's no reason
>>>>> the "PFN" we store in device-private entries couldn't instead just be an index
>>>>> into some data structure holding pointers to the struct pages. So instead of
>>>>> using pfn_to_page()/page_to_pfn() we would use device_private_index_to_page()
>>>>> and page_to_device_private_index().
>>>>
>>>> It could work, but any of the pfn conversions would have to be tracked
>>>> down.. Could be troublesome.
>>>
>>> I looked at this a while back and I'm reasonably optimistic that this is doable
>>> because we already have to treat these specially everywhere anyway.
>> How would that look like?
>>
>> E.g., we have code like
>>
>> if (is_device_private_entry(entry)) {
>> page = pfn_swap_entry_to_page(entry);
>> folio = page_folio(page);
>>
>> ...
>> folio_get(folio);
>> ...
>> }
>>
>> We could easily stop allowing pfn_swap_entry_to_page(), turning these into
>> non-pfn swap entries.
>>
>> Would it then be something like
>>
>> if (is_device_private_entry(entry)) {
>> page = device_private_entry_to_page(entry);
>>
>> ...
>> }
>>
>> Whereby device_private_entry_to_page() obtains the "struct page" not via the
>> PFN but some other magical (index) value?
>
> Exactly. The observation being that when you convert a PTE from a swap entry
> to a page we already know it is a device private entry, so can go look up the
> struct page with special magic (eg. an index into some other array or data
> structure).
>
> And if you have a struct page you already know it's a device private page so if
> you need to create the swap entry you can look up the magic index using some
> alternate function.
>
> The only issue would be if there were generic code paths that somehow have a
> raw pfn obtained from neither a page-table walk or struct page. My assumption
> (yet to be proven/tested) is that these paths don't exist.
I guess memory compaction and friends don't apply to ZONE_DEVICE, and
even memory_failure() handling goes a separate path.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2025-07-25 9:52 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-18 11:51 [PATCH v2 0/5] *** GPU Direct RDMA (P2P DMA) for Device Private Pages *** Yonatan Maman
2025-07-18 11:51 ` [PATCH v2 1/5] mm/hmm: HMM API to enable P2P DMA for device private pages Yonatan Maman
2025-07-18 14:17 ` Matthew Wilcox
2025-07-18 14:44 ` Jason Gunthorpe
2025-07-21 0:11 ` Alistair Popple
2025-07-21 13:23 ` Matthew Wilcox
2025-07-22 0:49 ` Alistair Popple
2025-07-23 3:51 ` Jason Gunthorpe
2025-07-23 4:10 ` Alistair Popple
2025-07-24 8:52 ` David Hildenbrand
2025-07-25 0:31 ` Alistair Popple
2025-07-25 9:51 ` David Hildenbrand [this message]
2025-08-01 16:40 ` Jason Gunthorpe
2025-08-01 16:50 ` David Hildenbrand
2025-08-01 16:57 ` Jason Gunthorpe
2025-08-04 1:51 ` Alistair Popple
2025-08-05 14:09 ` Jason Gunthorpe
2025-08-04 7:48 ` David Hildenbrand
2025-07-21 6:59 ` Christoph Hellwig
2025-07-22 5:42 ` Yonatan Maman
2025-08-01 16:52 ` Jason Gunthorpe
2025-07-18 11:51 ` [PATCH v2 2/5] nouveau/dmem: HMM P2P DMA for private dev pages Yonatan Maman
2025-07-21 7:00 ` Christoph Hellwig
2025-07-22 5:23 ` Yonatan Maman
2025-07-18 11:51 ` [PATCH v2 3/5] IB/core: P2P DMA for device private pages Yonatan Maman
2025-07-18 11:51 ` [PATCH v2 4/5] RDMA/mlx5: Enable P2P DMA with fallback mechanism Yonatan Maman
2025-07-21 7:03 ` Christoph Hellwig
2025-07-23 3:55 ` Jason Gunthorpe
2025-07-24 7:30 ` Christoph Hellwig
2025-08-01 16:46 ` Jason Gunthorpe
2025-07-18 11:51 ` [PATCH v2 5/5] RDMA/mlx5: Enabling ATS for ODP memory Yonatan Maman
2025-07-20 10:30 ` [PATCH v2 0/5] *** GPU Direct RDMA (P2P DMA) for Device Private Pages *** Leon Romanovsky
2025-07-20 21:03 ` Yonatan Maman
2025-07-21 6:49 ` Leon Romanovsky
2025-07-23 4:03 ` Jason Gunthorpe
2025-07-23 8:44 ` Leon Romanovsky
2025-07-21 6:54 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e64ee7c6-5113-4180-94e8-2fd7e711d5e2@redhat.com \
--to=david@redhat.com \
--cc=GalShalom@nvidia.com \
--cc=airlied@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=bskeggs@nvidia.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=dskmtsd@gmail.com \
--cc=jgg@ziepe.ca \
--cc=jglisse@redhat.com \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rdma@vger.kernel.org \
--cc=lyude@redhat.com \
--cc=michaelgur@nvidia.com \
--cc=nouveau@lists.freedesktop.org \
--cc=ohartoov@nvidia.com \
--cc=shayd@nvidia.com \
--cc=simona@ffwll.ch \
--cc=willy@infradead.org \
--cc=ymaman@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).