From: Matthew Brost <matthew.brost@intel.com>
To: "Mika Penttilä" <mpenttil@redhat.com>
Cc: Francois Dugast <francois.dugast@intel.com>, <airlied@gmail.com>,
<akpm@linux-foundation.org>, <apopple@nvidia.com>,
<baohua@kernel.org>, <baolin.wang@linux.alibaba.com>,
<dakr@kernel.org>, <david@redhat.com>, <donettom@linux.ibm.com>,
<jane.chu@oracle.com>, <jglisse@redhat.com>, <kherbst@redhat.com>,
<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
<lyude@redhat.com>, <peterx@redhat.com>, <ryan.roberts@arm.com>,
<shuah@kernel.org>, <simona@ffwll.ch>,
<wangkefeng.wang@huawei.com>, <willy@infradead.org>,
<ziy@nvidia.com>, Balbir Singh <balbirs@nvidia.com>,
<jgg@nvidia.com>, Leon Romanovsky <leonro@nvidia.com>
Subject: Re: [PATCH] mm/hmm: Do not fault in device private pages owned by the caller
Date: Wed, 23 Jul 2025 22:57:46 -0700 [thread overview]
Message-ID: <aIHLWnjzKWma1NLC@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <dad71615-0eba-4a8d-abfc-979fb815511c@redhat.com>
On Thu, Jul 24, 2025 at 08:46:11AM +0300, Mika Penttilä wrote:
>
> On 7/24/25 08:02, Matthew Brost wrote:
> > On Thu, Jul 24, 2025 at 10:25:11AM +1000, Balbir Singh wrote:
> >> On 7/23/25 05:34, Francois Dugast wrote:
> >>> When the PMD swap entry is device private and owned by the caller,
> >>> skip the range faulting and instead just set the correct HMM PFNs.
> >>> This is similar to the logic for PTEs in hmm_vma_handle_pte().
> >>>
> >>> For now, each hmm_pfns[i] entry is populated as it is currently done
> >>> in hmm_vma_handle_pmd() but this might not be necessary. A follow-up
> >>> optimization could be to make use of the order and skip populating
> >>> subsequent PFNs.
> >> I think we should test and remove these now
> >>
> > +Jason, Leon – perhaps either of you can provide insight into why
> > hmm_vma_handle_pmd fully populates the HMM PFNs when a higher-order page
> > is found.
> >
> > If we can be assured that changing this won’t break other parts of the
> > kernel, I agree it should be removed. A snippet of documentation should
> > also be added indicating that when higher-order PFNs are found,
> > subsequent PFNs within the range will remain unpopulated. I can verify
> > that GPU SVM works just fine without these PFNs being populated.
>
> afaics the device can consume the range as smaller pages also, and some
> hmm users depend on that.
>
Sure, but I think that should be fixed in the device code. If a
large-order PFN is found, the subsequent PFNs can clearly be inferred.
It's a micro-optimization here, but devices or callers capable of
handling this properly shouldn't force a hacky, less optimal behavior on
core code. If anything relies on the current behavior, we should fix it
and ensure correctness.
Matt
>
> > Matt
>
>
> --Mika
>
>
> >
> >>> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
> >>> ---
> >>> mm/hmm.c | 25 +++++++++++++++++++++++++
> >>> 1 file changed, 25 insertions(+)
> >>>
> >>> diff --git a/mm/hmm.c b/mm/hmm.c
> >>> index f2415b4b2cdd..63ec1b18a656 100644
> >>> --- a/mm/hmm.c
> >>> +++ b/mm/hmm.c
> >>> @@ -355,6 +355,31 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
> >>> }
> >>>
> >>> if (!pmd_present(pmd)) {
> >>> + swp_entry_t entry = pmd_to_swp_entry(pmd);
> >>> +
> >>> + /*
> >>> + * Don't fault in device private pages owned by the caller,
> >>> + * just report the PFNs.
> >>> + */
> >>> + if (is_device_private_entry(entry) &&
> >>> + pfn_swap_entry_folio(entry)->pgmap->owner ==
> >>> + range->dev_private_owner) {
> >>> + unsigned long cpu_flags = HMM_PFN_VALID |
> >>> + hmm_pfn_flags_order(PMD_SHIFT - PAGE_SHIFT);
> >>> + unsigned long pfn = swp_offset_pfn(entry);
> >>> + unsigned long i;
> >>> +
> >>> + if (is_writable_device_private_entry(entry))
> >>> + cpu_flags |= HMM_PFN_WRITE;
> >>> +
> >>> + for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) {
> >>> + hmm_pfns[i] &= HMM_PFN_INOUT_FLAGS;
> >>> + hmm_pfns[i] |= pfn | cpu_flags;
> >>> + }
> >>> +
> >> As discussed, can we remove these.
> >>
> >>> + return 0;
> >>> + }
> >> All of this be under CONFIG_ARCH_ENABLE_THP_MIGRATION
> >>
> >>> +
> >>> if (hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, 0))
> >>> return -EFAULT;
> >>> return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
> >>
> >>
> >> Balbir Singh
>
next prev parent reply other threads:[~2025-07-24 5:56 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-03 23:34 [v1 resend 00/12] THP support for zone device page migration Balbir Singh
2025-07-03 23:35 ` [v1 resend 01/12] mm/zone_device: support large zone device private folios Balbir Singh
2025-07-07 5:28 ` Alistair Popple
2025-07-08 6:47 ` Balbir Singh
2025-07-03 23:35 ` [v1 resend 02/12] mm/migrate_device: flags for selecting device private THP pages Balbir Singh
2025-07-07 5:31 ` Alistair Popple
2025-07-08 7:31 ` Balbir Singh
2025-07-19 20:06 ` Matthew Brost
2025-07-19 20:16 ` Matthew Brost
2025-07-18 3:15 ` Matthew Brost
2025-07-03 23:35 ` [v1 resend 03/12] mm/thp: zone_device awareness in THP handling code Balbir Singh
2025-07-04 4:46 ` Mika Penttilä
2025-07-06 1:21 ` Balbir Singh
2025-07-04 11:10 ` Mika Penttilä
2025-07-05 0:14 ` Balbir Singh
2025-07-07 6:09 ` Alistair Popple
2025-07-08 7:40 ` Balbir Singh
2025-07-07 3:49 ` Mika Penttilä
2025-07-08 4:20 ` Balbir Singh
2025-07-08 4:30 ` Mika Penttilä
2025-07-07 6:07 ` Alistair Popple
2025-07-08 4:59 ` Balbir Singh
2025-07-22 4:42 ` Matthew Brost
2025-07-03 23:35 ` [v1 resend 04/12] mm/migrate_device: THP migration of zone device pages Balbir Singh
2025-07-04 15:35 ` kernel test robot
2025-07-18 6:59 ` Matthew Brost
2025-07-18 7:04 ` Balbir Singh
2025-07-18 7:21 ` Matthew Brost
2025-07-18 8:22 ` Matthew Brost
2025-07-22 4:54 ` Matthew Brost
2025-07-19 2:10 ` Matthew Brost
2025-07-03 23:35 ` [v1 resend 05/12] mm/memory/fault: add support for zone device THP fault handling Balbir Singh
2025-07-17 19:34 ` Matthew Brost
2025-07-03 23:35 ` [v1 resend 06/12] lib/test_hmm: test cases and support for zone device private THP Balbir Singh
2025-07-03 23:35 ` [v1 resend 07/12] mm/memremap: add folio_split support Balbir Singh
2025-07-04 11:14 ` Mika Penttilä
2025-07-06 1:24 ` Balbir Singh
2025-07-03 23:35 ` [v1 resend 08/12] mm/thp: add split during migration support Balbir Singh
2025-07-04 5:17 ` Mika Penttilä
2025-07-04 6:43 ` Mika Penttilä
2025-07-05 0:26 ` Balbir Singh
2025-07-05 3:17 ` Mika Penttilä
2025-07-07 2:35 ` Balbir Singh
2025-07-07 3:29 ` Mika Penttilä
2025-07-08 7:37 ` Balbir Singh
2025-07-04 11:24 ` Zi Yan
2025-07-05 0:58 ` Balbir Singh
2025-07-05 1:55 ` Zi Yan
2025-07-06 1:15 ` Balbir Singh
2025-07-06 1:34 ` Zi Yan
2025-07-06 1:47 ` Balbir Singh
2025-07-06 2:34 ` Zi Yan
2025-07-06 3:03 ` Zi Yan
2025-07-07 2:29 ` Balbir Singh
2025-07-07 2:45 ` Zi Yan
2025-07-08 3:31 ` Balbir Singh
2025-07-08 7:43 ` Balbir Singh
2025-07-16 5:34 ` Matthew Brost
2025-07-16 11:19 ` Zi Yan
2025-07-16 16:24 ` Matthew Brost
2025-07-16 21:53 ` Balbir Singh
2025-07-17 22:24 ` Matthew Brost
2025-07-17 23:04 ` Zi Yan
2025-07-18 0:41 ` Matthew Brost
2025-07-18 1:25 ` Zi Yan
2025-07-18 3:33 ` Matthew Brost
2025-07-18 15:06 ` Zi Yan
2025-07-23 0:00 ` Matthew Brost
2025-07-03 23:35 ` [v1 resend 09/12] lib/test_hmm: add test case for split pages Balbir Singh
2025-07-03 23:35 ` [v1 resend 10/12] selftests/mm/hmm-tests: new tests for zone device THP migration Balbir Singh
2025-07-03 23:35 ` [v1 resend 11/12] gpu/drm/nouveau: add THP migration support Balbir Singh
2025-07-03 23:35 ` [v1 resend 12/12] selftests/mm/hmm-tests: new throughput tests including THP Balbir Singh
2025-07-04 16:16 ` [v1 resend 00/12] THP support for zone device page migration Zi Yan
2025-07-04 23:56 ` Balbir Singh
2025-07-08 14:53 ` David Hildenbrand
2025-07-08 22:43 ` Balbir Singh
2025-07-17 23:40 ` Matthew Brost
2025-07-18 3:57 ` Balbir Singh
2025-07-18 4:57 ` Matthew Brost
2025-07-21 23:48 ` Balbir Singh
2025-07-22 0:07 ` Matthew Brost
2025-07-22 0:51 ` Balbir Singh
2025-07-19 0:53 ` Matthew Brost
2025-07-21 11:42 ` Francois Dugast
2025-07-21 23:34 ` Balbir Singh
2025-07-22 0:01 ` Matthew Brost
2025-07-22 19:34 ` [PATCH] mm/hmm: Do not fault in device private pages owned by the caller Francois Dugast
2025-07-22 20:07 ` Andrew Morton
2025-07-23 15:34 ` Francois Dugast
2025-07-23 18:05 ` Matthew Brost
2025-07-24 0:25 ` Balbir Singh
2025-07-24 5:02 ` Matthew Brost
2025-07-24 5:46 ` Mika Penttilä
2025-07-24 5:57 ` Matthew Brost [this message]
2025-07-24 6:04 ` Mika Penttilä
2025-07-24 6:47 ` Leon Romanovsky
2025-07-28 13:34 ` Jason Gunthorpe
2025-08-08 0:21 ` Matthew Brost
2025-08-08 9:43 ` Francois Dugast
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aIHLWnjzKWma1NLC@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=airlied@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=balbirs@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=dakr@kernel.org \
--cc=david@redhat.com \
--cc=donettom@linux.ibm.com \
--cc=francois.dugast@intel.com \
--cc=jane.chu@oracle.com \
--cc=jgg@nvidia.com \
--cc=jglisse@redhat.com \
--cc=kherbst@redhat.com \
--cc=leonro@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lyude@redhat.com \
--cc=mpenttil@redhat.com \
--cc=peterx@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=shuah@kernel.org \
--cc=simona@ffwll.ch \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).