All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: Balbir Singh <balbirs@nvidia.com>
Cc: linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	"David Hildenbrand" <david@redhat.com>, "Zi Yan" <ziy@nvidia.com>,
	"Joshua Hahn" <joshua.hahnjy@gmail.com>,
	"Rakie Kim" <rakie.kim@sk.com>,
	"Byungchul Park" <byungchul@sk.com>,
	"Gregory Price" <gourry@gourry.net>,
	"Ying Huang" <ying.huang@linux.alibaba.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
	"Baolin Wang" <baolin.wang@linux.alibaba.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	"Nico Pache" <npache@redhat.com>,
	"Ryan Roberts" <ryan.roberts@arm.com>,
	"Dev Jain" <dev.jain@arm.com>, "Barry Song" <baohua@kernel.org>,
	"Lyude Paul" <lyude@redhat.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"Mika Penttilä" <mpenttil@redhat.com>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Francois Dugast" <francois.dugast@intel.com>
Subject: Re: [v7 03/16] mm/huge_memory: add device-private THP support to PMD operations
Date: Mon, 13 Oct 2025 09:48:35 +0800	[thread overview]
Message-ID: <83492e9c-3f17-42e5-8897-9c0ed5aa76e7@linux.dev> (raw)
In-Reply-To: <1b311458-957a-4f0d-b7f9-51e75bbabd55@nvidia.com>



On 2025/10/13 08:01, Balbir Singh wrote:
> On 10/13/25 02:46, Lance Yang wrote:
>> On Wed, Oct 1, 2025 at 4:20 PM Balbir Singh <balbirs@nvidia.com> wrote:
>>>
>>> Extend core huge page management functions to handle device-private THP
>>> entries.  This enables proper handling of large device-private folios in
>>> fundamental MM operations.
>>>
>>> The following functions have been updated:
>>>
>>> - copy_huge_pmd(): Handle device-private entries during fork/clone
>>> - zap_huge_pmd(): Properly free device-private THP during munmap
>>> - change_huge_pmd(): Support protection changes on device-private THP
>>> - __pte_offset_map(): Add device-private entry awareness
>>>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Zi Yan <ziy@nvidia.com>
>>> Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
>>> Cc: Rakie Kim <rakie.kim@sk.com>
>>> Cc: Byungchul Park <byungchul@sk.com>
>>> Cc: Gregory Price <gourry@gourry.net>
>>> Cc: Ying Huang <ying.huang@linux.alibaba.com>
>>> Cc: Alistair Popple <apopple@nvidia.com>
>>> Cc: Oscar Salvador <osalvador@suse.de>
>>> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
>>> Cc: Nico Pache <npache@redhat.com>
>>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>>> Cc: Dev Jain <dev.jain@arm.com>
>>> Cc: Barry Song <baohua@kernel.org>
>>> Cc: Lyude Paul <lyude@redhat.com>
>>> Cc: Danilo Krummrich <dakr@kernel.org>
>>> Cc: David Airlie <airlied@gmail.com>
>>> Cc: Simona Vetter <simona@ffwll.ch>
>>> Cc: Ralph Campbell <rcampbell@nvidia.com>
>>> Cc: Mika Penttilä <mpenttil@redhat.com>
>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>> Cc: Francois Dugast <francois.dugast@intel.com>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Acked-by: Zi Yan <ziy@nvidia.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> Signed-off-by: Balbir Singh <balbirs@nvidia.com>
>>> ---
>>>   include/linux/swapops.h | 32 +++++++++++++++++++++++
>>>   mm/huge_memory.c        | 56 ++++++++++++++++++++++++++++++++++-------
>>>   mm/pgtable-generic.c    |  2 +-
>>>   3 files changed, 80 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/include/linux/swapops.h b/include/linux/swapops.h
>>> index 64ea151a7ae3..2687928a8146 100644
>>> --- a/include/linux/swapops.h
>>> +++ b/include/linux/swapops.h
>>> @@ -594,10 +594,42 @@ static inline int is_pmd_migration_entry(pmd_t pmd)
>>>   }
>>>   #endif  /* CONFIG_ARCH_ENABLE_THP_MIGRATION */
>>>
>>> +#if defined(CONFIG_ZONE_DEVICE) && defined(CONFIG_ARCH_ENABLE_THP_MIGRATION)
>>> +
>>> +/**
>>> + * is_pmd_device_private_entry() - Check if PMD contains a device private swap entry
>>> + * @pmd: The PMD to check
>>> + *
>>> + * Returns true if the PMD contains a swap entry that represents a device private
>>> + * page mapping. This is used for zone device private pages that have been
>>> + * swapped out but still need special handling during various memory management
>>> + * operations.
>>> + *
>>> + * Return: 1 if PMD contains device private entry, 0 otherwise
>>> + */
>>> +static inline int is_pmd_device_private_entry(pmd_t pmd)
>>> +{
>>> +       return is_swap_pmd(pmd) && is_device_private_entry(pmd_to_swp_entry(pmd));
>>> +}
>>> +
>>> +#else /* CONFIG_ZONE_DEVICE && CONFIG_ARCH_ENABLE_THP_MIGRATION */
>>> +
>>> +static inline int is_pmd_device_private_entry(pmd_t pmd)
>>> +{
>>> +       return 0;
>>> +}
>>> +
>>> +#endif /* CONFIG_ZONE_DEVICE && CONFIG_ARCH_ENABLE_THP_MIGRATION */
>>> +
>>>   static inline int non_swap_entry(swp_entry_t entry)
>>>   {
>>>          return swp_type(entry) >= MAX_SWAPFILES;
>>>   }
>>>
>>> +static inline int is_pmd_non_present_folio_entry(pmd_t pmd)
>>> +{
>>> +       return is_pmd_migration_entry(pmd) || is_pmd_device_private_entry(pmd);
>>> +}
>>> +
>>>   #endif /* CONFIG_MMU */
>>>   #endif /* _LINUX_SWAPOPS_H */
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 1b81680b4225..8e0a1747762d 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -1703,17 +1703,45 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>>>          if (unlikely(is_swap_pmd(pmd))) {
>>>                  swp_entry_t entry = pmd_to_swp_entry(pmd);
>>>
>>> -               VM_BUG_ON(!is_pmd_migration_entry(pmd));
>>> -               if (!is_readable_migration_entry(entry)) {
>>> -                       entry = make_readable_migration_entry(
>>> -                                                       swp_offset(entry));
>>> +               VM_WARN_ON(!is_pmd_non_present_folio_entry(pmd));
>>> +
>>> +               if (is_writable_migration_entry(entry) ||
>>> +                   is_readable_exclusive_migration_entry(entry)) {
>>> +                       entry = make_readable_migration_entry(swp_offset(entry));
>>>                          pmd = swp_entry_to_pmd(entry);
>>>                          if (pmd_swp_soft_dirty(*src_pmd))
>>>                                  pmd = pmd_swp_mksoft_dirty(pmd);
>>>                          if (pmd_swp_uffd_wp(*src_pmd))
>>>                                  pmd = pmd_swp_mkuffd_wp(pmd);
>>>                          set_pmd_at(src_mm, addr, src_pmd, pmd);
>>> +               } else if (is_device_private_entry(entry)) {
>>> +                       /*
>>> +                        * For device private entries, since there are no
>>> +                        * read exclusive entries, writable = !readable
>>> +                        */
>>> +                       if (is_writable_device_private_entry(entry)) {
>>> +                               entry = make_readable_device_private_entry(swp_offset(entry));
>>> +                               pmd = swp_entry_to_pmd(entry);
>>> +
>>> +                               if (pmd_swp_soft_dirty(*src_pmd))
>>> +                                       pmd = pmd_swp_mksoft_dirty(pmd);
>>> +                               if (pmd_swp_uffd_wp(*src_pmd))
>>> +                                       pmd = pmd_swp_mkuffd_wp(pmd);
>>> +                               set_pmd_at(src_mm, addr, src_pmd, pmd);
>>> +                       }
>>> +
>>> +                       src_folio = pfn_swap_entry_folio(entry);
>>> +                       VM_WARN_ON(!folio_test_large(src_folio));
>>> +
>>> +                       folio_get(src_folio);
>>> +                       /*
>>> +                        * folio_try_dup_anon_rmap_pmd does not fail for
>>> +                        * device private entries.
>>> +                        */
>>> +                       folio_try_dup_anon_rmap_pmd(src_folio, &src_folio->page,
>>> +                                                       dst_vma, src_vma);
>>>                  }
>>> +
>>>                  add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
>>>                  mm_inc_nr_ptes(dst_mm);
>>>                  pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
>>> @@ -2211,15 +2239,16 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>>                          folio_remove_rmap_pmd(folio, page, vma);
>>>                          WARN_ON_ONCE(folio_mapcount(folio) < 0);
>>>                          VM_BUG_ON_PAGE(!PageHead(page), page);
>>> -               } else if (thp_migration_supported()) {
>>> +               } else if (is_pmd_non_present_folio_entry(orig_pmd)) {
>>>                          swp_entry_t entry;
>>>
>>> -                       VM_BUG_ON(!is_pmd_migration_entry(orig_pmd));
>>>                          entry = pmd_to_swp_entry(orig_pmd);
>>>                          folio = pfn_swap_entry_folio(entry);
>>>                          flush_needed = 0;
>>> -               } else
>>> -                       WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
>>> +
>>> +                       if (!thp_migration_supported())
>>> +                               WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
>>> +               }
>>>
>>>                  if (folio_test_anon(folio)) {
>>>                          zap_deposited_table(tlb->mm, pmd);
>>> @@ -2239,6 +2268,12 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>>                                  folio_mark_accessed(folio);
>>>                  }
>>>
>>> +               if (folio_is_device_private(folio)) {
>>> +                       folio_remove_rmap_pmd(folio, &folio->page, vma);
>>> +                       WARN_ON_ONCE(folio_mapcount(folio) < 0);
>>> +                       folio_put(folio);
>>> +               }
>>
>> IIUC, a device-private THP is always anonymous, right? would it make sense
>> to move this folio_is_device_private() block inside the folio_test_anon()
>> check above?
>>
> Yes, they are, there is discussion on file-backed mapping at
> https://lwn.net/Articles/1016124/. I don't see a benefit from moving it, do you?

Ah, I see. Never mind :)

Cheers,
Lance



  reply	other threads:[~2025-10-13  1:48 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-01  6:56 [v7 00/16] mm: support device-private THP Balbir Singh
2025-10-01  6:56 ` [v7 01/16] mm/zone_device: support large zone device private folios Balbir Singh
2025-10-12  6:10   ` Lance Yang
2025-10-12 22:54     ` Balbir Singh
2025-10-01  6:56 ` [v7 02/16] mm/zone_device: Rename page_free callback to folio_free Balbir Singh
2025-10-01  6:56 ` [v7 03/16] mm/huge_memory: add device-private THP support to PMD operations Balbir Singh
2025-10-12 15:46   ` Lance Yang
2025-10-13  0:01     ` Balbir Singh
2025-10-13  1:48       ` Lance Yang [this message]
2025-10-17 14:49   ` linux-next: KVM/s390x regression (was: [v7 03/16] mm/huge_memory: add device-private THP support to PMD operations) Christian Borntraeger
2025-10-17 14:54     ` linux-next: KVM/s390x regression David Hildenbrand
2025-10-17 15:01       ` Christian Borntraeger
2025-10-17 15:07         ` David Hildenbrand
2025-10-17 15:20           ` Christian Borntraeger
2025-10-17 17:07             ` David Hildenbrand
2025-10-17 21:56               ` Balbir Singh
2025-10-17 22:15                 ` David Hildenbrand
2025-10-17 22:41                   ` David Hildenbrand
2025-10-20  7:01                     ` Christian Borntraeger
2025-10-20  7:00                 ` Christian Borntraeger
2025-10-20  8:41                   ` David Hildenbrand
2025-10-20  9:04                     ` Claudio Imbrenda
2025-10-27 16:47                     ` Claudio Imbrenda
2025-10-27 16:59                       ` David Hildenbrand
2025-10-27 17:06                       ` Christian Borntraeger
2025-10-28  9:24                         ` Balbir Singh
2025-10-28 13:01                         ` [PATCH v1 0/1] KVM: s390: Fix missing present bit for gmap puds Claudio Imbrenda
2025-10-28 13:01                           ` [PATCH v1 1/1] " Claudio Imbrenda
2025-10-28 21:23                             ` Balbir Singh
2025-10-29 10:00                             ` David Hildenbrand
2025-10-29 10:20                               ` Claudio Imbrenda
2025-10-28 22:53                           ` [PATCH v1 0/1] " Andrew Morton
2025-10-01  6:56 ` [v7 04/16] mm/rmap: extend rmap and migration support device-private entries Balbir Singh
2025-10-22 11:54   ` Lance Yang
2025-10-01  6:56 ` [v7 05/16] mm/huge_memory: implement device-private THP splitting Balbir Singh
2025-10-01  6:56 ` [v7 06/16] mm/migrate_device: handle partially mapped folios during collection Balbir Singh
2025-10-01  6:56 ` [v7 07/16] mm/migrate_device: implement THP migration of zone device pages Balbir Singh
2025-10-01  6:56 ` [v7 08/16] mm/memory/fault: add THP fault handling for zone device private pages Balbir Singh
2025-10-01  6:57 ` [v7 09/16] lib/test_hmm: add zone device private THP test infrastructure Balbir Singh
2025-10-01  6:57 ` [v7 10/16] mm/memremap: add driver callback support for folio splitting Balbir Singh
2025-10-01  6:57 ` [v7 11/16] mm/migrate_device: add THP splitting during migration Balbir Singh
2025-10-13 21:17   ` Zi Yan
2025-10-13 21:33     ` Balbir Singh
2025-10-13 21:55       ` Zi Yan
2025-10-13 22:50         ` Balbir Singh
2025-10-19  8:19   ` Wei Yang
2025-10-19 22:49     ` Balbir Singh
2025-10-19 22:59       ` Zi Yan
2025-10-21 21:34         ` Balbir Singh
2025-10-22  2:59           ` Zi Yan
2025-10-22  7:16             ` Balbir Singh
2025-10-22 15:26               ` Zi Yan
2025-10-28  9:32                 ` Balbir Singh
2026-02-09 16:00   ` David Hildenbrand (Arm)
2026-02-09 21:57     ` Balbir Singh
2026-02-10  9:39       ` David Hildenbrand (Arm)
2026-02-10  9:41         ` Balbir Singh
2025-10-01  6:57 ` [v7 12/16] lib/test_hmm: add large page allocation failure testing Balbir Singh
2025-10-01  6:57 ` [v7 13/16] selftests/mm/hmm-tests: new tests for zone device THP migration Balbir Singh
2025-10-01  6:57 ` [v7 14/16] selftests/mm/hmm-tests: partial unmap, mremap and anon_write tests Balbir Singh
2025-10-01  6:57 ` [v7 15/16] selftests/mm/hmm-tests: new throughput tests including THP Balbir Singh
2025-10-01  6:57 ` [v7 16/16] gpu/drm/nouveau: enable THP support for GPU memory migration Balbir Singh
2025-10-09  3:17 ` [v7 00/16] mm: support device-private THP Andrew Morton
2025-10-09  3:26   ` Balbir Singh
2025-10-09 10:33     ` Matthew Brost
2025-10-13 22:51       ` Balbir Singh
2025-11-11 23:43       ` Andrew Morton
2025-11-11 23:52         ` Balbir Singh
2025-11-12  0:24           ` Andrew Morton
2025-11-12  0:36             ` Balbir Singh
2025-11-20  2:40           ` Matthew Brost
2025-11-20  2:50             ` Balbir Singh
2025-11-20  2:59               ` Balbir Singh
2025-11-20  3:15                 ` Matthew Brost
2025-11-20  3:58                   ` Balbir Singh
2025-11-20  5:46                     ` Balbir Singh
2025-11-20  5:53                     ` Matthew Brost
2025-11-20  6:03                       ` Balbir Singh
2025-11-20 17:27                         ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83492e9c-3f17-42e5-8897-9c0ed5aa76e7@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=balbirs@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=byungchul@sk.com \
    --cc=dakr@kernel.org \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=francois.dugast@intel.com \
    --cc=gourry@gourry.net \
    --cc=joshua.hahnjy@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lyude@redhat.com \
    --cc=matthew.brost@intel.com \
    --cc=mpenttil@redhat.com \
    --cc=npache@redhat.com \
    --cc=osalvador@suse.de \
    --cc=rakie.kim@sk.com \
    --cc=rcampbell@nvidia.com \
    --cc=ryan.roberts@arm.com \
    --cc=simona@ffwll.ch \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.